Preprocessing
General Info
Useful Snippets

1.0.2 Create Dummy Variables with the OneHotEncoder

Description:

Convert a column containing categorical values to numeric dummy variables. This works for null values as well

Example:

# instantiate a OneHotEncoder used to create dummy variables, spase=False returns the results an array 
my_OneHotEncoder = OneHotEncoder(sparse =False)

########## Example:  
# Create an example DF with categoricall columns and nulls (np.nan)
import pandas as pd
import numpy  as np

# Empty DF
DF_train = pd.DataFrame()

# Add columns
DF_train['column_1'] = ['A','B','C']
DF_train['column_2'] = ['E', np.nan,'G']

print( DF_train ) 
#  column_1 column_2
#  A        E
#  B      NaN
#  C        G


# instantiate a OneHotEncoder used to create dummy variables, spase=False returns the results an array 
my_OneHotEncoder = OneHotEncoder(sparse =False)

# fit the encoder on the columns to create dummy variables
my_OneHotEncoder.fit(DF_train[['column_1', 'column_2']])

# fit the encoder on the columns to create dummy variables
my_OneHotEncoder.fit(DF_train[['column_1', 'column_2']])

# Create dummy values for the columns and save the results as an array
array = my_OneHotEncoder.transform(DF_train[['column_1', 'column_2']])

# Convert the array to a dataframe and add the auto-generated column names 
DF = pd.DataFrame(array, columns = my_OneHotEncoder.get_feature_names_out())

print(DF)

#  column_1_A  column_1_B  column_1_C  column_2_E  column_2_G  column_2_nan
#  1.0         0.0         0.0          1.0        0.0          0.0
#  0.0         1.0         0.0          0.0        0.0          1.0
#  0.0         0.0         1.0          0.0        1.0          0.01         
#  0.0         1.0         0.0          0.0        0.0          1.0
#  0.0         0.0         1.0          0.0        1.0          0.0