1.0.2 Create Dummy Variables with the OneHotEncoder
Description:
Convert a column containing categorical values to numeric dummy variables. This works for null values as well
Example:
# instantiate a OneHotEncoder used to create dummy variables, spase=False returns the results an array
my_OneHotEncoder = OneHotEncoder(sparse =False)
########## Example:
# Create an example DF with categoricall columns and nulls (np.nan)
import pandas as pd
import numpy as np
# Empty DF
DF_train = pd.DataFrame()
# Add columns
DF_train['column_1'] = ['A','B','C']
DF_train['column_2'] = ['E', np.nan,'G']
print( DF_train )
# column_1 column_2
# A E
# B NaN
# C G
# instantiate a OneHotEncoder used to create dummy variables, spase=False returns the results an array
my_OneHotEncoder = OneHotEncoder(sparse =False)
# fit the encoder on the columns to create dummy variables
my_OneHotEncoder.fit(DF_train[['column_1', 'column_2']])
# fit the encoder on the columns to create dummy variables
my_OneHotEncoder.fit(DF_train[['column_1', 'column_2']])
# Create dummy values for the columns and save the results as an array
array = my_OneHotEncoder.transform(DF_train[['column_1', 'column_2']])
# Convert the array to a dataframe and add the auto-generated column names
DF = pd.DataFrame(array, columns = my_OneHotEncoder.get_feature_names_out())
print(DF)
# column_1_A column_1_B column_1_C column_2_E column_2_G column_2_nan
# 1.0 0.0 0.0 1.0 0.0 0.0
# 0.0 1.0 0.0 0.0 0.0 1.0
# 0.0 0.0 1.0 0.0 1.0 0.01
# 0.0 1.0 0.0 0.0 0.0 1.0
# 0.0 0.0 1.0 0.0 1.0 0.0