ONE_HOT_ENCODING

Create a one-hot encoding from a dataframe containing categorical features. Params: data : DataFrame The input dataframe containing the categorical features. feature_col : DataFrame A dataframe whose columns are used to create the one hot encoding. For example, if 'data' has columns ['a', 'b', 'c'] and 'feature_col' has columns ['a', 'b'], then the one hot encoding will be created only for columns ['a', 'b'] against 'data'. Defaults to None, meaning that all columns of categorizable objects are encoded. Returns: out : DataFrame The one hot encoding of the input features.

Python Code

from typing import Optional

import pandas as pd
from flojoy import DataFrame, flojoy


@flojoy
def ONE_HOT_ENCODING(
    data: DataFrame,
    feature_col: Optional[DataFrame] = None,
) -> DataFrame:
    """Create a one-hot encoding from a dataframe containing categorical features.

    Parameters
    ----------
    data : DataFrame
        The input dataframe containing the categorical features.
    feature_col: DataFrame, optional
        A dataframe whose columns are used to create the one hot encoding.
        For example, if 'data' has columns ['a', 'b', 'c'] and 'feature_col' has columns ['a', 'b'],
        then the one hot encoding will be created only for columns ['a', 'b'] against 'data'.
        Defaults to None, meaning that all columns of categorizable objects are encoded.

    Returns
    -------
    DataFrame
        The one hot encoding of the input features.
    """

    df = data.m
    if feature_col:
        encoded = pd.get_dummies(df, columns=feature_col.m.columns.to_list())

    else:
        cat_df = df.select_dtypes(include=["object", "category"]).columns.to_list()
        encoded = pd.get_dummies(df, columns=cat_df)

    return DataFrame(df=encoded)

Find this Flojoy Block on GitHub

Example

Having problems with this example app? Join our Discord community and we will help you out!

In this example, ONE_HOT_ENCODING is passed the tips dataset from the PLOTLY_DATASET node.

ONE_HOT_ENCODING is passed smoker,day for the columns parameter, so the output consists of a dataframe with one hot encodings for only the smoker and day columns from the input dataset.