stepsel.modeling.prep.interaction

Preprocessing functions for interaction terms.

Functions

interaction_categorical_numerical(series1, series2)

Create an interaction term between a categorical and a numerical variable.

interaction_categorical_categorical(series1, series2)

Create an interaction term between two categorical variables.

interaction_numerical_numerical(series1, series2)

Create an interaction term between two numerical variables.

Module Contents

stepsel.modeling.prep.interaction.interaction_categorical_numerical(series1: pandas.Series, series2: pandas.Series)[source]

Create an interaction term between a categorical and a numerical variable.

Parameters:
  • series1 (pandas.Series) – The first series.

  • series2 (pandas.Series) – The second series.

Returns:

interaction_df – A DataFrame with the interaction terms.

Return type:

pandas.DataFrame

Raises:

ValueError – If one (and only one) of the series is not categorical. If one (and only one) of the series is not numerical.

Notes

The function will create dummy variables for the categorical variable and multiply them by the numerical variable. The dummy variables will be named “categorical_variable: category * numerical_variable”.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from stepsel.modeling.prep import interaction_categorical_numerical
>>> categorical_series = pd.Series(np.random.choice(["A", "B", "C"], size=10), name="categorical").astype("category")
>>> numerical_series = pd.Series(np.random.normal(size=10), name="numerical")
>>> interaction_categorical_numerical(categorical_series, numerical_series)
        categorical: A * numerical  categorical: B * numerical  categorical: C * numerical
0                        -0.626453                    0.417258                    0.619825
1                         0.183643                   -0.720788                   -0.720788
2                         0.835979                   -0.632650                   -0.632650
...
stepsel.modeling.prep.interaction.interaction_categorical_categorical(series1: pandas.Series, series2: pandas.Series)[source]

Create an interaction term between two categorical variables.

Parameters:
  • series1 (pandas.Series) – The first series.

  • series2 (pandas.Series) – The second series.

Returns:

interaction – A Series with the interaction terms.

Return type:

pandas.Series

Raises:

ValueError – If one series1 is not categorical. If one series2 is not categorical.

Notes

The function will create an interaction term between the two categorical variables. The interaction term will be named “categorical_variable1 * categorical_variable2”. The interactions will be in form of “category1 * category2”.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from stepsel.modeling.prep import interaction_categorical_categorical
>>> categorical_series1 = pd.Series(["A", "B", "C"], name="categorical1").astype("category")
>>> categorical_series2 = pd.Series(["X", "Y", "Z"], name="categorical2").astype("category")
>>> interaction_categorical_categorical(categorical_series1, categorical_series2)
0    A * X
1    B * Y
2    C * Z
Name: categorical1 * categorical2, dtype: category
Categories (3, object): ['A * X', 'B * Y', 'C * Z']
stepsel.modeling.prep.interaction.interaction_numerical_numerical(series1: pandas.Series, series2: pandas.Series)[source]

Create an interaction term between two numerical variables.

Parameters:
  • series1 (pandas.Series) – The first series.

  • series2 (pandas.Series) – The second series.

Returns:

interaction – A Series with the interaction terms.

Return type:

pandas.Series

Raises:

ValueError – If one series1 is not numerical. If one series2 is not numerical.

Notes

The function will create an interaction term between the two numerical variables. The interaction term will be named “numerical_variable1 * numerical_variable2”.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from stepsel.modeling.prep import interaction_numerical_numerical
>>> numerical_series1 = pd.Series(np.random.normal(size=10), name="numerical1")
>>> numerical_series2 = pd.Series(np.random.normal(size=10), name="numerical2")
>>> interaction_numerical_numerical(numerical_series1, numerical_series2)
0    -0.626453
1    -0.720788
2    -0.632650
...
Name: numerical1 * numerical2, dtype: float64