stepsel.modeling.prep.helper

Helper functions for modeling prep. Smaller functions that are used in multiple functions.

Functions

get_interaction_type(interaction, ...)

Get the interaction type of an interaction.

relevel_categorical_variable(series, new_order)

Relevel a categorical variable.

parse_model_formula(formula)

Parse a model formula into its components.

recognize_variable_types(data, interaction_variables, ...)

Recognize the types of the variables.

Module Contents

stepsel.modeling.prep.helper.get_interaction_type(interaction: str, interaction_numerical_variables: list, interaction_categorical_variables: list)[source]

Get the interaction type of an interaction.

Parameters:
  • interaction (str) – The interaction to get the type of.

  • interaction_numerical_variables (list) – The numerical variables that are used in the interactions.

  • interaction_categorical_variables (list) – The categorical variables that are used in the interactions.

Returns:

interaction_type – The interaction type. One of “numerical_numerical”, “categorical_categorical”, “numerical_categorical”, “categorical_numerical”.

Return type:

str

Raises:

ValueError – If the interaction does not contain exactly one ‘*’ character. If the interaction variables are not in exactly one of the two lists.

Examples

>>> get_interaction_type("a * b", ["a"], ["b"])
"numerical_categorical"
stepsel.modeling.prep.helper.relevel_categorical_variable(series: pandas.Series, new_order: list)[source]

Relevel a categorical variable.

Parameters:
  • series (pd.Series) – The categorical variable to relevel.

  • new_order (list) – The new order of the categories.

Returns:

series – The relevelled categorical variable.

Return type:

pd.Series

Raises:

ValueError – If the new order is not a subset of the current categories. If the new order contains duplicates.

stepsel.modeling.prep.helper.parse_model_formula(formula: str)[source]

Parse a model formula into its components.

Parameters:

formula (str) – The model formula to parse.

Returns:

  • left_side_variables (list) – The variables on the left side of the formula.

  • interaction_variables (list) – The interaction variables on the right side of the formula.

  • non_interaction_variables (list) – The non-interaction variables on the right side of the formula.

Raises:

ValueError – If the formula does not contain exactly one ‘~’ character.

Examples

>>> parse_model_formula("y ~ a + b + a * b")
(["y"], ["a * b"], ["a", "b"])
stepsel.modeling.prep.helper.recognize_variable_types(data: pandas.DataFrame, interaction_variables: list, non_interaction_variables: list)[source]

Recognize the types of the variables.

Parameters:
  • data (pd.DataFrame) – The data to recognize the variable types from.

  • interaction_variables (list) – The interaction variables to recognize the types from.

  • non_interaction_variables (list) – The non-interaction variables to recognize the types from.

Returns:

dictionary – A dictionary containing the variable types.

interaction_numerical_variableslist

The numerical variables in the interaction variables.

interaction_categorical_variableslist

The categorical variables in the interaction variables.

non_interaction_numerical_variableslist

The numerical variables in the non-interaction variables.

non_interaction_categorical_variableslist

The categorical variables in the non-interaction variables.

interaction_variableslist

The interaction variables.

Return type:

dict

Raises:

ValueError – If the interaction variables are not either numerical or categorical. If the non-interaction variables are not either numerical or categorical.

Examples

>>> recognize_variable_types(data, ["a * b"], ["a", "b", "c"])
(["a"], ["b"], [], [], ["a * b"])
{"non_interaction_numerical_variables": ["a"],
 "non_interaction_categorical_variables": ["b"],
 "interaction_numerical_variables": ["a"],
 "interaction_categorical_variables": ["b", "c"],
 "interaction_variables": ["a * b"]}