stepsel.modeling

The stepsel.modeling module includes functions for (not only) statistical modeling.

Subpackages

Submodules

Classes

StepwiseGLM

Class for performing stepwise GLM.

Package Contents

class stepsel.modeling.StepwiseGLM(formula: str, data: pandas.DataFrame, include: list[str] | None = None, slentry: float = 0.001, slstay: float = 0.001, model_fit_log: pandas.DataFrame = None, family: Any | None = None, offset: Any | None = None, exposure: Any | None = None, freq_weights: Any | None = None, var_weights: Any | None = None, missing: str = 'none', **kwargs: Any)[source]

Class for performing stepwise GLM.

fit()[source]

Fit stepwise GLM.

formula
data
include
slentry
slstay
family
offset
exposure
freq_weights
var_weights
missing
kwargs
full_formula
feature_ids
model_fit_log
current_model = None
current_model_features = None
static _get_model_string(features: pandas.Series) str[source]

Get model string from features.

Parameters:

features (pd.Series) – Features to include in model.

Returns:

Model string.

Return type:

str

Notes

The model string is used as unique identifier for the model. Only the exogenous variables are included in the model string. The variables are sorted alphabetically.

_log_model_fit(features: pandas.Series, fit: statsmodels.api.GLM) None[source]

Log model fit.

Parameters:
  • features (pd.Series) – Features to include in model.

  • fit (sm.GLM) – Fitted model.

Returns:

  • None

  • Modifies

  • ——–

  • self.model_fit_log (pd.DataFrame) – Log of model fits - adds row for model.

Notes

The model string is used as unique identifier for the model.

_model_fit(features: list[str]) None[source]

Fit model.

Parameters:

features (list[str]) – Features to include in model.

Returns:

fit – Fitted model.

Return type:

sm.GLM

_add1(scope: list[str]) str | None[source]

Add one feature to the model if it improves the model fit.

Parameters:

scope (list[str]) – List of features to consider for adding to the model.

Returns:

  • best_new_eligible_feature (str | None) – Best feature to add to the model. None if no feature improves the model fit.

  • Modifies

  • ——–

  • self.current_model (sm.GLM) – Fitted model if a feature is added.

  • self.current_model_features (list[str]) – Features in the model if a feature is added.

  • self.model_fit_log (pd.DataFrame) – Log of model fits - adds row for model.

Notes

The model fit is improved if the p-value of the likelihood ratio test is below slentry.

_drop1(scope: list[str]) str | None[source]

Drop one feature from the model if it does not worsen the model fit.

Parameters:

scope (list[str]) – List of features to consider for dropping from the model.

Returns:

  • best_dropped_eligible_feature (str | None) – Best feature to drop from the model. None if no feature can be dropped without worsening the model fit.

  • Modifies

  • ——–

  • self.current_model (sm.GLM) – Fitted model if a feature is dropped.

  • self.current_model_features (list[str]) – Features in the model if a feature is dropped.

  • self.model_fit_log (pd.DataFrame) – Log of model fits - adds row for model.

Notes

The model fit is worsened if the p-value of the likelihood ratio test is above slstay.

fit()[source]

Fit the model using stepwise selection.

Modifies

self.current_modelsm.GLM

Fitted model.

self.current_model_featureslist[str]

Features in the model.

self.model_fit_logpd.DataFrame

Log of model fits.

Notes

Selection is done in a forward-backward fashion. It starts with the intercept and features in self.include. Then, it adds a feature if it improves the model fit. Then, it drops a feature if it does not worsen the model fit. Adding and dropping is repeated until no feature can be added or dropped.