stepsel.modeling.stepwise_glm
Classes
Class for performing stepwise GLM. |
Module Contents
- class stepsel.modeling.stepwise_glm.StepwiseGLM(formula: str, data: pandas.DataFrame, include: list[str] | None = None, slentry: float = 0.001, slstay: float = 0.001, model_fit_log: pandas.DataFrame = None, family: Any | None = None, offset: Any | None = None, exposure: Any | None = None, freq_weights: Any | None = None, var_weights: Any | None = None, missing: str = 'none', **kwargs: Any)[source]
Class for performing stepwise GLM.
- formula
- data
- include
- slentry
- slstay
- family
- offset
- exposure
- freq_weights
- var_weights
- missing
- kwargs
- full_formula
- feature_ids
- model_fit_log
- current_model = None
- current_model_features = None
- static _get_model_string(features: pandas.Series) str[source]
Get model string from features.
- Parameters:
features (pd.Series) – Features to include in model.
- Returns:
Model string.
- Return type:
str
Notes
The model string is used as unique identifier for the model. Only the exogenous variables are included in the model string. The variables are sorted alphabetically.
- _log_model_fit(features: pandas.Series, fit: statsmodels.api.GLM) None[source]
Log model fit.
- Parameters:
features (pd.Series) – Features to include in model.
fit (sm.GLM) – Fitted model.
- Returns:
None
Modifies
——–
self.model_fit_log (pd.DataFrame) – Log of model fits - adds row for model.
Notes
The model string is used as unique identifier for the model.
- _model_fit(features: list[str]) None[source]
Fit model.
- Parameters:
features (list[str]) – Features to include in model.
- Returns:
fit – Fitted model.
- Return type:
sm.GLM
- _add1(scope: list[str]) str | None[source]
Add one feature to the model if it improves the model fit.
- Parameters:
scope (list[str]) – List of features to consider for adding to the model.
- Returns:
best_new_eligible_feature (str | None) – Best feature to add to the model. None if no feature improves the model fit.
Modifies
——–
self.current_model (sm.GLM) – Fitted model if a feature is added.
self.current_model_features (list[str]) – Features in the model if a feature is added.
self.model_fit_log (pd.DataFrame) – Log of model fits - adds row for model.
Notes
The model fit is improved if the p-value of the likelihood ratio test is below slentry.
- _drop1(scope: list[str]) str | None[source]
Drop one feature from the model if it does not worsen the model fit.
- Parameters:
scope (list[str]) – List of features to consider for dropping from the model.
- Returns:
best_dropped_eligible_feature (str | None) – Best feature to drop from the model. None if no feature can be dropped without worsening the model fit.
Modifies
——–
self.current_model (sm.GLM) – Fitted model if a feature is dropped.
self.current_model_features (list[str]) – Features in the model if a feature is dropped.
self.model_fit_log (pd.DataFrame) – Log of model fits - adds row for model.
Notes
The model fit is worsened if the p-value of the likelihood ratio test is above slstay.
- fit()[source]
Fit the model using stepwise selection.
Modifies
- self.current_modelsm.GLM
Fitted model.
- self.current_model_featureslist[str]
Features in the model.
- self.model_fit_logpd.DataFrame
Log of model fits.
Notes
Selection is done in a forward-backward fashion. It starts with the intercept and features in self.include. Then, it adds a feature if it improves the model fit. Then, it drops a feature if it does not worsen the model fit. Adding and dropping is repeated until no feature can be added or dropped.