stepsel.modeling.stepwise_glm ============================= .. py:module:: stepsel.modeling.stepwise_glm Classes ------- .. autoapisummary:: stepsel.modeling.stepwise_glm.StepwiseGLM Module Contents --------------- .. py:class:: StepwiseGLM(formula: str, data: pandas.DataFrame, include: list[str] | None = None, slentry: float = 0.001, slstay: float = 0.001, model_fit_log: pandas.DataFrame = None, family: Any | None = None, offset: Any | None = None, exposure: Any | None = None, freq_weights: Any | None = None, var_weights: Any | None = None, missing: str = 'none', **kwargs: Any) Class for performing stepwise GLM. .. method:: fit() Fit stepwise GLM. .. py:attribute:: formula .. py:attribute:: data .. py:attribute:: include .. py:attribute:: slentry .. py:attribute:: slstay .. py:attribute:: family .. py:attribute:: offset .. py:attribute:: exposure .. py:attribute:: freq_weights .. py:attribute:: var_weights .. py:attribute:: missing .. py:attribute:: kwargs .. py:attribute:: full_formula .. py:attribute:: feature_ids .. py:attribute:: model_fit_log .. py:attribute:: current_model :value: None .. py:attribute:: current_model_features :value: None .. py:method:: _get_model_string(features: pandas.Series) -> str :staticmethod: Get model string from features. :param features: Features to include in model. :type features: pd.Series :returns: Model string. :rtype: str .. rubric:: Notes The model string is used as unique identifier for the model. Only the exogenous variables are included in the model string. The variables are sorted alphabetically. .. py:method:: _log_model_fit(features: pandas.Series, fit: statsmodels.api.GLM) -> None Log model fit. :param features: Features to include in model. :type features: pd.Series :param fit: Fitted model. :type fit: sm.GLM :returns: * *None* * *Modifies* * *--------* * **self.model_fit_log** (*pd.DataFrame*) -- Log of model fits - adds row for model. .. rubric:: Notes The model string is used as unique identifier for the model. .. py:method:: _model_fit(features: list[str]) -> None Fit model. :param features: Features to include in model. :type features: list[str] :returns: **fit** -- Fitted model. :rtype: sm.GLM .. py:method:: _add1(scope: list[str]) -> str | None Add one feature to the model if it improves the model fit. :param scope: List of features to consider for adding to the model. :type scope: list[str] :returns: * **best_new_eligible_feature** (*str | None*) -- Best feature to add to the model. None if no feature improves the model fit. * *Modifies* * *--------* * **self.current_model** (*sm.GLM*) -- Fitted model if a feature is added. * **self.current_model_features** (*list[str]*) -- Features in the model if a feature is added. * **self.model_fit_log** (*pd.DataFrame*) -- Log of model fits - adds row for model. .. rubric:: Notes The model fit is improved if the p-value of the likelihood ratio test is below slentry. .. py:method:: _drop1(scope: list[str]) -> str | None Drop one feature from the model if it does not worsen the model fit. :param scope: List of features to consider for dropping from the model. :type scope: list[str] :returns: * **best_dropped_eligible_feature** (*str | None*) -- Best feature to drop from the model. None if no feature can be dropped without worsening the model fit. * *Modifies* * *--------* * **self.current_model** (*sm.GLM*) -- Fitted model if a feature is dropped. * **self.current_model_features** (*list[str]*) -- Features in the model if a feature is dropped. * **self.model_fit_log** (*pd.DataFrame*) -- Log of model fits - adds row for model. .. rubric:: Notes The model fit is worsened if the p-value of the likelihood ratio test is above slstay. .. py:method:: fit() Fit the model using stepwise selection. Modifies -------- self.current_model : sm.GLM Fitted model. self.current_model_features : list[str] Features in the model. self.model_fit_log : pd.DataFrame Log of model fits. .. rubric:: Notes Selection is done in a forward-backward fashion. It starts with the intercept and features in self.include. Then, it adds a feature if it improves the model fit. Then, it drops a feature if it does not worsen the model fit. Adding and dropping is repeated until no feature can be added or dropped.