stepsel.modeling.predict
The stepsel.modeling.predict module includes functions for making predictions.
Submodules
Classes
Class for scoring table and related functions |
Package Contents
- class stepsel.modeling.predict.ScoringTableGLM(scoring_table: pandas.DataFrame)[source]
Class for scoring table and related functions
- from_glm_model(model: GLMResultsWrapper, adjusted_coeffs: dict = None)[source]
Create a scoring table from the model fit. Suitable for statsmodels GLM.
- predict_linear(X: pd.DataFrame)[source]
Predict linear part of the GLM model based on the scoring table and model matrix.
- _reconstruct_model_matrix_columns(scoring_table: pd.DataFrame)[source]
Reconstruct model matrix columns and model variables based on the scoring table
- required_columns = ['var1', 'var2', 'level_var1', 'level_var2', 'estimate']
- scoring_table
- classmethod from_glm_model(model: statsmodels.genmod.generalized_linear_model.GLMResultsWrapper, adjusted_coeffs: dict = None) ScoringTableGLM[source]
Create a scoring table from the model fit. Suitable for statsmodels GLM.
- Parameters:
model (GLMResultsWrapper) – Model fit of statsmodels GLM.
adjusted_coeffs (dict, optional) –
Dictionary of adjusted coefficients. The default is None. The format of the dictionary is as follows:
{variable_name: adjusted_coefficient} Variable_name is the name of the variable in the model.
Example: {“ts_new9_g: 06”: 0.20, “drpou_cpp_dop3: H”: -1.74}
- Returns:
Scoring table as a ScoringTable object.
- Return type:
ScoringTable
Notes
Output of this function can be used to create scoring SQL query when uploaded to the database.
- classmethod from_csv(path: str, *args, **kwargs) ScoringTableGLM[source]
Create a scoring table from a csv file.
- Parameters:
path (str) – Path to the csv file.
*args – Additional arguments passed to pandas.read_csv() function.
**kwargs – Additional arguments passed to pandas.read_csv() function.
- Returns:
Scoring table as a ScoringTable object.
- Return type:
ScoringTable
- Raises:
ValueError – If the csv file does not contain the required columns.
- __repr__() str[source]
Returns string representation of the ScoringTable object
- Returns:
String representation of the ScoringTable object
- Return type:
str
- static _reconstruct_model_matrix_columns(scoring_table) tuple[source]
Reconstruct model matrix columns and model features and variables based on the scoring table
- Parameters:
scoring_table (pd.DataFrame) – Required columns: var1, var2, level_var1, level_var2
- Returns:
Tuple of two lists and dict. List of model matrix column names. List of model features. Dict of model variables with keys “numerical” and “categorical”.
- Return type:
tuple
- Raises:
ValueError – If the type of the row in the scoring table is not recognized.
- sql(intercept_name: str = 'Intercept', format: bool = False)[source]
Returns SQL statement for scoring table
- Parameters:
intercept_name (str, optional) – Name of the intercept variable, by default ‘Intercept’
format (bool, optional) – If True, the SQL statement is formatted for better readability, by default False
- Returns:
str – SQL statement for scoring table
Uses
—-
self.scoring_table (pandas.DataFrame) – Scoring table with columns: var1, var2, level_var1, level_var2, estimate.
Notes
If format=True line breaks are added after each row. This is useful for manual inspection of the SQL statement. print() has to be used to print the SQL statement to the console with line breaks.
- predict_linear(X: pandas.DataFrame) pandas.Series[source]
Predict linear part of the GLM model based on the scoring table and model matrix.
- Parameters:
X (pd.DataFrame) – Model matrix as created by the prepare_model_matrix() function.
- Returns:
pd.Series – Linear part of the GLM model
Uses
—-
scoring_table (pd.DataFrame) – Required columns: var1, var2, level_var1, level_var2
model_matrix_columns (list[str]) – List of model matrix column names as created by the _reconstruct_model_matrix_columns() function.