stepsel.modeling.predict

The stepsel.modeling.predict module includes functions for making predictions.

Submodules

stepsel.modeling.predict.scoring_table

Classes

ScoringTableGLM

Class for scoring table and related functions

Package Contents

class stepsel.modeling.predict.ScoringTableGLM(scoring_table: pandas.DataFrame)[source]

Class for scoring table and related functions

from_glm_model(model: GLMResultsWrapper, adjusted_coeffs: dict = None)[source]: Create a scoring table from the model fit. Suitable for statsmodels GLM.

from_csv(path: str, \*args, \*\*kwargs)[source]: Create a scoring table from a csv file.

sql(format: bool = False)[source]: Returns SQL statement for scoring table

predict_linear(X: pd.DataFrame)[source]: Predict linear part of the GLM model based on the scoring table and model matrix.

to_sql(*args, \*\*kwargs)[source]: Upload scoring table to the database

to_csv(path: str, \*args, \*\*kwargs)[source]: Save scoring table to a csv file

_reconstruct_model_matrix_columns(scoring_table: pd.DataFrame)[source]: Reconstruct model matrix columns and model variables based on the scoring table

required_columns = ['var1', 'var2', 'level_var1', 'level_var2', 'estimate']

scoring_table

classmethod from_glm_model(model: statsmodels.genmod.generalized_linear_model.GLMResultsWrapper, adjusted_coeffs: dict = None) → ScoringTableGLM[source]

Create a scoring table from the model fit. Suitable for statsmodels GLM.

Parameters:

model (GLMResultsWrapper) – Model fit of statsmodels GLM.
adjusted_coeffs (dict, optional) –
Dictionary of adjusted coefficients. The default is None. The format of the dictionary is as follows:

{variable_name: adjusted_coefficient} Variable_name is the name of the variable in the model.

Example: {“ts_new9_g: 06”: 0.20, “drpou_cpp_dop3: H”: -1.74}

Returns:

Scoring table as a ScoringTable object.

Return type:

ScoringTable

Notes

Output of this function can be used to create scoring SQL query when uploaded to the database.

classmethod from_csv(path: str, *args, **kwargs) → ScoringTableGLM[source]

Create a scoring table from a csv file.

Parameters:

path (str) – Path to the csv file.
*args – Additional arguments passed to pandas.read_csv() function.
**kwargs – Additional arguments passed to pandas.read_csv() function.

Returns:

Scoring table as a ScoringTable object.

Return type:

ScoringTable

Raises:

ValueError – If the csv file does not contain the required columns.

__repr__() → str[source]

Returns string representation of the ScoringTable object

Returns:: String representation of the ScoringTable object
Return type:: str

static _reconstruct_model_matrix_columns(scoring_table) → tuple[source]

Reconstruct model matrix columns and model features and variables based on the scoring table

Parameters:: scoring_table (pd.DataFrame) – Required columns: var1, var2, level_var1, level_var2
Returns:: Tuple of two lists and dict. List of model matrix column names. List of model features. Dict of model variables with keys “numerical” and “categorical”.
Return type:: tuple
Raises:: ValueError – If the type of the row in the scoring table is not recognized.

sql(intercept_name: str = 'Intercept', format: bool = False)[source]

Returns SQL statement for scoring table

Parameters:

intercept_name (str, optional) – Name of the intercept variable, by default ‘Intercept’
format (bool, optional) – If True, the SQL statement is formatted for better readability, by default False

Returns:

str – SQL statement for scoring table
Uses
—-
self.scoring_table (pandas.DataFrame) – Scoring table with columns: var1, var2, level_var1, level_var2, estimate.

Notes

If format=True line breaks are added after each row. This is useful for manual inspection of the SQL statement. print() has to be used to print the SQL statement to the console with line breaks.

predict_linear(X: pandas.DataFrame) → pandas.Series[source]

Predict linear part of the GLM model based on the scoring table and model matrix.

Parameters:

X (pd.DataFrame) – Model matrix as created by the prepare_model_matrix() function.

Returns:

pd.Series – Linear part of the GLM model
Uses
—-
scoring_table (pd.DataFrame) – Required columns: var1, var2, level_var1, level_var2
model_matrix_columns (list[str]) – List of model matrix column names as created by the _reconstruct_model_matrix_columns() function.

to_sql(*args, **kwargs) → None[source]

Upload scoring table to the database

Parameters:

**kwargs – Arguments passed to the pandas.DataFrame.to_sql() function.
Uses
----
self.scoring_table (pd.DataFrame) – Required columns: var1, var2, level_var1, level_var2, estimate

to_csv(*args, **kwargs) → None[source]

Save scoring table as csv file

Parameters:

**kwargs – Arguments passed to the pandas.DataFrame.to_csv() function.
Uses
----
self.scoring_table (pd.DataFrame) – Required columns: var1, var2, level_var1, level_var2, estimate