stepsel.modeling.prep.model_matrix ================================== .. py:module:: stepsel.modeling.prep.model_matrix Functions --------- .. autoapisummary:: stepsel.modeling.prep.model_matrix.prepare_model_matrix stepsel.modeling.prep.model_matrix.adjust_model_matrix Module Contents --------------- .. py:function:: prepare_model_matrix(formula: str, data: pandas.DataFrame, intercept: bool = True, drop_first: bool = True, omit_left_side_variables: bool = False) Prepare a model matrix based on a formula and a data set. TODO: If intercept = False, keep all the levels of the first categorical variable. :param formula: The formula for the model. :type formula: str :param data: The data set. :type data: pandas.DataFrame :param intercept: Whether to include an intercept in the model matrix. Default is True. :type intercept: bool, optional :param drop_first: Whether to drop the first level of each categorical variable. Default is True. :type drop_first: bool, optional :param omit_left_side_variables: Whether to omit the left side variables from the output. Default is False. If True, the function will return only the model matrix and the feature IDs. :type omit_left_side_variables: bool, optional :returns: * **y** (*pandas.Series*) -- The response variable. If omit_left_side_variables is True, the function won't return y. * **model_matrix** (*pandas.DataFrame*) -- The model matrix. * **feature_ids** (*list*) -- The feature IDs. :raises ValueError: If interaction type is not supported. .. rubric:: Notes The function will create a model matrix based on the formula and the data set. Categories will be dummy-encoded. Interaction terms will be created and dummy-encoded if necessary. The feature IDs will be a list of strings of the variable names corresponding to the columns of the model matrix. .. rubric:: Examples >>> import pandas as pd >>> import numpy as np >>> from stepsel.modeling.prep import prepare_model_matrix >>> data = pd.DataFrame({"y": np.random.normal(size=100), ... "x1": np.random.normal(size=100), ... "x2": np.random.choice(["A", "B", "C"], size=100), ... "x3": np.random.choice(["A", "B", "C"], size=100)}) >>> data[["x2", "x3"]] = data[["x2", "x3"]].astype("category") >>> y, model_matrix, feature_ids = prepare_model_matrix("y ~ x1 + x2 + x3 + x1*x2 + x1*x3", data) .. py:function:: adjust_model_matrix(model_matrices: list, adjusted_coeffs: dict, offsets: list = None) Adjust model matrix (and offset) based on adjusted coefficients dictionary. :param model_matrices: The model matrices. :type model_matrices: list (of data frames) :param adjusted_coeffs: The adjusted coefficients dictionary. The format of the dictionary is as follows: {variable_name: adjusted_coefficient} Variable_name is the name of the variable in the model. Example: {"ts_new9_g: 06": 0.20, "drpou_cpp_dop3: H": -1.74} :type adjusted_coeffs: dict :param offsets: The offsets. Default is None. :type offsets: list (of numpy arrays or pandas Series), optional :returns: * **model_matrices** (*tuple (of data frames)*) -- The adjusted model matrices. * **offsets** (*tuple (of numpy arrays or pandas Series)*) -- The adjusted offsets. :raises Exception: If the number of offsets is not equal to the number of model matrices. If the number of rows in the model matrix is not equal to the number of offset values. .. rubric:: Notes The function will adjust the model matrices and offsets based on the adjusted coefficients dictionary. The function will delete the variables from the model matrices and add the adjusted coefficients to the offsets. The function will return a tuple of the adjusted model matrices and offsets. Adjustments are done in-place. If both matrices and offsets are provided, re-assignment is not necessary. If one wants to keep the original model matrices and offsets, make a copy of them before calling the function.