respy.pre_processing.model_processing#

Process model specification files or objects.

Module Contents#

Functions#

process_params_and_options(params, options)

Process params and options.

_read_options(dict_or_path)

Read the options which can either be a dictionary or a path.

_create_internal_seeds_from_user_seeds(options)

Create internal seeds from user input.

_read_params(df_or_series)

Read the parameters which can either be a path, a Series, or a DataFrame.

_parse_parameters(params, options)

Parse the parameter vector into a dictionary of model quantities.

_parse_present_bias_parameter(optim_paras, params)

Parse present-bias parameter which is 1 by default.

_parse_exogenous_processes(optim_paras, params)

Parse exogenous processes.

_parse_observables(optim_paras, params)

Parse observed variables and their levels.

_parse_choices(optim_paras, params, options)

Define unique order of choices.

_parse_choice_parameters(optim_paras, params)

Parse utility parameters for choices.

_parse_initial_and_max_experience(optim_paras, params, ...)

Process initial experience distributions and maximum experience.

_parse_shocks(optim_paras, params)

Parse the shock parameters and create the Cholesky factor.

_parse_measurement_errors(optim_paras, params)

Parse the standard deviations of measurement errors.

_parse_types(optim_paras, params)

Parse type shifts and type parameters.

_infer_number_of_types(params)

Infer the number of types from parameters which is zero by default.

_infer_choices_with_experience(params, options)

Infer choices with experiences.

_infer_choices_with_prefix(params, prefix)

Infer choices with prefix.

_parse_lagged_choices(optim_paras, options, params)

Parse lagged choices from covariates and params.

_parse_probabilities_or_logit_coefficients(params, ...)

Parse probabilities or logit coefficients of parameter groups.

_parse_observable_or_exog_process_names(params, keyword)

Parse the names of observables or exogenous processes.

_sync_optim_paras_and_options(optim_paras, options)

Sync optim_paras and options after they have been parsed separately.

_add_type_covariates(options, optim_paras)

Add type covariates.

_add_default_is_inadmissible(options, optim_paras)

Add default negative choice set constraints.

_convert_labels_in_formulas_to_codes(options, optim_paras)

Convert labels in covariates, filters and inadmissible formulas to codes.

_replace_in_single_or_double_quotes(val, from_, to)

Replace a value in a string enclosed in single or double quotes.

_replace_choices_and_observables_in_formula(formula, ...)

Replace choices and observables in formula.

_convert_labels_in_filters_to_codes(optim_paras, options)

Convert labels in "core_state_space_filters" to codes.

_parse_cache_directory(options)

Parse the location of the cache.

respy.pre_processing.model_processing.process_params_and_options(params, options)[source]#

Process params and options.

This function is interface for parsing the model specification given by the user.

respy.pre_processing.model_processing._read_options(dict_or_path)[source]#

Read the options which can either be a dictionary or a path.

respy.pre_processing.model_processing._create_internal_seeds_from_user_seeds(options)[source]#

Create internal seeds from user input.

Instead of reusing the same seed, we use sequences of seeds incrementing by one. It ensures that we do not accidentally draw the same randomness twice.

As naive sequences started by the seeds given by the user might be overlapping, the user seeds are used to generate seeds within certain ranges. The seed for the

  • solution is between 1,000,000 and 2,000,000.

  • simulation is between 4,000,000 and 5,000,000.

  • likelihood estimation is between 7,000,000 and 8,000,000.

Furthermore, we need to sequences of seeds. The first sequence is for building simulate() or log_like() where “startup” seeds are used to generate the draws. The second sequence start at seed_start + SEED_STARTUP_ITERATION_GAP and has to be reset to the initial value at the beginning of every iteration.

See Randomness and Reproducibility for more information.

Examples

>>> options = {"solution_seed": 1, "simulation_seed": 2, "estimation_seed": 3}
>>> options = _create_internal_seeds_from_user_seeds(options)
>>> options["solution_seed_startup"], options["solution_seed_iteration"]
(count(1128037), count(2128037))
>>> options["simulation_seed_startup"], options["simulation_seed_iteration"]
(count(4875688), count(5875688))
>>> options["estimation_seed_startup"], options["estimation_seed_iteration"]
(count(7071530), count(8071530))
respy.pre_processing.model_processing._read_params(df_or_series)[source]#

Read the parameters which can either be a path, a Series, or a DataFrame.

respy.pre_processing.model_processing._parse_parameters(params, options)[source]#

Parse the parameter vector into a dictionary of model quantities.

respy.pre_processing.model_processing._parse_present_bias_parameter(optim_paras, params)[source]#

Parse present-bias parameter which is 1 by default.

Examples

An example where present-bias parameter is specified:

>>> tuples = [("beta", "beta")]
>>> index = pd.MultiIndex.from_tuples(tuples, names=["category", "name"])
>>> params = pd.Series(data=0.4, index=index)
>>> optim_paras = {"delta": 0.95}
>>> _parse_present_bias_parameter(optim_paras, params)
{'delta': 0.95, 'beta': 0.4, 'beta_delta': 0.38}

And one where present-bias parameter is not specified:

>>> params = pd.Series(dtype="float64")
>>> optim_paras = {"delta": 0.95}
>>> _parse_present_bias_parameter(optim_paras, params)
{'delta': 0.95, 'beta': 1, 'beta_delta': 0.95}
respy.pre_processing.model_processing._parse_exogenous_processes(optim_paras, params)[source]#

Parse exogenous processes.

respy.pre_processing.model_processing._parse_observables(optim_paras, params)[source]#

Parse observed variables and their levels.

respy.pre_processing.model_processing._parse_choices(optim_paras, params, options)[source]#

Define unique order of choices.

This function defines a unique order of choices. Choices can be separated in choices with experience and wage, with experience but without wage and without experience and wage. This distinction is used to create a unique ordering of choices. Within each group, we order alphabetically.

respy.pre_processing.model_processing._parse_choice_parameters(optim_paras, params)[source]#

Parse utility parameters for choices.

respy.pre_processing.model_processing._parse_initial_and_max_experience(optim_paras, params, options)[source]#

Process initial experience distributions and maximum experience.

respy.pre_processing.model_processing._parse_shocks(optim_paras, params)[source]#

Parse the shock parameters and create the Cholesky factor.

respy.pre_processing.model_processing._parse_measurement_errors(optim_paras, params)[source]#

Parse the standard deviations of measurement errors.

Measurement errors can be provided for all or none choices with wages. Measurement errors for non-wage choices are neglected.

optim_paras[“has_meas_error”] is only False if there are no standard deviations of measurement errors in params, not if they are all zero. Otherwise, we would introduce a kink into the likelihood function.

respy.pre_processing.model_processing._parse_types(optim_paras, params)[source]#

Parse type shifts and type parameters.

It is not explicitly enforced that all types have the same covariates, but it is implicitly enforced that the parameters form a valid matrix.

respy.pre_processing.model_processing._infer_number_of_types(params)[source]#

Infer the number of types from parameters which is zero by default.

Examples

An example without types:

>>> tuples = [("wage_a", "constant"), ("nonpec_edu", "exp_edu")]
>>> index = pd.MultiIndex.from_tuples(tuples, names=["category", "name"])
>>> s = pd.Series(index=index, dtype="object")
>>> _infer_number_of_types(s)
1

And one with types:

>>> tuples = [("wage_a", "type_3"), ("nonpec_edu", "type_2")]
>>> index = pd.MultiIndex.from_tuples(tuples, names=["category", "name"])
>>> s = pd.Series(index=index, dtype="object")
>>> _infer_number_of_types(s)
4
respy.pre_processing.model_processing._infer_choices_with_experience(params, options)[source]#

Infer choices with experiences.

Examples

>>> options = {"covariates": {"a": "exp_white_collar + exp_a", "b": "exp_b >= 2"}}
>>> index = pd.MultiIndex.from_product([["category"], ["a", "b"]])
>>> params = pd.Series(index=index, dtype="object")
>>> _infer_choices_with_experience(params, options)
['a', 'b', 'white_collar']
respy.pre_processing.model_processing._infer_choices_with_prefix(params, prefix)[source]#

Infer choices with prefix.

Examples

>>> params = pd.Series(
...     index=["wage_b", "wage_white_collar", "wage_a", "nonpec_c"], dtype="object"
... )
>>> _infer_choices_with_prefix(params, "wage")
['a', 'b', 'white_collar']
respy.pre_processing.model_processing._parse_lagged_choices(optim_paras, options, params)[source]#

Parse lagged choices from covariates and params.

Lagged choices can only influence behavior of individuals through covariates of the utility function. Thus, check the covariates for any patterns like “lagged_choice_[0-9]+”.

Then, compare the number of lags required by covariates with the information on lagged choices in the parameter specification. For the estimation, there does not have to be any information on lagged choices. For the simulation, we need parameters to define the probability of a choice being the lagged choice.

Warning

UserWarning

If not enough lagged choices are specified in params and the model can only be used for estimation.

UserWarning

If the model contains superfluous definitions of lagged choices.

respy.pre_processing.model_processing._parse_probabilities_or_logit_coefficients(params, regex_for_levels)[source]#

Parse probabilities or logit coefficients of parameter groups.

Some parameters form a group to specify a distribution. The parameters can either be probabilities from a probability mass function. For example, see the specification of initial years of schooling in the extended model of Keane and Wolpin (1997).

On the other hand, parameters and their corresponding covariates can form the inputs of a scipy.special.softmax() which generates the probability mass function. This distribution can be more complex.

Internally, probabilities are also converted to logit coefficients to align the interfaces. To convert probabilities to the appropriate multinomial logit (softmax) coefficients, use a constant for covariates and note that the sum in the denominator is equal for all probabilities and, thus, can be treated as a constant. The following formula shows that the multinomial coefficients which produce the same probability mass function are equal to the logs of probabilities.

\[\begin{split}p_i &= \frac{e^{x_i \beta_i}}{\sum_j e^{x_j \beta_j}} \\ &= \frac{e^{\beta_i}}{\sum_j e^{\beta_j}} \\ log(p_i) &= \beta_i - \log(\sum_j e^{\beta_j}) \\ &= \beta_i - C\end{split}\]
Raises:
ValueError

If probabilities and multinomial logit coefficients are mixed.

Warning

The user is warned if the discrete probabilities of a probability mass function do not sum to one.

respy.pre_processing.model_processing._parse_observable_or_exog_process_names(params, keyword)[source]#

Parse the names of observables or exogenous processes.

The function accepts params and a keyword like observable and separates the name of the variables from its possible realizations.

Parameters:
paramspandas.Series

Contains the parameters of a model.

keyword{“exogenous_process”, “observable”}

Keyword for a group of parameters.

Examples

>>> index = pd.MultiIndex.from_tuples([
...     ("observable_observable_0_first", "probability"),
...     ("observable_observable_0_second", "probability"),
...     ("observable_observable_1_first", "probability"),
...     ("observable_observable_1_second", "probability"),
...     ("observable_children_two_or_less", "probability"),
...     ("observable_children_more_than_two", "probability"),
... ], names=["category", "name"])
>>> params = pd.Series(index=index, dtype="object")
>>> _parse_observable_or_exog_process_names(params, "observable")
['children', 'observable_0', 'observable_1']
respy.pre_processing.model_processing._sync_optim_paras_and_options(optim_paras, options)[source]#

Sync optim_paras and options after they have been parsed separately.

respy.pre_processing.model_processing._add_type_covariates(options, optim_paras)[source]#

Add type covariates.

Since types only introduce constant shifts in the utility functions, this function conveniently adds covariates for each type by default.

Examples

>>> options = {"covariates": {}}
>>> optim_paras = {"n_types": 2}
>>> _add_type_covariates(options, optim_paras)
{'covariates': {'type_1': 'type == 1'}}
respy.pre_processing.model_processing._add_default_is_inadmissible(options, optim_paras)[source]#

Add default negative choice set constraints.

This function adds negative choice set conditions based on maximum experience and no constraints for choices without experience.

respy.pre_processing.model_processing._convert_labels_in_formulas_to_codes(options, optim_paras)[source]#

Convert labels in covariates, filters and inadmissible formulas to codes.

Characteristics with labels are either choices or observables. Choices are ordered as in optim_paras["choices"] and observables alphabetically.

Labels can either be in single or double quote strings which has to be checked.

respy.pre_processing.model_processing._replace_in_single_or_double_quotes(val, from_, to)[source]#

Replace a value in a string enclosed in single or double quotes.

respy.pre_processing.model_processing._replace_choices_and_observables_in_formula(formula, optim_paras)[source]#

Replace choices and observables in formula.

Choices and levels of an observable can have string identifier which are replaced with their codes.

respy.pre_processing.model_processing._convert_labels_in_filters_to_codes(optim_paras, options)[source]#

Convert labels in “core_state_space_filters” to codes.

The filters are used to remove states from the state space which are inadmissible anyway.

A filter might look like this:

"lagged_choice_1 == '{choice_w_wage}' and exp_{choice_w_wage} == 0"

{choice_w_wage} is replaced by the actual choice name whereas ‘{choice_w_wage}’ or “{choice_w_wage}” is replaced with the internal choice code.

respy.pre_processing.model_processing._parse_cache_directory(options)[source]#

Parse the location of the cache.