respy.pre_processing.model_processing
Process model specification files or objects.
process_params_and_options(params, options)
process_params_and_options
Process params and options.
_read_options(dict_or_path)
_read_options
Read the options which can either be a dictionary or a path.
_create_internal_seeds_from_user_seeds(options)
_create_internal_seeds_from_user_seeds
Create internal seeds from user input.
_read_params(df_or_series)
_read_params
Read the parameters which can either be a path, a Series, or a DataFrame.
_parse_parameters(params, options)
_parse_parameters
Parse the parameter vector into a dictionary of model quantities.
_parse_present_bias_parameter(optim_paras, params)
_parse_present_bias_parameter
Parse present-bias parameter which is 1 by default.
_parse_observables(optim_paras, params)
_parse_observables
Parse observed variables and their levels.
_parse_choices(optim_paras, params, options)
_parse_choices
Define unique order of choices.
_parse_choice_parameters(optim_paras, params)
_parse_choice_parameters
Parse utility parameters for choices.
_parse_initial_and_max_experience(optim_paras, params, options)
_parse_initial_and_max_experience
Process initial experience distributions and maximum experience.
_parse_shocks(optim_paras, params)
_parse_shocks
Parse the shock parameters and create the Cholesky factor.
_parse_measurement_errors(optim_paras, params)
_parse_measurement_errors
Parse the standard deviations of measurement errors.
_parse_types(optim_paras, params)
_parse_types
Parse type shifts and type parameters.
_infer_number_of_types(params)
_infer_number_of_types
Infer the number of types from parameters which is zero by default.
_infer_choices_with_experience(params, options)
_infer_choices_with_experience
Infer choices with experiences.
_infer_choices_with_prefix(params, prefix)
_infer_choices_with_prefix
Infer choices with prefix.
_parse_lagged_choices(optim_paras, options, params)
_parse_lagged_choices
Parse lagged choices from covariates and params.
_parse_probabilities_or_logit_coefficients(params, regex_for_levels)
_parse_probabilities_or_logit_coefficients
Parse probabilities or logit coefficients of parameter groups.
_parse_observable_or_exog_process_names(params, keyword)
_parse_observable_or_exog_process_names
Parse the names of observables or exogenous processes.
_sync_optim_paras_and_options(optim_paras, options)
_sync_optim_paras_and_options
Sync optim_paras and options after they have been parsed separately.
optim_paras
options
_add_type_covariates(options, optim_paras)
_add_type_covariates
Add type covariates.
_add_default_is_inadmissible(options, optim_paras)
_add_default_is_inadmissible
Add default negative choice set constraints.
_convert_labels_in_formulas_to_codes(options, optim_paras)
_convert_labels_in_formulas_to_codes
Convert labels in covariates, filters and inadmissible formulas to codes.
_replace_in_single_or_double_quotes(val, from_, to)
_replace_in_single_or_double_quotes
Replace a value in a string enclosed in single or double quotes.
_replace_choices_and_observables_in_formula(formula, optim_paras)
_replace_choices_and_observables_in_formula
Replace choices and observables in formula.
_convert_labels_in_filters_to_codes(optim_paras, options)
_convert_labels_in_filters_to_codes
Convert labels in “core_state_space_filters” to codes.
_parse_cache_directory(options)
_parse_cache_directory
Parse the location of the cache.
respy.pre_processing.model_processing.
This function is interface for parsing the model specification given by the user.
Instead of reusing the same seed, we use sequences of seeds incrementing by one. It ensures that we do not accidentally draw the same randomness twice.
As naive sequences started by the seeds given by the user might be overlapping, the user seeds are used to generate seeds within certain ranges. The seed for the
solution is between 1,000,000 and 2,000,000.
simulation is between 4,000,000 and 5,000,000.
likelihood estimation is between 7,000,000 and 8,000,000.
Furthermore, we need to sequences of seeds. The first sequence is for building simulate() or log_like() where “startup” seeds are used to generate the draws. The second sequence start at seed_start + SEED_STARTUP_ITERATION_GAP and has to be reset to the initial value at the beginning of every iteration.
simulate()
log_like()
seed_start + SEED_STARTUP_ITERATION_GAP
See Randomness and Reproducibility for more information.
Examples
>>> options = {"solution_seed": 1, "simulation_seed": 2, "estimation_seed": 3} >>> options = _create_internal_seeds_from_user_seeds(options) >>> options["solution_seed_startup"], options["solution_seed_iteration"] (count(1128037), count(2128037)) >>> options["simulation_seed_startup"], options["simulation_seed_iteration"] (count(4875688), count(5875688)) >>> options["estimation_seed_startup"], options["estimation_seed_iteration"] (count(7071530), count(8071530))
An example where present-bias parameter is specified:
>>> tuples = [("beta", "beta")] >>> index = pd.MultiIndex.from_tuples(tuples, names=["category", "name"]) >>> params = pd.Series(data=0.4, index=index) >>> optim_paras = {"delta": 0.95} >>> _parse_present_bias_parameter(optim_paras, params) {'delta': 0.95, 'beta': 0.4, 'beta_delta': 0.38}
And one where present-bias parameter is not specified:
>>> params = pd.Series(dtype="float64") >>> optim_paras = {"delta": 0.95} >>> _parse_present_bias_parameter(optim_paras, params) {'delta': 0.95, 'beta': 1, 'beta_delta': 0.95}
This function defines a unique order of choices. Choices can be separated in choices with experience and wage, with experience but without wage and without experience and wage. This distinction is used to create a unique ordering of choices. Within each group, we order alphabetically.
Measurement errors can be provided for all or none choices with wages. Measurement errors for non-wage choices are neglected.
optim_paras[“has_meas_error”] is only False if there are no standard deviations of measurement errors in params, not if they are all zero. Otherwise, we would introduce a kink into the likelihood function.
It is not explicitly enforced that all types have the same covariates, but it is implicitly enforced that the parameters form a valid matrix.
An example without types:
>>> tuples = [("wage_a", "constant"), ("nonpec_edu", "exp_edu")] >>> index = pd.MultiIndex.from_tuples(tuples, names=["category", "name"]) >>> s = pd.Series(index=index, dtype="object") >>> _infer_number_of_types(s) 1
And one with types:
>>> tuples = [("wage_a", "type_3"), ("nonpec_edu", "type_2")] >>> index = pd.MultiIndex.from_tuples(tuples, names=["category", "name"]) >>> s = pd.Series(index=index, dtype="object") >>> _infer_number_of_types(s) 4
>>> options = {"covariates": {"a": "exp_white_collar + exp_a", "b": "exp_b >= 2"}} >>> index = pd.MultiIndex.from_product([["category"], ["a", "b"]]) >>> params = pd.Series(index=index, dtype="object") >>> _infer_choices_with_experience(params, options) ['a', 'b', 'white_collar']
>>> params = pd.Series( ... index=["wage_b", "wage_white_collar", "wage_a", "nonpec_c"], dtype="object" ... ) >>> _infer_choices_with_prefix(params, "wage") ['a', 'b', 'white_collar']
Lagged choices can only influence behavior of individuals through covariates of the utility function. Thus, check the covariates for any patterns like “lagged_choice_[0-9]+”.
Then, compare the number of lags required by covariates with the information on lagged choices in the parameter specification. For the estimation, there does not have to be any information on lagged choices. For the simulation, we need parameters to define the probability of a choice being the lagged choice.
Warning
If not enough lagged choices are specified in params and the model can only be used for estimation.
If the model contains superfluous definitions of lagged choices.
Some parameters form a group to specify a distribution. The parameters can either be probabilities from a probability mass function. For example, see the specification of initial years of schooling in the extended model of Keane and Wolpin (1997).
On the other hand, parameters and their corresponding covariates can form the inputs of a scipy.special.softmax() which generates the probability mass function. This distribution can be more complex.
scipy.special.softmax()
Internally, probabilities are also converted to logit coefficients to align the interfaces. To convert probabilities to the appropriate multinomial logit (softmax) coefficients, use a constant for covariates and note that the sum in the denominator is equal for all probabilities and, thus, can be treated as a constant. The following formula shows that the multinomial coefficients which produce the same probability mass function are equal to the logs of probabilities.
ValueError
If probabilities and multinomial logit coefficients are mixed.
The user is warned if the discrete probabilities of a probability mass function do not sum to one.
The function accepts params and a keyword like observable and separates the name of the variables from its possible realizations.
pandas.Series
Contains the parameters of a model.
Keyword for a group of parameters.
>>> index = pd.MultiIndex.from_tuples([ ... ("observable_observable_0_first", "probability"), ... ("observable_observable_0_second", "probability"), ... ("observable_observable_1_first", "probability"), ... ("observable_observable_1_second", "probability"), ... ("observable_children_two_or_less", "probability"), ... ("observable_children_more_than_two", "probability"), ... ], names=["category", "name"]) >>> params = pd.Series(index=index, dtype="object") >>> _parse_observable_or_exog_process_names(params, "observable") ['children', 'observable_0', 'observable_1']
Since types only introduce constant shifts in the utility functions, this function conveniently adds covariates for each type by default.
>>> options = {"covariates": {}} >>> optim_paras = {"n_types": 2} >>> _add_type_covariates(options, optim_paras) {'covariates': {'type_1': 'type == 1'}}
This function adds negative choice set conditions based on maximum experience and no constraints for choices without experience.
Characteristics with labels are either choices or observables. Choices are ordered as in optim_paras["choices"] and observables alphabetically.
optim_paras["choices"]
Labels can either be in single or double quote strings which has to be checked.
Choices and levels of an observable can have string identifier which are replaced with their codes.
The filters are used to remove states from the state space which are inadmissible anyway.
A filter might look like this:
"lagged_choice_1 == '{choice_w_wage}' and exp_{choice_w_wage} == 0"
{choice_w_wage} is replaced by the actual choice name whereas ‘{choice_w_wage}’ or “{choice_w_wage}” is replaced with the internal choice code.
respy.pre_processing.model_checking
respy.pre_processing.process_covariates