respy.simulate
Everything related to the simulation of data with structural models.
get_simulate_func(params, options, method='n_step_ahead_with_sampling', df=None, n_simulation_periods=None)
get_simulate_func
Get the simulation function.
simulate(params, base_draws_sim, base_draws_wage, df, method, n_simulation_periods, solve, options)
simulate
Perform a simulation.
_extend_data_with_sampled_characteristics(df, optim_paras, options)
_extend_data_with_sampled_characteristics
Sample initial observations from initial conditions.
_simulate_single_period(df, choice_set, wages, nonpecs, continuation_values, optim_paras)
_simulate_single_period
Simulate individuals in a single period.
_sample_characteristic(states_df, options, level_dict, use_keys)
_sample_characteristic
Sample characteristic of individuals.
_convert_codes_to_original_labels(df, optim_paras)
_convert_codes_to_original_labels
Convert codes in choice-related and observed variables to labels.
_process_simulation_output(data, optim_paras)
_process_simulation_output
Create simulated data.
_random_choice(choices, probabilities=None, decimals=5)
_random_choice
Return elements of choices for a two-dimensional array of probabilities.
_apply_law_of_motion(df, optim_paras)
_apply_law_of_motion
Apply the law of motion to get the states in the next period.
_harmonize_simulation_arguments(method, df, n_simulation_periods, options)
_harmonize_simulation_arguments
Harmonize the arguments of the simulation.
_process_input_df_for_simulation(df, method, options, optim_paras)
_process_input_df_for_simulation
Process a pandas.DataFrame provided by the user for the simulation.
pandas.DataFrame
respy.simulate.
Return simulate() where all arguments except the parameter vector are fixed with functools.partial(). Thus, the function can be directly passed into an optimizer for estimation with simulated method of moments or other techniques.
simulate()
functools.partial()
DataFrame containing model parameters.
dict
Dictionary containing model options.
The simulation method which can be one of three and is explained in more detail in simulate().
None
DataFrame containing one or multiple observations per individual.
int
Simulate data for a number of periods. This options does not affect options["n_periods"] which controls the number of periods for which decision rules are computed.
options["n_periods"]
Simulation function where all arguments except the parameter vector are set.
This function performs one of three possible simulation exercises. The type of the simulation is controlled by method in get_simulate_func(). Ordered from no data to panel data on individuals, there is:
method
get_simulate_func()
n-step-ahead simulation with sampling: The first observation of an individual is sampled from the initial conditions, i.e., the distribution of observed variables or initial experiences, etc. in the first period. Then, the individuals are guided for n periods by the decision rules from the solution of the model.
n
n-step-ahead simulation with data: Instead of sampling individuals from the initial conditions, take the first observation of each individual in the data. Then, do as in 1..
one-step-ahead simulation: Take the complete data and find for each observation the corresponding outcomes, e.g, choices and wages, using the decision rules from the model solution.
pandas.Series
Contains parameters.
numpy.ndarray
Array with shape (n_periods, n_individuals, n_choices) to provide a unique set of shocks for each individual in each period.
Array with shape (n_periods, n_individuals, n_choices) to provide a unique set of wage measurement errors for each individual in each period.
Can be one three objects:
None if no data is provided. This triggers sampling from initial conditions and a n-step-ahead simulation.
pandas.DataFrame containing panel data on individuals which triggers a one-step-ahead simulation.
pandas.DataFrame containing only first observations which triggers a n-step-ahead simulation taking the data as initial conditions.
str
The simulation method.
Number periods to simulate.
solve()
Function which creates the solution of the model with new parameters.
Contains model options.
DataFrame of simulated individuals.
The function iterates over all state space dimensions and replaces NaNs with values sampled from initial conditions. In the case of an n-step-ahead simulation with sampling all state space dimensions are sampled. For the other two simulation methods, potential NaNs in the data are replaced with sampled characteristics.
Characteristics are sampled regardless of the simulation type which keeps randomness across the types constant.
A pandas DataFrame which contains only an index for n-step-ahead simulation with sampling. For the other simulation methods, it contains information on individuals which is allowed to have missing information in the first period.
A pandas DataFrame with no missing values.
The function performs the following sets:
Map individuals in one period to the states in the model.
Simulate choices and wages for those individuals.
Store additional information in a pandas.DataFrame and return it.
Until now this function assumes that there are no mixed constraints. See docs for more information!
The function is used to sample the values of one state space characteristic, say experience. The keys of level_dict are the possible starting values of experience. The values of the dictionary are pandas.Series whose index are covariate names and the values are the parameter values.
level_dict
states_df is used to generate all possible covariates with the existing information.
states_df
For each level, the dot product of parameters and covariates determines the value z. The softmax function converts the level-specific z-values to probabilities. The probabilities are used to sample the characteristic.
z
Contains the state of each individual.
Options of the model.
A dictionary where the keys are the values distributed according to the probability mass function. The values are a pandas.Series with covariate names as the index and parameter values.
Identifier for whether the keys of the level dict should be used as variables values or use numeric codes instead. For example, assign numbers to choices.
Array with shape (n_individuals,) containing sampled values.
This function takes an array of simulated outcomes and additional information for each period and stacks them together to one DataFrame.
list
List of DataFrames for each simulated period with internal codes and labels.
DataFrame with simulated data.
It is assumed that probabilities are ordered (n_samples, n_choices).
The function is taken from this StackOverflow post as a workaround for numpy.random.choice() as it can only handle one-dimensional probabilities.
numpy.random.choice()
Examples
Here is an example with non-zero probabilities.
>>> n_samples = 100_000 >>> n_choices = 3 >>> p = np.array([0.15, 0.35, 0.5]) >>> ps = np.tile(p, (n_samples, 1)) >>> choices = _random_choice(n_choices, ps) >>> np.round(np.bincount(choices), decimals=-3) / n_samples array([0.15, 0.35, 0.5 ])
Here is an example where one choice has probability zero.
>>> choices = np.arange(3) >>> p = np.array([0.4, 0, 0.6]) >>> ps = np.tile(p, (n_samples, 1)) >>> choices = _random_choice(3, ps) >>> np.round(np.bincount(choices), decimals=-3) / n_samples array([0.4, 0. , 0.6])
For n-step-ahead simulations, the states of the next period are generated from the current states and the current decision. This function changes experiences and previous choices according to the choice in the current period, to get the states of the next period.
We implicitly assume that observed variables are constant.
The DataFrame contains the simulated information of individuals in one period.
The DataFrame contains the states of individuals in the next period.
This function handles the interaction of the four inputs and aligns the number of simulated individuals and the number of simulated periods.
respy.shared
respy.solve