respy.simulate
#
Everything related to the simulation of data with structural models.
Module Contents#
Functions#
|
Get the simulation function. |
|
Perform a simulation. |
|
Update dense variable, if exogenous process. |
|
Update the value of the exogenous processes. |
Sample initial observations from initial conditions. |
|
|
Simulate individuals in a single period. |
|
For exogenous processes draw the dense key for next period. |
|
Sample characteristic of individuals. |
|
Convert codes in choice-related and observed variables to labels. |
|
Create simulated data. |
|
Return elements of choices for a two-dimensional array of probabilities. |
|
Harmonize the arguments of the simulation. |
|
Process a |
- respy.simulate.get_simulate_func(params, options, method='n_step_ahead_with_sampling', df=None, n_simulation_periods=None)[source]#
Get the simulation function.
Return
simulate()
where all arguments except the parameter vector are fixed withfunctools.partial()
. Thus, the function can be directly passed into an optimizer for estimation with simulated method of moments or other techniques.- Parameters:
- params
pandas.DataFrame
DataFrame containing model parameters.
- options
dict
Dictionary containing model options.
- method{“n_step_ahead_with_sampling”, “n_step_ahead_with_data”, “one_step_ahead”}
The simulation method which can be one of three and is explained in more detail in
simulate()
.- df
pandas.DataFrame
orNone
, defaultNone
DataFrame containing one or multiple observations per individual.
- n_simulation_periods
int
orNone
, defaultNone
Simulate data for a number of periods. This options does not affect
options["n_periods"]
which controls the number of periods for which decision rules are computed.
- params
- Returns:
- simulate_function
simulate()
Simulation function where all arguments except the parameter vector are set.
- simulate_function
Examples
>>> import respy as rp >>> params, options = rp.get_example_model("robinson_crusoe_basic", with_data=False) >>> simulate = rp.get_simulate_func(params, options) >>> data = simulate(params)
- respy.simulate.simulate(params, base_draws_sim, base_draws_wage, df, method, n_simulation_periods, solve, options)[source]#
Perform a simulation.
This function performs one of three possible simulation exercises. The type of the simulation is controlled by
method
inget_simulate_func()
. Ordered from no data to panel data on individuals, there is:n-step-ahead simulation with sampling: The first observation of an individual is sampled from the initial conditions, i.e., the distribution of observed variables or initial experiences, etc. in the first period. Then, the individuals are guided for
n
periods by the decision rules from the solution of the model.n-step-ahead simulation with data: Instead of sampling individuals from the initial conditions, take the first observation of each individual in the data. Then, do as in 1..
one-step-ahead simulation: Take the complete data and find for each observation the corresponding outcomes, e.g, choices and wages, using the decision rules from the model solution.
- Parameters:
- params
pandas.DataFrame
orpandas.Series
Contains parameters.
- base_draws_sim
numpy.ndarray
Array with shape (n_periods, n_individuals, n_choices) to provide a unique set of shocks for each individual in each period.
- base_draws_wage
numpy.ndarray
Array with shape (n_periods, n_individuals, n_choices) to provide a unique set of wage measurement errors for each individual in each period.
- df
pandas.DataFrame
orNone
Can be one three objects:
None
if no data is provided. This triggers sampling from initial conditions and a n-step-ahead simulation.pandas.DataFrame
containing panel data on individuals which triggers a one-step-ahead simulation.pandas.DataFrame
containing only first observations which triggers a n-step-ahead simulation taking the data as initial conditions.
- method
str
The simulation method.
- n_simulation_periods
int
Number periods to simulate.
- solve
solve()
Function which creates the solution of the model with new parameters.
- options
dict
Contains model options.
- params
- Returns:
- simulated_data
pandas.DataFrame
DataFrame of simulated individuals.
- simulated_data
- respy.simulate.apply_law_of_motion_for_dense(df, state_space, optim_paras)[source]#
Update dense variable, if exogenous process.
- Parameters:
- df
pandas.DataFrame
A pandas DataFrame containing the updated state variables, as well as the draw of next periods dense key.
- state_space
- optim_paras
- df
- Returns:
- df
pandas.DataFrame
A pandas DataFrame containing the updated state variables and the updated exogenous process.
- df
- respy.simulate.update_dense_state_variables(df, dense_key_to_dense_covariates, optim_paras)[source]#
Update the value of the exogenous processes.
- Parameters:
- df
pandas.DataFrame
A pandas DataFrame containing the updated state variables, as well as the draw of next periods dense key.
- dense_key_to_dense_covariates
dict
Dictionary with dense_key as keys and dense grid points.
- optim_paras
dict
- df
- Returns:
- df
pandas.DataFrame
A pandas DataFrame containing the updated state variables and the updated exogenous process.
- df
- respy.simulate._extend_data_with_sampled_characteristics(df, optim_paras, options)[source]#
Sample initial observations from initial conditions.
The function iterates over all state space dimensions and replaces NaNs with values sampled from initial conditions. In the case of an n-step-ahead simulation with sampling all state space dimensions are sampled. For the other two simulation methods, potential NaNs in the data are replaced with sampled characteristics.
Characteristics are sampled regardless of the simulation type which keeps randomness across the types constant.
- Parameters:
- df
pandas.DataFrame
A pandas DataFrame which contains only an index for n-step-ahead simulation with sampling. For the other simulation methods, it contains information on individuals which is allowed to have missing information in the first period.
- optim_paras
dict
- options
dict
- df
- Returns:
- df
pandas.DataFrame
A pandas DataFrame with no missing values.
- df
- respy.simulate._simulate_single_period(df, complex_tuple, wages, nonpecs, continuation_values, optim_paras, options)[source]#
Simulate individuals in a single period.
The function performs the following sets:
Map individuals in one period to the states in the model.
Simulate choices and wages for those individuals.
Store additional information in a
pandas.DataFrame
and return it.
Until now this function assumes that there are no mixed constraints. See docs for more information!
- respy.simulate.draw_dense_key_next_period(complex_tuple, core_index, options)[source]#
For exogenous processes draw the dense key for next period.
- Parameters:
- complex_tuple
- core_index
- options
- Returns:
- dense_key_next_periodpd:Series
A pandas Series containing the dense keys in the next period for all keys.
- respy.simulate._sample_characteristic(states_df, options, level_dict, use_keys)[source]#
Sample characteristic of individuals.
The function is used to sample the values of one state space characteristic, say experience. The keys of
level_dict
are the possible starting values of experience. The values of the dictionary arepandas.Series
whose index are covariate names and the values are the parameter values.states_df
is used to generate all possible covariates with the existing information.For each level, the dot product of parameters and covariates determines the value
z
. The softmax function converts the level-specificz
-values to probabilities. The probabilities are used to sample the characteristic.- Parameters:
- states_df
pandas.DataFrame
Contains the state of each individual.
- options
dict
Options of the model.
- level_dict
dict
A dictionary where the keys are the values distributed according to the probability mass function. The values are a
pandas.Series
with covariate names as the index and parameter values.- use_keysbool
Identifier for whether the keys of the level dict should be used as variables values or use numeric codes instead. For example, assign numbers to choices.
- states_df
- Returns:
- characteristic
numpy.ndarray
Array with shape (n_individuals,) containing sampled values.
- characteristic
- respy.simulate._convert_codes_to_original_labels(df, optim_paras)[source]#
Convert codes in choice-related and observed variables to labels.
- respy.simulate._process_simulation_output(data, optim_paras)[source]#
Create simulated data.
This function takes an array of simulated outcomes and additional information for each period and stacks them together to one DataFrame.
- Parameters:
- Returns:
- df
pandas.DataFrame
DataFrame with simulated data.
- df
- respy.simulate._random_choice(choices, probabilities=None, decimals=5)[source]#
Return elements of choices for a two-dimensional array of probabilities.
It is assumed that probabilities are ordered (n_samples, n_choices).
The function is taken from this StackOverflow post as a workaround for
numpy.random.choice()
as it can only handle one-dimensional probabilities.Examples
Here is an example with non-zero probabilities.
>>> n_samples = 100_000 >>> n_choices = 3 >>> p = np.array([0.15, 0.35, 0.5]) >>> ps = np.tile(p, (n_samples, 1)) >>> choices = _random_choice(n_choices, ps) >>> np.round(np.bincount(choices), decimals=-3) / n_samples array([0.15, 0.35, 0.5 ])
Here is an example where one choice has probability zero.
>>> choices = np.arange(3) >>> p = np.array([0.4, 0, 0.6]) >>> ps = np.tile(p, (n_samples, 1)) >>> choices = _random_choice(3, ps) >>> np.round(np.bincount(choices), decimals=-3) / n_samples array([0.4, 0. , 0.6])
- respy.simulate._harmonize_simulation_arguments(method, df, n_simulation_periods, options)[source]#
Harmonize the arguments of the simulation.
This function handles the interaction of the four inputs and aligns the number of simulated individuals and the number of simulated periods.
- respy.simulate._process_input_df_for_simulation(df, method, options, optim_paras)[source]#
Process a
pandas.DataFrame
provided by the user for the simulation.