View and download the notebook here!
[1]:
import pandas as pd import respy as rp
This notebook contains a step by step tutorial to simulated method of moments estimation using respy.
Respy can construct a msm function using get_msm_func. The function requires the following arguments:
get_msm_func
params (pandas.DataFrame)
options (dict)
calc_moments (callable, list, dict)
replace_nans (callable, list, dict)
empirical_moments (pandas.DataFrame, pandas.Series, list, dict)
weighting_matrix (numpy.ndarray)
n_simulation_periods (int, default None)
return_scalar (bool, default True)
return_simulated_moments (bool, default False)
return_comparison_plot_data (bool, default False)
get_msm_func returns a function where all arguments except params are held fixed. The returned function can then easily be passed on to an optimizer for estimation.
The following section discusses all the arguments in detail using an example model.
The first step to msm estimation is the simulation of data using a specified model. Respy simulates data using a vector of parameters params, which will be the variable of interest for estimation, and a set of options that help define the underlying model.
Respy provides a number of example models. For this tutorial we will be using the parameterization from Keane and Wolpin (1994).
[2]:
params, options, df_emp = rp.get_example_model("kw_94_one")
[3]:
params
[4]:
options
{'estimation_draws': 200, 'estimation_seed': 500, 'estimation_tau': 500, 'interpolation_points': -1, 'n_periods': 40, 'simulation_agents': 1000, 'simulation_seed': 132, 'solution_draws': 500, 'solution_seed': 3, 'monte_carlo_sequence': 'random', 'core_state_space_filters': ["period > 0 and exp_{i} == period and lagged_choice_1 != '{i}'", "period > 0 and exp_a + exp_b + exp_edu == period and lagged_choice_1 == '{j}'", "period > 0 and lagged_choice_1 == 'edu' and exp_edu == 0", "lagged_choice_1 == '{k}' and exp_{k} == 0", "period == 0 and lagged_choice_1 == '{k}'"], 'covariates': {'constant': '1', 'exp_a_square': 'exp_a ** 2', 'exp_b_square': 'exp_b ** 2', 'at_least_twelve_exp_edu': 'exp_edu >= 12', 'not_edu_last_period': "lagged_choice_1 != 'edu'"}}
The calc_moments argument is the function that will be used to calculate moments from the simulated data. It can also be specified as a list or dictionary of multiple functions if different sets of moments should be calculated from different functions.
In this case, we will calculate two sets of moments: choice frequencies and parameters that characterize the wage distribution. The moments are saved to a pandas.DataFrame with time periods as the index and the moments as columns.
[5]:
def calc_moments(df): choices = df.groupby("Period").Choice.value_counts(normalize=True).unstack() wages = df.groupby(['Period'])['Wage'].describe()[['mean', 'std']] return pd.concat([choices, wages], axis=1)
Next we define replace_nans is a function or list of functions that define how to handle missings in the data.
[6]:
def fill_nans_zero(df): return df.fillna(0)
The empirical moments are the moments that are calculated from the observed data which the simulated moments should be matched to. The empirical_moments argument requires a pandas.DataFrame or pandas.Series as inputs. Alternatively, users can input lists or dictionaries containing DataFrames or Series as items. It is necessary that calc_moments, replace_nans and empirical_moments correspond to each other i.e. calc_moments should output moments that are of the same structure as empirical_moments.
For this example we calculate the empirical moments the same way that we calculate the simulated moments, so we can be sure that this condition is fulfilled.
[7]:
empirical_moments = calc_moments(df_emp) empirical_moments = fill_nans_zero(empirical_moments)
[8]:
empirical_moments.head()
For the msm estimation, the user has to define a weighting matrix. get_diag_weighting_matrix allows users to create a diagonal weighting matrix that will match the moment vectors used for estimation. The required inputs are empirical_moments that are also used in get_msm_func and a set of weights that are of the same form as empirical_moments. If no weights are specified, the function will return the identity matrix.
get_diag_weighting_matrix
[9]:
weighting_matrix = rp.get_diag_weighting_matrix(empirical_moments)
[10]:
pd.DataFrame(weighting_matrix)
240 rows × 240 columns
If the user prefers to compute a weighting matrix manually, the respy function get_flat_moments may be of use. This function returns the empirical moments as an indexed pandas.Series which is the form they will be passed on to the loss function as.
get_flat_moments
[11]:
flat_empirical_moments = rp.get_flat_moments(empirical_moments) flat_empirical_moments
a_0 0.409000 a_1 0.435000 a_2 0.460000 a_3 0.445000 a_4 0.423000 ... std_35 12756.763553 std_36 12146.297978 std_37 13293.128148 std_38 13236.160871 std_39 13933.439538 Length: 240, dtype: float64
The n_simulation_periods is part of the simulator that is constructed by respy in get_msm_func. It dictates the number of periods in the simulated dataset and is not to be confused with options["n_periods"] which controls the number of periods for which decision rules are computed. If the desired dataset needs to include only a subset of the total number of periods realized in the model, n_simulation_periods can be set to a value lower number of periods.
options["n_periods"]
This argument, if not needed, can be left out when specifying inputs. By default, the simulator will produce a dataset with the number of periods specified in options["n_periods"].
The return_scalar argument allows us to return the moment errors in vector form. get_msm_func will return the moment error vector if return_scalar is set to False and will return the value of the weighted square product of the moment errors if return_scalar is set to True which is also the default.
If return_simulated_moments is set to True, the msm function will return the simulated in addition to the function value or moment error vector. This can be useful for visualization purposes and to compare simulated moments to empirical moments for a given parameterization. Its default value is False.
Similarly to return_simulated_moments, the return_comparison_plot_data argument can be used to return the empirical and simulated moments with other function output. Instead of lists, the data will be saved to one pandas.DataFrame that is designed to compliment the visualization capabilities of estimagic. It is False by default and acts as a substitute for return_simulated_moments, meaning only one can be set to True at the same time.
We can now compute the msm function. The function is constructed using get_msm_func. Adding all arguments to get_msm_func will return a function that holds all elements but the params argument fixed and can thus easily be passed on to an optimizer. The function will return a value of 0 if we use the true parameter vector as input.
[12]:
msm = rp.get_msm_func( params=params, options=options, calc_moments=calc_moments, replace_nans = fill_nans_zero, empirical_moments=empirical_moments, weighting_matrix = weighting_matrix, return_scalar=True, ) msm(params)
0.0
Using a different parameter vector will result in a value different from 0.
[13]:
params_sim = params.copy() params_sim.loc['delta', 'value'] = 0.8
[14]:
msm(params_sim)
3603224859.124363
If we set return_scalar to False, the function will return the vector of moment errors instead.
[15]:
msm_vector = rp.get_msm_func( params=params_sim, options=options, calc_moments=calc_moments, replace_nans = fill_nans_zero, empirical_moments=empirical_moments, weighting_matrix = weighting_matrix, return_scalar=False ) moment_errors = msm_vector(params_sim) moment_errors
a_0 0.130000 a_1 0.152000 a_2 0.151000 a_3 0.136000 a_4 0.095000 ... std_35 8112.352899 std_36 7292.742834 std_37 8414.066029 std_38 8509.249336 std_39 9398.821013 Length: 240, dtype: float64
In the example above we used single elements for all inputs i.e. we used one function to calculate moments, one function to replace missing moments and saved all sets of moments in a single pandas.DataFrame. This works well for the example at hand because the inputs are relatively simple, but other applications might require more flexibility. get_msm_func thus alternatively accepts lists and dictionaries as inputs. This way, different sets of moments can be stored separately. Using lists or dictionaries also allows the use of different replacement functions for different moments.
For the sake of this example, we add another set of moments to the estimation. In addition to the choice frequencies and wage distribution, we include the final education of agents. Here, the index is given by the educational experience agents have accumulated in period 39. The moments are given by the frequency of each level of experience in the dataset. Since this set of moments is not grouped by period, it cannot be saved to a DataFrame with the other moments. We hence give each set of moments its own function and save them to a list. The choice frequencies and wage distribution are saved to a pandas.DataFrame with multiple columns, the final education is given by a pandas.Series.
Instead of lists, the functions and moments may also be saved to a dictionary. Dictionaries will be sorted according to keys before being passed on the loss function. Using dictionaries therefore has the advantage that the user does not have to pay attention to storing items in the correct order as with lists, where inputs are matched according to position. For the same reason it is not recommended to mix lists and dictionaries as inputs.
[16]:
def calc_choice_freq(df): return df.groupby("Period").Choice.value_counts(normalize=True).unstack() def calc_wage_distr(df): return df.groupby(['Period'])['Wage'].describe()[['mean', 'std']] def calc_final_edu(df): last_period = max(df.index.get_level_values(1)) return df.xs(last_period, level=1).Experience_Edu.value_counts(normalize=True,sort=False) calc_moments = [calc_choice_freq, calc_wage_distr, calc_final_edu]
We can additionally specify different replacement functions for each set of moments and save them to a list just like calc_moments. However, here we will use the same replacement function for all moments and thus just need to specify one. Respy will automatically apply this function to all sets of moments.
Note that this only works if only one replacement function is given. Otherwise replace_nans must be a list of the same length as calc_moments with each replacement function holding the same position as the moment function it corresponds to. In the case of dictionaries, replacement functions should be saved with the same keys as set of moments they correspond to.
[17]:
def fill_nans_zero(df): return df.fillna(0) replace_nans = [fill_nans_zero]
We now calculate the empirical_moments. They are saved to a list as well. We can calculate the weighting_matrix as before.
[18]:
params, options, df = rp.get_example_model("kw_94_one") empirical_moments = [calc_choice_freq(df), calc_wage_distr(df), calc_final_edu(df)] empirical_moments = [fill_nans_zero(df) for df in empirical_moments]
[19]:
Finally, we can construct the msm function from the defined inputs.
[20]:
msm = rp.get_msm_func( params=params, options=options, calc_moments=calc_moments, replace_nans = replace_nans, empirical_moments=empirical_moments, weighting_matrix = weighting_matrix, return_scalar=True ) msm(params)
The result for the simulated moments slightly deviates from the introductory example because we added an additional set of moments.
[21]:
3603224859.597081
[22]:
msm_vector = rp.get_msm_func( params=params, options=options, calc_moments=calc_moments, replace_nans = replace_nans, empirical_moments=empirical_moments, weighting_matrix = weighting_matrix, return_scalar=False ) moment_errors = msm_vector(params_sim) moment_errors
a_0 0.130 a_1 0.152 a_2 0.151 a_3 0.136 a_4 0.095 ... Experience_Edu_16 0.075 Experience_Edu_17 0.059 Experience_Edu_18 0.051 Experience_Edu_19 0.020 Experience_Edu_20 0.010 Length: 251, dtype: float64
Keane, M. P. and Wolpin, K. I. (1994). The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo Evidence. The Review of Economics and Statistics, 76(4): 648-672.
McFadden, D. (1989). A Method of Simulated Moments for Estimation of Discrete Response Models without Numerical Integration. Econometrica: Journal of the Econometric Society, 995-1026.