{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Method of Simulated Moments Criterion" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:34:25.655353Z", "start_time": "2020-01-20T16:34:22.511828Z" }, "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "import pandas as pd \n", "import respy as rp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**respy** can construct a criterion function for estimation with the Method of Simulated Moments (MSM) (McFadden, 1989) that can easily be passed on to an optimizer for estimation. MSM estimation requires a number of calibration choices and **respy**'s interface is designed to allow users as much flexibility as possible when setting up a criterion function for estimation. This guide discusses the functions of the interface with focus on the different options for specifying inputs. For a concise overview of all functions, we refer users to the **respy** API." ] }, { "cell_type": "raw", "metadata": {}, "source": [ "\n", " API\n", "\n", " For all functions see respy API.\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introductory Example\n", "\n", "The following section discusses all the arguments of the interface's core function `get_moment_errors_func` in detail using an example model. The function processes all arguments needed for estimation such as the empirical moments and weighting matrix to construct a `functools.partial` which only requires the parameter vector as input and is thus ideal to use for optimization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `get_moment_errors_func` Arguments and Example Inputs" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2019-12-30T12:13:40.138047Z", "start_time": "2019-12-30T12:13:40.132338Z" } }, "source": [ "#### The `params` and `options` Arguments\n", "\n", "The first step to MSM estimation is the simulation of data using a specified model. **respy** simulates data using a vector of parameters `params`, which will be the variable of interest for estimation, and a set of `options` that help define the underlying model.\n", "\n", "**respy** provides a number of example models. For this tutorial we will be using the parameterization from Keane and Wolpin (1994)." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:34:49.938436Z", "start_time": "2020-01-20T16:34:25.657999Z" }, "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "params, options, df_emp = rp.get_example_model(\"kw_94_one\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The `calc_moments` Argument\n", "\n", "The `calc_moments` argument is the function that will be used to calculate moments from the simulated data. It can also be specified as a list or dictionary of multiple functions if different sets of moments should be calculated from different functions.\n", "\n", "In this case, we will calculate two sets of moments: choice frequencies and parameters that characterize the wage distribution. The moments are saved to a pandas.DataFrame with time periods as the index and the moments as columns." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:34:50.010547Z", "start_time": "2020-01-20T16:34:50.001417Z" }, "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "def calc_moments(df):\n", " choices = df.groupby(\"Period\").Choice.value_counts(normalize=True).unstack()\n", " choices.columns = choices.columns.astype(str)\n", " wages = df.groupby(\"Period\").Wage.describe()[['mean', 'std']]\n", " \n", " return pd.concat([choices, wages], axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The `replace_nans` Argument\n", "\n", "Next we define *replace_nans* is a function or list of functions that define how to handle missings in the data. It can be set to **None** if no replacements should be made." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:34:50.024161Z", "start_time": "2020-01-20T16:34:50.016012Z" }, "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "def fill_nans_zero(df):\n", " return df.fillna(0)" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2019-12-30T12:41:52.010937Z", "start_time": "2019-12-30T12:41:52.007437Z" } }, "source": [ "#### The `empirical_moments` Argument\n", "\n", "The empirical moments are the moments that are calculated from the observed data which the simulated moments should be matched to. The `empirical_moments` argument requires a `pandas.DataFrame` or `pandas.Series` as inputs. Alternatively, users can input a `list` or `dict` containing `pandas.DataFrames` or `pandas.Series` as items. It is necessary that `calc_moments`, `replace_nans` and `empirical_moments` correspond to each other i.e. `calc_moments` should output moments that are of the same structure as `empirical_moments`.\n", "\n", "For this example we calculate the empirical moments the same way that we calculate the simulated moments, so we can be sure that this condition is fulfilled. " ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:34:50.159003Z", "start_time": "2020-01-20T16:34:50.026807Z" }, "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "empirical_moments = calc_moments(df_emp)\n", "empirical_moments = fill_nans_zero(empirical_moments)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:34:50.180162Z", "start_time": "2020-01-20T16:34:50.162826Z" }, "pycharm": { "is_executing": false }, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abeduhomemeanstd
Period
00.4280.1070.4470.01816795.7706702763.572041
10.4760.1580.3190.04716273.7956113140.603821
20.4930.2290.2340.04416399.6906223281.481467
30.4810.2590.2200.04016719.9913733601.925045
40.4880.2790.1910.04217129.4908243717.262571
\n", "
" ], "text/plain": [ " a b edu home mean std\n", "Period \n", "0 0.428 0.107 0.447 0.018 16795.770670 2763.572041\n", "1 0.476 0.158 0.319 0.047 16273.795611 3140.603821\n", "2 0.493 0.229 0.234 0.044 16399.690622 3281.481467\n", "3 0.481 0.259 0.220 0.040 16719.991373 3601.925045\n", "4 0.488 0.279 0.191 0.042 17129.490824 3717.262571" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "empirical_moments.head()" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2019-12-30T12:13:28.165952Z", "start_time": "2019-12-30T12:13:28.163127Z" } }, "source": [ "#### The `weighting_matrix` Argument\n", "\n", "For the msm estimation, a weighting matrix has to be specified. `get_diag_weighting_matrix` allows users to create a diagonal weighting matrix that will match the moment vectors used for estimation. The required inputs are `empirical_moments` that are also used in `get_moment_errors_func` and a set of weights that are of the same form as `empirical_moments`. If no weights are specified, the function will return the identity matrix. " ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:34:50.204860Z", "start_time": "2020-01-20T16:34:50.185782Z" }, "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "weighting_matrix = rp.get_diag_weighting_matrix(empirical_moments)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:34:50.324845Z", "start_time": "2020-01-20T16:34:50.206916Z" }, "pycharm": { "is_executing": false }, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789...230231232233234235236237238239
01.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
10.01.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
20.00.01.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
30.00.00.01.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
40.00.00.00.01.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.0
..................................................................
2350.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.01.00.00.00.00.0
2360.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.01.00.00.00.0
2370.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.01.00.00.0
2380.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.01.00.0
2390.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.01.0
\n", "

240 rows × 240 columns

\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8 9 ... 230 231 232 \\\n", "0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 \n", "1 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 \n", "2 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 \n", "3 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 \n", ".. ... ... ... ... ... ... ... ... ... ... ... ... ... ... \n", "235 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 \n", "236 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 \n", "237 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 \n", "238 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 \n", "239 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 \n", "\n", " 233 234 235 236 237 238 239 \n", "0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", ".. ... ... ... ... ... ... ... \n", "235 0.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "236 0.0 0.0 0.0 1.0 0.0 0.0 0.0 \n", "237 0.0 0.0 0.0 0.0 1.0 0.0 0.0 \n", "238 0.0 0.0 0.0 0.0 0.0 1.0 0.0 \n", "239 0.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "\n", "[240 rows x 240 columns]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(weighting_matrix)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the user prefers to compute a weighting matrix manually, the respy function `get_flat_moments` may be of use. This function returns the empirical moments as an indexed `pandas.Series` which is the form they will be passed on to the loss function as. " ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:34:50.335487Z", "start_time": "2020-01-20T16:34:50.326872Z" } }, "outputs": [ { "data": { "text/plain": [ "0_a_0 0.428000\n", "0_a_1 0.476000\n", "0_a_2 0.493000\n", "0_a_3 0.481000\n", "0_a_4 0.488000\n", " ... \n", "0_std_35 12714.065528\n", "0_std_36 13302.883105\n", "0_std_37 13179.870462\n", "0_std_38 13537.023045\n", "0_std_39 13406.485526\n", "Length: 240, dtype: float64" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "flat_empirical_moments = rp.get_flat_moments(empirical_moments)\n", "flat_empirical_moments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The `n_simulation_periods` Argument\n", "\n", "The `n_simulation_periods` is part of the simulator that is constructed by **respy** in `get_moment_errors_func`. It dictates the number of periods in the simulated dataset and is not to be confused with `options[\"n_periods\"]` which controls the number of periods for which decision rules are computed. If the desired dataset needs to include only a subset of the total number of periods realized in the model, `n_simulation_periods` can be set to a value lower number of periods.\n", "\n", "This argument, if not needed, can be left out when specifying inputs. By default, the simulator will produce a dataset with the number of periods specified in `options[\"n_periods\"]`. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The `return_scalar` Argument\n", "\n", "The `return_scalar` argument can be used to return additional function out puts. If `return_scalar` is set to **True**, the function will only return the weighted square product of moment errors (this is also the default). If the argument is set to **False**, the function will instead return a dictionary that contains the scalar output, but also the root contributions that can be used to compute it, simulated moments that match the structure of the input empirical moments, and a pandas.DataFrame that can be used for visualization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### MSM Criterion Function\n", "We can now construct the criterion function for estimation. `get_moment_errors_func` will return a function that holds all elements but the `params` argument fixed and can thus easily be passed on to an optimizer. The function will return a value of 0 if we use the true parameter vector as input." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:35:10.064895Z", "start_time": "2020-01-20T16:34:50.337390Z" }, "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weighted_sum_squared_errors = rp.get_moment_errors_func(\n", " params=params, \n", " options=options, \n", " calc_moments=calc_moments, \n", " replace_nans = fill_nans_zero,\n", " empirical_moments=empirical_moments, \n", " weighting_matrix = weighting_matrix, \n", " return_scalar=True,\n", ")\n", "\n", "weighted_sum_squared_errors(params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using a different parameter vector will result in a value different from 0." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:35:10.074465Z", "start_time": "2020-01-20T16:35:10.067110Z" }, "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "params_sim = params.copy()\n", "params_sim.loc['delta', 'value'] = 0.8" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:35:27.681782Z", "start_time": "2020-01-20T16:35:10.077186Z" }, "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/plain": [ "3192548591.9774055" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weighted_sum_squared_errors(params_sim)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we set `return_scalar` to **False**, the function will return the a dictionary with more extensive information instead. " ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:35:48.001371Z", "start_time": "2020-01-20T16:35:27.685029Z" }, "pycharm": { "is_executing": false }, "scrolled": true }, "outputs": [], "source": [ "weighted_errors = rp.get_moment_errors_func(\n", " params=params_sim, \n", " options=options, \n", " calc_moments=calc_moments, \n", " replace_nans = fill_nans_zero,\n", " empirical_moments=empirical_moments, \n", " weighting_matrix = weighting_matrix, \n", " return_scalar=False\n", ")\n", "\n", "outputs=weighted_errors(params_sim)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['value', 'root_contributions', 'comparison_plot_data', 'simulated_moments'])" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "outputs.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Examples of the dictionary entries are shown below:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3192548591.9774055" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "outputs[\"value\"]" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0.178, 0.204, 0.196, 0.168, 0.165, 0.133, 0.072, 0.048,\n", " -0.006, 0.012])" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "outputs[\"root_contributions\"][0:10]" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
moment_columnmoment_indexvaluemoment_setkind
0a00.4280empirical
1a10.4760empirical
2a20.4930empirical
3a30.4810empirical
4a40.4880empirical
\n", "
" ], "text/plain": [ " moment_column moment_index value moment_set kind\n", "0 a 0 0.428 0 empirical\n", "1 a 1 0.476 0 empirical\n", "2 a 2 0.493 0 empirical\n", "3 a 3 0.481 0 empirical\n", "4 a 4 0.488 0 empirical" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "outputs[\"comparison_plot_data\"].head()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
abeduhomemeanstd
Period
00.2500.0130.00.73718656.7783222466.027332
10.2720.0090.00.71918471.2073702383.080064
20.2970.0120.00.69118540.2203262161.267108
30.3130.0110.00.67618750.9790032491.027312
40.3230.0090.00.66818891.7614672626.142919
\n", "
" ], "text/plain": [ " a b edu home mean std\n", "Period \n", "0 0.250 0.013 0.0 0.737 18656.778322 2466.027332\n", "1 0.272 0.009 0.0 0.719 18471.207370 2383.080064\n", "2 0.297 0.012 0.0 0.691 18540.220326 2161.267108\n", "3 0.313 0.011 0.0 0.676 18750.979003 2491.027312\n", "4 0.323 0.009 0.0 0.668 18891.761467 2626.142919" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "outputs[\"simulated_moments\"].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inputs as Lists or Dictionaries\n", "\n", "In the example above we used single elements for all inputs i.e. we used one function to calculate moments, one function to replace missing moments and saved all sets of moments in a single pandas.DataFrame. This works well for the example at hand because the inputs are relatively simple, but other applications might require more flexibility. `get_moment_errors_func` thus alternatively accepts inputs of type `list` and `dict`. This way, different sets of moments can be stored separately. Using lists or dictionaries also allows the use of different replacement functions for different moments. \n", "\n", "For the sake of this example, we add another set of moments to the estimation. In addition to the choice frequencies and wage distribution, we include the final education of agents. Here, the index is given by the educational experience agents have accumulated in period 39. The moments are given by the frequency of each level of experience in the dataset. Since this set of moments is not grouped by period, it cannot be saved to a `pandas.DataFrame` with the other moments. We hence give each set of moments its own function and save them to a `list`. The choice frequencies and wage distribution are saved to a `pandas.DataFrame` with multiple columns, the final education is given by a `pandas.Series`.\n", "\n", "Instead of lists, the functions and moments may also be saved to a `dict`. Internally, all inputs are converted into dictionaries, so input types should not be mixed." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:35:48.027530Z", "start_time": "2020-01-20T16:35:48.010926Z" }, "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "def calc_choice_freq(df):\n", " return df.groupby(\"Period\").Choice.value_counts(normalize=True).unstack()\n", "\n", "def calc_wage_distr(df):\n", " return df.groupby(['Period'])['Wage'].describe()[['mean', 'std']]\n", "\n", "def calc_final_edu(df):\n", " last_period = max(df.index.get_level_values(1))\n", " return df.xs(last_period, level=1).Experience_Edu.value_counts(normalize=True,sort=False)\n", "\n", "calc_moments = [calc_choice_freq, calc_wage_distr, calc_final_edu]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can additionally specify different replacement functions for each set of moments and save them to a list just like `calc_moments`. However, here we will use the same replacement function for all moments and thus just need to specify one. **respy** will automatically apply this function to all sets of moments.\n", "\n", "Note that this only works if only one replacement function is given. Otherwise `replace_nans` must be a list of the same length as `calc_moments` with each replacement function holding the same position as the moment function it corresponds to. In the case of dictionaries, replacement functions should be saved with the same keys as set of moments they correspond to." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:35:48.042741Z", "start_time": "2020-01-20T16:35:48.030370Z" }, "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "def fill_nans_zero(df):\n", " return df.fillna(0)" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2020-01-12T09:47:28.537388Z", "start_time": "2020-01-12T09:47:28.529692Z" } }, "source": [ "We now calculate the `empirical_moments`. They are saved to a list as well. We can calculate the `weighting_matrix` as before." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:36:09.205272Z", "start_time": "2020-01-20T16:35:48.045669Z" }, "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "params, options, df = rp.get_example_model(\"kw_94_one\")\n", "empirical_moments = [calc_choice_freq(df), calc_wage_distr(df), calc_final_edu(df)]\n", "empirical_moments = [fill_nans_zero(df) for df in empirical_moments]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:36:09.219941Z", "start_time": "2020-01-20T16:36:09.207328Z" }, "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "weighting_matrix = rp.get_diag_weighting_matrix(empirical_moments)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can construct the MSM criterion from the defined inputs." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:36:29.989040Z", "start_time": "2020-01-20T16:36:09.222191Z" }, "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weighted_sum_squared_errors = rp.get_moment_errors_func(\n", " params=params, \n", " options=options, \n", " calc_moments=calc_moments, \n", " replace_nans = fill_nans_zero,\n", " empirical_moments=empirical_moments, \n", " weighting_matrix = weighting_matrix, \n", " return_scalar=True\n", ")\n", "\n", "weighted_sum_squared_errors(params)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result for the simulated moments slightly deviates from the introductory example because we added an additional set of moments." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2020-01-20T16:36:49.185204Z", "start_time": "2020-01-20T16:36:29.991864Z" }, "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/plain": [ "3192548592.3866115" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weighted_sum_squared_errors(params_sim)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "- Keane, M. P. and Wolpin, K. I. (1994). [The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo Evidence](https://doi.org/10.2307/2109768). *The Review of Economics and Statistics*, 76(4): 648-672.\n", "\n", "- McFadden, D. (1989). [A Method of Simulated Moments for Estimation of Discrete Response Models without Numerical Integration](https://jstor.org/stable/1913621). *Econometrica*, 995-1026.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "308.8px" }, "toc_section_display": true, "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }