{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Method of Simulated Moments Criterion"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:34:25.655353Z",
     "start_time": "2020-01-20T16:34:22.511828Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd \n",
    "import respy as rp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**respy** can construct a criterion function for estimation with the Method of Simulated Moments (MSM) (McFadden, 1989) that can easily be passed on to an optimizer for estimation. MSM estimation requires a number of calibration choices and **respy**'s interface is designed to allow users as much flexibility as possible when setting up a criterion function for estimation. This guide discusses the functions of the interface with focus on the different options for specifying inputs. For a concise overview of all functions, we refer users to the **respy** API."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "<divto class=\"d-flex flex-row gs-torefguide\">\n",
    "    <span class=\"badge badge-info\">API</span>\n",
    "\n",
    "    For all functions see <a\n",
    "    href=\"../autoapi/index.html\">respy API</a>.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introductory Example\n",
    "\n",
    "The following section discusses all the arguments of the interface's core function `get_moment_errors_func` in detail using an example model. The function processes all arguments needed for estimation such as the empirical moments and weighting matrix to construct a `functools.partial` which only requires the parameter vector as input and is thus ideal to use for optimization."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `get_moment_errors_func` Arguments and Example Inputs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-30T12:13:40.138047Z",
     "start_time": "2019-12-30T12:13:40.132338Z"
    }
   },
   "source": [
    "#### The `params` and `options` Arguments\n",
    "\n",
    "The first step to MSM estimation is the simulation of data using a specified model. **respy** simulates data using a vector of parameters `params`, which will be the variable of interest for estimation, and a set of `options` that help define the underlying model.\n",
    "\n",
    "**respy** provides a number of example models. For this tutorial we will be using the parameterization from Keane and Wolpin (1994)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:34:49.938436Z",
     "start_time": "2020-01-20T16:34:25.657999Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "params, options, df_emp = rp.get_example_model(\"kw_94_one\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The `calc_moments` Argument\n",
    "\n",
    "The `calc_moments` argument is the function that will be used to calculate moments from the simulated data. It can also be specified as a list or dictionary of multiple functions if different sets of moments should be calculated from different functions.\n",
    "\n",
    "In this case, we will calculate two sets of moments: choice frequencies and parameters that characterize the wage distribution. The moments are saved to a pandas.DataFrame with time periods as the index and the moments as columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:34:50.010547Z",
     "start_time": "2020-01-20T16:34:50.001417Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "def calc_moments(df):\n",
    "    choices = df.groupby(\"Period\").Choice.value_counts(normalize=True).unstack()\n",
    "    choices.columns = choices.columns.astype(str)\n",
    "    wages = df.groupby(\"Period\").Wage.describe()[['mean', 'std']]\n",
    "    \n",
    "    return pd.concat([choices, wages], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  The `replace_nans` Argument\n",
    "\n",
    "Next we define *replace_nans* is a function or list of functions that define how to handle missings in the data. It can be set to **None** if no replacements should be made."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:34:50.024161Z",
     "start_time": "2020-01-20T16:34:50.016012Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "def fill_nans_zero(df):\n",
    "    return df.fillna(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-30T12:41:52.010937Z",
     "start_time": "2019-12-30T12:41:52.007437Z"
    }
   },
   "source": [
    "#### The `empirical_moments` Argument\n",
    "\n",
    "The empirical moments are the moments that are calculated from the observed data which the simulated moments should be matched to. The `empirical_moments` argument requires a `pandas.DataFrame` or `pandas.Series` as inputs. Alternatively, users can input a `list` or `dict` containing `pandas.DataFrames` or `pandas.Series` as items. It is necessary that `calc_moments`, `replace_nans` and `empirical_moments` correspond to each other i.e. `calc_moments` should output moments that are of the same structure as `empirical_moments`.\n",
    "\n",
    "For this example we calculate the empirical moments the same way that we calculate the simulated moments, so we can be sure that this condition is fulfilled. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:34:50.159003Z",
     "start_time": "2020-01-20T16:34:50.026807Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "empirical_moments = calc_moments(df_emp)\n",
    "empirical_moments = fill_nans_zero(empirical_moments)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:34:50.180162Z",
     "start_time": "2020-01-20T16:34:50.162826Z"
    },
    "pycharm": {
     "is_executing": false
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>edu</th>\n",
       "      <th>home</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Period</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.428</td>\n",
       "      <td>0.107</td>\n",
       "      <td>0.447</td>\n",
       "      <td>0.018</td>\n",
       "      <td>16795.770670</td>\n",
       "      <td>2763.572041</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.476</td>\n",
       "      <td>0.158</td>\n",
       "      <td>0.319</td>\n",
       "      <td>0.047</td>\n",
       "      <td>16273.795611</td>\n",
       "      <td>3140.603821</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.493</td>\n",
       "      <td>0.229</td>\n",
       "      <td>0.234</td>\n",
       "      <td>0.044</td>\n",
       "      <td>16399.690622</td>\n",
       "      <td>3281.481467</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.481</td>\n",
       "      <td>0.259</td>\n",
       "      <td>0.220</td>\n",
       "      <td>0.040</td>\n",
       "      <td>16719.991373</td>\n",
       "      <td>3601.925045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.488</td>\n",
       "      <td>0.279</td>\n",
       "      <td>0.191</td>\n",
       "      <td>0.042</td>\n",
       "      <td>17129.490824</td>\n",
       "      <td>3717.262571</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            a      b    edu   home          mean          std\n",
       "Period                                                       \n",
       "0       0.428  0.107  0.447  0.018  16795.770670  2763.572041\n",
       "1       0.476  0.158  0.319  0.047  16273.795611  3140.603821\n",
       "2       0.493  0.229  0.234  0.044  16399.690622  3281.481467\n",
       "3       0.481  0.259  0.220  0.040  16719.991373  3601.925045\n",
       "4       0.488  0.279  0.191  0.042  17129.490824  3717.262571"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "empirical_moments.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2019-12-30T12:13:28.165952Z",
     "start_time": "2019-12-30T12:13:28.163127Z"
    }
   },
   "source": [
    "#### The `weighting_matrix` Argument\n",
    "\n",
    "For the msm estimation, a weighting matrix has to be specified. `get_diag_weighting_matrix` allows users to  create a diagonal weighting matrix that will match the moment vectors used for estimation. The required inputs are `empirical_moments` that are also used in `get_moment_errors_func` and a set of weights that are of the same form as `empirical_moments`. If no weights are specified, the function will return the identity matrix. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:34:50.204860Z",
     "start_time": "2020-01-20T16:34:50.185782Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "weighting_matrix = rp.get_diag_weighting_matrix(empirical_moments)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:34:50.324845Z",
     "start_time": "2020-01-20T16:34:50.206916Z"
    },
    "pycharm": {
     "is_executing": false
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>230</th>\n",
       "      <th>231</th>\n",
       "      <th>232</th>\n",
       "      <th>233</th>\n",
       "      <th>234</th>\n",
       "      <th>235</th>\n",
       "      <th>236</th>\n",
       "      <th>237</th>\n",
       "      <th>238</th>\n",
       "      <th>239</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>235</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>236</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>237</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>238</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>239</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>240 rows × 240 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     0    1    2    3    4    5    6    7    8    9    ...  230  231  232  \\\n",
       "0    1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   \n",
       "1    0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   \n",
       "2    0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   \n",
       "3    0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   \n",
       "4    0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   \n",
       "..   ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...   \n",
       "235  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   \n",
       "236  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   \n",
       "237  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   \n",
       "238  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   \n",
       "239  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   \n",
       "\n",
       "     233  234  235  236  237  238  239  \n",
       "0    0.0  0.0  0.0  0.0  0.0  0.0  0.0  \n",
       "1    0.0  0.0  0.0  0.0  0.0  0.0  0.0  \n",
       "2    0.0  0.0  0.0  0.0  0.0  0.0  0.0  \n",
       "3    0.0  0.0  0.0  0.0  0.0  0.0  0.0  \n",
       "4    0.0  0.0  0.0  0.0  0.0  0.0  0.0  \n",
       "..   ...  ...  ...  ...  ...  ...  ...  \n",
       "235  0.0  0.0  1.0  0.0  0.0  0.0  0.0  \n",
       "236  0.0  0.0  0.0  1.0  0.0  0.0  0.0  \n",
       "237  0.0  0.0  0.0  0.0  1.0  0.0  0.0  \n",
       "238  0.0  0.0  0.0  0.0  0.0  1.0  0.0  \n",
       "239  0.0  0.0  0.0  0.0  0.0  0.0  1.0  \n",
       "\n",
       "[240 rows x 240 columns]"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(weighting_matrix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the user prefers to compute a weighting matrix manually, the respy function `get_flat_moments` may be of use. This function returns the empirical moments as an indexed `pandas.Series` which is the form they will be passed on to the loss function as. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:34:50.335487Z",
     "start_time": "2020-01-20T16:34:50.326872Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0_a_0           0.428000\n",
       "0_a_1           0.476000\n",
       "0_a_2           0.493000\n",
       "0_a_3           0.481000\n",
       "0_a_4           0.488000\n",
       "                ...     \n",
       "0_std_35    12714.065528\n",
       "0_std_36    13302.883105\n",
       "0_std_37    13179.870462\n",
       "0_std_38    13537.023045\n",
       "0_std_39    13406.485526\n",
       "Length: 240, dtype: float64"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "flat_empirical_moments = rp.get_flat_moments(empirical_moments)\n",
    "flat_empirical_moments"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The `n_simulation_periods` Argument\n",
    "\n",
    "The `n_simulation_periods` is part of the simulator that is constructed by **respy** in `get_moment_errors_func`. It dictates the number of periods in the simulated dataset and is not to be confused with `options[\"n_periods\"]` which controls the number of periods for which decision rules are computed. If the desired dataset needs to include only a subset of the total number of periods realized in the model, `n_simulation_periods` can be set to a value lower number of periods.\n",
    "\n",
    "This argument, if not needed, can be left out when specifying inputs. By default, the simulator will produce a dataset with the number of periods specified in `options[\"n_periods\"]`. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The `return_scalar` Argument\n",
    "\n",
    "The `return_scalar` argument can be used to return additional function out puts. If `return_scalar` is set to **True**, the function will only return the weighted square product of moment errors (this is also the default). If the argument is set to **False**, the function will instead return a dictionary that contains the scalar output, but also the root contributions that can be used to compute it, simulated moments that match the structure of the input empirical moments, and a pandas.DataFrame that can be used for visualization."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### MSM Criterion Function\n",
    "We can now construct the criterion function for estimation. `get_moment_errors_func` will return a function that holds all elements but the `params` argument fixed and can thus easily be passed on to an optimizer. The function will return a value of 0 if we use the true parameter vector as input."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:35:10.064895Z",
     "start_time": "2020-01-20T16:34:50.337390Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weighted_sum_squared_errors = rp.get_moment_errors_func(\n",
    "    params=params, \n",
    "    options=options, \n",
    "    calc_moments=calc_moments, \n",
    "    replace_nans = fill_nans_zero,\n",
    "    empirical_moments=empirical_moments, \n",
    "    weighting_matrix = weighting_matrix, \n",
    "    return_scalar=True,\n",
    ")\n",
    "\n",
    "weighted_sum_squared_errors(params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using a different parameter vector will result in a value different from 0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:35:10.074465Z",
     "start_time": "2020-01-20T16:35:10.067110Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "params_sim = params.copy()\n",
    "params_sim.loc['delta', 'value'] = 0.8"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:35:27.681782Z",
     "start_time": "2020-01-20T16:35:10.077186Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3192548591.9774055"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weighted_sum_squared_errors(params_sim)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we set `return_scalar` to **False**, the function will return the a dictionary with more extensive information instead. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:35:48.001371Z",
     "start_time": "2020-01-20T16:35:27.685029Z"
    },
    "pycharm": {
     "is_executing": false
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "weighted_errors = rp.get_moment_errors_func(\n",
    "    params=params_sim, \n",
    "    options=options, \n",
    "    calc_moments=calc_moments, \n",
    "    replace_nans = fill_nans_zero,\n",
    "    empirical_moments=empirical_moments, \n",
    "    weighting_matrix = weighting_matrix, \n",
    "    return_scalar=False\n",
    ")\n",
    "\n",
    "outputs=weighted_errors(params_sim)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['value', 'root_contributions', 'comparison_plot_data', 'simulated_moments'])"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "outputs.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Examples of the dictionary entries are shown below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3192548591.9774055"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "outputs[\"value\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0.178,  0.204,  0.196,  0.168,  0.165,  0.133,  0.072,  0.048,\n",
       "       -0.006,  0.012])"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "outputs[\"root_contributions\"][0:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>moment_column</th>\n",
       "      <th>moment_index</th>\n",
       "      <th>value</th>\n",
       "      <th>moment_set</th>\n",
       "      <th>kind</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>a</td>\n",
       "      <td>0</td>\n",
       "      <td>0.428</td>\n",
       "      <td>0</td>\n",
       "      <td>empirical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>a</td>\n",
       "      <td>1</td>\n",
       "      <td>0.476</td>\n",
       "      <td>0</td>\n",
       "      <td>empirical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a</td>\n",
       "      <td>2</td>\n",
       "      <td>0.493</td>\n",
       "      <td>0</td>\n",
       "      <td>empirical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>a</td>\n",
       "      <td>3</td>\n",
       "      <td>0.481</td>\n",
       "      <td>0</td>\n",
       "      <td>empirical</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>a</td>\n",
       "      <td>4</td>\n",
       "      <td>0.488</td>\n",
       "      <td>0</td>\n",
       "      <td>empirical</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  moment_column  moment_index  value moment_set       kind\n",
       "0             a             0  0.428          0  empirical\n",
       "1             a             1  0.476          0  empirical\n",
       "2             a             2  0.493          0  empirical\n",
       "3             a             3  0.481          0  empirical\n",
       "4             a             4  0.488          0  empirical"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "outputs[\"comparison_plot_data\"].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>edu</th>\n",
       "      <th>home</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Period</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.250</td>\n",
       "      <td>0.013</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.737</td>\n",
       "      <td>18656.778322</td>\n",
       "      <td>2466.027332</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.272</td>\n",
       "      <td>0.009</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.719</td>\n",
       "      <td>18471.207370</td>\n",
       "      <td>2383.080064</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.297</td>\n",
       "      <td>0.012</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.691</td>\n",
       "      <td>18540.220326</td>\n",
       "      <td>2161.267108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.313</td>\n",
       "      <td>0.011</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.676</td>\n",
       "      <td>18750.979003</td>\n",
       "      <td>2491.027312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.323</td>\n",
       "      <td>0.009</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.668</td>\n",
       "      <td>18891.761467</td>\n",
       "      <td>2626.142919</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            a      b  edu   home          mean          std\n",
       "Period                                                     \n",
       "0       0.250  0.013  0.0  0.737  18656.778322  2466.027332\n",
       "1       0.272  0.009  0.0  0.719  18471.207370  2383.080064\n",
       "2       0.297  0.012  0.0  0.691  18540.220326  2161.267108\n",
       "3       0.313  0.011  0.0  0.676  18750.979003  2491.027312\n",
       "4       0.323  0.009  0.0  0.668  18891.761467  2626.142919"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "outputs[\"simulated_moments\"].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inputs as Lists or Dictionaries\n",
    "\n",
    "In the example above we used single elements for all inputs i.e. we used one function to calculate moments, one function to replace missing moments and saved all sets of moments in a single pandas.DataFrame. This works well for the example at hand because the inputs are relatively simple, but other applications might require more flexibility. `get_moment_errors_func` thus alternatively accepts inputs of type `list` and `dict`. This way, different sets of moments can be stored separately. Using lists or dictionaries also allows the use of different replacement functions for different moments. \n",
    "\n",
    "For the sake of this example, we add another set of moments to the estimation. In addition to the choice frequencies and wage distribution, we include the final education of agents. Here, the index is given by the educational experience agents have accumulated in period 39. The moments are given by the frequency of each level of experience in the dataset. Since this set of moments is not grouped by period, it cannot be saved to a `pandas.DataFrame` with the other moments. We hence give each set of moments its own function and save them to a `list`. The choice frequencies and wage distribution are saved to a `pandas.DataFrame` with multiple columns, the final education is given by a `pandas.Series`.\n",
    "\n",
    "Instead of lists, the functions and moments may also be saved to a `dict`. Internally, all inputs are converted into dictionaries, so input types should not be mixed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:35:48.027530Z",
     "start_time": "2020-01-20T16:35:48.010926Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "def calc_choice_freq(df):\n",
    "    return df.groupby(\"Period\").Choice.value_counts(normalize=True).unstack()\n",
    "\n",
    "def calc_wage_distr(df):\n",
    "    return df.groupby(['Period'])['Wage'].describe()[['mean', 'std']]\n",
    "\n",
    "def calc_final_edu(df):\n",
    "    last_period = max(df.index.get_level_values(1))\n",
    "    return df.xs(last_period, level=1).Experience_Edu.value_counts(normalize=True,sort=False)\n",
    "\n",
    "calc_moments = [calc_choice_freq, calc_wage_distr, calc_final_edu]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can additionally specify different replacement functions for each set of moments and save them to a list just like `calc_moments`. However, here we will use the same replacement function for all moments and thus just need to specify one. **respy** will automatically apply this function to all sets of moments.\n",
    "\n",
    "Note that this only works if only one replacement function is given. Otherwise `replace_nans` must be a list of the same length as `calc_moments` with each replacement function holding the same position as the moment function it corresponds to. In the case of dictionaries, replacement functions should be saved with the same keys as set of moments they correspond to."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:35:48.042741Z",
     "start_time": "2020-01-20T16:35:48.030370Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "def fill_nans_zero(df):\n",
    "    return df.fillna(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-12T09:47:28.537388Z",
     "start_time": "2020-01-12T09:47:28.529692Z"
    }
   },
   "source": [
    "We now calculate the `empirical_moments`. They are saved to a list as well. We can calculate the `weighting_matrix` as before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:36:09.205272Z",
     "start_time": "2020-01-20T16:35:48.045669Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "params, options, df = rp.get_example_model(\"kw_94_one\")\n",
    "empirical_moments = [calc_choice_freq(df), calc_wage_distr(df), calc_final_edu(df)]\n",
    "empirical_moments = [fill_nans_zero(df) for df in empirical_moments]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:36:09.219941Z",
     "start_time": "2020-01-20T16:36:09.207328Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [],
   "source": [
    "weighting_matrix = rp.get_diag_weighting_matrix(empirical_moments)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we can construct the MSM criterion from the defined inputs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:36:29.989040Z",
     "start_time": "2020-01-20T16:36:09.222191Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weighted_sum_squared_errors = rp.get_moment_errors_func(\n",
    "    params=params, \n",
    "    options=options, \n",
    "    calc_moments=calc_moments, \n",
    "    replace_nans = fill_nans_zero,\n",
    "    empirical_moments=empirical_moments, \n",
    "    weighting_matrix = weighting_matrix, \n",
    "    return_scalar=True\n",
    ")\n",
    "\n",
    "weighted_sum_squared_errors(params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result for the simulated moments slightly deviates from the introductory example because we added an additional set of moments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-01-20T16:36:49.185204Z",
     "start_time": "2020-01-20T16:36:29.991864Z"
    },
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3192548592.3866115"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weighted_sum_squared_errors(params_sim)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "\n",
    "- Keane, M. P. and  Wolpin, K. I. (1994). [The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo Evidence](https://doi.org/10.2307/2109768). *The Review of Economics and Statistics*, 76(4): 648-672.\n",
    "\n",
    "- McFadden, D. (1989). [A Method of Simulated Moments for Estimation of Discrete Response Models without Numerical Integration](https://jstor.org/stable/1913621). *Econometrica*, 995-1026.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "308.8px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}