{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Specifying a Model\n", "\n", "One of the core features of **respy** include the flexible modeling capabilities. The guide on *example models* showcases a collection of economic models that have already been implemented. They can be accessed freely. " ] }, { "cell_type": "raw", "metadata": {}, "source": [ "
\n", " To how-to guide\n", "\n", " Find out more about example models in How to load example models.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, **respy** can also be used to implement models from scratch. This guide illustrates how to translate an economic model and underlying mathematical relations to the core objects in **respy**: `params` and `options`. As a guiding example we will follow the seminal work of Keane and Wolpin (1994) and replicate their dynamic discrete choice model of schooling and occupational choice. Insights carry over to the conceptually close model used by Keane and Wolpin (1997).\n", "\n", "---\n", " \n", "**Note:** Only models of the Eckstein-Keane-Wolpin (EKW) class are implementable in **respy**. You can find further information about this modeling framework in the explanations section of this documentation.\n", " " ] }, { "cell_type": "raw", "metadata": {}, "source": [ "
\n", " Explanations\n", "\n", " Find out more about EKW models in the Explanations.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Components to modeling\n", "\n", "See the article in the explanations section linked below to find information on the exact model specification of Keane and Wolpin (1994)." ] }, { "cell_type": "raw", "metadata": {}, "source": [ "
\n", " Explanations\n", "\n", " Find the details about this model specification in Model in Keane and Wolpin (1994).\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How can we map the equations from the model into **respy** to construct a discrete choice dynamic programming model that allows us to estimate the structural parameters? \n", "\n", "A model in **respy** is defined by two components:\n", "\n", "1. The `params` DataFrame, where model parameters reside. It should be specified as a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html). \n", "\n", "2. The `options` which specify the settings for the model solution and further restrictions on the model structure. `options` are defined in a Python `dictionary`. Examples of components that enter the options include the number of periods, type of numerical integration, unfeasible states, etc. \n", "\n", "In the next steps, we will examine these two components in detail to illustrate how they mirror the model outlined above. Since the model of Keane and Wolpin (1994) is already implemented, we can simply load it into memory." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:25.077275Z", "start_time": "2020-12-11T14:39:21.729183Z" } }, "outputs": [], "source": [ "import respy as rp" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.146639Z", "start_time": "2020-12-11T14:39:25.078716Z" } }, "outputs": [], "source": [ "params, options, data = rp.get_example_model(\"kw_94_one\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "Note that when you specify these objects yourself, doing so in separate files might facilitate your workflow. For example, `params` could be loaded from a .csv-file and `options` from a .yaml-file.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Specifying the `params`\n", "\n", "We first inspect the `params` DataFrame. It contains all the parameters that enter the structural model. Usually, these parameters will be estimable, but this is not mandatory. For instance, a specified shock distribution may guide the model but be exogenously set. The `params` DataFrame may also contain auxiliary parameters that aid simulation but are not directly related to the model. **respy** allows copious freedom in designing reward functions and naming parameters. However, certain rules need to be accounted for to allow **respy** to process a model correctly. Below, we discuss each parameter group of our exemplary `params` DataFrame to outline how parameters can be specified." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.178321Z", "start_time": "2020-12-11T14:39:49.148433Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuecomment
categoryname
deltadelta0.9500discount factor
wage_aconstant9.2100log of rental price
exp_edu0.0380return to an additional year of schooling
exp_a0.0330return to same sector experience
exp_a_square-0.0005return to same sector, quadratic experience
exp_b0.0000return to other sector experience
exp_b_square0.0000return to other sector, quadratic experience
wage_bconstant8.4800log of rental price
exp_edu0.0700return to an additional year of schooling
exp_b0.0670return to same sector experience
exp_b_square-0.0010return to same sector, quadratic experience
exp_a0.0220return to other sector experience
exp_a_square-0.0005return to other sector, quadratic experience
nonpec_educonstant0.0000constant reward for choosing education
at_least_twelve_exp_edu0.0000reward for going to college (tuition, etc.)
not_edu_last_period-4000.0000reward for going back to school
nonpec_homeconstant17750.0000constant reward of non-market alternative
shocks_sdcorrsd_a0.2000Element 1,1 of standard-deviation/correlation ...
sd_b0.2500Element 2,2 of standard-deviation/correlation ...
sd_edu1500.0000Element 3,3 of standard-deviation/correlation ...
sd_home1500.0000Element 4,4 of standard-deviation/correlation ...
corr_b_a0.0000Element 2,1 of standard-deviation/correlation ...
corr_edu_a0.0000Element 3,1 of standard-deviation/correlation ...
corr_edu_b0.0000Element 3,2 of standard-deviation/correlation ...
corr_home_a0.0000Element 4,1 of standard-deviation/correlation ...
corr_home_b0.0000Element 4,2 of standard-deviation/correlation ...
corr_home_edu0.0000Element 4,3 of standard-deviation/correlation ...
lagged_choice_1_eduprobability1.0000Probability that the first lagged choice is ed...
initial_exp_edu_10probability1.0000Probability that the initial level of educatio...
maximum_expedu20.0000Maximum level of experience for education (opt...
\n", "
" ], "text/plain": [ " value \\\n", "category name \n", "delta delta 0.9500 \n", "wage_a constant 9.2100 \n", " exp_edu 0.0380 \n", " exp_a 0.0330 \n", " exp_a_square -0.0005 \n", " exp_b 0.0000 \n", " exp_b_square 0.0000 \n", "wage_b constant 8.4800 \n", " exp_edu 0.0700 \n", " exp_b 0.0670 \n", " exp_b_square -0.0010 \n", " exp_a 0.0220 \n", " exp_a_square -0.0005 \n", "nonpec_edu constant 0.0000 \n", " at_least_twelve_exp_edu 0.0000 \n", " not_edu_last_period -4000.0000 \n", "nonpec_home constant 17750.0000 \n", "shocks_sdcorr sd_a 0.2000 \n", " sd_b 0.2500 \n", " sd_edu 1500.0000 \n", " sd_home 1500.0000 \n", " corr_b_a 0.0000 \n", " corr_edu_a 0.0000 \n", " corr_edu_b 0.0000 \n", " corr_home_a 0.0000 \n", " corr_home_b 0.0000 \n", " corr_home_edu 0.0000 \n", "lagged_choice_1_edu probability 1.0000 \n", "initial_exp_edu_10 probability 1.0000 \n", "maximum_exp edu 20.0000 \n", "\n", " comment \n", "category name \n", "delta delta discount factor \n", "wage_a constant log of rental price \n", " exp_edu return to an additional year of schooling \n", " exp_a return to same sector experience \n", " exp_a_square return to same sector, quadratic experience \n", " exp_b return to other sector experience \n", " exp_b_square return to other sector, quadratic experience \n", "wage_b constant log of rental price \n", " exp_edu return to an additional year of schooling \n", " exp_b return to same sector experience \n", " exp_b_square return to same sector, quadratic experience \n", " exp_a return to other sector experience \n", " exp_a_square return to other sector, quadratic experience \n", "nonpec_edu constant constant reward for choosing education \n", " at_least_twelve_exp_edu reward for going to college (tuition, etc.) \n", " not_edu_last_period reward for going back to school \n", "nonpec_home constant constant reward of non-market alternative \n", "shocks_sdcorr sd_a Element 1,1 of standard-deviation/correlation ... \n", " sd_b Element 2,2 of standard-deviation/correlation ... \n", " sd_edu Element 3,3 of standard-deviation/correlation ... \n", " sd_home Element 4,4 of standard-deviation/correlation ... \n", " corr_b_a Element 2,1 of standard-deviation/correlation ... \n", " corr_edu_a Element 3,1 of standard-deviation/correlation ... \n", " corr_edu_b Element 3,2 of standard-deviation/correlation ... \n", " corr_home_a Element 4,1 of standard-deviation/correlation ... \n", " corr_home_b Element 4,2 of standard-deviation/correlation ... \n", " corr_home_edu Element 4,3 of standard-deviation/correlation ... \n", "lagged_choice_1_edu probability Probability that the first lagged choice is ed... \n", "initial_exp_edu_10 probability Probability that the initial level of educatio... \n", "maximum_exp edu Maximum level of experience for education (opt... " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Index stucture\n", "\n", "The `params` DataFrame needs to abide to a specific index structure:\n", "\n", "- **Index**: The DataFrame has a MultiIndex with two levels. The levels have to be named `category` and `name`. Categories need to be unique. Names may be repeated but never within the same category. This ensures that each parameter is uniquely identfied in the `params` DataFrame.\n", "- **Columns**:The parameter value needs to be saved in a column called `value`. The `params` DataFrame may contain other columns like the comment column above. They do not influence the model. This can also be useful for parameter estimation where information like bounds may need to be specified as additional columns.\n", "\n", "### Discounting\n", "\n", "In `respy` the discount factor has a pre-defined and unmutable name: `delta`." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.195230Z", "start_time": "2020-12-11T14:39:49.181367Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuecomment
name
delta0.95discount factor
\n", "
" ], "text/plain": [ " value comment\n", "name \n", "delta 0.95 discount factor" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params.loc[\"delta\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**respy** also supports hyperbolic discounting. You can implement it in your model by adding a `category` and `name` called `beta` to your parameter vector." ] }, { "cell_type": "raw", "metadata": {}, "source": [ "
\n", " How-to Guide\n", "\n", " Find out how to implement hyperbolic discounting in Impatient Robinson.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Choice Rewards\n", "\n", "The structural model consists of two building blocks: states and choices. Choices in general can have two types of rewards: \n", "\n", "- **pecuniary rewards**, e.g. wages, with corresponding `category`: `wage_{choice}`.\n", "- **non-pecuniary rewards**, e.g. intrinsic value of education, with corresponding `category`: `nonpec_{choice}`.\n", "\n", "Choices can be named freely but it is important to use the appropriate prefixes so **respy** can process the model accordingly. In our example above, choices have either exclusively pecuniary rewards (occupation *A* and *B*) or non-pecuniary rewards (*education* and *home*) but **respy** also allows for combinations of both types to define reward functions. Each parameter in `params` then corresponds to a parameter in the reward functions.\n", "\n", "#### Example: Returns to Occupation A\n", "\n", "Take for example the reward function for choosing to work in occupation *A*:\n", "\n", "$$\n", "R_1(t) = w_{1t} = r_{1} exp\\{\\alpha_{10} + \\alpha_{11}s_{t} + \\alpha_{12}x_{1t} - \\alpha_{13}x^2_{1t} + \\alpha_{14}x_{2t} - \\alpha_{15}x^2_{2t} + \\epsilon_{1t}\\} \\nonumber\\\\\n", "$$\n", "\n", "We can directly map the `params` DataFrame to the equation. All parameters are saved under the `category` of `wage_a`. The pecuniary reward associated with working in occupation A, `wage_a` is determined by state-specific returns. The index `name` collects all covariates where `value` captures the associated return. The state-variables and returns are mapped to the entries in `category` `wage_a` according to the following table:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| Covariate | `name` | Return | `value` |\n", "|---------------|----------------|----------------|-----------|\n", "| $1$ | `constant` | $\\alpha_{10}$ | $9.2100$ |\n", "| $s_{t}$ | `exp_edu` | $\\alpha_{11}$ | $0.0380$ |\n", "| $x_{1t}$ | `exp_a` | $\\alpha_{12}$ | $0.0330$ |\n", "| $x_{1t}^2$ | `exp_a_square` | $\\alpha_{13}$ | $-0.0005$ |\n", "| $x_{2t}$ | `exp_b` | $\\alpha_{14}$ | $0.0000$ |\n", "| $x_{2t}^2$ | `exp_b_square` | $\\alpha_{15}$ | $0.0000$ |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can imagine the equation to be written as\n", "\n", "$$\n", " w_{1t} = 1 \\cdot exp\\{9.2100 \\cdot 1 + 0.0380 \\cdot h_{t} + 0.0330 \\cdot k_{1t} -0.0005 \\cdot k_{1t}^2 + 0.0000 \\cdot k_{2t} + 0.0000 \\cdot k_{2t}^2\\ + \\epsilon_{1t}\\}.\n", "$$\n", "\n", "\n", "\n", "The choice-specific shock that is also part of this equation will be discussed in more detail below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "**Note:** The prefix `exp_` is a special `name` in **respy** and must be complemented by the name of a choice. Parameters with this prefix indicate the return to experience in a certain choice alternative. Conversely, the names `constant` and `exp_{choice}_square` do not have this pre-specified structure. Instead, they require further user input in the `options` dictionary to be properly specified. \n", "\n", "Experience accumulation is a central component of EKW models and thus an important feature of **respy**. You will notice that `exp_home` does not appear in our `params` DataFrame. This is a direct result from our model equations: Individuals do not accumulate any experience while being at home. Omitted experience parameters indicate that experience accumulation for this alternative is not a model component. Notably, alternatives with a wage component automatically account for experience accumulation.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Shocks\n", "\n", "For each choice reward, idiosyncratic and serially uncorrelated shocks alter the respective return. Those alternative-specific shocks are specified jointly in `category` `shocks_sdcorr`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.207736Z", "start_time": "2020-12-11T14:39:49.197344Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuecomment
name
sd_a0.20Element 1,1 of standard-deviation/correlation ...
sd_b0.25Element 2,2 of standard-deviation/correlation ...
sd_edu1500.00Element 3,3 of standard-deviation/correlation ...
sd_home1500.00Element 4,4 of standard-deviation/correlation ...
corr_b_a0.00Element 2,1 of standard-deviation/correlation ...
corr_edu_a0.00Element 3,1 of standard-deviation/correlation ...
corr_edu_b0.00Element 3,2 of standard-deviation/correlation ...
corr_home_a0.00Element 4,1 of standard-deviation/correlation ...
corr_home_b0.00Element 4,2 of standard-deviation/correlation ...
corr_home_edu0.00Element 4,3 of standard-deviation/correlation ...
\n", "
" ], "text/plain": [ " value comment\n", "name \n", "sd_a 0.20 Element 1,1 of standard-deviation/correlation ...\n", "sd_b 0.25 Element 2,2 of standard-deviation/correlation ...\n", "sd_edu 1500.00 Element 3,3 of standard-deviation/correlation ...\n", "sd_home 1500.00 Element 4,4 of standard-deviation/correlation ...\n", "corr_b_a 0.00 Element 2,1 of standard-deviation/correlation ...\n", "corr_edu_a 0.00 Element 3,1 of standard-deviation/correlation ...\n", "corr_edu_b 0.00 Element 3,2 of standard-deviation/correlation ...\n", "corr_home_a 0.00 Element 4,1 of standard-deviation/correlation ...\n", "corr_home_b 0.00 Element 4,2 of standard-deviation/correlation ...\n", "corr_home_edu 0.00 Element 4,3 of standard-deviation/correlation ..." ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params.loc[\"shocks_sdcorr\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Shocks are **assumed to follow a multivariate normal distribution** with zero mean and covariance matrix $\\Sigma$. The **dimensionality** of the symmetric covariance matrix equals the number of modeled choices. The specification of $\\Sigma$ remains in the discretion of the user. Because the symmetry of covariance matrices, it is sufficient to specify the lower triangular matrix. However, it is mandatory to follow the order which is prescribed by **respy**. \n", "\n", "- First, the **diagonal elements (standard deviations)** are specified via `sd_{choice}` according to the following order:\n", "\n", " 1. Working alternatives (alphabetically sorted).\n", " 2. Non-working alternatives with experience accumulation (alphabetically sorted).\n", " 3. Remaining alternatives (alphabetically sorted.)\n", "\n", "- Second, the **off-diagonal elements (correlations)** are specified ordered **by rows** in the matrix. \n", "\n", "---\n", "In all of the example models, the covariance matrices are specified in form of a correlation matrix following Keane and Wolpin (1994, 1997, 2000) to allow direct comparison between the parameters presented in the papers and their **respy** implementation.\n", "\n", "---\n", "\n", "\n", "Aside from specifying shocks according to standard deviations and correlations, you can also specify the variance-covariance matrix. The parameters are ordered by appearance in the lower triangular. Variances have the name `var_{choice}` and covariances `cov_{choice_2}_{choice_1}` and so forth. Lastly, another option is the Cholesky factor of the variance-covariance matrix ordered by appearance in the lower triangular. The labels are either `chol_{choice}` or `chol_{choice_2}_{choice_1}` and so forth. In contrast to the other two options, Cholesky shocks are not ordered according to diagonal and off-diagonal elements. Instead they need to be ordered according to appearance by rows in the lower triangular of the shock matrix.\n", "\n", "\n", "---\n", "\n", "The specification of shocks may appear a bit confusing due to the ordering requirements. Notably, **respy** will raise an error the shock parameters are not passed in correct order. The error message will help you specify the parameters in the correct order.\n", "\n", "---\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Additional Parameters\n", "\n", "Aside from discounting and reward-specific parameters (pecuniary rewards, non-pecuniary rewards, and shocks) there are some additional parameters that you might want to add to specify your model. Below you find a small overview of the type of parameters you may add." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Initial Conditions\n", "\n", "In many instances, you may need to add initial conditions to your model. This can include lagged choices, experience levels, and observable characteristics. Their `value` in the `params` reflect the share of individuals that exhibits a specific characteristic. Importantly, initial conditions are usually non-estimable parameters. Our example model requires two such parameter specifications. \n", "\n", "The parameter `lagged_choice_1_edu` ensures that the model logs *education* as the previous choice in period $t=-1$ for all individuals in the sample. Our model requires this specification because we include a cost of returning to school in the reward function for education, if the previous choice was another alternative. In order to compute the rewards for period $0$, **respy** thus needs to know the choice of the previous period, even if it is not directly part of the model's decision horizon. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.219301Z", "start_time": "2020-12-11T14:39:49.209549Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuecomment
name
probability1.0Probability that the first lagged choice is ed...
\n", "
" ], "text/plain": [ " value comment\n", "name \n", "probability 1.0 Probability that the first lagged choice is ed..." ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params.loc[\"lagged_choice_1_edu\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The parameter `initial_exp_edu_10` assigns 10 periods of experience in education (i.e. 10 periods of completed schooling) to all individuals in period $0$. Adding an initial condition like this may be useful if we think about the correspondence between the model and potential empirical data. Since we are assessing occupational choices, we will be analyzing individuals of working age who will have accumulated schooling before they enter the labor market." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.235780Z", "start_time": "2020-12-11T14:39:49.220887Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuecomment
name
probability1.0Probability that the initial level of educatio...
\n", "
" ], "text/plain": [ " value comment\n", "name \n", "probability 1.0 Probability that the initial level of educatio..." ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params.loc[\"initial_exp_edu_10\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both of these parameters only exhibit one value that occurs for all individuals in this example. However, initial conditions are much more versatile and can be defined quite flexibility. Refer to the guide linked below for more information. " ] }, { "cell_type": "raw", "metadata": {}, "source": [ "
\n", " How-to Guide\n", "\n", " Find out how to implement initial conditions in Initial Conditions.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Maximum Experience\n", "\n", "Much like adding initial experience, we may want to limit the maximum amount of experience. In our example, individuals can complete a maximum of 20 periods of schooling. The implementation is straightforward. We define a `category` called `maximum_exp` and add a parameter `name` that corresponds to the name of a choice (e.g. `edu`). The `value` column holds the maximum level of experience." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.252925Z", "start_time": "2020-12-11T14:39:49.240597Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
valuecomment
name
edu20.0Maximum level of experience for education (opt...
\n", "
" ], "text/plain": [ " value comment\n", "name \n", "edu 20.0 Maximum level of experience for education (opt..." ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params.loc[\"maximum_exp\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Unobserved Heterogeneity\n", "\n", "A component not implemented in this example is unobserved heterogeneity between individuals. **respy** allows to add such components using finite mixture approaches. Check out the guide below and example models based on Keane and Wolpin (1997) to learn more about adding unobserved heterogeneity to your model." ] }, { "cell_type": "raw", "metadata": {}, "source": [ "
\n", " How-to Guide\n", "\n", " Find out how to add unobserved heterogeneity in Unobserved Heterogeneity and Finite Mixture Models.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Measurement Error\n", "\n", "You may also implement measurement error in wages. To do so you have to define a `category` called `meas_error` and add the parameter names `sd_{choice}` for all choices with a wage. The parameter `value` should be the standard deviations of measurement error. Check out the model parametrization of `kw_97_extended` for an example.\n", "\n", "---\n", "\n", "Note that this parameter `category` only requires standard deviations for choices with a wage. They can be provided for *all* or *none* choices with wages, measurement errors for non-wage choices are neglected, and no correlation between measurement errors can be defined.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Defining the `options`\n", "\n", "The `options` dictionary is the second necessary component for defining models in **respy**. As we have learned above, structural parameters are defined in a pandas.DataFrame. The `options` dictionary holds additional settings and information about the model. Thus, the `params` DataFrame and `options` dictionary should be viewed as complementary objects. Some types of parameters require additional options in order for **respy** to process them. Below we will inspect our example model's `options`." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.263369Z", "start_time": "2020-12-11T14:39:49.256524Z" } }, "outputs": [ { "data": { "text/plain": [ "{'estimation_draws': 200,\n", " 'estimation_seed': 500,\n", " 'estimation_tau': 500,\n", " 'interpolation_points': -1,\n", " 'n_periods': 40,\n", " 'simulation_agents': 1000,\n", " 'simulation_seed': 132,\n", " 'solution_draws': 500,\n", " 'solution_seed': 15,\n", " 'monte_carlo_sequence': 'random',\n", " 'core_state_space_filters': [\"period > 0 and exp_{choices_w_exp} == period and lagged_choice_1 != '{choices_w_exp}'\",\n", " \"period > 0 and exp_a + exp_b + exp_edu == period and lagged_choice_1 == '{choices_wo_exp}'\",\n", " \"period > 0 and lagged_choice_1 == 'edu' and exp_edu == 0\",\n", " \"lagged_choice_1 == '{choices_w_wage}' and exp_{choices_w_wage} == 0\",\n", " \"period == 0 and lagged_choice_1 == '{choices_w_wage}'\"],\n", " 'covariates': {'constant': '1',\n", " 'exp_a_square': 'exp_a ** 2',\n", " 'exp_b_square': 'exp_b ** 2',\n", " 'at_least_twelve_exp_edu': 'exp_edu >= 12',\n", " 'not_edu_last_period': \"lagged_choice_1 != 'edu'\"}}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "options" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `n_periods`\n", "\n", "The option `n_periods` determines the number of periods that individuals take into account when evaluating their actions. That is, they decide for the action that maximizes their expected utility in an evaluation over `n_periods`. Possible values are one and higher integers. This option is mandatory as no default is supplied. In most models, the model's complexity or the number of states in the state space is exponentially increasing in the number of periods.\n", "\n", "Do not confuse this option with the number of periods for which you want to simulate the actions of individuals. This number can be lower because although actions of individuals are simulated for, say, 10 periods, their actions can still aim to maximize utility for 50 periods." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.270115Z", "start_time": "2020-12-11T14:39:49.265810Z" } }, "outputs": [ { "data": { "text/plain": [ "40" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "options[\"n_periods\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `simulation_agents`\n", "\n", "This option specifies the number of individuals which are simulated. This option is ignored if you pass data to the simulation function." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.277594Z", "start_time": "2020-12-11T14:39:49.272251Z" } }, "outputs": [ { "data": { "text/plain": [ "1000" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "options[\"simulation_agents\"]" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "
\n", " To how-to guide\n", " Find out more about Simulation.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `covariates`\n", "\n", "In the subsection on the [parameterization of the choice rewards](#Choice-Rewards), we discussed the special role of `exp_{choice}` in defining parameters for the pecuniary reward of occupation A. However, the parameter vector includes further covariates like a constant and squared experience terms. \n", "\n", "These covariates need further specification so **respy** knows how to process them. Covariates with no pre-defined naming convention are specified in the model `options` as a nested dictionary called `covariates`. In the `covariates` dictionary, keys correspond to the parameter `name` in `params` and dictionary values hold the definition of this parameter.\n", "\n", "For example, all parameters named `constant` return a value of 1 for every individual. The parameters `exp_a_square` and `exp_b_square` signal the return to square experience in both occupations.\n", "\n", "The other two covariates enter the reward function for *education*. `at_least_twelve_exp_edu` is a boolean that evaluates true when an individual has accumulated 12 periods of schooling or more, and triggers a cost component in the reward function. Lastly, the covariate `not_edu_last_period` is a boolean indicator for not having chosen *education* in the last period. As discussed in the section on [initial conditions](#Initial-Conditions), this requires the inclusion of a lagged choice in the `params` DataFrame." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.284581Z", "start_time": "2020-12-11T14:39:49.279554Z" } }, "outputs": [ { "data": { "text/plain": [ "{'constant': '1',\n", " 'exp_a_square': 'exp_a ** 2',\n", " 'exp_b_square': 'exp_b ** 2',\n", " 'at_least_twelve_exp_edu': 'exp_edu >= 12',\n", " 'not_edu_last_period': \"lagged_choice_1 != 'edu'\"}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "options[\"covariates\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", " \n", "How should covariate definitions in the `options` look like to be processed? Here are some pointers:\n", "\n", "- The statements are evaluated using [pandas.eval](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html). This means you can use all arithmetic operations that this method supports.\n", "- The following pre-defined terms are recognized to construct covariates: `period`, `exp_{choice}`, `lagged_choice_{number of periods}`.\n", "- You can also define new covariates as a function of already existing covariates.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Seeds (optional)\n", "\n", "To be able to replicate a model, the `options` for solution, simulation, and estimation allows for three seeds. The distinction enables us to vary randomness in only one component, independent from the others. The dictionary keys are\n", "\n", "- `solution_seed` for the computation of the decision rules.\n", "- `simulation_seed` for the simulation.\n", "- `estimation_seed` for the computation of the log likelihood." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.294083Z", "start_time": "2020-12-11T14:39:49.287575Z" } }, "outputs": [ { "data": { "text/plain": [ "{'estimation_seed': 500, 'simulation_seed': 132, 'solution_seed': 15}" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{k: v for k, v in options.items() if \"seed\" in k}" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "
\n", " To reference guide\n", " Find out more about this topic in \n", " Randomness and Reproducibility.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `monte_carlo_sequence` and draws (optional)\n", "\n", "`monte_carlo_sequence` and draws refer more generally to approximations of integrals with Monte Carlo simulations inside **respy**. There exist two applications for Monte Carlo simulation.\n", "\n", "1. In the solution of a model, the value of expected value functions has to be simulated.\n", "2. While computing the log likelihood, (log) choice probabilities are simulated.\n", "\n", "The number of draws controls how many points are used to evaluate an integral. The default is 500 for the solution and 200 for the estimation of choice probabilities." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.302942Z", "start_time": "2020-12-11T14:39:49.297006Z" } }, "outputs": [ { "data": { "text/plain": [ "{'estimation_draws': 200, 'solution_draws': 500}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{k: v for k, v in options.items() if \"draws\" in k}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The option `monte_carlo_sequence` controls how points are drawn.\n", "\n", "- `\"random\"`: Points are drawn randomly (crude Monte Carlo).\n", "- `\"sobol\"`or `\"halton\"`: Points are drawn from low-discrepancy sequences (superiority in coverage). This means a given approximation error can be achieved with less points. \n" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "
\n", " To how-to guide\n", " Find out more about \n", " Numerical Integration Methods.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `interpolation_points`\n", "\n", "The number of interpolation points specifies the number states or their corresponding expected value functions which are used to fit an interpolation model. The model is used to predict the expected value functions for all remaining states. The interpolation method available in **respy** is designed by Keane and Wolpin (1994). Their paper offers a detailed explanation of the method.\n", "\n", "If `interpolation_points` is set to -1, the full solution is computed." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.311343Z", "start_time": "2020-12-11T14:39:49.305378Z" } }, "outputs": [ { "data": { "text/plain": [ "-1" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "options[\"interpolation_points\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `negative_choice_set`\n", "\n", "You can limit the set of available choices at different points in time using the option `negative_choice_set`. To implement a negative choice set, define a nested dictionary where keys correspond to choice alternatives and values hold a list of conditions that will eliminate the corresponding choice for periods whenever it evaluates to `True`.\n", "\n", "For example, consider a scenario where individuals can only work in occupation A after the fifth period ($t=4$) (i.e. the occupation may have an age requirement). In this case, we need to implement a negative choice set for the first five periods as follows." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.316746Z", "start_time": "2020-12-11T14:39:49.313395Z" } }, "outputs": [], "source": [ "options[\"negative_choice_set\"] = {\"a\" : [\"period < 5\"]}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `core_state_space_filters` (optional)\n", "\n", "Core state space filters partly complement the `negative_choice_set` option. First of all, what is the core state space? The core state space is the part of the state space spanned by the combinations of experiences and previous choices. Not all combinations are feasible, but it is not always possible to catch all invalid combinations.\n", "\n", "States with impossible combinations have no effect on the correctness of the model, but pose an additional computational burden which should be eliminated. Similar to `negative_choice_set` the `core_state_space_filters` are a list of conditions and whenever one of them is true, the state is eliminated from the state space.\n", "\n", "This option is a rather advanced feature of **respy** as it requires a sound understanding of the state space and at least partial knowledge on how it processed internally. In most cases, you would not necessarily need to add them, but they can be useful to:\n", "\n", "- Improve the computational performance of your model.\n", "- Implement restrictions on the choice set that cannot be implemented using the `params` or `negative_choice_set` option ." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.324717Z", "start_time": "2020-12-11T14:39:49.318867Z" } }, "outputs": [ { "data": { "text/plain": [ "[\"period > 0 and exp_{choices_w_exp} == period and lagged_choice_1 != '{choices_w_exp}'\",\n", " \"period > 0 and exp_a + exp_b + exp_edu == period and lagged_choice_1 == '{choices_wo_exp}'\",\n", " \"period > 0 and lagged_choice_1 == 'edu' and exp_edu == 0\",\n", " \"lagged_choice_1 == '{choices_w_wage}' and exp_{choices_w_wage} == 0\",\n", " \"period == 0 and lagged_choice_1 == '{choices_w_wage}'\"]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "options[\"core_state_space_filters\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "\n", "**Order is important**\n", "\n", "`negative_choice_set`'s are applied **after** initial conditions are implemented.\n", "\n", "`core_state_space_filters` are applied **before** initial conditions are implemented. Pay attention to this when you have, for example, implemented initial experience for a choice. When adding a filter based on the experience for this choice, you will have to refer to within-model experience and discard knowledge of potential previous experience. \n", "\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `estimation_tau`\n", "\n", "*This option is only relevant for maximum likelihood estimation.* \n", "\n", "The choice probabilities in the likelihood function are simulated, as there exists no closed-form solution for them. They are computed with the [softmax function](https://en.wikipedia.org/wiki/Softmax_function) and require the specfication of a so-called temperature parameter tau. This parameter can be specified in the **respy** options." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "ExecuteTime": { "end_time": "2020-12-11T14:39:49.333128Z", "start_time": "2020-12-11T14:39:49.327220Z" } }, "outputs": [ { "data": { "text/plain": [ "500" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "options[\"estimation_tau\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " " ] }, { "cell_type": "raw", "metadata": {}, "source": [ "
\n", " How-to Guide\n", "\n", " To learn more about the temerature parameter see Maximum Likelihood Criterion.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## References\n", "\n", "- Keane, M. P., & Wolpin, K. I. (1994). The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo Evidence. *The Review of Etheconomics and Statistics*, 648-672.\n", "\n", "- Keane, M. P., & Wolpin, K. I. (1997). The Career Decisions of Young Men. *Journal of Political Economy*, 105(3), 473-522." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }