{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Unobserved Heterogeneity and Finite Mixture Models\n", "\n", "Unobserved heterogeneity is a concern in every econometric application. Keane and Wolpin (1997) face the problem that individuals at the age of sixteen report varying years of schooling. Neglecting the issue of measurement error, it is unlikely that the differences in initial schooling are caused by exogenous factors. Instead, the schooling decision is affected by a variety of endogenous factors such as parental investement, school and teacher quality, intrinsic motivation, and ability. Without correction, estimation methods fail to recover the true parameters.\n", "\n", "One solution would be to extend the model and incorporate the whole human capital investement process up to the age where initial schooling was zero. Although such a model would be extremely interesting, it is also almost infeasible to model that many factors in terms of modeling, computation and data.\n", "\n", "Another solution is to employ individual fixed-effects. Then, the state space comprises a dimension with has the same number of unique values as there are individuals in the sample. Thus, you have to compute the decision rules for every individual for the whole state space separately which is computationally infeasible.\n", "\n", "Keane and Wolpin (1997) resort to model unobserved heterogeneity with a finite mixture. A mixture model can be used to model the presence of subpopulations (types) in the general population without requiring the observed data to identify the affiliation to a group. In contrast to fixed-effects, the number of subpopulations is much lower than the number of individuals. There is also no fixed and unique assignment to one subpopulation, but relations are defined by a probability mass function.\n", "\n", "Each type has a preference for a particular choice which is modeled by a constant in the utility functions. For working alternatives, $w$, the constant is in the log wage equation whereas for non-working alternatives, $n$, it is in the nonpecuniary reward. Note that **respy** allows for type-specific effects in every utility component. Keane and Wolpin (1997) call it endowment with the symbol $e_{ak}$ for type $k$ and alternative $a$.\n", "\n", "$$\\begin{align}\n", " \\log(W(s_t, a_t)) = x^w\\beta^w + e_{ak} + \\epsilon_{at}\\\\\n", " N^n(s_t, a_t) = x^n\\beta^n + e_{ak} + \\epsilon_{at}\n", "\\end{align}$$\n", "\n", "To estimate model parameters with maximum likelihood, the likelihood contribution for one individual is defined as the joint probability of choices and wages accumulated over time.\n", "\n", "$$\n", " P(\\{a_t\\}^T_{t=0} \\mid s^-_t, e_{ak}, W_t) =\n", " \\prod^T_{t = 0} p(a_t, \\mid s^-_t, e_{ak}, W_t)\n", "$$\n", "\n", "We can weight the contribution for type $k$ with the probability for being the same type to get the unconditioned likelihood contribution of an individual.\n", "\n", "$$\n", " P(\\{a_t, W_t\\}^T_{t=0}) = \\sum^K_{k=1} \\pi_k\n", " P(\\{a_t\\}^T_{t=0} \\mid s^-_t, e_{ak}, W_t)\n", "$$\n", "\n", "To avoid misspecification of the likelihood, $\\pi_k$ must be a function of all individual characteristics which are determined before individuals enter the model horizon and are not the result of exogenous factors. The type-specific probability $\\pi_k = f(x^\\pi \\beta^\\pi)$ is calculated with softmax function based on a vector of covariates $x^\\pi$ and a matrix of coefficients $\\beta^\\pi$ for each type-covariate combination.\n", "\n", "$$\n", " \\pi_k = f(x^\\pi \\beta^\\pi_k) =\n", " \\frac{\\exp{\\{x^\\pi \\beta^\\pi_k\\}}}{\\sum^K_{k=1} \\exp \\{x^\\pi \\beta^\\pi_k\\}}\n", "$$\n", "\n", "To implement a finite mixture, we have to express $e_{ak}$ and $\\beta^\\pi$ in the parameters. As an example, we start with the basic Robinson Crusoe Economy. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import io\n", "import pandas as pd\n", "import respy as rp" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value
categoryname
deltadelta0.95
wage_fishingexp_fishing0.30
nonpec_fishingconstant-0.20
nonpec_hammockconstant2.00
shocks_sdcorrsd_fishing0.50
sd_hammock0.50
corr_hammock_fishing0.00
\n", "
" ], "text/plain": [ " value\n", "category name \n", "delta delta 0.95\n", "wage_fishing exp_fishing 0.30\n", "nonpec_fishing constant -0.20\n", "nonpec_hammock constant 2.00\n", "shocks_sdcorr sd_fishing 0.50\n", " sd_hammock 0.50\n", " corr_hammock_fishing 0.00" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params, options = rp.get_example_model(\"robinson_crusoe_basic\", with_data=False)\n", "params" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We extend the model by allowing for different periods of experience in fishing at $t = 0$. Robinsons starts with zero, one or two experience in fishing because of different tastes for fishing. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "initial_exp_fishing = \"\"\"\n", "category,name,value\n", "initial_exp_fishing_0,probability,0.33\n", "initial_exp_fishing_1,probability,0.33\n", "initial_exp_fishing_2,probability,0.34\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value
categoryname
initial_exp_fishing_0probability0.33
initial_exp_fishing_1probability0.33
initial_exp_fishing_2probability0.34
\n", "
" ], "text/plain": [ " value\n", "category name \n", "initial_exp_fishing_0 probability 0.33\n", "initial_exp_fishing_1 probability 0.33\n", "initial_exp_fishing_2 probability 0.34" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "initial_exp_fishing = pd.read_csv(io.StringIO(initial_exp_fishing), index_col=[\"category\", \"name\"])\n", "initial_exp_fishing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next step, we add type-specific endowment effects $e_{ak}$. We assume that there exist three types and the additional utility is increasing from the first to the third type. For computational simplicity, the benefit of the first type is normalized to zero such that all other types are in relation to the first." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "endowments = \"\"\"\n", "category,name,value\n", "wage_fishing,type_1,0.2\n", "wage_fishing,type_2,0.4\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value
categoryname
wage_fishingtype_10.2
type_20.4
\n", "
" ], "text/plain": [ " value\n", "category name \n", "wage_fishing type_1 0.2\n", " type_2 0.4" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "endowments = pd.read_csv(io.StringIO(endowments), index_col=[\"category\", \"name\"])\n", "endowments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We assume no effect for choosing the hammock.\n", "\n", "At last, we need to specify the probability mass function which relates individuals to types. We simply assume that initial experience is positively correlated with a stronger taste for fishing. For a comprehensive overview on how to specify distributions with multinomial coefficients, see the guide on the [initial conditions](how_to_initial_conditions.ipynb). Note that, the distribution is also only specified for type 1 and 2 and the coefficients for type 1 are left out for a parsimonuous representation. You cannot use probabilities as type assignment cannot be completely random. The following example is designed to specify a certain distribution and recover the pattern in the data. In reality, the distribution of unobservables is unknown.\n", "\n", "First, we define that Robinsons without prior experience are of type 0. Thus, we make the coefficients for type 1 and 2 extremely small. Robinsons with one prior experience are of type 1 with probability 0.66 and type 2 with 0.33. For two periods of experience for fishing, the share of type 1 individuals is 0.33 and of type 2 is 0.66. The coefficients for type 1 and 2 are simply the log of the probabilities.\n", "\n", "At last, we add a sufficiently large integer to all coefficients. The coefficient of type 0 is implicitly set to zero, so the distribution samples type 0 individuals for one or two experience in fishing. By shifting the parameters with a positive value, this is prevented. At the same time, the softmax function is shift-invariant and the relation of type 1 and type 2 shares is preserved." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "type_probabilities = \"\"\"\n", "category,name,value\n", "type_1,initial_exp_fishing_0,-100\n", "type_1,initial_exp_fishing_1,-0.4055\n", "type_1,initial_exp_fishing_2,-1.0986\n", "type_2,initial_exp_fishing_0,-100\n", "type_2,initial_exp_fishing_1,-1.0986\n", "type_2,initial_exp_fishing_2,-0.4055\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value
categoryname
type_1initial_exp_fishing_0-90.0000
initial_exp_fishing_19.5945
initial_exp_fishing_28.9014
type_2initial_exp_fishing_0-90.0000
initial_exp_fishing_18.9014
initial_exp_fishing_29.5945
\n", "
" ], "text/plain": [ " value\n", "category name \n", "type_1 initial_exp_fishing_0 -90.0000\n", " initial_exp_fishing_1 9.5945\n", " initial_exp_fishing_2 8.9014\n", "type_2 initial_exp_fishing_0 -90.0000\n", " initial_exp_fishing_1 8.9014\n", " initial_exp_fishing_2 9.5945" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type_probabilities = pd.read_csv(io.StringIO(type_probabilities), index_col=[\"category\", \"name\"])\n", "type_probabilities += 10\n", "type_probabilities" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The covariates used for the probabilities are defined below." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'initial_exp_fishing_0': 'exp_fishing == 0',\n", " 'initial_exp_fishing_1': 'exp_fishing == 1',\n", " 'initial_exp_fishing_2': 'exp_fishing == 2'}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type_covariates = {\n", " \"initial_exp_fishing_0\": \"exp_fishing == 0\",\n", " \"initial_exp_fishing_1\": \"exp_fishing == 1\",\n", " \"initial_exp_fishing_2\": \"exp_fishing == 2\",\n", "}\n", "type_covariates" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next step, we put all pieces together to get the complete model specification." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
value
categoryname
deltadelta0.9500
wage_fishingexp_fishing0.3000
nonpec_fishingconstant-0.2000
nonpec_hammockconstant2.0000
shocks_sdcorrsd_fishing0.5000
sd_hammock0.5000
corr_hammock_fishing0.0000
initial_exp_fishing_0probability0.3300
initial_exp_fishing_1probability0.3300
initial_exp_fishing_2probability0.3400
wage_fishingtype_10.2000
type_20.4000
type_1initial_exp_fishing_0-90.0000
initial_exp_fishing_19.5945
initial_exp_fishing_28.9014
type_2initial_exp_fishing_0-90.0000
initial_exp_fishing_18.9014
initial_exp_fishing_29.5945
\n", "
" ], "text/plain": [ " value\n", "category name \n", "delta delta 0.9500\n", "wage_fishing exp_fishing 0.3000\n", "nonpec_fishing constant -0.2000\n", "nonpec_hammock constant 2.0000\n", "shocks_sdcorr sd_fishing 0.5000\n", " sd_hammock 0.5000\n", " corr_hammock_fishing 0.0000\n", "initial_exp_fishing_0 probability 0.3300\n", "initial_exp_fishing_1 probability 0.3300\n", "initial_exp_fishing_2 probability 0.3400\n", "wage_fishing type_1 0.2000\n", " type_2 0.4000\n", "type_1 initial_exp_fishing_0 -90.0000\n", " initial_exp_fishing_1 9.5945\n", " initial_exp_fishing_2 8.9014\n", "type_2 initial_exp_fishing_0 -90.0000\n", " initial_exp_fishing_1 8.9014\n", " initial_exp_fishing_2 9.5945" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params = params.append([initial_exp_fishing, endowments, type_probabilities])\n", "params" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'solution_draws': 100,\n", " 'solution_seed': 456,\n", " 'n_periods': 5,\n", " 'simulation_agents': 10000,\n", " 'simulation_seed': 132,\n", " 'estimation_draws': 100,\n", " 'estimation_seed': 100,\n", " 'estimation_tau': 0.001,\n", " 'interpolation_points': -1,\n", " 'covariates': {'constant': '1',\n", " 'initial_exp_fishing_0': 'exp_fishing == 0',\n", " 'initial_exp_fishing_1': 'exp_fishing == 1',\n", " 'initial_exp_fishing_2': 'exp_fishing == 2'}}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "options[\"covariates\"] = {**options[\"covariates\"], **type_covariates}\n", "options[\"simulation_agents\"] = 10_000\n", "options" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us simulate a dataset to see whether the distribution of types can be recovered from the data." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "simulate = rp.get_simulate_func(params, options)\n", "df = simulate(params)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Type012
Experience_Fishing
01.0000000.0000000.000000
10.0000000.6655480.334452
20.0002960.3302780.669426
\n", "
" ], "text/plain": [ "Type 0 1 2\n", "Experience_Fishing \n", "0 1.000000 0.000000 0.000000\n", "1 0.000000 0.665548 0.334452\n", "2 0.000296 0.330278 0.669426" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.query(\"Period == 0\").groupby(\"Experience_Fishing\").Type.value_counts(normalize=\"rows\").unstack().fillna(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also know that type 1 and 2 experience a higher utility for choosing fishing. Here are the choice probabilities for each type." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Choicefishinghammock
Type
00.4265710.573429
10.9926020.007398
20.9980360.001964
\n", "
" ], "text/plain": [ "Choice fishing hammock\n", "Type \n", "0 0.426571 0.573429\n", "1 0.992602 0.007398\n", "2 0.998036 0.001964" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"Type\").Choice.value_counts(normalize=True).unstack()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 4 }