View and download the notebook here!

Unobserved Heterogeneity and Finite Mixture Models#

Unobserved heterogeneity is a concern in every econometric application. Keane and Wolpin (1997) face the problem that individuals at the age of sixteen report varying years of schooling. Neglecting the issue of measurement error, it is unlikely that the differences in initial schooling are caused by exogenous factors. Instead, the schooling decision is affected by a variety of endogenous factors such as parental investement, school and teacher quality, intrinsic motivation, and ability. Without correction, estimation methods fail to recover the true parameters.

One solution would be to extend the model and incorporate the whole human capital investement process up to the age where initial schooling was zero. Although such a model would be extremely interesting, it is also almost infeasible to model that many factors in terms of modeling, computation and data.

Another solution is to employ individual fixed-effects. Then, the state space comprises a dimension with has the same number of unique values as there are individuals in the sample. Thus, you have to compute the decision rules for every individual for the whole state space separately which is computationally infeasible.

Keane and Wolpin (1997) resort to model unobserved heterogeneity with a finite mixture. A mixture model can be used to model the presence of subpopulations (types) in the general population without requiring the observed data to identify the affiliation to a group. In contrast to fixed-effects, the number of subpopulations is much lower than the number of individuals. There is also no fixed and unique assignment to one subpopulation, but relations are defined by a probability mass function.

Each type has a preference for a particular choice which is modeled by a constant in the utility functions. For working alternatives, \(w\), the constant is in the log wage equation whereas for non-working alternatives, \(n\), it is in the nonpecuniary reward. Note that respy allows for type-specific effects in every utility component. Keane and Wolpin (1997) call it endowment with the symbol \(e_{ak}\) for type \(k\) and alternative \(a\).

\[\begin{split}\begin{align} \log(W(s_t, a_t)) = x^w\beta^w + e_{ak} + \epsilon_{at}\\ N^n(s_t, a_t) = x^n\beta^n + e_{ak} + \epsilon_{at} \end{align}\end{split}\]

To estimate model parameters with maximum likelihood, the likelihood contribution for one individual is defined as the joint probability of choices and wages accumulated over time.

\[P(\{a_t\}^T_{t=0} \mid s^-_t, e_{ak}, W_t) = \prod^T_{t = 0} p(a_t, \mid s^-_t, e_{ak}, W_t)\]

We can weight the contribution for type \(k\) with the probability for being the same type to get the unconditioned likelihood contribution of an individual.

\[P(\{a_t, W_t\}^T_{t=0}) = \sum^K_{k=1} \pi_k P(\{a_t\}^T_{t=0} \mid s^-_t, e_{ak}, W_t)\]

To avoid misspecification of the likelihood, \(\pi_k\) must be a function of all individual characteristics which are determined before individuals enter the model horizon and are not the result of exogenous factors. The type-specific probability \(\pi_k = f(x^\pi \beta^\pi)\) is calculated with softmax function based on a vector of covariates \(x^\pi\) and a matrix of coefficients \(\beta^\pi\) for each type-covariate combination.

\[\pi_k = f(x^\pi \beta^\pi_k) = \frac{\exp{\{x^\pi \beta^\pi_k\}}}{\sum^K_{k=1} \exp \{x^\pi \beta^\pi_k\}}\]

To implement a finite mixture, we have to express \(e_{ak}\) and \(\beta^\pi\) in the parameters. As an example, we start with the basic Robinson Crusoe Economy.

[1]:

import io
import pandas as pd
import respy as rp

[2]:

params, options = rp.get_example_model("robinson_crusoe_basic", with_data=False)
params

[2]:

		value
category	name
delta	delta	0.95
wage_fishing	exp_fishing	0.30
nonpec_fishing	constant	-0.20
nonpec_hammock	constant	2.00
shocks_sdcorr	sd_fishing	0.50
	sd_hammock	0.50
	corr_hammock_fishing	0.00

We extend the model by allowing for different periods of experience in fishing at \(t = 0\). Robinsons starts with zero, one or two experience in fishing because of different tastes for fishing.

[3]:

initial_exp_fishing = """
category,name,value
initial_exp_fishing_0,probability,0.33
initial_exp_fishing_1,probability,0.33
initial_exp_fishing_2,probability,0.34
"""

[4]:

initial_exp_fishing = pd.read_csv(io.StringIO(initial_exp_fishing), index_col=["category", "name"])
initial_exp_fishing

[4]:

		value
category	name
initial_exp_fishing_0	probability	0.33
initial_exp_fishing_1	probability	0.33
initial_exp_fishing_2	probability	0.34

In the next step, we add type-specific endowment effects \(e_{ak}\). We assume that there exist three types and the additional utility is increasing from the first to the third type. For computational simplicity, the benefit of the first type is normalized to zero such that all other types are in relation to the first.

[5]:

endowments = """
category,name,value
wage_fishing,type_1,0.2
wage_fishing,type_2,0.4
"""

[6]:

endowments = pd.read_csv(io.StringIO(endowments), index_col=["category", "name"])
endowments

[6]:

		value
category	name
wage_fishing	type_1	0.2
wage_fishing	type_2	0.4

We assume no effect for choosing the hammock.

At last, we need to specify the probability mass function which relates individuals to types. We simply assume that initial experience is positively correlated with a stronger taste for fishing. For a comprehensive overview on how to specify distributions with multinomial coefficients, see the guide on the initial conditions. Note that, the distribution is also only specified for type 1 and 2 and the coefficients for type 1 are left out for a parsimonuous representation. You cannot use probabilities as type assignment cannot be completely random. The following example is designed to specify a certain distribution and recover the pattern in the data. In reality, the distribution of unobservables is unknown.

First, we define that Robinsons without prior experience are of type 0. Thus, we make the coefficients for type 1 and 2 extremely small. Robinsons with one prior experience are of type 1 with probability 0.66 and type 2 with 0.33. For two periods of experience for fishing, the share of type 1 individuals is 0.33 and of type 2 is 0.66. The coefficients for type 1 and 2 are simply the log of the probabilities.

At last, we add a sufficiently large integer to all coefficients. The coefficient of type 0 is implicitly set to zero, so the distribution samples type 0 individuals for one or two experience in fishing. By shifting the parameters with a positive value, this is prevented. At the same time, the softmax function is shift-invariant and the relation of type 1 and type 2 shares is preserved.

[7]:

type_probabilities = """
category,name,value
type_1,initial_exp_fishing_0,-100
type_1,initial_exp_fishing_1,-0.4055
type_1,initial_exp_fishing_2,-1.0986
type_2,initial_exp_fishing_0,-100
type_2,initial_exp_fishing_1,-1.0986
type_2,initial_exp_fishing_2,-0.4055
"""

[8]:

type_probabilities = pd.read_csv(io.StringIO(type_probabilities), index_col=["category", "name"])
type_probabilities += 10
type_probabilities

[8]:

		value
category	name
type_1	initial_exp_fishing_0	-90.0000
	initial_exp_fishing_1	9.5945
	initial_exp_fishing_2	8.9014
type_2	initial_exp_fishing_0	-90.0000
	initial_exp_fishing_1	8.9014
	initial_exp_fishing_2	9.5945

The covariates used for the probabilities are defined below.

[9]:

type_covariates = {
    "initial_exp_fishing_0": "exp_fishing == 0",
    "initial_exp_fishing_1": "exp_fishing == 1",
    "initial_exp_fishing_2": "exp_fishing == 2",
}
type_covariates

[9]:

{'initial_exp_fishing_0': 'exp_fishing == 0',
 'initial_exp_fishing_1': 'exp_fishing == 1',
 'initial_exp_fishing_2': 'exp_fishing == 2'}

In the next step, we put all pieces together to get the complete model specification.

[10]:

params = params.append([initial_exp_fishing, endowments, type_probabilities])
params

[10]:

		value
category	name
delta	delta	0.9500
wage_fishing	exp_fishing	0.3000
nonpec_fishing	constant	-0.2000
nonpec_hammock	constant	2.0000
shocks_sdcorr	sd_fishing	0.5000
	sd_hammock	0.5000
	corr_hammock_fishing	0.0000
initial_exp_fishing_0	probability	0.3300
initial_exp_fishing_1	probability	0.3300
initial_exp_fishing_2	probability	0.3400
wage_fishing	type_1	0.2000
wage_fishing	type_2	0.4000
type_1	initial_exp_fishing_0	-90.0000
	initial_exp_fishing_1	9.5945
	initial_exp_fishing_2	8.9014
type_2	initial_exp_fishing_0	-90.0000
	initial_exp_fishing_1	8.9014
	initial_exp_fishing_2	9.5945

[11]:

options["covariates"] = {**options["covariates"], **type_covariates}
options["simulation_agents"] = 10_000
options

[11]:

{'solution_draws': 100,
 'solution_seed': 456,
 'n_periods': 5,
 'simulation_agents': 10000,
 'simulation_seed': 132,
 'estimation_draws': 100,
 'estimation_seed': 100,
 'estimation_tau': 0.001,
 'interpolation_points': -1,
 'covariates': {'constant': '1',
  'initial_exp_fishing_0': 'exp_fishing == 0',
  'initial_exp_fishing_1': 'exp_fishing == 1',
  'initial_exp_fishing_2': 'exp_fishing == 2'}}

Let us simulate a dataset to see whether the distribution of types can be recovered from the data.

[12]:

simulate = rp.get_simulate_func(params, options)
df = simulate(params)

[13]:

df.query("Period == 0").groupby("Experience_Fishing").Type.value_counts(normalize="rows").unstack().fillna(0)

[13]:

Type	0	1	2
Experience_Fishing
0	1.000000	0.000000	0.000000
1	0.000000	0.665548	0.334452
2	0.000296	0.330278	0.669426

We also know that type 1 and 2 experience a higher utility for choosing fishing. Here are the choice probabilities for each type.

[14]:

df.groupby("Type").Choice.value_counts(normalize=True).unstack()

[14]:

Choice	fishing	hammock
Type
0	0.426571	0.573429
1	0.992602	0.007398
2	0.998036	0.001964