View and download the notebook here!

Maximum Likelihood Criterion#

The respy interface supports two different types of estimation for parameter calbiration:

  1. (Simulated) maximum likelihood estimation

  2. Method of simulated moments estimation

To calibrate a model, you can derive a criterion functions using params, options, and empirical data. That criterion function can then be passed on to an optimizer like those provided by estimagic. This guide outlines the construction of a criterion function for simulated maximum likelihood estimation. See the guide below for the guide on the method of simulated moments.

How-to Guide Contstruct a criterion function using the method of simulated moments.

To start off, we load an example model as usual.

[1]:
import respy as rp
import pandas as pd
[2]:
params, options, data = rp.get_example_model("robinson_crusoe_basic")

The log likelihood function#

The criterion for maximum likelihood estimation is constructed in two steps. The respy function get_log_like_func takes the inputs params, options, and df to construct a function that only depends on the parameter vector. This function can then be passed to an optimizer to calibrate the model parameters.

[3]:
log_like = rp.get_log_like_func(params=params, options=options, df=data)
scalar = log_like(params)
scalar
[3]:
-5.494678164823001

By default, the function returns a scalar value given by the mean log likelihood. To return the log likelihood contributions, set the argument return_scalar to False. The function will the return a dictionary containing the scalar value, contributions, and a pandas.DataFrame which can be used for visualization purposes.

[6]:
log_like_contribs = rp.get_log_like_func(params=params, options=options, df=data, return_scalar=False)
outputs = log_like_contribs(params)
outputs.keys()
[6]:
dict_keys(['value', 'contributions', 'comparison_plot_data'])
[7]:
outputs["value"]
[7]:
-5.494678164823001
[11]:
outputs["contributions"][0:10]
[11]:
array([-1.12998713, -1.16105606, -8.14899502, -1.18885353, -6.5085553 ,
       -1.22019297, -7.125007  , -5.29376864, -7.4765499 , -4.82486523])

The DataFrame saved under the key comparison_plot_data lists the individual contributions of each observation split up by choices and wages and is suited for estimagic’s visualization capabilities.

[12]:
outputs["comparison_plot_data"].head()
[12]:
identifier period choice value kind
0 0 0 hammock -0.597872 choice
1 0 1 hammock -0.248358 choice
2 0 2 hammock -0.127806 choice
3 0 3 hammock -0.083382 choice
4 0 4 hammock -0.072571 choice

options: The smoothing parameter \(\tau\)#

The choice probabilities in the likelihood function are simulated, as there exists no closed-form solution for them. Application of a basic accept-reject (AR) simulator poses two challenges.

  1. There is the ocurrance of zero probability simulation for low probability events which causes problems for the evaluation of the log-likelihood.

  2. The choice probabilities are not smooth in the parameters and instead are a step function.

McFadden (1989) introduces a class of smoothed AR simulators. The logit-smoothed AR simulator is the most popular one and also implemented in respy. The implementation uses the see softmax function to compute choice probabilities and requires to specify the smoothing (also called temperature) parameter \(\tau\).

For \(\tau \to \infty\) all choices become equiprobable whereas for \(\tau \to 0\) some choices receive a zero probability which is not desirable while using gradient-based numerical optimization methods.

The parameter has a huge impact on the log likelihood of a sample and seems to be model-dependent. In Keane and Wolpin (1994) and related literature, the parameter is set to 500. We recommend to test different values ranging from >0 to 500. Lower values are only possible because respy computes the log likelihood solely in the log-space and uses robust methods to avoid under- and overflows.

The parameter \(\tau\) can be specified in the respy options.

[4]:
options["estimation_tau"]
[4]:
0.001

Note that this is not the only tuning parameter which affects the likelihood function. You also need to be mindful of options like the solution_draws, estimation_draws, and number of simulated agents (simulation_agents) when specifying the likelihood function.

How-to Guide To learn more about the model options see the guide Specifying a Model.

References#

  • Keane, M. P., & Wolpin, K. I. (1994). The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo Evidence. The Review of Economics and Statistics, 648-672.

  • McFadden, D. (1989). A method of simulated moments for estimation of discrete response models without numerical integration. Econometrica, 57(5), 995-1026.