{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Maximum Likelihood Criterion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **respy** interface supports two different types of estimation for parameter calbiration:\n", "\n", "1. (Simulated) maximum likelihood estimation\n", "2. Method of simulated moments estimation\n", "\n", "To calibrate a model, you can derive a criterion functions using `params`, `options`, and empirical data. That criterion function can then be passed on to an optimizer like those provided by [estimagic](https://estimagic.readthedocs.io). This guide outlines the construction of a criterion function for simulated maximum likelihood estimation. See the guide below for the guide on the method of simulated moments." ] }, { "cell_type": "raw", "metadata": {}, "source": [ "
\n", " How-to Guide\n", "\n", " Contstruct a criterion function using the method of simulated moments.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To start off, we load an example model as usual." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import respy as rp\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "params, options, data = rp.get_example_model(\"robinson_crusoe_basic\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The log likelihood function\n", "\n", "The criterion for maximum likelihood estimation is constructed in two steps. The **respy** function `get_log_like_func` takes the inputs `params`, `options`, and `df` to construct a function that only depends on the parameter vector. This function can then be passed to an optimizer to calibrate the model parameters." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-5.494678164823001" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "log_like = rp.get_log_like_func(params=params, options=options, df=data)\n", "scalar = log_like(params)\n", "scalar" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, the function returns a scalar value given by the mean log likelihood. To return the log likelihood contributions, set the argument `return_scalar` to `False`. The function will the return a dictionary containing the scalar value, contributions, and a pandas.DataFrame which can be used for visualization purposes." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['value', 'contributions', 'comparison_plot_data'])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "log_like_contribs = rp.get_log_like_func(params=params, options=options, df=data, return_scalar=False)\n", "outputs = log_like_contribs(params)\n", "outputs.keys()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-5.494678164823001" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "outputs[\"value\"]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-1.12998713, -1.16105606, -8.14899502, -1.18885353, -6.5085553 ,\n", " -1.22019297, -7.125007 , -5.29376864, -7.4765499 , -4.82486523])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "outputs[\"contributions\"][0:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The DataFrame saved under the key `comparison_plot_data` lists the individual contributions of each observation split up by choices and wages and is suited for [estimagic](https://estimagic.readthedocs.io/en/latest/)'s visualization capabilities." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
identifierperiodchoicevaluekind
000hammock-0.597872choice
101hammock-0.248358choice
202hammock-0.127806choice
303hammock-0.083382choice
404hammock-0.072571choice
\n", "
" ], "text/plain": [ " identifier period choice value kind\n", "0 0 0 hammock -0.597872 choice\n", "1 0 1 hammock -0.248358 choice\n", "2 0 2 hammock -0.127806 choice\n", "3 0 3 hammock -0.083382 choice\n", "4 0 4 hammock -0.072571 choice" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "outputs[\"comparison_plot_data\"].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## options: The smoothing parameter $\\tau$\n", "\n", "The choice probabilities in the likelihood function are simulated, as there exists no closed-form solution for them. Application of a basic accept-reject (AR) simulator poses two challenges. \n", "\n", "1. There is the ocurrance of zero probability simulation for low probability events which causes problems for the evaluation of the log-likelihood.\n", "\n", "2. The choice probabilities are not smooth in the parameters and instead are a step function. \n", "\n", "McFadden (1989) introduces a class of smoothed AR simulators. The logit-smoothed AR simulator is the most popular one and also implemented in **respy**. The implementation uses the see [softmax function](https://en.wikipedia.org/wiki/Softmax_function) to compute choice probabilities and requires to specify the smoothing (also called temperature) parameter $\\tau$. \n", "\n", "For $\\tau \\to \\infty$ all choices become equiprobable whereas for $\\tau \\to 0$ some choices receive a zero probability which is not desirable while using gradient-based numerical optimization methods.\n", "\n", "The parameter has a huge impact on the log likelihood of a sample and seems to be model-dependent. In Keane and Wolpin (1994) and related literature, the parameter is set to 500. We recommend to test different values ranging from >0 to 500. Lower values are only possible because **respy** computes the log likelihood solely in the log-space and uses robust methods to avoid under- and overflows.\n", "\n", "The parameter $\\tau$ can be specified in the **respy** options." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.001" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "options[\"estimation_tau\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that this is not the only tuning parameter which affects the likelihood function. You also need to be mindful of options like the `solution_draws`, `estimation_draws`, and number of simulated agents (`simulation_agents`) when specifying the likelihood function." ] }, { "cell_type": "raw", "metadata": {}, "source": [ "
\n", " How-to Guide\n", "\n", " To learn more about the model options see the guide Specifying a Model.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "\n", "- Keane, M. P., & Wolpin, K. I. (1994). The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo Evidence. *The Review of Economics and Statistics*, 648-672.\n", "\n", "- McFadden, D. (1989). A method of simulated moments for estimation of discrete response models without numerical integration. *Econometrica*, 57(5), 995-1026." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }