Mathematical Framework#

Mathematically, the class of EKW models (so the flagship model in Keane and Wolpin (1997, [13])) is set up as a standard Markov decision process (White, 1993, [25]; Puterman, 1994, [19]). Individuals make sequential decisions under uncertainty and aim to implement an optimal policy \(\pi^* \in \Pi\), given the state of the world.

In principle, this requires to evaluate the performance of all policies \(\Pi\) based on all possible sequences of per-period utilities, each of them weighted by the probability with which they occur. Fortunately, the multistage problem can be solved by a sequence of simpler inductively defined single-stage problems. Puterman (1994, Chapter 4, [19]) shows that deterministic Markovian decision rules are always optimal in our presented economic setting. Hence, we will restrict our exposition to this special case.

For any given state \(s_t\), the value function \(v_t^{\pi}(s_t)\) captures the expected total discounted per-period utilities under policy \(\pi\) from period \(t\) onwards

\[v_t^{\pi}(s_t) \equiv \mathbb{E}_{p^{\pi}} \left[ \sum_{j=0}^T \delta^j u_{t+j}(s_{t+j}), a_{t+j}^{\pi}(s_{t+j}) \,\big|\, s_t \right]\]

For any policy \(\pi\) we can determine the value function at the initial state by recursively evaluating

(1)#\[v_t^{\pi}(s_t) = u(s_t, a_t^{\pi}(s_t)) + \delta \,\mathbb{E}_{p^{\pi}} \left[ v_{t+1}^{\pi}(s_{t+1}) \,|\, s_t \right].\]

Invoking the principle of optimality (Bellman, 1954, 1957, [4], [3])) allows to construct the optimal policy \(\pi^*\) by solving the optimality equations for all \(s_t\) recursively

(2)#\[v_t^{\pi^*}(s_t) = \underset{a_t \in \mathcal{A}}{\max} \left\{ u(s_t, a_t^{\pi}(s_t)) + \delta \,\mathbb{E}_{p^{\pi}} \left[ v_{t+1}^{\pi^*}(s_{t+1}) \,|\, s_t \right] \right\}.\]

The optimal value function \(v_t^{\pi^*}\) is the sum of expected discounted utility in \(t\) over remaining assumption given that the optimal policy is implemented in all subsequent periods.


In respy the value functions can be retrieved as part of any model solution. In the case of our Keane and Wolpin (1997) exposition the value functions are indexed via Value_Function_{Alternative}.


The Value Function Iteration Algorithm solves the Markov decision process via backward induction procedure and retrieves the decision rules of the agents. In the final period \(T\) there is no future to take into account, and the optimal action is choosing the alternative with the highest per-period utilities in each states. The preceding decision rules are determined recursively.

Algorithm value function iteration

Value Function Iteration Algorithm#

Solving the Integrated Value Function#

As already suggested, the state space contains a stochastic component \(\epsilon_t\). Equation (2) constitutes the major reason for the computational complexity of DCDP. The integrated value function

(3)#\[\begin{split}\text{Emax}(s_{t}) &\equiv \int_{\mathcal{S}} v_{t+1}^{\pi^*}(s_{t+1}) \, \mathrm{d}p(s_t, a_t) \\ &= \int_{\mathcal{S}} \underset{a \in \mathcal{A}}{\max} \left\{ v_{t+1}^{\pi}(s_{t+1}) \right\} \, \mathrm{d}p(s_t, a_t)\end{split}\]

has no analytical solution, and so the application of numerical methods is mandatory. Keane and Wolpin (1997, [13]) impose two assumptions in order to provide a simplified the expression:

  • Stochastic shocks \(\epsilon_{t}\) are independently and identically distributed over individuals and time (serially uncorrelated), conditional on \(s_{t}\). We will denote their probability density function \(\phi_{\mu, \Sigma}(\epsilon_{t})\).

  • State variables are independent of the realizations of \(\epsilon_{t}\), conditional on decisions. This is the reason we can write \(p(s_t, \epsilon_t, a_t) = p(s_t, a_t)\).

We can reformulate parts of the integrated value function

(4)#\[\int_{\mathcal{S}} \, \underset{a \in \mathcal{A}}{\max} \, \left\{ v_{t+1}^{\pi}(s_{t+1}) \right\} \mathrm{d}p(s_t, a_t) = \int_{\epsilon} \underset{ a \in \mathcal{A}}{\max} \, \left\{ v_{t+1}^{\pi}(s_{t+1}) \right\} \, \mathrm{d}\phi_{\mu, \Sigma}(\epsilon_{t}).\]

This expression is a \((|\mathcal{A} | = 5)\)-dimensional integral which has to be solved for any possible state \(s_{t} \in \mathcal{S}_t\), so in most cases million-wise. [1]

Most of the current implementations use Monte Carlo integration to solve the integral numerically. Judd and Skrainka (2011, [11]) lament the resulting numerical errors and computational instabilities.


To How-to guide

The EMax calculation in respy relies on advanced methods. The use of quasi Monte-Carlo methods mitigates numerical errors and dramatically reduces the time to solve the model. A How-to guide is provided in Numerical Integration.


The formulation in Equation (4) indicates that the computational complexity is governed by the size of the (observable) state space [2] and the multi-dimensionality of the integral. Notably, to retrieve the optimal policy \(\pi^*\) it is necessary to solve the value function at each point of the state space. This demonstrates the so-called ‘’curse of dimensionality’’ (Bellman, 1957, [3]). The number of states grows exponentially with the number of available choices (\(|\mathcal{A}|\)) and linearly in the number of periods. Every possible extension of the Keane and Wolpin (1997) model that affects any of both factors will be computationally more demanding.

A comparison of Keane and Wolpin (1997, [13]) and Keane and Wolpin (2000, [14]) quantifies this link between state space and computational complexity. In Keane and Wolpin (2000, [14]) the authors enrich the model with a dummy variable that captures a binary characteristic of the individual decision-maker. This binary state option increases the state space from initially 52 million states to 104 million states in Keane and Wolpin (2000, [14]) . For a given parameterization of the model it is necessary to evaluate Equation (4) at each of the points.

To Explanation

We present the specification of Keane and Wolpin (1997) in Computational Implementation

Footnotes