Solution and Estimation


From a mathematical perspective, this type of model boils down to a finite-horizon DP problem under uncertainty that can be solved by backward induction. For the discussion, it is useful to define the value function \(V(S(t),t)\) as a shorthand for the agents objective function. \(V(S(t),t)\) depends on the state space at \(t\) and on \(t\) itself due to the finiteness of the time horizon and can be written as:

\[V(S(t),t) = \max_{k \in K}\{V_k(S(t),t)\},\]

with \(V_k(S(t),t)\) as the alternative-specific value function. \(V_k(S(t),t)\) obeys the Bellman equation (Bellman, 1957) and is thus amenable to a backward recursion.

\[\begin{split}\begin{align} V_k(S(t),t) = \begin{cases} R_k(S(t)) + \delta E\left[V(S(t + 1), t + 1) \mid S(t), d_k(t) = 1\right] &\text{if } t < T \\ R_k(S(t)) &\text{if } t = T. \end{cases} \end{align}\end{split}\]

Assuming continued optimal behavior, the expected future value of state \(S(t + 1)\) for all \(K\) alternatives given today’s state \(S(t)\) and choice \(d_k(t) = 1\), \(E\max(S(t + 1))\) for short, can be calculated:

\[E\max(S(t + 1)) = E\left[V(S(t + 1), t + 1) \mid S(t), d_k(t) = 1\right].\]

This requires the evaluation of a \(K\) - dimensional integral as future rewards are partly uncertain due to the unknown realization of the shocks:

\[ \begin{align}\begin{aligned} E\max(S(t)) =\hspace{11cm}\\\int_{\epsilon_1(t)} ... \int_{\epsilon_K(t)}\max\{R_1(t), ..., R_K(t)\}f_{\epsilon}(\epsilon_1(t), ... ,\epsilon_K(t))d\epsilon_1(t) ... d\epsilon_K(t),\end{aligned}\end{align} \]

where \(f_{\epsilon}\) is the joint density of the uncertain component of the rewards in \(t\) not known at \(t - 1\). With all ingredients at hand, the solution of the model by backward induction is straightforward.


We estimate the parameters of the reward functions \(\theta\) based on a sample of agents whose behavior and state experiences are described by the model. Although all shocks to the rewards are eventually known to the agent, they remain unobserved by the econometrician. So each parameterization induces a different probability distribution over the sequence of observed agent choices and their state experience. We implement maximum likelihood estimation and appraise each candidate parameterization of the model using the likelihood function of the observed sample (Fisher, 1922). Given the serial independence of the shocks, We can compute the likelihood contribution by agent and period. The sample likelihood is then just the product of the likelihood contributions over all agents and time periods. As we need to simulate the agent’s choice probabilities, we end up with a simulated maximum likelihood estimator (Manski and Lerman, 1977) and minimize the simulated negative log-likelihood of the observed sample.