:py:mod:`respy.likelihood`
==========================

.. py:module:: respy.likelihood

.. autoapi-nested-parse::

   Everything related to the estimation with maximum likelihood.

   ..
       !! processed by numpydoc !!


Module Contents
---------------


Functions
~~~~~~~~~

.. autoapisummary::

   respy.likelihood.get_log_like_func
   respy.likelihood.log_like
   respy.likelihood._internal_log_like_obs
   respy.likelihood._compute_wage_and_choice_log_likelihood_contributions
   respy.likelihood._compute_log_type_probabilities
   respy.likelihood._compute_x_beta_for_type_probabilities
   respy.likelihood._logsumexp
   respy.likelihood._simulate_log_probability_of_individuals_observed_choice
   respy.likelihood._process_estimation_data
   respy.likelihood._update_optim_paras_with_initial_experience_levels
   respy.likelihood._create_comparison_plot_data
   respy.likelihood._map_choice_codes_to_indices_of_valid_choice_set


.. py:function:: get_log_like_func(params, options, df, return_scalar=True)

   
   Get the criterion function for maximum likelihood estimation.

   Return a version of the likelihood functions in respy where all arguments
   except the parameter vector are fixed with :func:`functools.partial`. Thus the
   function can be directly passed into an optimizer or a function for taking
   numerical derivatives.

   :Parameters:

       **params** : :obj:`pandas.DataFrame`
           DataFrame containing model parameters.

       **options** : :class:`python:dict`
           Dictionary containing model options.

       **df** : :obj:`pandas.DataFrame`
           The model is fit to this dataset.

       **return_scalar** : :ref:`bool <python:bltin-boolean-values>`, default :data:`python:False`
           Indicator for whether the mean log likelihood should be returned. If False will
           return a dictionary with the following key and value pairs:
           - "value": mean log likelihood (float)
           - "contributions": log likelihood contributions (numpy.array)
           - "comparison_plot_data" : DataFrame with various contributions for
           the visualization with estimagic. Data contains the following columns:
               - ``identifier`` : Individual identifiers derived from input df.
               - ``period`` : Periods derived from input df.
               - ``choice`` : Choice that ``value`` is connected to.
               - ``value`` : Value of log likelihood contribution.
               - ``kind`` : Kind of contribution (e.g choice or wage).
               - ``type`` and `log_type_probability``: Will be included in models with
               types.

   :Returns:

       **criterion_function** : :func:`log_like`
           Criterion function where all arguments except the parameter vector are set.


   :Raises:

       :obj:`AssertionError`
           If data has not the expected format.


   .. rubric:: Examples

   >>> import respy as rp
   >>> params, options, data = rp.get_example_model("robinson_crusoe_basic")

   At default the function returns the log likelihood as a scalar value.

   >>> log_like = rp.get_log_like_func(params=params, options=options, df=data)
   >>> scalar = log_like(params)

   Alternatively, a dictionary containing the log likelihood, as well as
   log likelihood contributions and a :class:`pandas.DataFrame` can be returned.

   >>> log_like = rp.get_log_like_func(params=params, options=options, df=data,
   ...     return_scalar=False
   ... )
   >>> outputs = log_like(params)
   >>> outputs.keys()
   dict_keys(['value', 'contributions', 'comparison_plot_data'])


   ..
       !! processed by numpydoc !!

.. py:function:: log_like(params, df, base_draws_est, solve, type_covariates, options, return_scalar)

   
   Criterion function for the likelihood maximization.

   This function calculates the likelihood contributions of the sample.

   :Parameters:

       **params** : :obj:`pandas.Series`
           Parameter Series

       **df** : :obj:`pandas.DataFrame`
           The DataFrame contains choices, log wages, the indices of the states for the
           different types.

       **base_draws_est** : :obj:`numpy.ndarray`
           Set of draws to calculate the probability of observed wages.

       **solve** : :func:`~respy.solve.solve`
           Function which solves the model with new parameters.

       **options** : :class:`python:dict`
           Contains model options.


   ..
       !! processed by numpydoc !!

.. py:function:: _internal_log_like_obs(state_space, df, base_draws_est, type_covariates, optim_paras, options)

   
   Calculate the likelihood contribution of each individual in the sample.

   The function calculates all likelihood contributions for all observations in the
   data which means all individual-period-type combinations.

   Then, likelihoods are accumulated within each individual and type over all periods.
   After that, the result is multiplied with the type-specific shares which yields the
   contribution to the likelihood for each individual.

   :Parameters:

       **state_space** : :class:`~respy.state_space.StateSpace`
           Class of state space.

       **df** : :obj:`pandas.DataFrame`
           The DataFrame contains choices, log wages, the indices of the states for the
           different types.

       **base_draws_est** : :obj:`numpy.ndarray`
           Array with shape (n_periods, n_draws, n_choices) containing i.i.d. draws from
           standard normal distributions.

       **type_covariates** : :obj:`pandas.DataFrame` or :data:`python:None`
           If the model includes types, this is a :class:`pandas.DataFrame` containing the
           covariates to compute the type probabilities.

       **optim_paras** : :class:`python:dict`
           Dictionary with quantities that were extracted from the parameter vector.

       **options** : :class:`python:dict`
           Options of the model.

   :Returns:

       **contribs** : :obj:`numpy.ndarray`
           Array with shape (n_individuals,) containing contributions of individuals in the
           empirical data.

       **df** : :obj:`pandas.DataFrame`
           Contains log wages, choices and


   ..
       !! processed by numpydoc !!

.. py:function:: _compute_wage_and_choice_log_likelihood_contributions(df, base_draws_est, wages, nonpecs, continuation_values, choice_set, optim_paras, options)

   
   Compute wage and choice log likelihood contributions.


   ..
       !! processed by numpydoc !!

.. py:function:: _compute_log_type_probabilities(df, optim_paras, options)

   
   Compute the log type probabilities.


   ..
       !! processed by numpydoc !!

.. py:function:: _compute_x_beta_for_type_probabilities(df, optim_paras, options)

   
   Compute the vector dot product of type covariates and type coefficients.

   For each individual, compute as many vector dot products as there are types. The
   scalars are later passed to a softmax function to compute the type probabilities.
   The probability for each individual to be some type.


   ..
       !! processed by numpydoc !!

.. py:function:: _logsumexp(x)

   
   Compute logsumexp of `x`.

   The function does the same as the following code, but faster.

   .. code-block:: python

       log_sum_exp = np.max(x) + np.log(np.sum(np.exp(x - np.max(x))))

   The subtraction of the maximum prevents overflows and mitigates the impact of
   underflows.


   ..
       !! processed by numpydoc !!

.. py:function:: _simulate_log_probability_of_individuals_observed_choice(wages, nonpec, continuation_values, draws, delta, choice, tau, smoothed_log_probability)

   
   Simulate the probability of observing the agent's choice.

   The probability is simulated by iterating over a distribution of unobservables.
   First, the utility of each choice is computed. Then, the probability of observing
   the choice of the agent given the maximum utility from all choices is computed.

   The naive implementation calculates the log probability for choice `i` with the
   softmax function.

   .. math::

       \log(\text{softmax}(x)_i) = \log\left(
           \frac{e^{x_i}}{\sum^J e^{x_j}}
       \right)

   The following function is numerically more robust. The derivation with the two
   consecutive `logsumexp` functions is included in `#278
   <https://github.com/OpenSourceEconomics/respy/pull/288>`_.

   :Parameters:

       **wages** : :obj:`numpy.ndarray`
           Array with shape (n_choices,).

       **nonpec** : :obj:`numpy.ndarray`
           Array with shape (n_choices,).

       **continuation_values** : :obj:`numpy.ndarray`
           Array with shape (n_choices,)

       **draws** : :obj:`numpy.ndarray`
           Array with shape (n_draws, n_choices)

       **delta** : :class:`python:float`
           Discount rate.

       **choice** : :class:`python:int`
           Choice of the agent.

       **tau** : :class:`python:float`
           Smoothing parameter for choice probabilities.

   :Returns:

       **smoothed_log_probability** : :class:`python:float`
           Simulated Smoothed log probability of choice.


   ..
       !! processed by numpydoc !!

.. py:function:: _process_estimation_data(df, state_space, optim_paras, options)

   
   Process estimation data.

   All necessary objects for :func:`_internal_log_like_obs` dependent on the data are
   produced.

   Some objects have to be repeated for each type which is a desirable format for the
   estimation where every observations is weighted by type probabilities.

   :Parameters:

       **df** : :obj:`pandas.DataFrame`
           The DataFrame which contains the data used for estimation. The DataFrame
           contains individual identifiers, periods, experiences, lagged choices, choices
           in current period, the wage and other observed data.

       **indexer** : :obj:`numpy.ndarray`
           Indexer for the core state space.

       **optim_paras** : :class:`python:dict`
           ..

       **options** : :class:`python:dict`
           ..

   :Returns:

       **choices** : :obj:`numpy.ndarray`
           Array with shape (n_observations, n_types) where information is only repeated
           over the second axis.

       **idx_indiv_first_obs** : :obj:`numpy.ndarray`
           Array with shape (n_individuals,) containing indices for the first observations
           of each individual.

       **indices** : :obj:`numpy.ndarray`
           Array with shape (n_observations, n_types) containing indices for states which
           correspond to observations.

       **log_wages_observed** : :obj:`numpy.ndarray`
           Array with shape (n_observations, n_types) containing clipped log wages.

       **type_covariates** : :obj:`numpy.ndarray`
           Array with shape (n_individuals, n_type_covariates) containing covariates to
           predict probabilities for each type.


   ..
       !! processed by numpydoc !!

.. py:function:: _update_optim_paras_with_initial_experience_levels(optim_paras, df)

   
   Adjust the initial experience levels in optim_paras from the data.


   ..
       !! processed by numpydoc !!

.. py:function:: _create_comparison_plot_data(df, log_type_probabilities, optim_paras)

   
   Create DataFrame for estimagic's comparison plot.


   ..
       !! processed by numpydoc !!

.. py:function:: _map_choice_codes_to_indices_of_valid_choice_set(choices, choice_set)

   
   Map choice codes to the indices of the valid choice set.

   Choice codes are numbering all choices going from 0 to `n_choices` - 1. In some
   dense indices not all choices are available and, thus, arrays like wages have only
   as many columns as available choices. Therefore, we need to number the available
   choices from 0 to `n_available_choices` - 1 and replace the old choice codes with
   the new ones.


   .. rubric:: Examples

   >>> wages = np.arange(4).reshape(2, 2)
   >>> choices = np.array([0, 2])
   >>> choice_set = (True, False, True)

   >>> np.choose(choices, wages)
   Traceback (most recent call last):
    ...
   ValueError: invalid entry in choice array

   >>> new_choices = _map_choice_codes_to_indices_of_valid_choice_set(
   ...     choices, choice_set
   ... )
   >>> np.choose(new_choices, wages)
   array([0, 3])


   ..
       !! processed by numpydoc !!