respy.likelihood#

Everything related to the estimation with maximum likelihood.

Module Contents#

Functions#

get_log_like_func(params, options, df[, return_scalar])

Get the criterion function for maximum likelihood estimation.

log_like(params, df, base_draws_est, solve, ...)

Criterion function for the likelihood maximization.

_internal_log_like_obs(state_space, df, ...)

Calculate the likelihood contribution of each individual in the sample.

_compute_wage_and_choice_log_likelihood_contributions(df, ...)

Compute wage and choice log likelihood contributions.

_compute_log_type_probabilities(df, optim_paras, options)

Compute the log type probabilities.

_compute_x_beta_for_type_probabilities(df, ...)

Compute the vector dot product of type covariates and type coefficients.

_logsumexp(x)

Compute logsumexp of x.

_simulate_log_probability_of_individuals_observed_choice(...)

Simulate the probability of observing the agent's choice.

_process_estimation_data(df, state_space, optim_paras, ...)

Process estimation data.

_update_optim_paras_with_initial_experience_levels(...)

Adjust the initial experience levels in optim_paras from the data.

_create_comparison_plot_data(df, ...)

Create DataFrame for estimagic's comparison plot.

_map_choice_codes_to_indices_of_valid_choice_set(...)

Map choice codes to the indices of the valid choice set.

respy.likelihood.get_log_like_func(params, options, df, return_scalar=True)[source]#

Get the criterion function for maximum likelihood estimation.

Return a version of the likelihood functions in respy where all arguments except the parameter vector are fixed with functools.partial(). Thus the function can be directly passed into an optimizer or a function for taking numerical derivatives.

Parameters:
paramspandas.DataFrame

DataFrame containing model parameters.

optionsdict

Dictionary containing model options.

dfpandas.DataFrame

The model is fit to this dataset.

return_scalarbool, default False

Indicator for whether the mean log likelihood should be returned. If False will return a dictionary with the following key and value pairs: - “value”: mean log likelihood (float) - “contributions”: log likelihood contributions (numpy.array) - “comparison_plot_data” : DataFrame with various contributions for the visualization with estimagic. Data contains the following columns:

  • identifier : Individual identifiers derived from input df.

  • period : Periods derived from input df.

  • choice : Choice that value is connected to.

  • value : Value of log likelihood contribution.

  • kind : Kind of contribution (e.g choice or wage).

  • type and log_type_probability`: Will be included in models with

types.

Returns:
criterion_functionlog_like()

Criterion function where all arguments except the parameter vector are set.

Raises:
AssertionError

If data has not the expected format.

Examples

>>> import respy as rp
>>> params, options, data = rp.get_example_model("robinson_crusoe_basic")

At default the function returns the log likelihood as a scalar value.

>>> log_like = rp.get_log_like_func(params=params, options=options, df=data)
>>> scalar = log_like(params)

Alternatively, a dictionary containing the log likelihood, as well as log likelihood contributions and a pandas.DataFrame can be returned.

>>> log_like = rp.get_log_like_func(params=params, options=options, df=data,
...     return_scalar=False
... )
>>> outputs = log_like(params)
>>> outputs.keys()
dict_keys(['value', 'contributions', 'comparison_plot_data'])
respy.likelihood.log_like(params, df, base_draws_est, solve, type_covariates, options, return_scalar)[source]#

Criterion function for the likelihood maximization.

This function calculates the likelihood contributions of the sample.

Parameters:
paramspandas.Series

Parameter Series

dfpandas.DataFrame

The DataFrame contains choices, log wages, the indices of the states for the different types.

base_draws_estnumpy.ndarray

Set of draws to calculate the probability of observed wages.

solvesolve()

Function which solves the model with new parameters.

optionsdict

Contains model options.

respy.likelihood._internal_log_like_obs(state_space, df, base_draws_est, type_covariates, optim_paras, options)[source]#

Calculate the likelihood contribution of each individual in the sample.

The function calculates all likelihood contributions for all observations in the data which means all individual-period-type combinations.

Then, likelihoods are accumulated within each individual and type over all periods. After that, the result is multiplied with the type-specific shares which yields the contribution to the likelihood for each individual.

Parameters:
state_spaceStateSpace

Class of state space.

dfpandas.DataFrame

The DataFrame contains choices, log wages, the indices of the states for the different types.

base_draws_estnumpy.ndarray

Array with shape (n_periods, n_draws, n_choices) containing i.i.d. draws from standard normal distributions.

type_covariatespandas.DataFrame or None

If the model includes types, this is a pandas.DataFrame containing the covariates to compute the type probabilities.

optim_parasdict

Dictionary with quantities that were extracted from the parameter vector.

optionsdict

Options of the model.

Returns:
contribsnumpy.ndarray

Array with shape (n_individuals,) containing contributions of individuals in the empirical data.

dfpandas.DataFrame

Contains log wages, choices and

respy.likelihood._compute_wage_and_choice_log_likelihood_contributions(df, base_draws_est, wages, nonpecs, continuation_values, choice_set, optim_paras, options)[source]#

Compute wage and choice log likelihood contributions.

respy.likelihood._compute_log_type_probabilities(df, optim_paras, options)[source]#

Compute the log type probabilities.

respy.likelihood._compute_x_beta_for_type_probabilities(df, optim_paras, options)[source]#

Compute the vector dot product of type covariates and type coefficients.

For each individual, compute as many vector dot products as there are types. The scalars are later passed to a softmax function to compute the type probabilities. The probability for each individual to be some type.

respy.likelihood._logsumexp(x)[source]#

Compute logsumexp of x.

The function does the same as the following code, but faster.

log_sum_exp = np.max(x) + np.log(np.sum(np.exp(x - np.max(x))))

The subtraction of the maximum prevents overflows and mitigates the impact of underflows.

respy.likelihood._simulate_log_probability_of_individuals_observed_choice(wages, nonpec, continuation_values, draws, delta, choice, tau, smoothed_log_probability)[source]#

Simulate the probability of observing the agent’s choice.

The probability is simulated by iterating over a distribution of unobservables. First, the utility of each choice is computed. Then, the probability of observing the choice of the agent given the maximum utility from all choices is computed.

The naive implementation calculates the log probability for choice i with the softmax function.

\[\log(\text{softmax}(x)_i) = \log\left( \frac{e^{x_i}}{\sum^J e^{x_j}} \right)\]

The following function is numerically more robust. The derivation with the two consecutive logsumexp functions is included in #278.

Parameters:
wagesnumpy.ndarray

Array with shape (n_choices,).

nonpecnumpy.ndarray

Array with shape (n_choices,).

continuation_valuesnumpy.ndarray

Array with shape (n_choices,)

drawsnumpy.ndarray

Array with shape (n_draws, n_choices)

deltafloat

Discount rate.

choiceint

Choice of the agent.

taufloat

Smoothing parameter for choice probabilities.

Returns:
smoothed_log_probabilityfloat

Simulated Smoothed log probability of choice.

respy.likelihood._process_estimation_data(df, state_space, optim_paras, options)[source]#

Process estimation data.

All necessary objects for _internal_log_like_obs() dependent on the data are produced.

Some objects have to be repeated for each type which is a desirable format for the estimation where every observations is weighted by type probabilities.

Parameters:
dfpandas.DataFrame

The DataFrame which contains the data used for estimation. The DataFrame contains individual identifiers, periods, experiences, lagged choices, choices in current period, the wage and other observed data.

indexernumpy.ndarray

Indexer for the core state space.

optim_parasdict
optionsdict
Returns:
choicesnumpy.ndarray

Array with shape (n_observations, n_types) where information is only repeated over the second axis.

idx_indiv_first_obsnumpy.ndarray

Array with shape (n_individuals,) containing indices for the first observations of each individual.

indicesnumpy.ndarray

Array with shape (n_observations, n_types) containing indices for states which correspond to observations.

log_wages_observednumpy.ndarray

Array with shape (n_observations, n_types) containing clipped log wages.

type_covariatesnumpy.ndarray

Array with shape (n_individuals, n_type_covariates) containing covariates to predict probabilities for each type.

respy.likelihood._update_optim_paras_with_initial_experience_levels(optim_paras, df)[source]#

Adjust the initial experience levels in optim_paras from the data.

respy.likelihood._create_comparison_plot_data(df, log_type_probabilities, optim_paras)[source]#

Create DataFrame for estimagic’s comparison plot.

respy.likelihood._map_choice_codes_to_indices_of_valid_choice_set(choices, choice_set)[source]#

Map choice codes to the indices of the valid choice set.

Choice codes are numbering all choices going from 0 to n_choices - 1. In some dense indices not all choices are available and, thus, arrays like wages have only as many columns as available choices. Therefore, we need to number the available choices from 0 to n_available_choices - 1 and replace the old choice codes with the new ones.

Examples

>>> wages = np.arange(4).reshape(2, 2)
>>> choices = np.array([0, 2])
>>> choice_set = (True, False, True)
>>> np.choose(choices, wages)
Traceback (most recent call last):
 ...
ValueError: invalid entry in choice array
>>> new_choices = _map_choice_codes_to_indices_of_valid_choice_set(
...     choices, choice_set
... )
>>> np.choose(new_choices, wages)
array([0, 3])