respy.shared#

Contains functions which are shared across other modules.

This module should only import from other packages or modules of respy which also do not import from respy itself. This is to prevent circular imports.

Module Contents#

Functions#

aggregate_keane_wolpin_utility(wage, nonpec, ...)

Calculate the utility of Keane and Wolpin models.

create_base_draws(shape, seed, monte_carlo_sequence)

Create a set of draws from the standard normal distribution.

transform_base_draws_with_cholesky_factor(draws, ...)

Transform standard normal draws with the Cholesky factor.

generate_column_dtype_dict_for_estimation(optim_paras)

Generate column labels for data necessary for the estimation.

downcast_to_smallest_dtype(series[, downcast_options])

Downcast the dtype of a pandas.Series to the lowest possible dtype.

compute_covariates(df, definitions[, check_nans, ...])

Compute covariates.

convert_labeled_variables_to_codes(df, optim_paras)

Convert labeled variables to codes.

rename_labels_to_internal(x)

Shorten labels and convert them to lower-case.

rename_labels_from_internal(x)

Shorten labels and convert them to lower-case.

normalize_probabilities(probabilities)

Normalize probabilities such that their sum equals one.

calculate_value_functions_and_flow_utilities(wage, ...)

Calculate the choice-specific value functions and flow utilities.

create_core_state_space_columns(optim_paras)

Create internal column names for the core state space.

create_dense_state_space_columns(optim_paras)

Create internal column names for the dense state space.

create_dense_choice_state_space_columns(optim_paras)

Create internal column names for the dense state space.

create_state_space_columns(optim_paras)

Create names of state space dimensions excluding the period and identifier.

calculate_expected_value_functions(wages, nonpecs, ...)

Calculate the expected maximum of value functions for a set of unobservables.

convert_dictionary_keys_to_dense_indices(dictionary)

Convert the keys to tuples containing integers.

subset_cholesky_factor_to_choice_set(cholesky_factor, ...)

Subset the Cholesky factor to dimensions required by the admissible choice set.

return_core_dense_key(core_idx[, dense])

Return core dense keys in the right format.

pandas_dot(x, beta[, out])

Compute the dot product for a DataFrame and a Series.

map_observations_to_states(states, state_space, ...)

Map observations in data to states.

map_states_to_core_key_and_core_index(states, indexer)

Map states to the core key and core index.

_map_observations_to_dense_index(dense, core_index, ...)

dump_objects(objects, topic, complex_, options)

Dump states.

load_objects(topic, complex_, options)

Load states.

_create_file_name_from_complex_index(topic, complex_)

Create a file name from a complex index.

prepare_cache_directory(options)

Prepare cache directory.

select_valid_choices(choices, choice_set)

Select valid choices.

apply_law_of_motion_for_core(df, optim_paras)

Apply the law of motion for the core dimensions.

get_choice_set_from_complex(complex_tuple)

Select the choice set from a complex tuple.

get_exogenous_from_dense_covariates(dense_covariates, ...)

Select eogenous grid points from dense grid points.

respy.shared.aggregate_keane_wolpin_utility(wage, nonpec, continuation_value, draw, delta)[source]#

Calculate the utility of Keane and Wolpin models.

Note that the function works for working and non-working alternatives as wages are set to one for non-working alternatives such that the draws enter the utility function additively.

Parameters:
wagefloat

Value of the wage component. Note that for non-working alternatives this value is actually zero, but to simplify computations it is set to one.

nonpecfloat

Value of the non-pecuniary component.

continuation_valuefloat

Value of the continuation value which is the expected present-value of the following state.

drawfloat

The shock which enters the enters the reward of working alternatives multiplicatively and of non-working alternatives additively.

deltafloat

The discount factor to calculate the present value of continuation values.

Returns:
alternative_specific_value_functionfloat

The expected present value of an alternative.

flow_utilityfloat

The immediate reward of an alternative.

respy.shared.create_base_draws(shape, seed, monte_carlo_sequence)[source]#

Create a set of draws from the standard normal distribution.

The draws are either drawn randomly or from quasi-random low-discrepancy sequences, i.e., Sobol or Halton.

“random” is used to draw random standard normal shocks for the Monte Carlo integrations or because individuals face random shocks in the simulation.

“halton” or “sobol” can be used to change the sequence for two Monte Carlo integrations. First, the calculation of the expected value function (EMAX) in the solution and the choice probabilities in the maximum likelihood estimation.

For the solution and estimation it is necessary to have the same randomness in every iteration. Otherwise, there is chatter in the simulation, i.e. a difference in simulated values not only due to different parameters but also due to draws (see 10.5 in [1]). At the same time, the variance-covariance matrix of the shocks is estimated along all other parameters and changes every iteration. Thus, instead of sampling draws from a varying multivariate normal distribution, standard normal draws are sampled here and transformed to the distribution specified by the parameters in transform_base_draws_with_cholesky_factor().

Parameters:
shapetuple(int)

Tuple representing the shape of the resulting array.

seedint

Seed to control randomness.

monte_carlo_sequence{“random”, “halton”, “sobol”}

Name of the sequence.

Returns:
drawsnumpy.ndarray

Array with shape (n_choices, n_draws, n_choices).

References

[1]

Train, K. (2009). Discrete Choice Methods with Simulation. Cambridge: Cambridge University Press.

[2]

Lemieux, C. (2009). Monte Carlo and Quasi-Monte Carlo Sampling. New York: Springer Verlag New York.

respy.shared.transform_base_draws_with_cholesky_factor(draws, choice_set, shocks_cholesky, optim_paras)[source]#

Transform standard normal draws with the Cholesky factor.

The standard normal draws are transformed to normal draws with variance-covariance matrix \(\Sigma\) by multiplication with the Cholesky factor \(L\) where \(L^TL = \Sigma\). See chapter 7.4 in [1] for more information.

This function relates to create_base_draws() in the sense that it transforms the unchanging standard normal draws to the distribution with the variance-covariance matrix specified by the parameters.

References

[1]

Gentle, J. E. (2009). Computational statistics (Vol. 308). New York: Springer.

respy.shared.generate_column_dtype_dict_for_estimation(optim_paras)[source]#

Generate column labels for data necessary for the estimation.

respy.shared.downcast_to_smallest_dtype(series, downcast_options=None)[source]#

Downcast the dtype of a pandas.Series to the lowest possible dtype.

By default, variables are converted to signed or unsigned integers. Use "float" to cast variables from float64 to float32.

Be aware that NumPy integers silently overflow which is why conversion to low dtypes should be done after calculations. For example, using numpy.uint8 for an array and squaring the elements leads to silent overflows for numbers higher than 255.

For more information on the dtype boundaries see the NumPy documentation under https://docs.scipy.org/doc/numpy-1.17.0/user/basics.types.html.

respy.shared.compute_covariates(df, definitions, check_nans=False, raise_errors=True)[source]#

Compute covariates.

The function iterates over the definitions of covariates and tries to compute them. It keeps track on how many covariates still need to be computed and stops if the number does not change anymore. This might be due to missing information.

Parameters:
dfpandas.DataFrame

DataFrame with some, maybe not all state space dimensions like period, experiences.

definitionsdict

Keys represent covariates and values are strings passed to df.eval.

check_nansbool, default False

Perform a check whether the variables used to compute the selected covariate do not contain any np.nan. This is necessary in respy.simulate._sample_characteristic() where some characteristics may contain missings.

raise_errorsbool, default True

Whether to raise errors if variables cannot be computed. This option is necessary for, e.g., _sample_characteristic() where not all necessary variables exist and it is not easy to exclude covariates which depend on them.

Returns:
covariatespandas.DataFrame

DataFrame with shape (n_states, n_covariates).

Raises:
Exception

If variables cannot be computed and raise_errors is true.

respy.shared.convert_labeled_variables_to_codes(df, optim_paras)[source]#

Convert labeled variables to codes.

We need to check choice variables and observables for potential labels. The mapping from labels to code can be inferred from the order in optim_paras.

respy.shared.rename_labels_to_internal(x)[source]#

Shorten labels and convert them to lower-case.

respy.shared.rename_labels_from_internal(x)[source]#

Shorten labels and convert them to lower-case.

respy.shared.normalize_probabilities(probabilities)[source]#

Normalize probabilities such that their sum equals one.

Examples

The following probs do not sum to one after dividing by the sum.

>>> probs = np.array([0.3775843411510946, 0.5384246942799851, 0.6522988820635421])
>>> normalize_probabilities(probs)
array([0.24075906, 0.34331568, 0.41592526])
respy.shared.calculate_value_functions_and_flow_utilities(wage, nonpec, continuation_value, draw, delta, value_function, flow_utility)[source]#

Calculate the choice-specific value functions and flow utilities.

To apply aggregate_keane_wolpin_utility() to arrays with arbitrary dimensions, this function uses numba.guvectorize(). One cannot use numba.vectorize() because it does not support multiple return values.

respy.shared.create_core_state_space_columns(optim_paras)[source]#

Create internal column names for the core state space.

respy.shared.create_dense_state_space_columns(optim_paras)[source]#

Create internal column names for the dense state space.

respy.shared.create_dense_choice_state_space_columns(optim_paras)[source]#

Create internal column names for the dense state space.

respy.shared.create_state_space_columns(optim_paras)[source]#

Create names of state space dimensions excluding the period and identifier.

respy.shared.calculate_expected_value_functions(wages, nonpecs, continuation_values, draws, delta, expected_value_functions)[source]#

Calculate the expected maximum of value functions for a set of unobservables.

The function takes an agent and calculates the utility for each of the choices, the ex-post rewards, with multiple draws from the distribution of unobservables and adds the discounted expected maximum utility of subsequent periods resulting from choices. Averaging over all maximum utilities yields the expected maximum utility of this state.

The underlying process in this function is called Monte Carlo integration. The goal is to approximate an integral by evaluating the integrand at randomly chosen points. In this setting, one wants to approximate the m maximum utility of the current state.

Note that wages have the same length as nonpecs despite that wages are only available in some choices. Missing choices are filled with ones. In the case of a choice with wage and without wage, flow utilities are

\[\text{Flow Utility} = \text{Wage} * \epsilon + \text{Non-pecuniary} \text{Flow Utility} = 1 * \epsilon + \text{Non-pecuniary}\]
Parameters:
wagesnumpy.ndarray

Array with shape (n_choices,) containing wages.

nonpecsnumpy.ndarray

Array with shape (n_choices,) containing non-pecuniary rewards.

continuation_valuesnumpy.ndarray

Array with shape (n_choices,) containing expected maximum utility for each choice in the subsequent period.

drawsnumpy.ndarray

Array with shape (n_draws, n_choices).

deltafloat

The discount factor.

Returns:
expected_value_functionsfloat

Expected maximum utility of an agent.

respy.shared.convert_dictionary_keys_to_dense_indices(dictionary)[source]#

Convert the keys to tuples containing integers.

Examples

>>> dictionary = {(0.0, 1): 0, 2: 1}
>>> convert_dictionary_keys_to_dense_indices(dictionary)
{(0, 1): 0, (2,): 1}
respy.shared.subset_cholesky_factor_to_choice_set(cholesky_factor, choice_set)[source]#

Subset the Cholesky factor to dimensions required by the admissible choice set.

Examples

>>> m = np.arange(9).reshape(3, 3)
>>> subset_cholesky_factor_to_choice_set(m, (False, True, False))
array([[4]])
respy.shared.return_core_dense_key(core_idx, dense=False)[source]#

Return core dense keys in the right format.

respy.shared.pandas_dot(x, beta, out=None)[source]#

Compute the dot product for a DataFrame and a Series.

The function computes each product in the dot product separately to limit the impact of converting a Series to an array.

To access the NumPy array, .values is used instead of .to_numpy() because it is faster and the latter avoids problems for extension arrays which are not used here.

Parameters:
xpandas.DataFrame

A DataFrame containing the covariates of the dot product.

betapandas.Series

A Series containing the parameters or coefficients of the dot product.

outnumpy.ndarray or optional

An output array can be passed to the function which is filled instead of allocating a new array.

Returns:
outnumpy.ndarray

Array with shape len(x) which contains the solution of the dot product.

Examples

>>> x = pd.DataFrame(np.arange(10).reshape(5, 2), columns=list("ab"))
>>> beta = pd.Series([1, 2], index=list("ab"))
>>> x.dot(beta).to_numpy()
array([ 2,  8, 14, 20, 26]...
>>> pandas_dot(x, beta)
array([ 2.,  8., 14., 20., 26.])
respy.shared.map_observations_to_states(states, state_space, optim_paras)[source]#

Map observations in data to states.

respy.shared.map_states_to_core_key_and_core_index(states, indexer)[source]#

Map states to the core key and core index.

Parameters:
statesnumpy.ndarray

Multidimensional array containing only core dimensions of states.

indexernumba.typed.Dict

A dictionary with core states as keys and the core key and core index as values.

Returns:
core_keynumpy.ndarray

An array containing the core key. See Core Key.

core_indexnumpy.ndarray

An array containing the core index. See Core Indices.

respy.shared._map_observations_to_dense_index(dense, core_index, dense_covariates_to_dense_index, core_key_and_dense_index_to_dense_key)[source]#
respy.shared.dump_objects(objects, topic, complex_, options)[source]#

Dump states.

respy.shared.load_objects(topic, complex_, options)[source]#

Load states.

respy.shared._create_file_name_from_complex_index(topic, complex_)[source]#

Create a file name from a complex index.

respy.shared.prepare_cache_directory(options)[source]#

Prepare cache directory.

The directory contains the parts of the state space.

respy.shared.select_valid_choices(choices, choice_set)[source]#

Select valid choices.

Examples

>>> select_valid_choices(list("abcde"), (1, 0, 1, 0, 1))
['a', 'c', 'e']
>>> select_valid_choices(list("abc"), (0, 1, 0, 1, 0))
['b']
respy.shared.apply_law_of_motion_for_core(df, optim_paras)[source]#

Apply the law of motion for the core dimensions.

This function only applies the law of motion for core dimensions which are the period, experiences, and previous choices. Depending on the integer-encoded choice in df["choice"], the new state is computed.

Parameters:
dfpandas.DataFrame

The DataFrame contains states with information on the period, experiences, previous choices. The current choice is encoded as an integer in a column named "choice".

optim_parasdict

Contains model parameters.

Returns:
dfpandas.DataFrame

The DataFrame contains the states in the next period.

respy.shared.get_choice_set_from_complex(complex_tuple)[source]#

Select the choice set from a complex tuple.

Parameters:
complex_tupletuple

The complex tuple.

Returns:
The choice set as tuple.
respy.shared.get_exogenous_from_dense_covariates(dense_covariates, optim_paras)[source]#

Select eogenous grid points from dense grid points.

Parameters:
dense_covariatestuple

Dense covariates grid point.

optim_parasdict
Returns:
The exogenous grid tuple