Randomness and Reproducibility#

respy embraces randomness to study individual behavior under risk. At the same time, it is crucial to make results reproducible. To build a reproducible model, users must define three seeds for the solution, simulation and estimation of the model in the options. This allows to study the impact of randomness for each of the components independently.

options = {"solution_seed": 1, "simulation_seed": 2, "estimation_seed": 3}

The seeds for the solution, simulation and estimation are used to draw a 3-, 5- and 7-digit seed sequence [1]. The first 100 seeds in the sequences are reserved for randomness in the startup of functions like simulate() or log_like(), e.g., to create draws from a uniform distribution. All other seeds are used during the iterations of those functions and reset to the initial value at the begin of every iteration.

As a general rule, models in respy are reproducible or use the same randomness as long as only model parameters are changed, for example utility parameters, but the structure of the model stays the same. The following list includes example of structural changes to the model.

  • Changing the choice set (forms of renaming, removing choices).

  • Changing the initial conditions (experiences, lagged choices, type probabilities).

  • Changing the Monte Carlo integrations (sequence, number of draws).

  • Using interpolation and changing the number of non-interpolated states.

  • Removing states from the state space via filters.

In the following, we document for each module the functions which use seeds to control randomness.

respy.shared#

The function create_base_draws() is used in all parts, solution, simulation, and estimation, to generate random draws. transform_base_draws_with_cholesky_factor() transforms the base draws to the variance-covariance matrix implied by the model parameters.

create_base_draws(shape, seed, ...)

Create a set of draws from the standard normal distribution.

transform_base_draws_with_cholesky_factor(...)

Transform standard normal draws with the Cholesky factor.

respy.solve#

Routines under respy.solve use a seed from the sequence initialized by options["solution_seed"] to control randomness. Apart from the draws, solve() relies on the following function.

_get_not_interpolated_indicator(...)

Get indicator for states which will be not interpolated.

respy.simulate#

Routines under respy.simulate use a seed from the sequence of options["simulation_seed"] to control randomness. Apart from the draws, simulate() relies on the following function to generate starting values for simulated individuals (experiences, types, etc.).

_sample_characteristic(states_df, options, ...)

Sample characteristic of individuals.

respy.likelihood#

Routines under respy.likelihood use a seed from the sequence specified under options["estimation_seed"] to control randomness. The seed is used to create the draws to simulate the probability of observed wages with create_base_draws().

respy.tests.random_model#

The regression tests are run on truncated data set which contains truncated history of individuals or missing wage information. The truncation process is controlled via a seed in the sequence initialized by options["simulation_seed"].

simulate_truncated_data(params, options[, ...])

Simulate a (truncated) dataset.

See also

See Random number generator seed mistakes for a general introduction to seeding problems.

See this comment in the same post which verifies independence between sequential seeds.

NumPy documentation on their RandomState object which wraps the pseudo-random number generator Mersenne Twister.

Footnotes