.. _randomness-and-reproducibility:
Randomness and Reproducibility
==============================
**respy** embraces randomness to study individual behavior under risk. At the same time,
it is crucial to make results reproducible. To build a reproducible model, users must
define three seeds for the solution, simulation and estimation of the model in the
options. This allows to study the impact of randomness for each of the components
independently.
.. code-block:: python
options = {"solution_seed": 1, "simulation_seed": 2, "estimation_seed": 3}
The seeds for the solution, simulation and estimation are used to draw a 3-, 5- and
7-digit seed sequence [#f1]_. The first 100 seeds in the sequences are reserved for
randomness in the startup of functions like :func:`~respy.simulate.simulate` or
:func:`~respy.likelihood.log_like`, e.g., to create draws from a uniform distribution.
All other seeds are used during the iterations of those functions and reset to the
initial value at the begin of every iteration.
As a general rule, models in **respy** are reproducible or use the same randomness as
long as only model parameters are changed, for example utility parameters, but the
structure of the model stays the same. The following list includes example of structural
changes to the model.
- Changing the choice set (forms of renaming, removing choices).
- Changing the initial conditions (experiences, lagged choices, type probabilities).
- Changing the Monte Carlo integrations (sequence, number of draws).
- Using interpolation and changing the number of non-interpolated states.
- Removing states from the state space via filters.
In the following, we document for each module the functions which use seeds to control
randomness.
respy.shared
------------
.. currentmodule:: respy.shared
The function :func:`create_base_draws` is used in all parts, solution, simulation, and
estimation, to generate random draws. :func:`transform_base_draws_with_cholesky_factor`
transforms the base draws to the variance-covariance matrix implied by the model
parameters.
.. autosummary::
create_base_draws
transform_base_draws_with_cholesky_factor
respy.solve
-----------
Routines under ``respy.solve`` use a seed from the sequence initialized by
``options["solution_seed"]`` to control randomness. Apart from the draws,
:func:`~respy.solve.solve` relies on the following function.
.. currentmodule:: respy.interpolate
.. autosummary::
_get_not_interpolated_indicator
respy.simulate
--------------
Routines under ``respy.simulate`` use a seed from the sequence of
``options["simulation_seed"]`` to control randomness. Apart from the draws,
:func:`~respy.simulate.simulate` relies on the following function to generate
starting values for simulated individuals (experiences, types, etc.).
.. currentmodule:: respy.simulate
.. autosummary::
_sample_characteristic
respy.likelihood
----------------
Routines under ``respy.likelihood`` use a seed from the sequence specified under
``options["estimation_seed"]`` to control randomness. The seed is used to create the
draws to simulate the probability of observed wages with
:func:`~respy.shared.create_base_draws`.
respy.tests.random_model
------------------------
The regression tests are run on truncated data set which contains truncated history of
individuals or missing wage information. The truncation process is controlled via a seed
in the sequence initialized by ``options["simulation_seed"]``.
.. currentmodule:: respy.tests.random_model
.. autosummary::
simulate_truncated_data
.. seealso::
See `Random number generator seed mistakes `_ for a general introduction to seeding
problems.
See `this comment `_ in the same post which
verifies independence between sequential seeds.
NumPy documentation on their `RandomState object `_ which wraps the
pseudo-random number generator `Mersenne Twister `_.
.. rubric:: Footnotes
.. [#f1] The need for seed sequences became apparent in `#268
`_.