Tutorial - Model

We now illustrate the basic capabilities of the respy package in a simple tutorial.

The model specification

In order to perform simulation and/or estimation using respy package a model specification is needed. It consists of two files: the parameter specification contains initial parameter values and the options specification includes important data set dimentions, arguments for the optimization algorythms, etc.. Details on the components of the model specification are presented in the section Model specification. Please note that the two specification files should be in your current working directory or another accessible directory when executing the commands and scripts discussed below.

Example

Now we can explore the basic functionalities of the respy package based on a simple example.

[1]:
import os
import respy
import shutil

from pathlib import Path
[2]:
# Create temporary directory and walk into it, so that the output does not
# clutter your directory.
temp_dir = Path("__tutorial__").resolve()
if temp_dir.exists():
    shutil.rmtree(temp_dir)
temp_dir.mkdir()
os.chdir(temp_dir)
[3]:
# Get an exemplary model specification.
options_spec, params_spec = respy.get_example_model("kw_data_one")

The options specification shows all variable arguments to the model which are not part of the optimization process like the number of periods in the model or the optimizer to fit the model to data.

[4]:
options_spec
[4]:
{'estimation': {'file': 'data.respy.dat',
  'maxfun': 1000,
  'agents': 1000,
  'draws': 200,
  'optimizer': 'FORT-BOBYQA',
  'seed': 500,
  'tau': 500.0},
 'simulation': {'file': 'data', 'agents': 1000, 'seed': 132},
 'program': {'debug': False, 'procs': 1, 'threads': 1, 'version': 'fortran'},
 'interpolation': {'flag': False, 'points': 200},
 'solution': {'store': True, 'seed': 456, 'draws': 500},
 'preconditioning': {'minimum': 1e-05, 'type': 'magnitudes', 'eps': 0.0001},
 'derivatives': 'forward-differences',
 'edu_spec': {'lagged': [1.0], 'start': [10], 'share': [1.0], 'max': 20},
 'num_periods': 40,
 'FORT-NEWUOA': {'maxfun': 1000000, 'npt': 1, 'rhobeg': 1.0, 'rhoend': 1e-06},
 'FORT-BFGS': {'eps': 0.0001, 'gtol': 1e-05, 'maxiter': 10, 'stpmx': 100.0},
 'FORT-BOBYQA': {'maxfun': 1000000, 'npt': 1, 'rhobeg': 1.0, 'rhoend': 1e-06},
 'SCIPY-BFGS': {'eps': 0.0001, 'gtol': 0.0001, 'maxiter': 1},
 'SCIPY-POWELL': {'ftol': 0.0001,
  'maxfun': 100000,
  'maxiter': 1,
  'xtol': 0.0001},
 'SCIPY-LBFGSB': {'eps': 4.41037423e-07,
  'factr': 30.401091854739622,
  'm': 5,
  'maxiter': 2,
  'maxls': 2,
  'pgtol': 8.6554171164e-05}}
[5]:
# We use the Python version for compatibility.
options_spec["program"]["version"] = "python"
# We need to change from a Fortran to a Python optimizer
options_spec["estimation"]["optimizer"] = "SCIPY-LBFGSB"
# We limit the model to five periods to make runtime shorter
# and to avoid memory errors on mybinder.org.
options_spec["num_periods"] = 5

The parameter specification includes all parameters of the model which are affected by the optimization routine.

[6]:
params_spec.head(5)
[6]:
category name para fixed lower upper comment
0 delta delta 0.950 False 0.7 1.0 discount factor
1 coeffs_common return_hs_degree 0.000 False NaN NaN return to high school degree (non pecuniary)
2 coeffs_common return_col_degree 0.000 False NaN NaN return to college degree (non pecuniary)
3 coeffs_a skill_price 9.210 False NaN NaN skill rental price if the base skill endowment...
4 coeffs_a return_schooling 0.038 False NaN NaN linear return to an additional year of schooli...
[7]:
# Instantiate the respy model class with parameters and options
model = respy.RespyCls(params_spec, options_spec)
[8]:
# Simulate a sample from the specified model
model, df = model.simulate()
[9]:
# Set maximum number of function evaluations to 5
model.attr["maxfun"] = 5
[10]:
# Estimate the model using the simulated data as an observed sample
x, crit_val = model.fit()
[11]:
# Simulate a sample based on the estimated parameters
model.update_optim_paras(x)
model, df = model.simulate()
[12]:
# Step out of the folder and delete it.
os.chdir(temp_dir.parent)
shutil.rmtree(temp_dir)

The simulation and estimation functionalities of the respy package can also be used separately. To perform a simulation only an initialization file, as discussed above, is required. To directly estimate the model parameters your working directory has to contain the initialization file and your data set. Here we are using the simulated data for the estimation. However, you can of course also use other data sources. Just make sure they follow the layout of the simulated sample as visible in data.respy.dat. For more information on the required structure of the dataset see Model specification. The coefficient values in the initialization file serve as the starting values.

Output Files

During the script execution, several files will appear in the current working directory. First, we outline the files generated during the initial simulation.

  • data.respy.sol

Records the progress of the backward induction procedure. If the interpolation method is used during the backward induction procedure, the coefficient estimates and goodness of fit statistics are provided.

  • data.respy.pkl

This file is an instance of the RespyCls and contains detailed information about the solution of model such as the \(E\max\) of each state for example. For details, please consult the source code directly. It is created if persistent storage of results is requested in the SOLUTION section of the initialization file.

  • data.respy.sim

Allows to monitor the progress of the simulation. It provides information about the seed used to sample the random components of the agents’ state experience and the total number of simulated agents.

  • data.respy.dat

Contains the simulated data on agents’ choices and state experiences. It has the following structure:

Column

Information

1

agent identifier

2

time period

3

choice (1 = Occupation A, 2 = Occupation B, 3 = education, 4 = home)

4

wages (missing value if not working)

5

work experience in Occupation A

6

work experience in Occupation B

7

years of schooling

8

lagged choice

9

type number (0 for the whole column, if homogeneous agents)

10 - 13

total rewards - all components

14 - 17

systematic reward - no shock

18 - 21

shock reward - shock component

22

discount rate

23 - 24

general reward - non-monetary rewards and non- common rewards, example cm1 cm2 and alpha for occupation A

25

common reward - indicators assoc with beta 1 and beta 2

26 - 29

immediate reward - period reward

  • data.respy.info

Provides descriptive statistics such as the choice probabilities, the transition matrix, number of agents per period and occupation, and the respective wage distributions. It also prints out the underlying parameterization of the model.

Second, we turn to the estimation output. The fit procedure directly returns the value of the coefficients at the final step of the optimizer, as well as the value of the criterion function. However, some additional files appear in the meantime.

  • est.respy.info

Allows to monitor the estimation as it progresses. It provides information about starting values, step values, and current values as well as the corresponding value of the criterion function.

  • est.respy.log

Documents details about the estimation procedure. Provides information on the preconditioning of the parameters including the original parameter value, the scaling factor and the rescaled parameter. Further, details about each of the evaluations of the criterion function are included. Most importantly, once an estimation is completed, it provides the return message from the optimizer.

Third, additional information is provided in two further generated files:

  • scaling.respy.out

  • solution.respy.pkl

Finally, when a second simulation is performed, now based on the parameter estimates, the existing simulation output files are replaced by new ones referring to the current simulation run.