Tutorial - Model¶
We now illustrate the basic capabilities of the respy
package in a simple tutorial.
The model specification¶
In order to perform simulation and/or estimation using respy
package a model specification is needed. It consists of two files: the parameter specification contains initial parameter values and the options specification includes important data set dimentions, arguments for the optimization algorythms, etc.. Details on the components of the model specification are presented in the section Model specification. Please note that the two specification
files should be in your current working directory or another accessible directory when executing the commands and scripts discussed below.
Example¶
Now we can explore the basic functionalities of the respy
package based on a simple example.
[1]:
import os
import respy
import shutil
from pathlib import Path
[2]:
# Create temporary directory and walk into it, so that the output does not
# clutter your directory.
temp_dir = Path("__tutorial__").resolve()
if temp_dir.exists():
shutil.rmtree(temp_dir)
temp_dir.mkdir()
os.chdir(temp_dir)
[3]:
# Get an exemplary model specification.
options_spec, params_spec = respy.get_example_model("kw_data_one")
The options specification shows all variable arguments to the model which are not part of the optimization process like the number of periods in the model or the optimizer to fit the model to data.
[4]:
options_spec
[4]:
{'estimation': {'file': 'data.respy.dat',
'maxfun': 1000,
'agents': 1000,
'draws': 200,
'optimizer': 'FORT-BOBYQA',
'seed': 500,
'tau': 500.0},
'simulation': {'file': 'data', 'agents': 1000, 'seed': 132},
'program': {'debug': False, 'procs': 1, 'threads': 1, 'version': 'fortran'},
'interpolation': {'flag': False, 'points': 200},
'solution': {'store': True, 'seed': 456, 'draws': 500},
'preconditioning': {'minimum': 1e-05, 'type': 'magnitudes', 'eps': 0.0001},
'derivatives': 'forward-differences',
'edu_spec': {'lagged': [1.0], 'start': [10], 'share': [1.0], 'max': 20},
'num_periods': 40,
'FORT-NEWUOA': {'maxfun': 1000000, 'npt': 1, 'rhobeg': 1.0, 'rhoend': 1e-06},
'FORT-BFGS': {'eps': 0.0001, 'gtol': 1e-05, 'maxiter': 10, 'stpmx': 100.0},
'FORT-BOBYQA': {'maxfun': 1000000, 'npt': 1, 'rhobeg': 1.0, 'rhoend': 1e-06},
'SCIPY-BFGS': {'eps': 0.0001, 'gtol': 0.0001, 'maxiter': 1},
'SCIPY-POWELL': {'ftol': 0.0001,
'maxfun': 100000,
'maxiter': 1,
'xtol': 0.0001},
'SCIPY-LBFGSB': {'eps': 4.41037423e-07,
'factr': 30.401091854739622,
'm': 5,
'maxiter': 2,
'maxls': 2,
'pgtol': 8.6554171164e-05}}
[5]:
# We use the Python version for compatibility.
options_spec["program"]["version"] = "python"
# We need to change from a Fortran to a Python optimizer
options_spec["estimation"]["optimizer"] = "SCIPY-LBFGSB"
# We limit the model to five periods to make runtime shorter
# and to avoid memory errors on mybinder.org.
options_spec["num_periods"] = 5
The parameter specification includes all parameters of the model which are affected by the optimization routine.
[6]:
params_spec.head(5)
[6]:
category | name | para | fixed | lower | upper | comment | |
---|---|---|---|---|---|---|---|
0 | delta | delta | 0.950 | False | 0.7 | 1.0 | discount factor |
1 | coeffs_common | return_hs_degree | 0.000 | False | NaN | NaN | return to high school degree (non pecuniary) |
2 | coeffs_common | return_col_degree | 0.000 | False | NaN | NaN | return to college degree (non pecuniary) |
3 | coeffs_a | skill_price | 9.210 | False | NaN | NaN | skill rental price if the base skill endowment... |
4 | coeffs_a | return_schooling | 0.038 | False | NaN | NaN | linear return to an additional year of schooli... |
[7]:
# Instantiate the respy model class with parameters and options
model = respy.RespyCls(params_spec, options_spec)
[8]:
# Simulate a sample from the specified model
model, df = model.simulate()
[9]:
# Set maximum number of function evaluations to 5
model.attr["maxfun"] = 5
[10]:
# Estimate the model using the simulated data as an observed sample
x, crit_val = model.fit()
[11]:
# Simulate a sample based on the estimated parameters
model.update_optim_paras(x)
model, df = model.simulate()
[12]:
# Step out of the folder and delete it.
os.chdir(temp_dir.parent)
shutil.rmtree(temp_dir)
The simulation and estimation functionalities of the respy
package can also be used separately. To perform a simulation only an initialization file, as discussed above, is required. To directly estimate the model parameters your working directory has to contain the initialization file and your data set. Here we are using the simulated data for the estimation. However, you can of course also use other data sources. Just make sure they follow the layout of the simulated sample as visible in
data.respy.dat
. For more information on the required structure of the dataset see Model specification. The coefficient values in the initialization file serve as the starting values.
Output Files
During the script execution, several files will appear in the current working directory. First, we outline the files generated during the initial simulation.
data.respy.sol
Records the progress of the backward induction procedure. If the interpolation method is used during the backward induction procedure, the coefficient estimates and goodness of fit statistics are provided.
data.respy.pkl
This file is an instance of the RespyCls
and contains detailed information about the solution of model such as the \(E\max\) of each state for example. For details, please consult the source code directly. It is created if persistent storage of results is requested in the SOLUTION section of the initialization file.
data.respy.sim
Allows to monitor the progress of the simulation. It provides information about the seed used to sample the random components of the agents’ state experience and the total number of simulated agents.
data.respy.dat
Contains the simulated data on agents’ choices and state experiences. It has the following structure:
Column |
Information |
---|---|
1 |
agent identifier |
2 |
time period |
3 |
choice (1 = Occupation A, 2 = Occupation B, 3 = education, 4 = home) |
4 |
wages (missing value if not working) |
5 |
work experience in Occupation A |
6 |
work experience in Occupation B |
7 |
years of schooling |
8 |
lagged choice |
9 |
type number (0 for the whole column, if homogeneous agents) |
10 - 13 |
total rewards - all components |
14 - 17 |
systematic reward - no shock |
18 - 21 |
shock reward - shock component |
22 |
discount rate |
23 - 24 |
general reward - non-monetary rewards and non- common rewards, example cm1 cm2 and alpha for occupation A |
25 |
common reward - indicators assoc with beta 1 and beta 2 |
26 - 29 |
immediate reward - period reward |
data.respy.info
Provides descriptive statistics such as the choice probabilities, the transition matrix, number of agents per period and occupation, and the respective wage distributions. It also prints out the underlying parameterization of the model.
Second, we turn to the estimation output. The fit procedure directly returns the value of the coefficients at the final step of the optimizer, as well as the value of the criterion function. However, some additional files appear in the meantime.
est.respy.info
Allows to monitor the estimation as it progresses. It provides information about starting values, step values, and current values as well as the corresponding value of the criterion function.
est.respy.log
Documents details about the estimation procedure. Provides information on the preconditioning of the parameters including the original parameter value, the scaling factor and the rescaled parameter. Further, details about each of the evaluations of the criterion function are included. Most importantly, once an estimation is completed, it provides the return message from the optimizer.
Third, additional information is provided in two further generated files:
scaling.respy.out
solution.respy.pkl
Finally, when a second simulation is performed, now based on the parameter estimates, the existing simulation output files are replaced by new ones referring to the current simulation run.