Model Specification¶
In the following, we discuss the model specification in greater detail. In case the model specification is used to simulate a data set, the data generation is based on the chosen parameters. As soon as the estimation procedure is invoked, the values specified in the model specification are used as starting values for the optimization.
The model is specified in two separate files as we differentiate between the parameters of the model and other options. As an example we take the first parametrization of Keane and Wolpin (1994).
Parameter specification¶
The following table shows a parameter specification for respy. The first two columns,
category
and name
, can be used for indexing. para
contains the parameter
value. fixed
indicates whether the parameter is held constant during optimization.
lower
and upper
indicate lower and upper bounds for the parameter which are used
in conjunction with a constrained optimizer. In this example the discount factor is
bounded between 0.7 and 1.0. comment
contains a short description of the parameter.
category 
name 
para 
fixed 
lower 
upper 
comment 

delta 
delta 
0.95 
False 
0.7 
1.0 
discount factor 
coeffs_common 
return_hs_degree 
0.0 
False 
return to high school degree (non pecuniary) 

coeffs_common 
return_col_degree 
0.0 
False 
return to college degree (non pecuniary) 

coeffs_a 
skill_price 
9.21 
False 
skill rental price if the base skill endowment of type 1 is normalized to 0 (wage) 

coeffs_a 
return_schooling 
0.038 
False 
linear return to an additional year of schooling (wage) 

coeffs_a 
exp_a 
0.033 
False 
return to experience, same sector, linear (wage) 

coeffs_a 
exp_a_square 
0.0005 
False 
return to experience, same sector, quadratic (divided by 100) (wage) 

coeffs_a 
exp_b 
0.0 
False 
return to experience, other civilian sector, linear (wage) 

coeffs_a 
exp_b_square 
0.0 
False 
return to experience, other civilian sector, quadratic (divided by 100) (wage) 

coeffs_a 
premium_hs 
0.0 
False 
skill premium of having finished high school (wage) 

coeffs_a 
premium_col 
0.0 
False 
skill premium of having finished college (wage) 

coeffs_a 
age 
0.0 
False 
linear age effect (wage) 

coeffs_a 
minor 
0.0 
False 
effect of being a minor (wage) 

coeffs_a 
not_first 
0.0 
False 
gain of having worked in the same occupation at least once before (wage) 

coeffs_a 
no_switch 
0.0 
False 
gain of remaining in the same occupation as previous period (wage) 

coeffs_a 
constant 
0.0 
False 
constant (non pecuniary) 

coeffs_a 
first 
0.0 
False 
reward of switching to a from other occupation (non pecuniary) 

coeffs_a 
switch 
0.0 
False 
reward of working in a for the first time (non pecuniary) 

coeffs_b 
skill_price 
8.48 
False 
skill rental price if the base skill endowment of type 1 is normalized to 0 (wage) 

coeffs_b 
return_schooling 
0.07 
False 
linear return to an additional year of schooling (wage) 

coeffs_b 
exp_a 
0.022 
False 
return to experience, other civilian sector, linear (wage) 

coeffs_b 
exp_a_square 
0.0005 
False 
return to experience, other civilian sector, quadratic (divided by 100) (wage) 

coeffs_b 
exp_b 
0.067 
False 
return to experience, same sector, linear (wage) 

coeffs_b 
exp_b_square 
0.001 
False 
return to experience, same sector, quadratic (divided by 100) (wage) 

coeffs_b 
premium_hs 
0.0 
False 
skill premium of having finished high school (wage) 

coeffs_b 
premium_col 
0.0 
False 
skill premium of having finished college (wage) 

coeffs_b 
age 
0.0 
False 
linear age effect (wage) 

coeffs_b 
minor 
0.0 
False 
effect of being a minor (wage) 

coeffs_b 
not_first 
0.0 
False 
gain of having worked in the same occupation at least once before (wage) 

coeffs_b 
no_switch 
0.0 
False 
gain of remaining in the same occupation as previous period (wage) 

coeffs_b 
constant 
0.0 
False 
constant (non pecuniary) 

coeffs_b 
first 
0.0 
False 
reward of switching to a from other occupation (non pecuniary) 

coeffs_b 
switch 
0.0 
False 
reward of working in a for the first time (non pecuniary) 

coeffs_edu 
constant 
0.0 
False 
consumption value of school attendance for type 1 

coeffs_edu 
value_col 
0.0 
False 
consumption value of college 

coeffs_edu 
value_grad 
0.0 
False 
consumption value of graduate school 

coeffs_edu 
reenroll_col 
4000.0 
False 
reward for going back to college 

coeffs_edu 
reenroll_hs 
4000.0 
False 
reward for going back to high school 

coeffs_edu 
age 
0.0 
False 
linear age effect 

coeffs_edu 
minor 
0.0 
False 
effect of being a minor 

coeffs_home 
constant 
17750.0 
False 
mean value of nonmarket alternative for type 1 

coeffs_home 
18_to_20 
0.0 
False 
additional value of staying home if aged 1820 

coeffs_home 
21_plus 
0.0 
False 
additional value of staying home if 21 or older 

shocks 
chol_sigma_1 
0.2 
False 
Element 1,1 of cholesky factor of shock covariance matrix 

shocks 
chol_sigma_21 
0.0 
False 
Element 2,1 of cholesky factor of shock covariance matrix 

shocks 
chol_sigma_2 
0.25 
False 
Element 2,2 of cholesky factor of shock covariance matrix 

shocks 
chol_sigma_31 
0.0 
False 
Element 3,1 of cholesky factor of shock covariance matrix 

shocks 
chol_sigma_32 
0.0 
False 
Element 3,2 of cholesky factor of shock covariance matrix 

shocks 
chol_sigma_3 
1500.0 
False 
Element 3,3 of cholesky factor of shock covariance matrix 

shocks 
chol_sigma_41 
0.0 
False 
Element 4,1 of cholesky factor of shock covariance matrix 

shocks 
chol_sigma_42 
0.0 
False 
Element 4,2 of cholesky factor of shock covariance matrix 

shocks 
chol_sigma_43 
0.0 
False 
Element 4,3 of cholesky factor of shock covariance matrix 

shocks 
chol_sigma_4 
1500.0 
False 
Element 4,4 of cholesky factor of shock covariance matrix 

type_shares 
base_share_2 
1.6094379124341 
False 
share_of_agents_of_type_2 

type_shares 
ten_years_2 
0.0 
False 
effect of more than ten years of schooling on probability of being type 2 

type_shares 
base_share_3 
0.22314355131421 
False 
share_of_agents_of_type_3 

type_shares 
ten_years_3 
0.0 
False 
effect of more than ten years of schooling on probability of being type 3 

type_shift 
type_2_in_occ_a 
0.1 
False 
deviation for type 2 from type 1 in occ_a 

type_shift 
type_2_in_occ_b 
0.15 
False 
deviation for type 2 from type 1 in occ_b 

type_shift 
type_2_in_edu 
1000.0 
False 
deviation for type 2 from type 1 in edu 

type_shift 
type_2_in_home 
1000.0 
False 
deviation for type 2 from type 1 in home 

type_shift 
type_3_in_occ_a 
0.1 
False 
deviation for type 3 from type 1 in occ_a 

type_shift 
type_3_in_occ_b 
0.15 
False 
deviation for type 3 from type 1 in occ_b 

type_shift 
type_3_in_edu 
1000.0 
False 
deviation for type 3 from type 1 in edu 

type_shift 
type_3_in_home 
1000.0 
False 
deviation for type 3 from type 1 in home 
In alignment to Keane and Wolpin (1994), the error terms of the model are set to follow a multivariate normal distribution, allowing for crosscorrelation are admissible, and excluding serial correlation. In the parameter specification, the shock parameters have to be specified as the lower triangular Cholesky factor of the covariance matrix. In the implementation, the requested number of realizations is drawn from the standard normal distribution. The draws are then multiplied by the shock parameters implied by the Cholesky factor in order to generate the desired variancecovariance structure.
In this example specification the model implementation implies three types of
heterogeneous agents. The current version of the code works both with more than three
types, as well as with homogeneous agents (only one type). In order to add a type, a
block of two and a block of four coefficients need to be specified in the categories
type_shares
and type_shifts
”, respectively.
Warning
There are two small differences compared to Keane and Wolpin (1997). First, all coefficients enter the return function with a positive sign, while the squared terms enter with a minus in the original paper. Second, the order of covariates is fixed across the two occupations. In the original paper, own experience always comes before other experience.
Warning
Again, there is a small difference between this setup and Keane and Wolpin (1997). There is no automatic change in sign for the costs. Thus, e.g. a $1,000 tuition cost must be specified as 1000.
Options specification¶
In addition to the model parameters, other model options are kept in another
specification file in the json
format.
{
"estimation": {
"file": "data.respy.dat",
"maxfun": 1000,
"agents": 1000,
"draws": 200,
"optimizer": "FORTBOBYQA",
"seed": 500,
"tau": 500.0
},
"simulation": {
"file": "data",
"agents": 1000,
"seed": 132
},
"program": {
"debug": false,
"procs": 1,
"threads": 1,
"version": "fortran"
},
"interpolation": {
"flag": false,
"points": 200
},
"solution": {
"store": true,
"seed": 456,
"draws": 500
},
"preconditioning": {
"minimum": 1e05,
"type": "magnitudes",
"eps": 0.0001
},
"derivatives": "forwarddifferences",
"edu_spec": {
"lagged": [
1.0,
1.0
],
"start": [
10,
9
],
"share": [
0.5,
0.5
],
"maxiter": 10,
"stpmx": 100.0
},
"FORTBOBYQA": {
"maxfun": 1000000,
"npt": 1,
"maxiter": 1,
"xtol": 0.0001
},
"SCIPYLBFGSB": {
"eps": 4.41037423e07,
"factr": 30.401091854739622,
"m": 5,
"maxiter": 2,
"maxls": 2,
"pgtol": 8.6554171164e05
}
}
Note that in order to implement the model based on agents with different initial levels of schooling the three integer values  start, share, and lagged  have to be specified together as a block.
SOLUTION
Key 
Value 
Interpretation 

draws 
int 
number of draws for \(E\max\) 
store 
bool 
persistent storage of results 
seed 
int 
random seed for \(E\max\) 
SIMULATION
Key 
Value 
Interpretation 

agents 
int 
number of simulated agents 
file 
str 
file to print simulated sample 
seed 
int 
random seed for agent experience 
ESTIMATION
Key 
Value 
Interpretation 

agents 
int 
number of agents to read from sample 
draws 
int 
number of draws for choice probabilities 
file 
str 
file to read observed sample 
maxfun 
int 
maximum number of function evaluations 
optimizer 
str 
optimizer to use 
seed 
int 
random seed for choice probability 
tau 
float 
scale parameter for function smoothing 
DERIVATIVES
Key 
Value 
Interpretation 

version 
str 
approximation scheme 
The computed derivatives are calculated numerically and are used in the standard error calculation.
PRECONDITIONING
Key 
Value 
Interpretation 

eps 
int 
step size 
minimum 
int 
minimum admissible value 
type 
str 
preconditioning type 
The inputs in the Preconditioning block are employed in reaching a (faster) solution in the optimization step. The coefficients are transformed for better handling by the optimizer. Three different types of transformations can be selected via the preconditioning type:
identity  no transformation
magnitude  divison by the number of digits
gradient based  weighting by the inverse contribution to the likelihood function
PROGRAM
Key 
Value 
Interpretation 

debug 
bool 
debug mode 
procs 
int 
number of processors 
threads 
int 
number of threads 
version 
str 
program version 
INTERPOLATION
Key 
Value 
Interpretation 

flag 
bool 
flag to use interpolation 
points 
int 
number of interpolation points 
The implemented optimization algorithms vary with the program’s version. If you request
the Python version of the program, you can choose from the scipy
implementations of
the BFGS (Norcedal and Wright, 2006), LBFGSB, and POWELL (Powell, 1964) algorithms. In
essense, POWELL is a conjugate direction method, which performs sequential
onedimentional minimizations, does not require that the functions be differentiable and
no derivatives are taken. The BFGS algorythm is a quasiNewton type of optimizer, which
uses first derivatives only, but performs reasonably well even in nonsmooth
optimizations. The LBFGS algorithm can use simple box contraints to potentially improve
accuracy. Further implementation details are available here.
For Fortran, we implemented the BFGS, BOBYQA and NEWUOA (Powell, 2004) algorithms.
NEWUOA is a gradientfree algorythm which performs unconstrained optimiztion. In a
similar fashion, BOBYQA performs gradientfree bound constrained optimization.
FORTNEWUOA
Key 
Value 
Interpretation 

maxfun 
float 
maximum number of function evaluations 
npt 
int 
number of points for approximation model 
rhobeg 
float 
starting value for size of trust region 
rhoend 
float 
minimum value of size for trust region 
FORTBFGS
Key 
Value 
Interpretation 

eps 
int 
value to use for step size if fprime is approximated 
gtol 
float 
gradient norm must be less than gtol before successful termination 
maxiter 
int 
maximum number of iterations 
stpmx 
int 
maximum step size 
FORTBOBYQA
Key 
Value 
Interpretation 

maxfun 
float 
maximum number of function evaluations 
npt 
int 
number of points for approximation model 
rhobeg 
float 
starting value for size of trust region 
rhoend 
float 
minimum value of size for trust region 
SCIPYBFGS
Key 
Value 
Interpretation 

eps 
value to use for step size if fprime is approximated 

gtol 
float 
gradient norm must be less than gtol before successful termination 
maxiter 
int 
maximum number of iterations 
stpmx 
int 
maximum step size 
SCIPYPOWELL
Key 
Value 
Interpretation 

ftol 
float 
relative error in func(xopt) acceptable for convergence 
maxfun 
int 
maximum number of function evaluations to make 
maxiter 
int 
maximum number of iterations 
xtol 
float 
linesearch error tolerance 
SCIPYLBFGSB
Key 
Value 
Interpretation 

eps 
float 
Step size used when approx_grad is True, for numerically calculating the gradient 
factr 
float 
Multiple of the default machine precision used to determine the relative error in func(xopt) acceptable for convergence 
m 
int 
Maximum number of variable metric corrections used to define the limited memory matrix. 
maxiter 
int 
maximum number of iterations 
maxls 
int 
Maximum number of line search steps (per iteration). Default is 20. 
pgtol 
float 
gradient norm must be less than gtol before successful termination 
Helper functions¶
We provide some helper functions to write a model specification. You can use the following function to output a template of the parameter specification.

respy.pre_processing.specification_helpers.
csv_template
(num_types=1, save_path=None, initialize_coeffs=True)[source]¶ Creates a template for the parameter specification.
 Parameters
num_types (int, optional) – Number of types in the model. Default is one.
save_path (str, pathlib.Path, optional) – The template is saved to this path. Default is
None
.initialize_coeffs (bool, optional) – Whether coefficients are initialized with values or not. Default is
True
.
Dataset¶
To use respy, you need a dataset with the following columns:
Identifier: identifies the different individuals in the sample
Period: identifies the different rounds of observation for each individual
Choice: an integer variable that indicates the labor market choice
1 = Occupation A
2 = Occupation B
3 = Education
4 = Home
Earnings: a float variable that indicates how much people are earning. This variable is missing (indicated by a dot) if individuals don’t work.
Experience_A: labor market experience in sector A
Experience_B: labor market experience in sector B
Years_Schooling: years of schooling
Lagged_Choice: choice in the period before the model starts. Codes are the same as in Choice.
The information in the data file should be first sorted by individual and then by period as visualized below:
ID. 
Priod 
Choice 
Earnings 
Exp_A 
Exp_B 
sch_y 
choice_lag 

0 
0 
4 
0 
0 
0 
10 
1 
0 
1 
4 
0 
0 
0 
10 
0 
0 
2 
4 
0 
0 
0 
10 
0 
1 
0 
4 
0 
0 
0 
10 
1 
1 
1 
4 
0 
0 
0 
10 
0 
1 
2 
4 
0 
0 
0 
10 
0 
2 
0 
4 
0 
0 
0 
10 
1 
2 
1 
4 
0 
0 
0 
10 
0 
2 
1 
4 
0 
0 
0 
10 
0 
Datasets for respy are stored in simple text files, where columns are separated by spaces. The easiest way to write such a text file in Python is to create a pandas DataFrame with all relevant columns and then storing it in the following way:
with open("my_data.respy.dat", "w") as file:
df.to_string(file, index=False, header=True, na_rep=".")