Data

“It is a capital mistake to theorize before one has data.”

― Sherlock Holmes

Here are the codes that I’ve developed to help me work with various datasets. You can also find more recent code by emailing me (bengriffy[at]gmail[dot]com), or through my github account.

Panel Study of Income Dynamics (PSID):

  • PSID Data and Cleaning: link
    This is based on the code that I use to create longitudinal datasets in the PSID. This will select data, build it into a panel format from a cross-sectional format, and then prep the data for analysis.
  • PyPSID: link
    This program downloads the entire Panel Study of Income Dynamics from UMich and allows the user to create a panel based on selected variables. It then writes the necessary codes into a Stata format. Use this in concert with my data and cleaning code to successfully use the PSID.

Survey of Income and Program Participation (SIPP):

  • SIPP Data and Cleaning: link
    This is based on the code that I use to re-employment elasticities in the SIPP. This updates the CEPR version of the data cleaning and Raj Chetty’s 2008 paper on Liquidity and Moral Hazard (both below).
  • CEPR SIPP Code: link
    Much of my original code comes from two sources: the CEPR’s SIPP data site (I have updated their code in mine) and Raj Chetty’s 2008 paper on Liquidity and Moral Hazard (link to programs).

National Longitudinal Survey of Youth, 1979 (NLSY79):

  • NLSY79 Data and Cleaning: link
    This code will create a panel and run basic estimations in the NLSY79. To run this code, you’ll need to go to the NLS Investigator website and upload the included variable list (“nov19.NLSY79”).

Other:

  • One of the best resources that I’ve found is Anthony Damico’s website for survey data: asdfree (analyze survey data for free).
  • Interactive Map of US Unemployment and Weekly Earnings (state and county): link