Data and Code

Data and Do Files for ``The National Rise of Residential Segregation'' (with Trevon Logan)

The following do files can be used to calculate the segregation indices in our 2017 Journal of Economic History article. The do files are designed for the IPUMS 100% samples of the 1880 federal census. The key things to modify in the files are the references to directory paths, references to the dta file for your IPUMS extract of the 1880 census, and the sorting criteria (note that the files are currently set up to look at segregation by occupation and to get estimates by city as opposed ot county).

Main do file (defines segregation variable and calls other do files)

Neighbor-based do file (calculates our neighbor-based segregation measure)

Traditional do file (calculates isolation and dissimilarity indices)

County-level data file for 1880 and 1940 (Stata dta file, 2.6MB)

Annual reports of county superintendents of schools, Iowa, 1900

The following data are transcribed from the microfilms of the original report files. To date, I have transcribed the complete records for thirteen counties. If you are interested in data for other counties, I can provide pdf's of the originial report pages.

Variable List (pdf)

Transcribed Data (excel file, 651KB)

Mapping files for the human capital spillovers in agriculture project

Complete ArcMap and data files (zip file, 83MB)

The following scripts determine a farmer's adjacent neighbors and calculate summary statistics regarding those neighbors. The scripts are based on the QueryAll script written by Shan Chen.

Script to calculate neighbor characteristics (VBscript file, 26KB)

Script to calculate neighbor characteristics within and outside of social networks (VBScript file, 33KB)

City level mortality and morbidity statistics by disease, averages for 1918-1924 and reported figures for 1925

The file below contains data on the number of cases and deaths from various diseases between 1918 and 1925 for cities with a population of over 100,000 in 1925. The data are transcribed from tables in volume 41, issue 38 of the Public Health Reports published by the Department of Public Health. I will post similar information for all cities with populations between 10,000 and 100,000 once I have checked the data for errors.

City level morbidity and mortality data, 1918-1925 (Excel file, 52KB)

Intergenerational data for the mobility and public school expansion project

The raw data and stata files for this project are not annotated and therefore not paritcularly useful without additional guidance. If you have any questions about these data, feel free to email me.

String comparison do-files for the influenza project

The following Stata do-files are used to compare names for assessing matches between federal censuses. The first file calculates the Phonex code for a name. The second file calculates the Damerau-Levenshtein distance between two names. Please note that the Damerau-Levenshtein script is still a bit buggy.

Do-file to calculate Phonex code for a name (Stata Version 10 do-file, 5KB)

Do-file to calculate Damerau-Levenshtein distance between two names (Stata Version 10 do-file, 6KB)