This project converts the original R and Stata code from Burke, Hsiang, and Miguel (2015) "Global non-linear effect of temperature on economic production" to Python.
The translation from Stata and R to python was done by Ken Caldeira with the help of Cursor and Claude Code. Use at your own risk. This translation could contain errors. If you find any mistake, please let me know (caldeira@stanford.edu).
The original analysis examines the relationship between temperature and economic growth using historical data and projects future impacts under climate change scenarios. This Python implementation maintains the same processing steps and methodology while providing a modern, reproducible framework.
burke-replication-python/
├── config.py # Configuration settings and file paths
├── main.py # Main orchestration script
├── step1_data_preparation.py # Data preparation and initial analysis
├── step2_climate_projections.py # Climate projections
├── step3_socioeconomic_scenarios.py # Socioeconomic scenarios
├── step4_impact_projections.py # Impact projections
├── step5_damage_function.py # Damage function calculations
├── step6_figure_generation.py # Figure generation
├── requirements.txt # Python dependencies
├── PROCESSING_FLOW.md # Detailed processing steps documentation
├── CLAUDE.md # Development guidelines
├── data/
│ ├── input/ # Input data (tracked in git)
│ │ ├── GrowthClimateDataset.csv
│ │ ├── SSP/
│ │ ├── CCprojections/
│ │ └── IAMdata/
│ └── output/ # Generated outputs (not tracked in git)
│ └── output_{run_name}_{timestamp}/
│ ├── *.csv files
│ ├── *.pdf figures
│ ├── *.log files
│ ├── bootstrap/
│ └── projectionOutput/
└── BurkeHsiangMiguel2015_Replication/ # Original R/Stata code (reference)
- Clone this repository
- Install dependencies:
pip install -r requirements.txt
Place the original Burke, Hsiang, and Miguel 2015 data files in data/input/:
data/input/
├── GrowthClimateDataset.csv
├── SSP/
│ ├── SSP_PopulationProjections.csv
│ └── SSP_GrowthProjections.csv
├── CCprojections/
│ └── CountryTempChange_RCP85.csv
└── IAMdata/
└── ProcessedKoppData.csv
The original replication code (for reference) should be in:
BurkeHsiangMiguel2015_Replication/
├── data/
└── script/
To run all processing steps with default settings:
python main.pyYou can specify an optional run name to organize different experiments:
python main.py [run_name]Examples:
python main.py # Uses run_name="default"
python main.py test # Uses run_name="test"
python main.py experiment1 # Uses run_name="experiment1"The run name affects:
- Output directory:
data/output/output_{run_name}_{timestamp}/ - Output files: All CSV, PDF, and log files include
_{run_name}suffix- e.g.,
estimatedGlobalResponse_test.csv,Figure2_test.pdf
- e.g.,
This allows you to run multiple experiments and keep their outputs separate.
You can modify the skip flags in config.py to bypass steps that have already been completed:
# Processing flags
SKIP_STEP_1 = False # Skip data preparation and initial analysis
SKIP_STEP_2 = False # Skip climate projections
SKIP_STEP_3 = False # Skip socioeconomic scenarios
SKIP_STEP_4 = False # Skip impact projections
SKIP_STEP_5 = False # Skip damage function
SKIP_STEP_6 = False # Skip figure generationYou can run individual steps directly:
python step1_data_preparation.py
python step2_climate_projections.py
python step3_socioeconomic_scenarios.py
python step4_impact_projections.py
python step5_damage_function.py
python step6_figure_generation.pyOriginal Files: GenerateFigure2Data.do, GenerateBootstrapData.do
- Input:
GrowthClimateDataset.csv(main dataset with temperature, precipitation, GDP growth data) - Process:
- Run baseline quadratic temperature response regression
- Estimate global response function with temperature and temperature squared
- Generate marginal effects and confidence intervals using full variance-covariance matrix (matching Stata's
marginscommand)
- Output:
estimatedGlobalResponse_{run_name}.csv(response function data)estimatedCoefficients_{run_name}.csv(regression coefficients)mainDataset_{run_name}.csv(cleaned dataset)
- Process:
- Analyze rich vs poor country responses (GDP percentile < 50)
- Analyze agricultural vs non-agricultural GDP growth
- Analyze early vs late period responses (pre/post 1990)
- Output:
EffectHeterogeneity_{run_name}.csv(rich/poor, agricultural responses)EffectHeterogeneityOverTime_{run_name}.csv(temporal heterogeneity)
- Process:
- Bootstrap regression coefficients (1000 replicates by default, configurable)
- Sample countries with replacement
- Run multiple model specifications:
- Pooled model (no lags)
- Rich/poor model (no lags)
- Pooled model (5 lags)
- Rich/poor model (5 lags)
- Output:
bootstrap/bootstrap_noLag_{run_name}.csvbootstrap/bootstrap_richpoor_{run_name}.csvbootstrap/bootstrap_5Lag_{run_name}.csvbootstrap/bootstrap_richpoor_5lag_{run_name}.csv
Original Files: getTemperatureChange.R
- Input: CMIP5 RCP8.5 ensemble mean temperature data, population data, country shapefiles
- Process: Calculate population-weighted country-specific temperature changes
- Output:
CountryTempChange_RCP85_{run_name}.csv
Original Files: ComputeMainProjections.R (first part)
- Input: SSP data, UN population projections, historical growth rates
- Process: Interpolate 5-year SSP projections to annual data, create baseline and SSP scenarios
- Output:
projectionOutput/popProjections_{run_name}.RdataprojectionOutput/growthProjections_{run_name}.Rdata
Original Files: ComputeMainProjections.R (main projection section)
- Input: Bootstrap coefficients, population/growth projections, temperature changes
- Process: Project GDP per capita with and without climate change (2010-2099)
- Output:
projectionOutput/GDPcapCC_*_*.RdataprojectionOutput/GDPcapNoCC_*_*.RdataprojectionOutput/GlobalChanges_*_*.Rdata
Original Files: ComputeDamageFunction.R
- Input: Impact projections, IAM temperature scenarios
- Process: Calculate damages for different global temperature increases (0.8°C to 6°C)
- Output:
projectionOutput/DamageFunction_*.Rdata
Original Files: MakeFigure*.R, MakeExtendedDataFigure*.R
- Input: All output data from previous steps
- Process: Generate main figures (2-5) and extended data figures
- Output:
Figure2_{run_name}.pdfFigure3_{run_name}.pdfFigure4_{run_name}.pdfFigure5_{run_name}.pdf
The confidence interval calculations use the full variance-covariance matrix with sample means for all covariates, matching Stata's margins, at() command. This ensures proper uncertainty quantification across the full temperature range.
Organize multiple experiments with unique run names that propagate to all output files and directories.
The implementation includes intelligent skip logic that:
- Checks if output files already exist
- Respects skip flags in configuration
- Provides warnings if skip flags are set but files are missing
- Each step is implemented as a separate module
- Clear interfaces between steps
- Easy to test individual components
- All runs generate timestamped log files
- Detailed progress tracking
- Error handling with informative messages
The config.py file contains all configuration settings:
- RUN_NAME: Name for the current run (set via command line)
- Paths: File paths for input and output data
- Flags: Skip flags for each processing step
- Settings: Bootstrap parameters (N_BOOTSTRAP), temperature ranges, etc.
- Models: Model specifications and scenarios
Key settings:
N_BOOTSTRAP = 1000 # Number of bootstrap replicates (set to 10 for testing)
RANDOM_SEED = 8675309 # Same as original Stata code
VERBOSITY_LEVEL = 3 # 0=quiet, 1=main steps, 2=detailed, 3=debugAll output files are placed in data/output/output_{run_name}_{timestamp}/ and include the run_name suffix.
estimatedGlobalResponse_{run_name}.csv: Global response functionestimatedCoefficients_{run_name}.csv: Regression coefficientsmainDataset_{run_name}.csv: Cleaned main datasetEffectHeterogeneity_{run_name}.csv: Rich/poor heterogeneityEffectHeterogeneityOverTime_{run_name}.csv: Temporal heterogeneity- Bootstrap files in
bootstrap/subdirectory
- Various CSV and Rdata files in main directory and
projectionOutput/subdirectory
Figure2_{run_name}.pdfthroughFigure5_{run_name}.pdf
burke_replication_{timestamp}_{run_name}.log: Complete processing log
- pandas
- numpy
- statsmodels
- scipy
- matplotlib
- seaborn
- tqdm
This project is for research purposes. Please cite the original paper:
Burke, M., Hsiang, S. M., & Miguel, E. (2015). Global non-linear effect of temperature on economic production. Nature, 527(7577), 235-239.
This is a replication project. The goal is to maintain consistency with the original analysis while providing a modern Python implementation. See CLAUDE.md for development guidelines.