Code for emissions and damage factor analysis in PJM

priyald17 · Jun 13, 2019 · e333c40 · e333c40
commit e333c40
Show file tree

Hide file tree

Showing 71 changed files with 20,152 additions and 0 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1 @@
+*.csv filter=lfs diff=lfs merge=lfs -text
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,7 @@
+*.DS_Store
+
+*.ipynb_checkpoints
+*__pycache__
+*.pyc
+
+*.pdf
diff --git a/README.md b/README.md
@@ -0,0 +1,58 @@
+# How much are we saving after all? Characterizing the effects of commonly-varying assumptions on emissions and damage estimates in PJM
+
+This repository is by 
+[Priya L. Donti](https://www.priyadonti.com), [J. Zico Kolter](http://zicokolter.com), and [Inês Azevedo](https://inesazevedo.org) and contains the Python source code to
+reproduce the experiments in our paper "How much are we saving after all? Characterizing the effects of commonly-varying assumptions on emissions and damage estimates in PJM."
+
+# Introduction
+
+In recent years, several methods have emerged to estimate the emissions and health, environmental, and climate change damages avoided by interventions such as energy efficiency, demand response, and renewables integration. However, differing assumptions employed in these analyses could yield contradicting recommendations regarding intervention implementation. We test the magnitude of the effect of using different key assumptions -- average vs. marginal emissions, year of calculation, temporal and regional scope, and inclusion of non-emitting generation -- to estimate PJM emissions and damage factors. We further highlight the importance of factor selection by evaluating three illustrative 2017 power system examples in PJM.
+
+Please see our paper for additional details.
+
+## Setup and Dependencies
+
+This code uses Python 3. All Python-related dependencies can be installed into a
+[conda environment](https://conda.io/docs/user-guide/tasks/manage-environments.html)
+using the [environment.yml](./environment.yml) file.
+
+Cloning this repository also requires [Git Large File Storage](https://git-lfs.github.com/), which is used to store some of the raw data files. 
+
+## Usage
+
+To run all experiments, simply run the following command:
+`bash run_all.sh`
+
+You can also see pre-run results and visualizations by viewing the notebook files in this repository (which end with the extension `.ipynb`). The structure of this repository is below.
+
+```
+run_all.sh - Script to reproduce all experiments.
+data
+├── format_data.sh - Script to format all data.
+├── cems - Folder containing raw CEMS data, formatting scripts, and notebooks.
+├── metered_loads - Folder containing raw metered load data, formatting scripts, and notebooks.
+├── pjm_gen_by_fuel - Folder containing raw PJM generation by fuel data, formatting scripts, and notebooks.
+├── pjm_marginal_fuel - Folder containing PJM marginal fuel type, formatting scripts, and notebooks.
+├── date_helpers.py - Helper functions for parsing and formatting dates.
+factor_estimates
+├── estimate_factors.sh - Script to get marginal and average factor estimates based on formatted data.
+├── get_factor_estimates.py - Python code to get emissions factors (see estimate_factors.sh for usage).
+├── get_plots.py - Python code to plot emissions factors (see estimate_factors.sh for usage).
+├── notebooks - Notebooks with visualizations and summaries of emissions factors.
+interventions
+├── run_interventions.sh - Script to get plots for intervention effects based on factor estimates.
+├── run_intervention.py - Python code to get intervention effects (see run_interventions.sh for usage).
+├── plot_intervention.py - Helper functions for plotting intervention effects.
+├── monthly_dr.csv - Demand response reduction data (for demand response experiments)
+├── notebooks - Notebooks with visualizations and summaries of interventions/power system examples.
+si - Folder with notebooks and data to reproduce analyses in the SI.
+```
+
+### Acknowledgments
+
+This work was supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE1252522, the Department of Energy Computational Science Graduate Fellowship under Grant No. DE-FG02-97ER25308, and the Center for Climate and Energy Decision Making (CEDM) through a cooperative agreement between Carnegie Mellon University and the National Science Foundation under Grant No. SES-1463492. 
+
+# Licensing
+
+Unless otherwise stated, the source code is copyright Carnegie Mellon University and licensed under the [Apache 2.0 License](./LICENSE).
+
diff --git a/data/cems/cems_formatting.py b/data/cems/cems_formatting.py
@@ -0,0 +1,97 @@
+import argparse
+import os
+import pandas as pd
+import numpy as np
+
+'''
+    Get clean aggregated hourly generation/emissions and the associated generation/emissions
+        differences between hours.
+    Input: Aggregated hourly data from CEMS. This data was obtained from EPA CEMS and then 
+        aggregated by RTO/ISO or NERC region.
+    Output: Clean aggregated hourly data, and differenced hourly data.
+'''
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--aggRegion', choices=['isorto', 'nerc'], required=True,
+        help='region type to which data should be aggregated')
+    parser.add_argument('--save', default='formatted_data', 
+        help='save folder path')
+    parser.add_argument('--startYear', type=int, default=2006)
+    parser.add_argument('--endYear', type=int, default=2017)
+    args = parser.parse_args()
+
+    agg_region = args.aggRegion
+
+    # Read in CEMS data, aggregated by region. Drop rows without region label.
+    print('getting emissions data')
+    emissions = pd.read_csv(
+        os.path.join('raw_data', 'emit_agg_by_{}.csv'.format(agg_region)))
+    emissions = emissions[pd.notnull(emissions[agg_region])]
+
+    # Convert timestamp to datetime
+    #  TODO: The time is actually UTC-5, not UTC. Need to change column name.
+    emissions['ts'] = pd.to_datetime(emissions['ts'])
+
+    # Organize by timestamp and region
+    emissions = emissions.set_index(['ts', agg_region]).sort_index()
+    emissions.index.names = ['DATE_UTC', agg_region]
+
+    # Convert units to kg
+    KG_IN_LB = 0.453592
+    KG_IN_TON = 907.185
+    emissions = convert_to_kg(emissions, 'lbs', KG_IN_LB)
+    emissions = convert_to_kg(emissions, 'tons', KG_IN_TON)
+
+    # Get differenced data, and format columns
+    print('getting differenced data')
+    diffs = get_diffs(emissions, agg_region, args.startYear, args.endYear)
+    # diffs.columns = diffs.columns.map(lambda x: '{}-diffs'.format(x))
+
+    # Save data
+    print('saving data')
+    save = args.save
+    if not os.path.exists(save): os.makedirs(save)
+    emissions.to_csv(os.path.join(save, 'cems_{}.csv'.format(agg_region)))
+    diffs.to_csv(os.path.join(save, 'cems_diffs_{}.csv'.format(agg_region)))
+
+
+
+def convert_to_kg(df, unit_label, conversion_factor):
+    old_unit_cols = [x for x in df.columns if unit_label in x]
+    df[old_unit_cols] = df[old_unit_cols] * conversion_factor
+    df.columns = [x.replace(unit_label, 'kg') for x in df.columns]
+    return df
+
+
+# Note: df must be indexed by date and aggregation region
+def get_diffs(df, agg_region, start_year, end_year):
+
+    # Reindex to ensure all hours and regions are represented
+    all_hours = pd.date_range(
+        start='{}-01-01'.format(start_year), end='{}-01-01'.format(end_year+1), freq='H')
+    all_hours_multidx = pd.MultiIndex.from_product(
+        [all_hours, df.index.get_level_values(agg_region).unique()], 
+        names=['DATE_UTC', agg_region])
+    df = df.reindex(all_hours_multidx)
+
+    # Sort index by region and then date
+    df = df.reset_index().set_index([agg_region, 'DATE_UTC']).sort_index()
+
+    # Take diffs and correct "spillover" between boundaries of regions
+    diffs = df.diff().reset_index()
+    mask = diffs[agg_region] != diffs[agg_region].shift(1)
+    diffs[mask] = np.nan
+
+    # Rearrange back to being sorted by date, then region
+    diffs = diffs.set_index(['DATE_UTC', agg_region]).sort_index()
+
+    # Drop any null diffs
+    diffs = diffs.dropna(how='all')
+
+    return diffs
+
+
+
+if __name__=='__main__':
+    main()