Skip to content

Commit

Permalink
Code for emissions and damage factor analysis in PJM
Browse files Browse the repository at this point in the history
  • Loading branch information
priyald17 committed Jun 13, 2019
0 parents commit e333c40
Show file tree
Hide file tree
Showing 71 changed files with 20,152 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.csv filter=lfs diff=lfs merge=lfs -text
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
*.DS_Store

*.ipynb_checkpoints
*__pycache__
*.pyc

*.pdf
58 changes: 58 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# How much are we saving after all? Characterizing the effects of commonly-varying assumptions on emissions and damage estimates in PJM

This repository is by
[Priya L. Donti](https://www.priyadonti.com), [J. Zico Kolter](http://zicokolter.com), and [Inês Azevedo](https://inesazevedo.org) and contains the Python source code to
reproduce the experiments in our paper "How much are we saving after all? Characterizing the effects of commonly-varying assumptions on emissions and damage estimates in PJM."

# Introduction

In recent years, several methods have emerged to estimate the emissions and health, environmental, and climate change damages avoided by interventions such as energy efficiency, demand response, and renewables integration. However, differing assumptions employed in these analyses could yield contradicting recommendations regarding intervention implementation. We test the magnitude of the effect of using different key assumptions -- average vs. marginal emissions, year of calculation, temporal and regional scope, and inclusion of non-emitting generation -- to estimate PJM emissions and damage factors. We further highlight the importance of factor selection by evaluating three illustrative 2017 power system examples in PJM.

Please see our paper for additional details.

## Setup and Dependencies

This code uses Python 3. All Python-related dependencies can be installed into a
[conda environment](https://conda.io/docs/user-guide/tasks/manage-environments.html)
using the [environment.yml](./environment.yml) file.

Cloning this repository also requires [Git Large File Storage](https://git-lfs.github.com/), which is used to store some of the raw data files.

## Usage

To run all experiments, simply run the following command:
`bash run_all.sh`

You can also see pre-run results and visualizations by viewing the notebook files in this repository (which end with the extension `.ipynb`). The structure of this repository is below.

```
run_all.sh - Script to reproduce all experiments.
data
├── format_data.sh - Script to format all data.
├── cems - Folder containing raw CEMS data, formatting scripts, and notebooks.
├── metered_loads - Folder containing raw metered load data, formatting scripts, and notebooks.
├── pjm_gen_by_fuel - Folder containing raw PJM generation by fuel data, formatting scripts, and notebooks.
├── pjm_marginal_fuel - Folder containing PJM marginal fuel type, formatting scripts, and notebooks.
├── date_helpers.py - Helper functions for parsing and formatting dates.
factor_estimates
├── estimate_factors.sh - Script to get marginal and average factor estimates based on formatted data.
├── get_factor_estimates.py - Python code to get emissions factors (see estimate_factors.sh for usage).
├── get_plots.py - Python code to plot emissions factors (see estimate_factors.sh for usage).
├── notebooks - Notebooks with visualizations and summaries of emissions factors.
interventions
├── run_interventions.sh - Script to get plots for intervention effects based on factor estimates.
├── run_intervention.py - Python code to get intervention effects (see run_interventions.sh for usage).
├── plot_intervention.py - Helper functions for plotting intervention effects.
├── monthly_dr.csv - Demand response reduction data (for demand response experiments)
├── notebooks - Notebooks with visualizations and summaries of interventions/power system examples.
si - Folder with notebooks and data to reproduce analyses in the SI.
```

### Acknowledgments

This work was supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE1252522, the Department of Energy Computational Science Graduate Fellowship under Grant No. DE-FG02-97ER25308, and the Center for Climate and Energy Decision Making (CEDM) through a cooperative agreement between Carnegie Mellon University and the National Science Foundation under Grant No. SES-1463492.

# Licensing

Unless otherwise stated, the source code is copyright Carnegie Mellon University and licensed under the [Apache 2.0 License](./LICENSE).

97 changes: 97 additions & 0 deletions data/cems/cems_formatting.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
import argparse
import os
import pandas as pd
import numpy as np

'''
Get clean aggregated hourly generation/emissions and the associated generation/emissions
differences between hours.
Input: Aggregated hourly data from CEMS. This data was obtained from EPA CEMS and then
aggregated by RTO/ISO or NERC region.
Output: Clean aggregated hourly data, and differenced hourly data.
'''

def main():
parser = argparse.ArgumentParser()
parser.add_argument('--aggRegion', choices=['isorto', 'nerc'], required=True,
help='region type to which data should be aggregated')
parser.add_argument('--save', default='formatted_data',
help='save folder path')
parser.add_argument('--startYear', type=int, default=2006)
parser.add_argument('--endYear', type=int, default=2017)
args = parser.parse_args()

agg_region = args.aggRegion

# Read in CEMS data, aggregated by region. Drop rows without region label.
print('getting emissions data')
emissions = pd.read_csv(
os.path.join('raw_data', 'emit_agg_by_{}.csv'.format(agg_region)))
emissions = emissions[pd.notnull(emissions[agg_region])]

# Convert timestamp to datetime
# TODO: The time is actually UTC-5, not UTC. Need to change column name.
emissions['ts'] = pd.to_datetime(emissions['ts'])

# Organize by timestamp and region
emissions = emissions.set_index(['ts', agg_region]).sort_index()
emissions.index.names = ['DATE_UTC', agg_region]

# Convert units to kg
KG_IN_LB = 0.453592
KG_IN_TON = 907.185
emissions = convert_to_kg(emissions, 'lbs', KG_IN_LB)
emissions = convert_to_kg(emissions, 'tons', KG_IN_TON)

# Get differenced data, and format columns
print('getting differenced data')
diffs = get_diffs(emissions, agg_region, args.startYear, args.endYear)
# diffs.columns = diffs.columns.map(lambda x: '{}-diffs'.format(x))

# Save data
print('saving data')
save = args.save
if not os.path.exists(save): os.makedirs(save)
emissions.to_csv(os.path.join(save, 'cems_{}.csv'.format(agg_region)))
diffs.to_csv(os.path.join(save, 'cems_diffs_{}.csv'.format(agg_region)))



def convert_to_kg(df, unit_label, conversion_factor):
old_unit_cols = [x for x in df.columns if unit_label in x]
df[old_unit_cols] = df[old_unit_cols] * conversion_factor
df.columns = [x.replace(unit_label, 'kg') for x in df.columns]
return df


# Note: df must be indexed by date and aggregation region
def get_diffs(df, agg_region, start_year, end_year):

# Reindex to ensure all hours and regions are represented
all_hours = pd.date_range(
start='{}-01-01'.format(start_year), end='{}-01-01'.format(end_year+1), freq='H')
all_hours_multidx = pd.MultiIndex.from_product(
[all_hours, df.index.get_level_values(agg_region).unique()],
names=['DATE_UTC', agg_region])
df = df.reindex(all_hours_multidx)

# Sort index by region and then date
df = df.reset_index().set_index([agg_region, 'DATE_UTC']).sort_index()

# Take diffs and correct "spillover" between boundaries of regions
diffs = df.diff().reset_index()
mask = diffs[agg_region] != diffs[agg_region].shift(1)
diffs[mask] = np.nan

# Rearrange back to being sorted by date, then region
diffs = diffs.set_index(['DATE_UTC', agg_region]).sort_index()

# Drop any null diffs
diffs = diffs.dropna(how='all')

return diffs



if __name__=='__main__':
main()
Loading

0 comments on commit e333c40

Please sign in to comment.