Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swiss energy balance data introduces invalid column in annual energy balances. #430

Open
irm-codebase opened this issue Aug 26, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@irm-codebase
Copy link
Contributor

irm-codebase commented Aug 26, 2024

What happened?

If you update the repo to the current version, and try to build energy balances with it, scripts depending on 'annual-energy-balances.csv' will fail.

This is because two new columns are introduced introduced ('NaN', 'FERN'), somewhere in the CHE processing.

>>> df.unstack()
year                                 NaN  1950  1960  1970  1978  1979  1980  1981  1982  1983  1984  1985  ...  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  FERN
cat_code  carrier_code unit country                                                                         ...                                                                        
AFC       BIOE         TJ   ALB      NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
                            AUT      NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
                            BEL      NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
                            BGR      NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
                            BIH      NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN

To replicate, just read the dataset like most rules do:

pd.read_csv("build/data/annual-energy-balances.csv", index_col=["cat_code", "carrier_code", "unit", "country", "year"],header=0).squeeze()

Version

1.0.0

Relevant log output

No response

@irm-codebase irm-codebase added the bug Something isn't working label Aug 26, 2024
@irm-codebase
Copy link
Contributor Author

This happens in the following:

ipdb> ch_industry_subsector_energy_use.unstack('year').columns.unique()
Index(['2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012',
       '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021',
       '2022', '2023', 'FERN', 'nan'],
      dtype='object', name='year')

@irm-codebase
Copy link
Contributor Author

irm-codebase commented Aug 26, 2024

Fixes (in annual_energy_balances.py)
The core problem is how CHE industry data is read over time: it will 'desync' as the document is updated each year. The current values probably correspond to the 2022 document. Someone updated the link, breaking the script.

  • in read_industry_subsector, update nrows=11
  • update ch_carriers to below:
ch_carriers = {  # first row in which carriers are defined in the file
        25: "E7000",  # 'electricity',
        53: "O4000XBIO",  # 'oil',
        81: "G3000",  # 'gas',
        108: "C0000X0350-0370",  # 'solid_fuel',
        132: "W6100_6220",  # 'waste',
        156: "O4000XBIO",  # 'oil',
        198: "H8000",  # 'heat',  # purchased
        237: "R5110-5150_W6000RI",  # 'biofuel'
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant