Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LCFS - Insert Legacy Fuel Types and Related Data #1551

Open
15 tasks
AlexZorkin opened this issue Dec 20, 2024 · 0 comments
Open
15 tasks

LCFS - Insert Legacy Fuel Types and Related Data #1551

AlexZorkin opened this issue Dec 20, 2024 · 0 comments
Assignees
Labels
Added Banana 4 Scale Label for tickets added mid-sprint Medium Medium priority Task Work that does not directly impact the user

Comments

@AlexZorkin
Copy link
Collaborator

Description:
Add new entries for legacy fuel types derived from TFRS-approved fuels. For each legacy variant, insert a fuel_type record with is_legacy = true and all related defaults (energy density, energy effectiveness ratios, carbon intensities, and associated units of measure), ensuring legacy data remains distinct from current records. This migration will transfer real values from the production TFRS lookup tables into the LCFS system through Alembic migrations, maintaining data integrity and consistency.

Purpose and benefit to user:
Allows the system to accurately represent historical or legacy fuels, enabling comparisons and analysis across multiple time periods or regulatory regimes. Users can perform comprehensive historical reporting and ensure consistency between TFRS and LCFS datasets without data duplication issues.

Acceptance Criteria:

  • Given I am a data migration engineer, when I run the Alembic migration, then all specified legacy fuel types from TFRS appear in the fuel_type table with is_legacy = true.
  • Given I am a data analyst, when I review the energy_density and energy_effectiveness_ratio tables, then corresponding legacy entries exist and correctly link to their respective legacy fuel_type IDs.
  • Given the migration is complete, when validating data integrity, then all legacy fuel types have their associated default values accurately reflecting the production TFRS lookup table values.
  • Given legacy fuel types share names with current fuels, when querying the fuel_type table, then duplicates are correctly identified and distinguished by the is_legacy flag.

Development Checklist:

  1. Preparation:

    • Review and extract legacy fuel types from the production TFRS approved_fuel_type table.
    • Identify which legacy fuel types already exist in LCFS and determine if they require duplication with is_legacy = true.
    • Gather corresponding default values for each legacy fuel type, including energy_density, energy_effectiveness_ratio, carbon_intensity, and unit_of_measure.
  2. Alembic Migration Script Creation:

    • Create a new Alembic revision for inserting legacy fuel types.
    • Define the upgrade function to:
      • Insert new rows into the fuel_type table with is_legacy = true for each legacy fuel type.
      • Insert corresponding entries into related tables (energy_density, energy_effectiveness_ratio, etc.) linked to the newly inserted legacy fuel_type IDs.
    • Define the downgrade function to:
      • Remove the inserted legacy fuel types and their related entries, ensuring rollback capability.
  3. Example Alembic Migration Script:

    from alembic import op
    import sqlalchemy as sa
    from sqlalchemy.sql import table, column
    from sqlalchemy import Integer, Text, Boolean, Numeric, Enum
    
    # Define tables for insertions
    fuel_type_table = table('fuel_type',
        column('fuel_type_id', Integer),
        column('fuel_type', Text),
        column('fossil_derived', Boolean),
        column('other_uses_fossil_derived', Boolean),
        column('default_carbon_intensity', Numeric),
        column('units', Enum),
        column('unrecognized', Boolean),
        column('is_legacy', Boolean)
    )
    
    energy_density_table = table('energy_density',
        column('fuel_type_id', Integer),
        column('density', Numeric),
        column('uom_id', Integer)
    )
    
    energy_effectiveness_ratio_table = table('energy_effectiveness_ratio',
        column('fuel_category_id', Integer),
        column('fuel_type_id', Integer),
        column('end_use_type_id', Integer),
        column('ratio', Numeric)
    )
    
    def upgrade():
        # Insert legacy fuel types
        op.bulk_insert(fuel_type_table,
            [
                {
                    'fuel_type': 'Biodiesel',
                    'fossil_derived': False,
                    'other_uses_fossil_derived': False,
                    'default_carbon_intensity': 100.21,
                    'units': 'Litres',
                    'unrecognized': False,
                    'is_legacy': True
                },
                {
                    'fuel_type': 'CNG',
                    'fossil_derived': False,
                    'other_uses_fossil_derived': True,
                    'default_carbon_intensity': 63.91,
                    'units': 'Cubic_metres',
                    'unrecognized': False,
                    'is_legacy': True
                },
                {
                    'fuel_type': 'Electricity',
                    'fossil_derived': False,
                    'other_uses_fossil_derived': True,
                    'default_carbon_intensity': 12.14,
                    'units': 'Kilowatt_hour',
                    'unrecognized': False,
                    'is_legacy': True
                },
                # Add additional legacy fuel types here...
            ]
        )
    
        # Retrieve the newly inserted fuel_type_ids
        # This typically requires selecting based on unique constraints or using RETURNING in databases that support it.
        # For simplicity, assuming fuel_type_id is auto-incremented and follows the order of insertion.
    
        # Example fuel_type_id assignments (these IDs should be dynamically retrieved in a real scenario)
        legacy_fuel_ids = {
            'Biodiesel': 101,
            'CNG': 102,
            'Electricity': 103,
            # Add additional mappings here...
        }
    
        # Insert into energy_density
        op.bulk_insert(energy_density_table,
            [
                {'fuel_type_id': legacy_fuel_ids['Biodiesel'], 'density': 35.40, 'uom_id': 1},
                {'fuel_type_id': legacy_fuel_ids['CNG'], 'density': 38.27, 'uom_id': 3},
                {'fuel_type_id': legacy_fuel_ids['Electricity'], 'density': 3.60, 'uom_id': 2},
                # Add additional energy_density entries here...
            ]
        )
    
        # Insert into energy_effectiveness_ratio
        op.bulk_insert(energy_effectiveness_ratio_table,
            [
                {'fuel_category_id': 2, 'fuel_type_id': legacy_fuel_ids['Biodiesel'], 'end_use_type_id': None, 'ratio': 1.0},
                {'fuel_category_id': 3, 'fuel_type_id': legacy_fuel_ids['CNG'], 'end_use_type_id': 4, 'ratio': 3.8},
                {'fuel_category_id': 3, 'fuel_type_id': legacy_fuel_ids['Electricity'], 'end_use_type_id': 5, 'ratio': 3.2},
                # Add additional energy_effectiveness_ratio entries here...
            ]
        )
    
    def downgrade():
        # Delete inserted energy_effectiveness_ratio entries
        op.execute("""
            DELETE FROM energy_effectiveness_ratio
            WHERE fuel_type_id IN (101, 102, 103)
        """)
    
        # Delete inserted energy_density entries
        op.execute("""
            DELETE FROM energy_density
            WHERE fuel_type_id IN (101, 102, 103)
        """)
    
        # Delete inserted fuel_type entries
        op.execute("""
            DELETE FROM fuel_type
            WHERE is_legacy = true AND fuel_type IN ('Biodiesel', 'CNG', 'Electricity')
        """)

    Notes:

    • The above script assumes specific fuel_type_id values for legacy entries (101, 102, 103, etc.). In a real-world scenario, retrieve these IDs dynamically after insertion using RETURNING or other database-specific features.
    • Ensure that all related tables (energy_density, energy_effectiveness_ratio, etc.) are correctly referenced with the appropriate fuel_type_id.
    • Extend the bulk_insert lists with all necessary legacy fuel types and their associated default values as per the production TFRS data.
    • Validate that the units field corresponds to the correct QuantityUnitsEnum values defined in the ORM model.
  4. Data Transfer Validation and Analysis:

    • After running the migration, execute queries to ensure all legacy fuel types are present with is_legacy = true.
    • Cross-verify the inserted energy_density and energy_effectiveness_ratio records against the production TFRS lookup tables to ensure accuracy.
    • Perform sample checks on several legacy fuel types to confirm that all related defaults are correctly linked and reflect the intended legacy values.
    • Utilize data validation scripts or tools to automate the comparison between TFRS source data and the migrated LCFS data.
    • Document any discrepancies found during validation and address them promptly.

Notes:

  • Legacy fuels may share names with current fuels; duplicates are allowed as long as the is_legacy flag distinguishes them.
  • Ensure that all unit_of_measure references align with the QuantityUnitsEnum defined in the ORM models.
  • Coordinate with the business area team to verify that all legacy default values are accurately captured and inserted.
  • Consider implementing logging within the migration script to track inserted records and facilitate troubleshooting if needed.
@AlexZorkin AlexZorkin added Medium Medium priority Task Work that does not directly impact the user labels Dec 20, 2024
@dhaselhan dhaselhan self-assigned this Jan 6, 2025
@dhaselhan dhaselhan added the Added Banana 4 Scale Label for tickets added mid-sprint label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Added Banana 4 Scale Label for tickets added mid-sprint Medium Medium priority Task Work that does not directly impact the user
Projects
None yet
Development

No branches or pull requests

2 participants