Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LCFS - Migrate Schedule B Records from TFRS to Fuel Supply Table in LCFS with Version Chaining #1554

Open
24 tasks
AlexZorkin opened this issue Dec 20, 2024 · 0 comments
Labels
Medium Medium priority Task Work that does not directly impact the user

Comments

@AlexZorkin
Copy link
Collaborator

AlexZorkin commented Dec 20, 2024

Description:
Develop a Groovy ETL script to migrate Schedule B records from TFRS to the fuel_supply table in LCFS. Each TFRS compliance report generates a full set of Schedule B records, including supplemental reports. The script will process compliance reports, identify supplemental chains, and compare Schedule B records to detect changes, ensuring accurate version chaining in LCFS.

Additionally, Schedule D records in TFRS must be handled uniquely. For any fuel_supply records derived from Schedule D values, the provision type will be set to GHGenius modelled.

Key tasks include:

  • Loop through compliance reports in TFRS to extract Schedule B records.
  • Identify supplemental chains and compare records across versions.
  • Establish version chains in LCFS using group_uuid and version.
  • Populate the fuel_supply table, associating records with compliance reports in LCFS using the legacy_id field.
  • Identify and process Schedule D records, setting provision type to GHGenius modelled.

Purpose and Benefit to User:
Ensures accurate migration of Schedule B data while preserving version chains for supplemental reports. This approach maintains the integrity of compliance records in LCFS and accounts for unique handling of Schedule D records for future editing scenarios.

Acceptance Criteria:

  • Given I am a developer, when the ETL process runs, all Schedule B records from TFRS are migrated to LCFS with correct associations to their compliance reports via legacy_id.
  • Given I am a developer, when the process runs, fuel_supply records are grouped by group_uuid and incremented versions are assigned for supplemental reports.
  • Given I am a developer, when the process runs, changes between supplemental reports are accurately identified and reflected in the version chains.
  • Given I am a developer, when the process runs, all required fields (quantity, fuel_type_id, provision_of_the_act_id, etc.) are correctly mapped to LCFS.
  • Given I am a developer, when the process runs, Schedule D records are identified, and their provision type is set to GHGenius modelled.
  • Given I am running the ETL process, errors are logged, and processing continues without halting for other compliance reports.

Development Checklist:

  1. Data Extraction:

    • Query TFRS compliance reports to extract Schedule B records, grouped by compliance report ID.
    • Identify supplemental chains for each compliance report.
  2. Mapping and Comparison:

    • Compare Schedule B records within each chain to detect changes (e.g., added, modified, or removed records).
    • Map TFRS fields to LCFS fuel_supply fields:
      • quantityquantity
      • fuel_type_idfuel_type_id
      • fuel_class_idfuel_category_id
      • schedule_id → determine provision type:
        • Set GHGenius modelled for Schedule D records.
        • Map other provisions appropriately based on provision_of_the_act_id.
  3. Version Chaining:

    • Assign a group_uuid to all records within a compliance report chain.
    • Increment version for each supplemental report in the chain.
    • Set action_type based on the type of change (e.g., CREATE, UPDATE, or DELETE).
  4. Data Insertion:

    • Insert mapped records into the fuel_supply table with the correct version and compliance report associations via legacy_id.
    • Ensure Schedule D-derived records have provision type set to GHGenius modelled.
  5. Error Handling and Logging:

    • Implement robust error handling to log issues without interrupting the ETL process for other records.
    • Log comparisons and detected changes between supplemental reports.
  6. Testing and Validation:

    • Verify that all records are migrated accurately, including version chaining.
    • Validate associations with compliance reports in LCFS.
    • Test edge cases, such as compliance reports with multiple supplemental versions or significant changes in Schedule B records.
    • Confirm Schedule D records are correctly flagged and provision type is set to GHGenius modelled.
  7. Documentation:

    • Document the script’s purpose, logic, and field mappings for future reference and maintenance.
    • Provide detailed guidance for running and verifying the ETL process.
    • Note future requirements for editability of Schedule D-derived records.

Notes:

  • The legacy_id field in LCFS compliance reports will match the TFRS compliance report ID for associations.
  • Schedule D records in TFRS are treated differently, with provision type set to GHGenius modelled, allowing for future edits.
  • Use existing ETL scripts (e.g., compliance reports) as references for structuring and implementing version chaining.
  • Ensure accurate handling of changes in supplemental reports to maintain data integrity.
@AlexZorkin AlexZorkin added Medium Medium priority Task Work that does not directly impact the user labels Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Medium Medium priority Task Work that does not directly impact the user
Projects
None yet
Development

No branches or pull requests

1 participant