Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Reconstruction and AO2D Parameters in Metadata #13806

Open
miranov25 opened this issue Dec 14, 2024 · 1 comment
Open

Store Reconstruction and AO2D Parameters in Metadata #13806

miranov25 opened this issue Dec 14, 2024 · 1 comment

Comments

@miranov25
Copy link
Contributor

Store Reconstruction and AO2D Parameters in Metadata

Problem Description

Currently, many parameters used during reconstruction and AO2D creation are hardcoded into the code or configuration files. This creates challenges in adapting to changes in encoding or decoding configurations and complicates the process of querying reconstruction parameters after production. The absence of a standardized mechanism for storing reconstruction/AO2D metadata often necessitates manual searches through logs or source code.

Proposal

Store reconstruction parameters and AO2D creation settings as metadata directly in the AO2D files. This approach offers several advantages:

  1. Flexibility: Enables adaptation to parameter changes without requiring code modifications.
    • Example: The precision of encoded delta parameters is defined as sigma/nBins. If nBins changes (e.g., reduced to 2 for better compression), decoding can automatically adapt by referencing the updated metadata.
  2. Queryability: Allows direct retrieval of parameters used during AO2D creation or reconstruction, eliminating the need for log or source code searches.
  3. Consistency: Ensures alignment between encoding and decoding settings across all production steps.

Background

As discussed in the emails summarized below, using the CCDB to store such metadata may not be ideal:

  • The CCDB is designed to manage properties tied to data collection (e.g., time of data taking).
  • Reconstruction and AO2D parameters pertain to the production process rather than the raw data, requiring a separate mechanism.

Instead, embedding these parameters directly in AO2D metadata would provide a more robust solution. Similarly, metadata for reconstruction outputs (e.g., tpctracks, combinedtracks, itstracks) should also be included in their respective output files.

Suggested Implementation

  1. AO2D Metadata:
    • Add a section in AO2D files for reconstruction parameters, selections, and switches.
    • Ensure these parameters are easily accessible for future queries.
  2. Reconstruction Outputs:
    • Include metadata in files such as tpctracks and itstracks to record reconstruction settings.
    • Use minimal additional space by employing compact formats (e.g., JSON).

Benefits

  • Provides a centralized and accessible way to track production settings.
  • Reduces reliance on manual log searches or hardcoded assumptions
  • Avoids potential mismatches between encoding and decoding parameters in future data analyses.

Related Discussions

Email 1 (Summary):
The challenge of managing parameters like sigma/nBins for encoding and decoding. Including these settings in AO2D metadata would allow flexibility for future changes, such as adjusting nBins to reduce data volume.

Email 2 (Summary):
Storing reconstruction parameters in metadata provides significant advantages over CCDB for production-specific settings. The CCDB is designed for data-taking conditions, not reconstruction configurations. A metadata-based solution offers a scalable, space-efficient, and queryable approach to managing reconstruction and AO2D parameters.

@miranov25
Copy link
Contributor Author

Posting older e-mail exchange with David

Hi Marian,

Thank you for your detailed explanation and suggestions. I understand your points, and I agree that the disk space overhead for adding metadata is minimal. Having the metadata saved at the same time as the generation ensures a 1-to-1 correspondence, which is highly convenient and eliminates the need for additional CCDB maintenance.

As far as I understand, the AO2D metadata is implemented as a map<string, string> and is flexible enough to support additions without requiring changes to the O2 codebase. It should be straightforward to add entries to this map during AO2D production, which can then be queried during analysis. I suggest you explore this option further.


Context and Discussion

Email Excerpt from Marian on 25 Nov 2024:

Hello David and all,

Adding David, Sandro, and Andreas to the discussion as we addressed similar topics about a year ago during the reconstruction scan at GSI.

In my opinion, using the CCDB to store AO2D parameters (reconstruction parameters) is not suitable for this particular case.

The CCDB is designed to characterize properties tied to the time of data taking. However, in this context, we are looking to parameterize settings used during reconstruction or AO2D creation (production, test characterization). These settings are not attributes of the data collection period but rather pertain to the parameters, selections, and switches applied during reconstruction and AO2D preparation.

To keep track of these properties, I believe all relevant parameters, switches, and their default values should be stored as metadata within the AO2D files. This would make them easily accessible for later queries and provide a cleaner, more structured approach than the current system.

I propose extending this approach to the reconstruction outputs as well (e.g., tpctracks, combinedtracks, itstracks). The reconstruction output properties depend on the parameters used, and currently, retrieving these settings involves a cumbersome process of searching through log files (for changed parameters) and source code (for a specific GitHub version). This is inefficient and prone to errors.

Storing parameter values in metadata would require minimal additional space and is, in my opinion, the cleanest strategy for both AO2D and reconstruction outputs. In the past, we have encountered problems due to the lack of standardization in handling parameters during parameter scans, and it would be wise to avoid similar problems in the future. While we implemented a schema using JSON files (together with Jens) and I developed log parsers to support parameter tracking, David (Rohr) later introduced a workaround for TPC tracking. However, this workaround is not a general or scalable solution for managing parameter metadata.

Unless CCDB functionality can be adapted for parameter storage using appropriate keys, I strongly advocate storing these parameters as metadata alongside the reconstruction output (one set of metadata per file).

Please let me know your thoughts on this proposal and whether the CCDB approach you suggested could fulfill these requirements.

Response Summary:

As discussed, using AO2D metadata to store reconstruction parameters seems like the most viable and efficient approach. This avoids the limitations of CCDB for such purposes and provides a clean, flexible way to manage these parameters.

Thank you and best regards,
David

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant