Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Access type READ_LINEAR #1291

Merged
merged 14 commits into from
Feb 13, 2023
Merged

New Access type READ_LINEAR #1291

merged 14 commits into from
Feb 13, 2023

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented Jun 9, 2022

Situation:
Using ADIOS2 steps in openPMD was always a bit challenging due to slightly incompatible IO workflows.
Traditionally in openPMD, we parse a Series upon opening it and users expect to use random-access for navigating through the Series. Alternatively, in ADIOS2 one can parse each iteration only after opening its corresponding IO step. With an IO step opened, ADIOS2 will only show the data pertaining to exactly that step and no other data (except for attributes that are designed to be constant). With the novel BP5 engine, this distinction is now mandatory to make since it has two access modes: adios2::Mode::Read and adios2::Mode::ReadRandomAccess.

Also adds the missing Append mode to the Python bindings.

This PR adds a new read mode READ_LINEAR and slightly restricts the usage of READ_ONLY:

  • READ_ONLY corresponds to our traditional workflow of parsing everything up-front and then not again. It is implemented in the backend via adios2::Mode::ReadRandomAccess.
    New in this PR: If the engine supports it at all, the Series will be parsed before opening any IO step. In variable-based iteration encoding, this means that data from a later iteration might leak into the first.
    Iterations that were parsed up front will not be parsed again upon opening their corresponding IO step.
    Optionally: Stop using IO steps altogether in this access mode in front- and backend. That would be API breaking for streaming (all other API breakage of this PR only affects the experimental new ADIOS2 schema), but with streaming, the other read mode makes more sense anyway.
  • READ_LINEAR: Don't parse anything upon opening the Series. To get any data, Series::readIterations() must be used.

TODO

  • Don't use steps at all in READ_ONLY (-> breaking change since this makes READ_LINEAR required for streaming workflows)
    → Probably best for the next release: Give a runtime warning when using READ_ONLY in contexts that should require READ_LINEAR (such as streaming), and disable the functionality altogether in the release after that
  • Memory optimization: Delete old iterations in READ_LINEAR mode (also ok for a future PR)
  • Merge Parsing logic: fail gracefully on unexpected input #1237 first
  • ADIOS 2.7 compatibility
  • Actually test BP5, especially with old iteration encoding
  • Documentation
  • Test that Streaming keeps working with READ_ONLY mode for now
  • What happens if writing a Series with /data/snapshot attribute and then reading in random-access mode
    -> /data/snapshot gets properly ignored, bp4_steps tests this
  • Ignore /data/snapshot attribute if the backend reports that it prefers upfront parsing

Commits:

Basic implementation
9a7fab7 Prepare scaffolding for Access::ReadLinearly
6f0e74c Adapt frontend to linear read mode
ed40837 Make OPEN_FILE task parameters writable
73bae08 ADIOS2 implementation

Testing and bindings
0b64fa2 Necessary fixes for tests
8dad845 Add and adapt Python bindings
0ff801b Testing

Documentation
b06d257 Documentation: C++ and Python
fbca5e8 rst documentation

Allow parsing global attributes before opening first iteration¹
c841156 Make SeriesIterator into a handle
357e487 Reading attributes without needing to open the first step

Workaround for ADIOS2 aggregation issue
891cba0 Don't write attributes too early

Delete old iterations from data structures ²
dd4228a Delete old iterations in READ_LINEAR mode
87d2004 DEREGISTER task

¹) In READ_LINEAR, the Series object is completely empty after constructing it. This is weird when we want to read global attributes:

Series s("data.bp", Access::READ_LINEAR);
// no attributes available
for(auto iteration: s.readIterations())
{
    // Attributes are now readable
    // It's weird having to open the first iteration for this
}

Solution:

Series s("data.bp", Access::READ_LINEAR);
// no attributes available
s.readIterations();
// global attributes can now be read, no iterations yet accessible
for(auto iteration: s.readIterations()) ...

²) In READ_LINEAR mode, iterations can only be accessed via series.readIterations(). We should not store them into series.iterations (except the currently active one), to avoid (1) users getting confused why old iterations are still there and (2) slowly rising memory usage in long-running simulation workflows.

@lgtm-com
Copy link
Contributor

lgtm-com bot commented Jun 9, 2022

This pull request introduces 2 alerts when merging 77522f9 into 20e271b - view on LGTM.com

new alerts:

  • 2 for Missing return statement

@lgtm-com
Copy link
Contributor

lgtm-com bot commented Jun 10, 2022

This pull request introduces 2 alerts when merging 018608b into 20e271b - view on LGTM.com

new alerts:

  • 2 for Missing return statement

@lgtm-com
Copy link
Contributor

lgtm-com bot commented Jun 13, 2022

This pull request introduces 2 alerts when merging 5b63442 into 20e271b - view on LGTM.com

new alerts:

  • 2 for Missing return statement

@franzpoeschel franzpoeschel marked this pull request as ready for review June 13, 2022 15:00
@franzpoeschel franzpoeschel force-pushed the topic-linear-read branch 2 times, most recently from d581733 to 43d1316 Compare June 17, 2022 13:56
@franzpoeschel franzpoeschel force-pushed the topic-linear-read branch 2 times, most recently from a62161d to ff8f8e8 Compare June 22, 2022 11:37
@franzpoeschel franzpoeschel force-pushed the topic-linear-read branch 3 times, most recently from 35ae314 to be43a49 Compare July 11, 2022 10:27
@franzpoeschel franzpoeschel force-pushed the topic-linear-read branch 2 times, most recently from df20208 to 148706d Compare July 19, 2022 09:57
franzpoeschel added a commit to franzpoeschel/openPMD-api that referenced this pull request Jul 25, 2022
Currently only available for BP5 engine, will be generalized into Linear
read mode in openPMD#1291.
If the backend does not support the snapshot attribute, then iterate in
ascending order, skipping duplicate and non-linear iteration indices.
Not possible if the Series is parsed ahead of time.
@franzpoeschel franzpoeschel force-pushed the topic-linear-read branch 4 times, most recently from 54aec65 to 5681b62 Compare July 27, 2022 12:20
franzpoeschel added a commit to franzpoeschel/openPMD-api that referenced this pull request Jul 29, 2022
Currently only available for BP5 engine, will be generalized into Linear
read mode in openPMD#1291.
If the backend does not support the snapshot attribute, then iterate in
ascending order, skipping duplicate and non-linear iteration indices.
Not possible if the Series is parsed ahead of time.
franzpoeschel added a commit to franzpoeschel/openPMD-api that referenced this pull request Aug 15, 2022
Currently only available for BP5 engine, will be generalized into Linear
read mode in openPMD#1291.
If the backend does not support the snapshot attribute, then iterate in
ascending order, skipping duplicate and non-linear iteration indices.
Not possible if the Series is parsed ahead of time.
m_mode = adios2::Mode::ReadRandomAccess;
// Overwrite the old IO object since it is stained with the
// objects of the now closed engine
create_IO();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem: This deletes all configuration

src/ReadIterations.cpp Fixed Show fixed Hide fixed
ReadIterations::ReadIterations(Series series) : m_series(std::move(series))
{}
ReadIterations::ReadIterations(
Series series,

Check notice

Code scanning / CodeQL

Large object passed by value

This parameter of type [Series](1) is 80 bytes - consider passing a const pointer/reference instead.
Use access::readOnly for runtime checks in backends
Needed so backends can give feedback on their preferred parse mode.
Commit with proper tests will follow later, this is just to highlight
the API changes brought forth by this PR (API changes only for things
that were dev and/or experimental)
Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me 🚀 ✨

Thank you also for the clear structure for reviewing 🙏

@@ -4923,6 +4925,17 @@ TEST_CASE("bp4_steps", "[serial][adios2]")
bp4_steps("../samples/nullcore.bp", nullcore, "");
bp4_steps("../samples/bp4steps_default.bp", "{}", "{}");

// bp4_steps(
Copy link
Member

@ax3l ax3l Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just flagging: these tests are still commented - intentional?
Remove dead code in favor of a comment?

ParseMode::WithSnapshot,
jsonConfigOld);
// This test config does not make sense
// append_mode(
Copy link
Member

@ax3l ax3l Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just flagging: this test is still commented - intentional?
Remove and just leave a comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants