Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create PathProvider that uses numtracker #991

Open
callumforrester opened this issue Jan 15, 2025 · 7 comments
Open

Create PathProvider that uses numtracker #991

callumforrester opened this issue Jan 15, 2025 · 7 comments
Labels
enhancement New feature or request

Comments

@callumforrester
Copy link
Contributor

callumforrester commented Jan 15, 2025

We now have an instance of numtracker deployed at https://numtracker.diamond.ac.uk. We should write a PathProvider iterating on StaticVisitPathProvider to make use of it.

Acceptance Criteria

  • TBD
@callumforrester callumforrester added the enhancement New feature or request label Jan 15, 2025
@callumforrester
Copy link
Contributor Author

@DiamondJoseph Where are we going to get visit information from? Does it still have to be hard-coded for the time being?

@DiamondJoseph
Copy link
Contributor

@callumforrester I believe we thought that visit should be passed as part of the request to blueapi- or an extra param when logging in that is cached and passed with every request? Since could collect data to commisioning visit then switch to live visit after beamline configured?

@callumforrester
Copy link
Contributor Author

In which case we start here? DiamondLightSource/blueapi#552

@DiamondJoseph
Copy link
Contributor

I believe that ticket is passing the metadata into the documents, not how it is being passed into the run/pod. (e.g. instrument could be a configuration value that gets mounted as an env var: it doesn't need to be passed into each run command, scan number is gotten from numtracker, and shouldn't (?) be overridden).

I think the ticket you're looking for is #452

@DiamondJoseph
Copy link
Contributor

Although #452 probably belongs in blueapi

@callumforrester
Copy link
Contributor Author

To get this straight...

The current problem is that we have a messy decorator on our plans that triggers the global PathProvider defined in dodal to update itself. This breaks if you are not using ophyd-async devices (see DiamondLightSource/blueapi#784) and is less portable.

The current idea is to write a PathProvider that hooks into the RunEngine's start hook meaning that whenever a run starts, the scan counter increments. See below:

sequenceDiagram
    Client->>RunEngine: run plan
    RunEngine->>Scan Number Source: next number?
    Scan Number Source->>RunEngine: <number>
    RunEngine->>Detector: prepare
    Detector->>PathProvider: path?
    PathProvider->>RunEngine: number?
    RunEngine->>PathProvider: <number>
    PathProvider->>Detector: <path(<number>)>
    RunEngine->>Client: run start(<number>)
Loading

For added context we are thinking that the scan number source will be https://github.com/DiamondLightSource/numtracker

We would create and inject this provider in blueapi (where the RunEngine is), dodal modules would default to a static path provider that writes to /tmp, for example, and blueapi overrides it.

Take a simple use case with a single plan, running a single scan (run) using a single detector. This would call the scan number source once, get a new number and produce an HDF5 file with that number, and a RunStart document with that same number.

def very_simple_plan(det: StandardDetector) -> MsgGenerator[None]:
    yield from count(det, num=5)

# Output files
# |-> ixx-1234-det.h5
# |-- ixx-1234.nxs

Downstream we would process documents associated with each RunStart into a single NeXus file so this scenario would still result in a single NeXus file with the same scan number (1234).

Bluesky supports multiple runs per plan, for example:

def simple_plan(det: StandardDetector) -> MsgGenerator[None]:
    yield from count(det, num=5)
    yield from count(det, num=10)

# Output files
# |-> ixx-1235-det.h5
# |-- ixx-1235.nxs
#
# |-> ixx-1236-det.h5
# |-- ixx-1236.nxs

Each count plan separately prepares and unstages the detectors, meaning that a new file and scan number is produced for each run.

The other use case we need to support is multiple runs linking to the same file, primarily for detector performance reasons. For example

def plan(det: StandardDetector) -> MsgGenerator[None]:
    yield from stage(det)

    # Actually causes the HDF5 AD plugin to open the file for writing
    yield from prepare(det, ...)
    for num_frames in [5, 8, 2]:
        yield from bps.open_run()
        for i in num_frames:
            yield from bps.one_shot(det)
        yield from bps.close_run()
    yield from unstage(det) 

# Output files
# |-> ixx-1237-det.h5
# |-- ixx-1237.nxs
# |-- ixx-1238.nxs
# |-- ixx-1239.nxs

This leaves us with default behaviour that is largely customizable.

  • Every run has a unique ID in accordance with DLS scan numbers if using blueapi.
  • Every run results in a NeXus file, can change by configuring the downstream NeXus writer service or writing your own.
  • The HDF5 file name is derived from only one of the run numbers when multiple runs reference the same file. Can be changed by writing your own PathProvider that names things accordingly.
  • For developing/debugging devices directly in a Python shell, dodal has a default global PathProvider that writes to a sensible default location (e.g. /tmp). We would need to document this clearly. Can be changed by setting the global singleton prior to instantiating devices.

Tagging @DominicOram and @olliesilvester for thoughts about how well this would support MX use cases.

@DominicOram
Copy link
Contributor

This works for some of our usecases but we will also need the ability for one run to produce one hdf file but multiple nexus files. This is because we have one hardware fly scan that we want to split into two nexus files. We do this currently by doing something like:

def plan(det: StandardDetector) -> MsgGenerator[None]:
    yield from stage(det)

    # Actually causes the HDF5 AD plugin to open the file for writing
    yield from prepare(det, ...)
    yield from bps.open_run(md={"nxs_indexes": [0, 100]})
    yield from bps.kickoff(det)
    yield from bps.kickoff(complete)
    yield from bps.close_run()
    yield from unstage(det) 

# Output files
# |-> sample_name_1_000001.h5
# |-> sample_name_1_000002.h5
# |-- sample_name_1.nxs
# |-- sample_name_2.nxs

I think this is covered by Every run results in a NeXus file, can change by configuring the downstream NeXus writer service or writing your own.

There is also the added complication, which you haven't covered above that the Eiger will actually spit out multiple h5 files as it will only put max 1000 frames in each. I don't think this makes a difference to your assumptions but worth mentioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants