Create `PathProvider` that uses numtracker #991

callumforrester · 2025-01-15T13:18:08Z

We now have an instance of numtracker deployed at https://numtracker.diamond.ac.uk. We should write a PathProvider iterating on StaticVisitPathProvider to make use of it.

Acceptance Criteria

TBD

The text was updated successfully, but these errors were encountered:

callumforrester · 2025-01-15T13:18:50Z

@DiamondJoseph Where are we going to get visit information from? Does it still have to be hard-coded for the time being?

DiamondJoseph · 2025-01-17T14:30:50Z

@callumforrester I believe we thought that visit should be passed as part of the request to blueapi- or an extra param when logging in that is cached and passed with every request? Since could collect data to commisioning visit then switch to live visit after beamline configured?

callumforrester · 2025-01-20T10:52:01Z

In which case we start here? DiamondLightSource/blueapi#552

DiamondJoseph · 2025-01-20T10:55:46Z

I believe that ticket is passing the metadata into the documents, not how it is being passed into the run/pod. (e.g. instrument could be a configuration value that gets mounted as an env var: it doesn't need to be passed into each run command, scan number is gotten from numtracker, and shouldn't (?) be overridden).

I think the ticket you're looking for is #452

DiamondJoseph · 2025-01-20T10:56:36Z

Although #452 probably belongs in blueapi

callumforrester · 2025-01-23T12:11:48Z

To get this straight...

The current problem is that we have a messy decorator on our plans that triggers the global PathProvider defined in dodal to update itself. This breaks if you are not using ophyd-async devices (see DiamondLightSource/blueapi#784) and is less portable.

The current idea is to write a PathProvider that hooks into the RunEngine's start hook meaning that whenever a run starts, the scan counter increments. See below:

sequenceDiagram
    Client->>RunEngine: run plan
    RunEngine->>Scan Number Source: next number?
    Scan Number Source->>RunEngine: <number>
    RunEngine->>Detector: prepare
    Detector->>PathProvider: path?
    PathProvider->>RunEngine: number?
    RunEngine->>PathProvider: <number>
    PathProvider->>Detector: <path(<number>)>
    RunEngine->>Client: run start(<number>)

For added context we are thinking that the scan number source will be https://github.com/DiamondLightSource/numtracker

We would create and inject this provider in blueapi (where the RunEngine is), dodal modules would default to a static path provider that writes to /tmp, for example, and blueapi overrides it.

Take a simple use case with a single plan, running a single scan (run) using a single detector. This would call the scan number source once, get a new number and produce an HDF5 file with that number, and a RunStart document with that same number.

def very_simple_plan(det: StandardDetector) -> MsgGenerator[None]:
    yield from count(det, num=5)

# Output files
# |-> ixx-1234-det.h5
# |-- ixx-1234.nxs

Downstream we would process documents associated with each RunStart into a single NeXus file so this scenario would still result in a single NeXus file with the same scan number (1234).

Bluesky supports multiple runs per plan, for example:

def simple_plan(det: StandardDetector) -> MsgGenerator[None]:
    yield from count(det, num=5)
    yield from count(det, num=10)

# Output files
# |-> ixx-1235-det.h5
# |-- ixx-1235.nxs
#
# |-> ixx-1236-det.h5
# |-- ixx-1236.nxs

Each count plan separately prepares and unstages the detectors, meaning that a new file and scan number is produced for each run.

The other use case we need to support is multiple runs linking to the same file, primarily for detector performance reasons. For example

def plan(det: StandardDetector) -> MsgGenerator[None]:
    yield from stage(det)

    # Actually causes the HDF5 AD plugin to open the file for writing
    yield from prepare(det, ...)
    for num_frames in [5, 8, 2]:
        yield from bps.open_run()
        for i in num_frames:
            yield from bps.one_shot(det)
        yield from bps.close_run()
    yield from unstage(det) 

# Output files
# |-> ixx-1237-det.h5
# |-- ixx-1237.nxs
# |-- ixx-1238.nxs
# |-- ixx-1239.nxs

This leaves us with default behaviour that is largely customizable.

Every run has a unique ID in accordance with DLS scan numbers if using blueapi.
Every run results in a NeXus file, can change by configuring the downstream NeXus writer service or writing your own.
The HDF5 file name is derived from only one of the run numbers when multiple runs reference the same file. Can be changed by writing your own PathProvider that names things accordingly.
For developing/debugging devices directly in a Python shell, dodal has a default global PathProvider that writes to a sensible default location (e.g. /tmp). We would need to document this clearly. Can be changed by setting the global singleton prior to instantiating devices.

Tagging @DominicOram and @olliesilvester for thoughts about how well this would support MX use cases.

DominicOram · 2025-01-23T14:59:19Z

This works for some of our usecases but we will also need the ability for one run to produce one hdf file but multiple nexus files. This is because we have one hardware fly scan that we want to split into two nexus files. We do this currently by doing something like:

def plan(det: StandardDetector) -> MsgGenerator[None]:
    yield from stage(det)

    # Actually causes the HDF5 AD plugin to open the file for writing
    yield from prepare(det, ...)
    yield from bps.open_run(md={"nxs_indexes": [0, 100]})
    yield from bps.kickoff(det)
    yield from bps.kickoff(complete)
    yield from bps.close_run()
    yield from unstage(det) 

# Output files
# |-> sample_name_1_000001.h5
# |-> sample_name_1_000002.h5
# |-- sample_name_1.nxs
# |-- sample_name_2.nxs

I think this is covered by Every run results in a NeXus file, can change by configuring the downstream NeXus writer service or writing your own.

There is also the added complication, which you haven't covered above that the Eiger will actually spit out multiple h5 files as it will only put max 1000 frames in each. I don't think this makes a difference to your assumptions but worth mentioning.

callumforrester added the enhancement New feature or request label Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create `PathProvider` that uses numtracker #991

Create `PathProvider` that uses numtracker #991

callumforrester commented Jan 15, 2025 •

edited

Loading

callumforrester commented Jan 15, 2025

DiamondJoseph commented Jan 17, 2025

callumforrester commented Jan 20, 2025

DiamondJoseph commented Jan 20, 2025

DiamondJoseph commented Jan 20, 2025

callumforrester commented Jan 23, 2025

DominicOram commented Jan 23, 2025

Create PathProvider that uses numtracker #991

Create PathProvider that uses numtracker #991

Comments

callumforrester commented Jan 15, 2025 • edited Loading

Acceptance Criteria

callumforrester commented Jan 15, 2025

DiamondJoseph commented Jan 17, 2025

callumforrester commented Jan 20, 2025

DiamondJoseph commented Jan 20, 2025

DiamondJoseph commented Jan 20, 2025

callumforrester commented Jan 23, 2025

DominicOram commented Jan 23, 2025

Create `PathProvider` that uses numtracker #991

Create `PathProvider` that uses numtracker #991

callumforrester commented Jan 15, 2025 •

edited

Loading