Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lobster Tries to Stream Gridpacks with XRootD #629

Open
klannon opened this issue Mar 14, 2018 · 2 comments
Open

Lobster Tries to Stream Gridpacks with XRootD #629

klannon opened this issue Mar 14, 2018 · 2 comments

Comments

@klannon
Copy link
Contributor

klannon commented Mar 14, 2018

As encountered by @Andrew42, when running with MultiProductionDataset Lobster blithely decides that it should stream gridpack files, even though CMSSW doesn't know how to do that. This leads to the gridpack file being passed into the config as root://deepthrought.crc.nd.edu://.... A workaround is to disable streaming, but if we are doing multistage production (i.e. GEN-SIM+DIGI-RECO+MiniAOD) that will mean none of the steps can stream inputs, since disable_input_streaming is a global parameter of StorageConfiguration. It would be nice to have finer grained control over XRootD streaming so that we could stream some input files but not others.

I can think of two options for accomplishing this:

  • Quick and Dirty: Provide an option for enabling streaming only for files that match a particular pattern. I would probably make the default value be .*\.root$ or something like that, so that only files that end in .root would be streamed unless the user changed that behavior.
  • Bigger Re-Engineering: We could make the StorageConfiguration object a property of the Workflow instead of the global Config for the whole Lobster run. This has the benefit of providing a lot more flexibility as each Workflow can have a separate input and output config, but I think this would require a major re-engineering of Lobster because every time files were being accessed (e.g. even in the master) you'd need to know which Workflow those files were coming from and load the appropriate config.

Although I like the thought of being more flexible, I'm leaning towards the "Quick and Dirty" solution. I suppose another response would be that nothing's broken so don't fix it. It's a feature; not a bug. Input welcome, especially from @annawoodard and @matz-e!

@annawoodard
Copy link
Contributor

annawoodard commented Mar 14, 2018

One alternative quick and dirty approach that would be more flexible than pattern matching but similarly simple would be to make disable_input_streaming a property of the Workflow (passed as an argument in the constructor) instead of the StorageConfiguration (so do not completely re-engineer everything, just that one property). Then instead of setting parameters['disable streaming'] here you would set it in Workflow.adjust here. Note that if you go that route, it would probably make sense to also make disable_stage_in_acceleration a property of the workflow.

I think that would completely solve this specific problem. So the next question would be: what are the other use cases of the bigger re-engineering approach, and are they worth the development effort?

@klannon
Copy link
Contributor Author

klannon commented Mar 14, 2018

@annawoodard: I like that suggestion. That's what I'll plan to do, unless I run into a problem when I start working out the implementation. Regarding the more expansive solution, no one is asking for this. The only use case I can dream up is one where, in a single Lobster project, you'd be coordinating a multistage/multisite production where, for example, you want to store GEN-SIM at ND, DIGI-RECO at the LPC CAF, and mini-AOD/nano-AOD at UVa, or something crazy like that. I think we can safely defer any idea of doing that until someone actually asks whether such a thing would be feasible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants