-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: find files on disk #40
Comments
Could PPD rather create new Rucio containers with the list of kept files (hopefully they did the deletion by blocks, btw) ? Then users can do |
That would be a nice solution using Rucio. I am not sure that it fully addresses all the problems, because cmsDriver commands still use DAS queries to find files. Maybe we need to upgrade that functionality to use Rucio directly... @smuzaffar ? |
could also make |
Hi, which files to be put on disk is not a PPD scope, we give fractions and DM implements the exact details. I am in favor of moving to whatever functionality to query on disk. |
I am not sure this is about capability of changing the |
I made this request for You can search this codebase, or look at the help/examples from the executable, and see that rucio is included as a source. |
@todor-ivanov just try |
DAS supports queries to Rucio, as it did for Phedex, and it aggregates results from both DBS and Rucio. Therefore, this request is valid and can be implemented in DAS. But as I replied to Kevin in private email my time is no longer allocated to DAS development. To make appropriate changes the following steps should be done:
The appropriate syntax of DAS queries should be modified as following:
So, the amount of work is not negligible as it requires changes in DAS QL, DAS APIs, DAS maps, but it is doable. |
thanks @DickyChant IMHO the simplest solution is:
That said, none of this in my scope. |
Just to clarify, on my comment here: #40 (comment). Since it is an issue created in the So, I think this issue would be better moved to the DAS repository: https://github.com/dmwm/das2go, because I suspect the information cold indeed be already accessible through the DAS link to Rucio. We just need to crosscheck if we make the proper queries to Rucio and if we convert them to the proper fields in the structure returned by DAS: https://github.com/dmwm/das2go/blob/master/services/rucio.go |
BTW, @kpedro88 A clever approach (pointed to me by @mapellidario in a private chat) could actually help identify the location for every file. And as seen in the query output, the storage type is already provided. Of course this would require external output aggregation, but at least the information is already retrievable. Wouldn't that suffice? BTW, Thanks @mapellidario for the clever suggestion! [1]
[2]
|
thanks @todor-ivanov and @mapellidario . Hopefully it will be enough to check block locations, not file-by-file. In any case changes to cmsdriver will be needed, so maintainer of that tool can decide if to use dasgoclient and parse the JSON or push for DM to create new container names and use rucio CLI. |
hi @belforte
Unfortunately, it wont work on the block level granularity [1]. Probably, because Rucio does not know about our concept of blocks or maybe because the go code does not make the full translation between CMS vs. Rucio data abstractions (blocks vs. datasets) for this query, I have not investigated for the moment. In case I get a request that a more complex filtering and aggregation needs to happen on the client side, then I'll have to dive deeper in the code anyway. [1]
|
Uh, thanks Stefano for the suggestion! After playing a bit with dasgoclient I realized that removing the just for completeness, the list of blocks for a given dataset can be obtained with [2] [1]
[2]
|
No matter what, the core issue is that cmsdriver assumes that the pileup dataset is fully on disk. So it needs to be modified. As long as it accepts only DBS datasets, they can not overlap, so new dataset names (as containers in Rucio) will not help although they look an good thing to me. My 2c's are that changing cmsdriver to work with current tools "the way they are now" is faster than changing it to use a yet-to-be-developed new dasgoclient functionality of the sort |
PPD recently deleted large fractions of existing premixed pileup input samples in order to make room for the new (2024) premixed pileup input sample. This means that DAS queries for the lists of files in those samples return many files that are now only on tape, and jobs that try to read those files will obviously fail.
For users, I recently implemented a simple script to get a list of files on disk using Rucio: https://github.com/FNALLPC/lpc-scripts/blob/master/get_files_on_disk.py. This is probably not the optimal set of Rucio queries, but I had some snippet from years ago and just wrapped it up into a nicer/more generalized script.
Is it possible to add a custom dasgoclient query that only finds files on disk? The information should all be there (since Rucio is one of the sources), it's just a matter of how to extract it in a suitable way. This would help both production operations and users.
The text was updated successfully, but these errors were encountered: