-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rationalize the storage of all DC2 data products #309
Comments
More generally, for Run1.1p, Run1.2p, Run1.2i, Run2.0i (omit?), and Run2.1i, we need the following data products to be saved in permanent storage and in standard locations (so they can be readily found): For on-sky simulations:
Calibration products:
Image processing related:
|
""" @villarrealas should have the final say but, as far as I understand, you are correct, @heather999 |
This is not exactly the nomenclature when dealing with --rerun syntax, and we should come to an agreement about rerun attributes. The CC pipeline currently only uses 2 reruns values, one for calexp production, and the other for coadd and multiband. Something like calexp-vX:coadd-vY has been proposed recently, with X=Y nominally. |
I'm all in favor of using |
I'd rather avoid it if I can, as it implies modifiying many scripts. At least for the time being... |
Some files under the Run2.0i instance catalogs ( |
Ack. Yes, we do care about the instance catalogs. I guess there is still a copy at Argonne, Antonio can confirm.
On 1/11/19 1:26 PM, Heather Kelly wrote:
Some files under the Run2.0i instance catalogs (/global/cscratch1/sd/desc/Run2.0i/instCat at NERSC on CSCRATCH have started to be purged as of Jan 7th. See: /global/cscratch1/sd/desc/.purged.20190107 We do not have a copy, as I was hoping we could sort out the permissions on this particular directory @katrinheitmann<https://github.com/katrinheitmann> @villarrealas<https://github.com/villarrealas> @jchiang87<https://github.com/jchiang87> Is there a copy at ANL? Do we care about Run2.0i instance catalogs? Fortunately, the Run2.0i outputs have already been copied to projecta
data: /global/cscratch1/sd/desc/Run2.0i/outputs =>/global/projecta/projectdirs/lsst/production/DC2_ImSim/Run2.0i/outputs
Anyway, I'll see about getting all the data organized... hopefully something we can discuss further an an upcoming CI meeting.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#309 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AMQ9jNe_E53oTEZd2dlErPlkizrmuVFSks5vCOVpgaJpZM4ZybSv>.
|
Hi! Is there a plan to archive data to HPSS at NERSC for old runs? That would free a lot of space in |
@JulienPeloton We can certainly do that (and ultimately we will!). We need to review our HPSS quota and current tape use. Migrating to HPSS really should be done when we feel we are effectively "done" actively working with some of the DC2 data. Transfer out of HPSS is very slow so I'm hesitant to start moving data to free up space until we know certain parts of the data are really unnecessary for immediate access.. Though we can certainly start copying portions of the data into HPSS just to get that done. There is also some question of what tape resource we are going to use - is IN2P3 planning to use their tape archive? I though there was also mention of ANL doing the same. It would be helpful to get that straight too. And if we use multiple tape archives, how do we keep track of it? |
Thanks @heather999 for your detailed answer! |
@danielsf Can I trouble you to point me to the instance catalogs for Run1.1p, Run1.2p, and Run1.2i? |
Keeping track of all the data here:
Is it those
|
@wmwv and @rearmstr should confirm, but I think you can delete everything in
i.e., everything in |
Be careful, if you delete everything under |
The |
I realize I put this in the meeting notes, but not this thread. Copying here: We don't need to keep the
We don't need to keep the
Contains
We can remove all of the We don't need any of the {{deepCoadd/?/????/?,?}} level directories once the deepCoadd is made. The actual coadd data are stored one level higher in deepCoadd/?/????. |
Removing the
And should the forced_source_catalog be considered for long term storage or is that too hot off the presses for now @wmwv ? |
|
At today's DM-DC2 meeting, we planned to review the directories that are stored under
Here is the breakdown for
Pinging @jchiang87 @wmwv @rearmstr |
The |
The |
@heather999 Given your great work to organized the DC2 products, I believe we can close this. Do you agree? |
NERSC CSCRATCH space is temporary. To avoid losing any Run2.xi data, we have started copying files over to
/global/projecta/projectdirs/lsst/production/DC2_ImSim/
This issue is meant to document the transfer, and sort out some remaining questions.Here is the copy plan for Run2.0i:
instance catalogs:
/global/cscratch1/sd/desc/Run2.0i/instCat
=>/global/projecta/projectdirs/lsst/production/DC2_ImSim/Run2.0i/instCat
data:
/global/cscratch1/sd/desc/Run2.0i/outputs
=>/global/projecta/projectdirs/lsst/production/DC2_ImSim/Run2.0i/outputs
Run2.1i:
instance catalogs:
/global/cscratch1/sd/desc/DC2/Run2.0i/cosmoDC2_v1.1.4
=>/global/projecta/projectdirs/lsst/production/DC2_ImSim/Run2.1i/cosmoDC2_v1.1.4
There are some files which lack proper permissions to allow copying, such as:
-rw-r----- 1 asv13 asv13 3411 Sep 22 10:18 /global/cscratch1/sd/desc/DC2/Run2.0i/instCat/edison_packed_submissions.py
To allow a clean copy, it is requested that these files have their permissions adjusted to allow reading by the
lsst
group.Concerning the Run2.1i instance catalogs, it is my understanding that this is the area used for production:
/global/cscratch1/sd/desc/DC2/Run2.0i/cosmoDC2_v1.1.4 and not:
/global/cscratch1/sd/desc/DC2/Run2.0i/Run2.1i/instCat`meaning this last area does NOT need to be copied over to projecta. Correct?
The text was updated successfully, but these errors were encountered: