Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up a database at CSD3 #2

Open
nsevilla opened this issue Oct 29, 2024 · 2 comments
Open

Set up a database at CSD3 #2

nsevilla opened this issue Oct 29, 2024 · 2 comments

Comments

@nsevilla
Copy link
Contributor

According to George Beckett:

CSD3 recommend we set up a PostgreSQL database in a service call ARCAS: this is a cloud platform that sits next to CSD3 (I think it is in the same racks). The only limitation they note is that ARCAS VMs cannot mount the (Lustre) work file system of CSD3. I'm assuming this isn't an issue for our case?
Another option, which we have used when running on CSD3, is to use an SQLite database, though this may/ may not be capable enough for what you have planned.
10:17
Assuming the ARCAS platform sounds suitable, could you let me know roughly what specs we would need for the database (number of cores, memory, and disk space).

@nsevilla
Copy link
Contributor Author

nsevilla commented Nov 7, 2024

After some discussion, we think that the PostgresSQL database to be set up at the ARCAS cloud system does not need to see the data directly.

Jim: I’m not sure that the db itself needs to be able to access the files on disk. The LSST software that queries the db does certainly.
George: Yes, I wasn't sure if PostgreSQL had an API that let it source table data from a remote file system so, perhaps, the key is that the Pipeline (running on CSD3 compute nodes) can see both the PostgreSQL database and the Lustre file system and mediates between the two in terms of writing metadata to the Butler registry or reading metadata from the Butler registry?
Heather: If Jim is correct (and I believe he is) - then the LSST Sci Pipelines code would handle that mediation. The Postgres db is just a registry that provides the necessary information for the LSST Sci Pipelines to find the data in the "data store".

@heather999
Copy link

Jim reminded us we should try to use Postgresql 16 to match the version Rubin is using

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants