-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Entity and attribute names and formats for sample and diffraction plan shipment/upload #4
Comments
Thanks Karl for your very comprehensive starting point! Prior to the SLS darktime this is what we our users could provide prior to their experiment (by email): V6_TELLSamplesSpreadsheetTemplate.xlsx Our website heidi.psi.ch allowed users to validate their spreadsheets prior to emailing them to us. Our desktop sample changer GUI would also run the same sample import validation when the spreadsheet is uploaded prior to an experiment. Pydantic model: (https://github.com/HeidiProject/backend/blob/main/app/sample_models.py) |
What I like about both of these is that the column names appear to be scientist-friendly and completely decoupled from those in the database :) Here's some JSON Schema for a previous attempt at a one-shot shipment submission, intended to encompass both pin and plate shipments as well as retrieval of crystal coordinates when putting a plate onto a home source: https://icebear.fi/shiplink/v0_3_0/schema.json (Karl, you might remember this one, back in the day...) A more human-friendly representation is here: https://icebear.fi/shiplink/schemadoc/?schema=https://icebear.fi/shiplink/v0_3_0/schema.json Some of this doesn't make any sense to me after not having seen it for a few years, and there's some stuff missing, but nothing fundamentally wrong with it as far as I can see. |
Hi, Our column names are pretty similar to what Karl has described with some minor differences. The csv can be downloaded from here Parameters
Currently, we are adding more parameters from online data analysis, but it is still in a very immature state. |
Hi,
We are working to a new tempalte in Excel to apply some restrictions to the diffraction plan columns and then the user will need to export the file as csv and import it into py-ispyb-ui or exi |
Maybe too early, but a few comments about some of those items - also mainly to show the kind of connection one could do between the some items here and a dictionary like PDBx/mmCIF (the definitions there are also not perfect in some places, but it seems the best we have and is actively developed and maintained). "aimed Resolution" and "required Resolution":
"aimed multiplicity":
"aimed completeness":
Nothing mentioned above has any impact right now - apart from maybe a renaming of "Resolution" ;-) |
Hi @CV-GPhL I remember discussing 'aimed resolution' and 'required resolution' for quite a long time in a recent meeting. It was also mentioned the word 'desired'. I have no say about this. My opinion, at this stage of the project, is to encourage more scientists to participate in the discussions. I've tried to involve some at the ESRF with little (or zero) success
At least in my case, I have just copied and pasted what we have in the CSV example template. It is only for listing purposes. This should not be considered as the final name that will be used to define the metadata in the catalog, where I presume each implementation will have its own styles. |
As I said, this kind of discussion is maybe a bit too early (and others might join in at later stages). What is important is that a discussion about the "proper" (whatever that means) scientific definition of various categories has to happen before anything goes into production. At the moment we shouldn't really care what a box is called - it's just a name after all with only a very rough meaning. |
As a starting-point, below is documentation for the CSV format we currently use for this at Diamond.
I imagine we would want to agree on a standard for attribute names as well as a JSON format to replace this.
These are the CSV column names:
In our actual CSV files, the first line is a header which "dynamically" defines which columns you have and their ordering. So, you can have different columns and ordering for each file, just as long as the column names are ones we know about, and you have included the mandatory columns.
Here is an example - only the three first lines of data - and note that empty columns are ignored:
I assume many of the attribute/column names are familiar and self-explanatory, but here is some extra info:
subLocation
is an index referring to a position within a multipin sample.userPath
describes one or two levels of folders (folder1/folder2) that will be created inside the visit directory and into which the acquisition system will write diffraction images for the given sample.screenAndCollectRecipe
: can be "best", "all" or "none" (or empty). If using "best", then setscreenAndCollectNValue
to some integer, e.g. 3 if you want the best 3 samples from the group collected on. It has to be a value in the range 1 to 5.sampleGroup
: should be the name of a new group. If you want an existing group, use the group id.The following fields are mandatory:
proposalCode
,proposalNumber
,shippingName
dewarCode
(i.e. dewar name) +containerCode
+proteinAcronym
+proteinName
+sampleName
+sampleBarcode
Additionally, you can specify flags when you upload the file:
--queuecontainer
so that the container is queued for Unattended Data Collection (UDC)--highpriority|mediumpriority|lowpriority
so that the container is moved in the UDC queue (DLS staff only)--allowanyregcontainer
to use any puck, not just the ones associated with a proposal--allowmissingfacilitycode
so you dont need to specify a dewar facility codeValidation
If not successful, the uploader will abort with an error message. If there was a minor problem, then it will complete but with a warning message.
The warning messages are:
The error messages are:
The text was updated successfully, but these errors were encountered: