-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datamodel v1 #49
datamodel v1 #49
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all: thanks for writing up this big PR - it's really good to have it all laid out and that was a lot of work.
I only went through 2 files -(jsonschema + mapping) so far (took most of my day). I will finish reviewing the other files tomorrow with a fresh mind, because this is important. However I wanted to get the accumulated comments out right away to get the process moving.
Here are my thoughts on some of the questions / notes you had:
Both attribute and data_key are needed
I haven't looked at the marshmallow code yet, but marshmallow 2 doesn't know about data_key
...
Creators type changed to name_type to avoid confusion with contributors type (role).
Contributors type
is a name type just like creators's type
. role
is a different field. So I would keep it as it is but add a "description"
to it to make it clear what type
means.
Should the recid be repeated? i.e. once (id) as part of the record, not the metadata, and then again recid inside the metadata.
How I see it: if a field is both metadata and administrative in some sense, it should be in the metadata only. This way there is no repetition / unsyncing and no hand-wringing about where to place fields.
Titles: Shall we add type "MainTitle" to be able to identify which is the main one in an easier way?
First title should be the main one. We can add 'MainTitle' to it, but shouldn't have to is my thinking there.
Shall
_created_by
field be required in Marshmallow (i.e. having a default value even if not required in the JSONSchema)
Yes, I think it should be required... but I also think it should be required in the JSONSchema :P
Closes 43:
I think resource_type should be required and the case of file upload without metadata should be the exception
-> this should apply to the other fields anyway. That goes to a discussion point in the review below about the required fields in the jsonschema.
Thanks again, I will finish up looking at things tomorrow.
@fenekku Thanks a lot for the PR, I know its humongous :`) I am answering most of your comments in this one. I answered some specif ones above. However, I will wait for others to review before fixing the rest (the ones not answered, I mostly agree and will implement the change) just waiting in order no to double work, just in case.
Since Marshmallow 2.x was for Python 2.x and it is not supported anymore, InvenioRDM should come with Marshmallow 3.x. Even though Invenio will support Marshmallow 2.x for a transition period we do not have to do so :hands_up:.
Noted, will change it.
Agree also, that's why I was "confused" lets see what others think about it.
I see, but if we do not add it. What type do users select for it? Depends on how we do deposit UI tho, maybe is a different part in the end.
Agree Thanks once again! |
After IRL dicsussion we have agreed (@fenekku , @zzacharo and me): 1- Keep the JsonSchema with required fields for realistic (currently in mind) use cases. Therefore, 2- In a similar fashion, not to change the access related fields until a consensus is reached. Mainly to have something that is working. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished full review. Really good to have something concrete to discuss. Thanks!
Re-order administrative fields alphabetically Re-order metadata fields into datacite's order Move back to initial access fields (until we have clear which ones to use) Remove globally required fields from JSONschema. Using only Marshmallow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with caveats for these outstanding points:
-
required fields for the subjects jsonschema :what is the status on this? (obviously I am rooting for my take :) )
-
removal of required=True on the Method field (doesn't do anything)
-
_embargo_date... stuff. I understand now. I would remove the comment about extra non-administrative fields (because they are -- anything that is not datacite core is administrative I think)
-
a test for Creator that is of type "Organizational" : we should be clear about how an organization would be filled
-
test for minimum record : given what you said we need to amend the test for minimal record, because a minimal record is bigger than what we have right now.
@fenekku Thanks! I think I answered all your questions in the corresponding comment. Except subject which I am leaving open for now. I think I have all fields that are minimal according to ours/datacite model, but please let me know if I forgot anything. As for:
I shelved to do further validation later on: #52 |
Contains fields according to discussed in the InvenioRDM Project meeting in January:
Notes/Questions:
All administrative fields are at the beginning preceded by underscore. Warning: There are some posts online about ES/Kibana not accepting fields prefixed with underscore. However, I did not find any official documentation except the one for Beats and it does allow them.
No enumeration has been set in the JSONSchema. The ones from Datacite are enforced/validated via Marshamllow schemas. However, they will be made configurable in the near future.
ES mappings contain only
object
. Since the current querying system does not support nested queries, even if the results are not optimal the user can still query the fields. Moreover, once nested queries are supported re-indexing with nested where necessary is not costly since the schema per se does not change.ES fields might require sub-fields (e.g. text/keyword to implement better search).
Opted for
fields.List()
rather thanmany=True
in marshmallow due to RFC: Remove Nested(many=True) to keep only the List(Nested()) form marshmallow-code/marshmallow#779.Both
attribute
anddata_key
are needed. See attribute used instead of data_key during deserialization marshmallow-code/marshmallow#791Creators
type
changed toname_type
to avoid confusion with contributorstype
(role).Should
owner
be required (e.g. at least one)?Should the
recid
be repeated? i.e. once (id
) as part of the record, not the metadata, and then againrecid
inside the metadata.Titles
: Shall we add type "MainTitle" to be able to identify which is the main one in an easier way?Shall
_created_by
field be required in Marshmallow (i.e. having a default value even if not required in the JSONSchema). This could be done with apre_load
er as is the case forpublication_date
.Dependencies:
Marshmallow
Missing:
Migrated here #54
Closes:
Closes #23
Closes #22
Closes #38 <-- I think it gets closed. Since it is slowly getting solved.
Closes #43 <-- I agree that a resource should have a
resource_type
. However @lnielsen mentioned that for workflow reasons (e.g. create a record when the user starts submitting files to they can be uploaded while the user fills the form) the required fields are minimal. However, we can put a default value. WDYT?