Updated check: version number #124

kathryn-ods · 2024-08-14T10:35:25Z

Currently in cove as inconsistent_schema_version_used this needs to be rewritten to allow for inconsistent minor versions.

Check: all Statements MUST have the same major version number.

On fail:

Error message: Statements have different major version numbers.
Info message: Version number (bodsVersion): [VALUE], Version number (bodsVersion): [VALUE2]

kathryn-ods · 2024-08-14T10:55:45Z

@radix0000 does it make sense to implement this test at this point in time? Because it's only invalid if the major values don't match one of the invalid values would need to include a statement with e.g. "1.0" and "0.4" as 1.0 doesn't exist yet would that be flagged up for not being a valid bods version as well as having inconsistent values?

kathryn-ods · 2024-08-19T14:59:40Z

@kd-ods you might be able to advise on the above now you're back

kd-ods · 2024-11-05T10:26:05Z

This is a special kind of check, since the outcome relates to how the whole dataset is processed. I think we should hold off implementing this. Pre- v1 things are having to be handled a little differently.

For future reference this is where I think we are and where we are going:

At this point (following the BODS 0.4 release)

When it comes to the DRT 'choosing' which version of the schema to validate a dataset against. It looks at the first statement in the dataset and:

if it has no publicationDetails.bodsVersion field, validates against BODS 0.1
if it has a publicationDetails.bodsVersion field with a valid BODS version, validates against it
if it has a publicationDetails.bodsVersion field with an invalid BODS version, validates against the latest version of BODS

@radix0000 - is that right? (We should document exactly what the process is.)

After BODS v1

This check, that 'all Statements MUST have the same major version number.' is done as part of the initial parsing of the data.

It passes if either (a) no statement has a publicationDetails.bodsVersion field or (b) all statements have a publicationDetails.bodsVersion field and all Statements have the same major version number
It fails if (c) some statements have a publicationDetails.bodsVersion field and some don't or (d) all statements have a publicationDetails.bodsVersion field but not the same major version number.

On fail: the dataset is not validated and the user gets an informative error message

On pass (case (a)): the dataset is validated against BODS 0.1

On pass (case (b)): the dataset is validated against the the latest MINOR.PATCH version release for the given MAJOR version number.

Reflections

Having worked through all that.... maybe post BODS v1 we should actually do a complete overhaul of the DRT too. We could relegate work so far to a 'beta' version then clean everything up for a v1 of the DRT. Then direct pre BODS v1 users to the beta version of the tool and BODS v1 + users to the new release. Then we don't need to maintain any overly-complicated BODS version-handling.

radix0000 · 2024-11-05T12:17:34Z

@kd-ods Re DRT choosing a schema version, it is slightly more complicated that (because as well as not being present, the cases where bodsVersion isn't a string, or isn't in list of known versions need to be covered), but the main tweak I have introduced is that it detects whether it is record-based (i.e. if it has "recordDetails", "recordId" or "recordType" in the statement), and if so it doesn't use BODS 0.1 as the default, instead it uses the latest version (i.e. currently 0.4). Having these 2 categories record-based and non-record-based and having different defaults for each seems sensible to me (given how different they are) but let me know what you think. There is a question of what the best defaults are as well (e.g. out of 0.1, 0.2, and 0.3 what is the "most used" version and should we be using that as the default for non-record-based data?).

kd-ods · 2024-11-08T10:56:41Z

Ah, thanks @radix0000. So is this a correct summary of what happens atm?

The entire dataset is validated against a single schema version.
The schema version is selected based on the contents of the first Statement in the array.
If that first statement is 'record-based' the whole dataset is validated against bodsVersion (if it is present and valid). If that field is not present and valid then validation is against BODS 0.4.
If that first statement is not record-based the whole dataset is validated against bodsVersion (if it is present and valid). If that field is not present and valid then validation is against BODS 0.1.

(If so - that looks sensible to me.)

kathryn-ods added this to Data Review Tool update (for BODS 0.4) Aug 14, 2024

kathryn-ods converted this from a draft issue Aug 14, 2024

kathryn-ods assigned kathryn-ods and kd-ods and unassigned kathryn-ods Aug 21, 2024

kd-ods removed the status in Data Review Tool update (for BODS 0.4) Jan 14, 2025

kd-ods removed this from Data Review Tool update (for BODS 0.4) Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated check: version number #124

Updated check: version number #124

kathryn-ods commented Aug 14, 2024 •

edited

Loading

kathryn-ods commented Aug 14, 2024

kathryn-ods commented Aug 19, 2024

kd-ods commented Nov 5, 2024

radix0000 commented Nov 5, 2024

kd-ods commented Nov 8, 2024

Updated check: version number #124

Updated check: version number #124

Comments

kathryn-ods commented Aug 14, 2024 • edited Loading

kathryn-ods commented Aug 14, 2024

kathryn-ods commented Aug 19, 2024

kd-ods commented Nov 5, 2024

At this point (following the BODS 0.4 release)

After BODS v1

Reflections

radix0000 commented Nov 5, 2024

kd-ods commented Nov 8, 2024

kathryn-ods commented Aug 14, 2024 •

edited

Loading