Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to model groups of optional fields? #300

Open
mtcalvin opened this issue Dec 21, 2023 · 3 comments
Open

Best way to model groups of optional fields? #300

mtcalvin opened this issue Dec 21, 2023 · 3 comments
Labels
question Further information is requested

Comments

@mtcalvin
Copy link

This is a somewhat contrived example, but it's simple enough to illustrate the issue without divulging any private details of the real thing I'm dealing with.

Imagine I have a list of structs representing times when an event happened, all of which must always contain a valid (year, month, day), and some of which also include additional resolution (hour, minute, second). I want to assert that if any of these additional fields are included, then all of them are, and all of them are valid.

(this is probably not the best way of modeling things, but in my case I'm trying to retrofit an ISL to describe the schema of an existing data format, so I'm not completely at liberty to change it).

  • Valid:
    • {year: 2023, month: 12, day: 21}
    • {year: 2023, month: 12, day: 21, hour: 13, minute: 50, second: 00}
  • Invalid:
    • {year: 2023, month: 12, day: 21, hour: 13}
    • {year: 2023, month: 12, day: 21, hour: "oops", minute: "oops", second: "oops"}

One naive way I might try to model this as ISL:

type::{
  name: Date,
  type: struct,
  fields: {
    year: { type: int, occurs: 1, range: [0, max] },
    month: { type: int, occurs: 1, range: [1, 12] },
    day: { type: int, occurs: 1, range: [1, 31] }
  }
}

type::{
  name: Time,
  type: struct,
  fields: {
    hour: { type: int, occurs: 1, range: [0, 23] },
    minute: { type: int, occurs: 1, range: [0, 59] },
    second: { type: int, occurs: 1, range: [0, 59] }
  }
}

type::{
  name: EventTime,
  one_of:[
    Date,
    { all_of: [Date, Time] }
  ]
}

In the above schema, the "Invalid" cases I listed above are considered valid EventTime, because one_of matches Date, which is not a closed struct. But if I change Date to have fields: closed::{ ... }, then the all_of in EventTime can never be satisfied. This is similarly the case if I rework the schema to have a DateTime which has type: Date and adds the fields from Time (AFAIK, you can't inherit from a closed-struct and add more fields to it).

So far the only solution I've been able to come up with is to duplicate the definitions of the year, month, and day fields from Date into a second DateTime type, mark both Date and DateTime as closed::{...}, and make EventTime be one_of: [Date, DateTime], but this is mildly annoying because I have to remember to keep the field definitions of Date and DateTime in sync.

I would describe the use case generally as one of these two:

  1. I have a base class, and a derived class which includes all of the fields of the base class and adds its own fields.
  2. I have a group of optional fields that either must all be present and valid, or none of them can be present.

Is there an intended way to describe this in ISL?

@mtcalvin mtcalvin added the question Further information is requested label Dec 21, 2023
@popematt
Copy link
Contributor

I have a group of optional fields that either must all be present and valid, or none of them can be present.

Here is a cookbook describing how to express logical relationships between fields.

I have a base class, and a derived class which includes all of the fields of the base class and adds its own fields.

We still need to provide some documentation about this, but at a high level, there are two approaches that I can think of.

The first is to define all possible fields of all child types in the base type. Combining this with expressing logical relationships, we have an approach where we model all of the constraints for the fields in one base type, and then the derived types can specify either that a field doesn't appear (using nothing) or that it must appear (using occurs: required). Here's what it looks like, applied to your example (using ISL 2.0, but translate-able to ISL 1.0).

$ion_schema_2_0
type::{
  name: DateTimeBase,
  type: struct,
  fields: closed::{
    year: { type: int, occurs: 1, range: [0, max] },
    month: { type: int, occurs: 1, range: [1, 12] },
    day: { type: int, occurs: 1, range: [1, 31] },
    hour: { type: int, range: [0, 23] },
    minute: { type: int, range: [0, 59] },
    second: { type: int, range: [0, 59] }
  }
}
type::{
  name: Date,
  type: DateTimeBase,
  fields: {
    hour: nothing,
    minute: nothing,
    second: nothing
  }
}
type::{
  name: DateTime,
  type: DateTimeBase,
  fields: {
    hour: { occurs: required },
    minute: { occurs: required },
    second: { occurs: required }
  }
}

type::{
  name: EventTime,
  one_of:[Date,DateTime]
}

The second approach is to define a base type with only the common fields. Derived types then name the fields that they inherit from the base type, and include definitions for their own types. Here's what it looks like, applied to your example (using ISL 2.0, but translate-able to ISL 1.0).

$ion_schema_2_0
type::{
  name: DateTimeBase,
  type: struct,
  fields: {
    year: { type: int, occurs: 1, range: [0, max] },
    month: { type: int, occurs: 1, range: [1, 12] },
    day: { type: int, occurs: 1, range: [1, 31] },
  }
}
type::{
  name: Date,
  type: DateTimeBase,
  fields: closed::{
    year: any,
    month: any,
    day: any,
  }
}
type::{
  name: DateTime,
  type: DateTimeBase,
  fields: closed::{
    year: any,
    month: any,
    day: any,
    hour: { type: int, occurs: 1, range: [0, 23] },
    minute: { type: int, occurs: 1, range: [0, 59] },
    second: { type: int, occurs: 1, range: [0, 59] }
  }
}

type::{
  name: EventTime,
  one_of:[Date,DateTime]
}

The real difference between these two is whether the base type has closed fields or the child types close their fields.

@popematt
Copy link
Contributor

Both of these are a little verbose. We do have an open issue to improve this amazon-ion/ion-schema#113.

If the proposed solution here is adopted, then it would become much simpler. You can define open base types, and then compose them into closed child types.

$ion_schema_2_1
type::{
  name: DateBase,
  type: struct,
  fields: {
    year: { type: int, occurs: 1, range: [0, max] },
    month: { type: int, occurs: 1, range: [1, 12] },
    day: { type: int, occurs: 1, range: [1, 31] },
  }
}

type::{
  name: TimeBase,
  type: DateBase,
  fields: {
    hour: { type: int, occurs: 1, range: [0, 23] },
    minute: { type: int, occurs: 1, range: [0, 59] },
    second: { type: int, occurs: 1, range: [0, 59] }
  },
}

type::{
  name: DateTime,
  all_of: [DateBase, TimeBase],
  // closes the fields to only those fields named in DateBase plus the fields defined in TimeBase
  field_names: fields_of::[DateBase, TimeBase]
}

type::{
  name: Date,
  type: DateBase,
  // closes the fields to only those fields named in DateTimeBase
  fields_names: fields_of::[DateBase],
}

type::{
  name: EventTime,
  one_of:[Date,DateTime]
}

Feel free to leave feedback over on amazon-ion/ion-schema#113 if you have any opinions about it.

@mtcalvin
Copy link
Author

Ah, yes, amazon-ion/ion-schema#113 looks promising, I guess I was just searching the wrong repo so I didn't find it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants