Adding the `property_ranges` query parameter and associated metadata #481

JPBergsma · 2023-06-29T15:01:19Z

This PR describes how a client can request a specific part of a property that is returned as partial data.
I have tried to keep things as much as possible the same as in the original ranged properties proposal #452.

optimade.rst

rartino · 2023-06-30T14:41:27Z

Thanks for writing this up!

Why did you move the whole section "Transmission of large property values"? It makes it tricky to see what has changed. Can you move it back?

Also, my interpretation of the workshop discussions was to make the property_ranges query parameter completely separate from, and orthogonal to, the mechanism for "Transmission of large property values" (this was a strong point of @sauliusg). Hence, it isn't clear to me why anything has to change in the "Transmission of large property values". Why can't we keep all documentation about this feature in the definition of the property_ranges query parameter?

JPBergsma · 2023-06-30T14:49:47Z

It seemed more logical to place the metadata section before the "Transmission of large property values" section, as the "Transmission of large property values" section depends on the metadata section but not the other way round. So it is more useful to read the metadata section before the "Transmission of large property values" section.

JPBergsma · 2023-06-30T14:57:18Z

You are right, I could still separate the property ranges query parameter and the extra metadata fields from the transmission of large data section, as each could still be used separate from the other. I'll try to do this next week.

optimade.rst

…efinement text

Make sure each sentence starts on a new line. Co-authored-by: Antanas Vaitkus <[email protected]>

ml-evs

Mostly formatting changes and suggesting rewordings -- I think I can go through and force most of them in if there are no objections

optimade.rst

rartino · 2024-01-09T14:30:31Z

This PR appears a bit stalled and is important. @JPBergsma do you plan to continue working on this, or are you ok with me (or someone else) starting to merge changes into it?

vaitkus

Looks good overall, just a few minor typos.

optimade.rst

rartino · 2024-06-14T12:27:34Z

On the topic of separators: The business with url-safe characters is complicated. I note that according to RFC3986 both bracket parentheses and colons are in the same category, "gen-delims", which I think is generally worse in terms of having to be escaped in practice compared to the characters in "sub-delims" (and in particular the unreserved characters listed right below the linked segment, which are always safe). However, very particular for bracket parentheses, apparently chrome and Firefox had decided to violate the standard and generally do not encode those characters out of 'tradition'.

merkys

Looks good, I left only some minor comments.

optimade.rst

merkys · 2024-06-14T21:43:42Z

optimade.rst

@@ -602,6 +605,162 @@ Example of the corresponding metadata property definition contained in the field
     }
     // ...

+Slices of array properties


I would suggest using "list" instead of "array" everywhere, as in the specification "list" is defined as data type, whereas "array" is used much less often and mostly as a synonym of "list".

You are right, good catch. Array should almost everywhere be replaced by list. Array should only be kept when we explicitly refer to JSON:API output arrays.

After thinking about this, I don't find a good way to handle this without defining the term 'array' to be a general word for a list or a structure of nested lists. (IMO "Multidimensional lists" is far worse as a term.) So, I've pushed a commit that does this.

(Note: 'resolve' this conversation if you are happy with the fix, otherwise comment below).

I'll leave it up to somebody more experienced in the protocol to resolve this.

optimade.rst

Co-authored-by: Andrius Merkys <[email protected]> Co-authored-by: Antanas Vaitkus <[email protected]>

…ates with different format than the structure one

optimade.rst

Co-authored-by: Antanas Vaitkus <[email protected]>

giovannipizzi · 2024-07-04T07:18:03Z

optimade.rst

+  The start value specifies the first index in that dimension for which values should be returned (which is 0-based and inclusive).
+  The default is :val:`0`.
+  The stop value specifies the last index for which values should be returned (inclusive).
+  The default is the last index of the array along the specified dimension.


Suggested change

The default is the last index of the array along the specified dimension.

The default is :val:`null`, which represents the last index of the array along the specified dimension.

It's actually already explaiend that :val:null refers to the default value, and that it should be returned unless a specified value was used.
From how I understood it, the same applies to the start and step keys too.

Co-authored-by: Antanas Vaitkus <[email protected]>

giovannipizzi · 2024-07-04T07:24:40Z

In view of Monday's online meeting where we need to discuss this, it would be great if we can get a last check and, if all is OK, approvals.

The only remaining things are:

a comment between @rartino and @merkys but it seems to me that the current state is probably OK?
the discussion on the syntax dim_xxx:start:stop:step vs dim_xxx[start:stop:step] or similar. I still think the second is easier to read, but I'm OK to keep the currently suggested version only with : if nobody else feels strongly about it

ndaelman-hu

Hey! In anticipation of the upcoming meeting, here is the promised review.

Mostly just questions for clarification, or me pointing out where the phrasing may be somewhat dense for those less familiar with OPTIMADE.

From the NOMAD perspective, my main question would be "whether array dimensions MAY also just be integers (served as strings)?"
We are also working on a system for adding (named) variables, so some dimensions may be dynamic. I don't think this conflicts with the specificatiosn per se, but I'd have to consult my coworkers about the implementation.

optimade.rst

ndaelman-hu · 2024-08-28T14:44:56Z

optimade.rst

+Slices are used for a client to ask the server to only provide a subset of items of an array, which can result in a small or large set of items.
+In contrast, the protocol for large property values is used by the server implementation to transmit a set of items that it deems too large to provide inside the normal OPTIMADE response.


I gather that "small" and "large" have a techncial meaning here (wrt to the transmission).
Do you have any kind of annotation trick to makes this clearer?

Suggested change

Slices are used for a client to ask the server to only provide a subset of items of an array, which can result in a small or large set of items.

In contrast, the protocol for large property values is used by the server implementation to transmit a set of items that it deems too large to provide inside the normal OPTIMADE response.

The protocol for large property values is used by the server implementation to transmit a set of items that it deems too large to provide inside the normal OPTIMADE response.

Slices, on the other hand, are used for a client to request a subset of any size of the items of an array, which can possibly (but not necessarily) result in such a large amount of values that the protocol for large property values is required to transmit them.

I like this better, yes. Given that only large has a meaning, it's better to remove small, as you did.
2 minor suggestions:

The protocol for large property values: can this refer to the relevant section?

If we decide on using some kind of highlighting (as brought up above), I'd mark server and client. I think it's a good short-hand for navigating the protocol.

The protocol for large property values: can this refer to the relevant section?

I realize that the way these comments chop up the text can make it a bit difficult to see the context, but the link you ask for is already right on the row before this line (numbered 616 right now).

ndaelman-hu · 2024-08-28T14:48:05Z

optimade.rst

+In contrast, the protocol for large property values is used by the server implementation to transmit a set of items that it deems too large to provide inside the normal OPTIMADE response.
+
+The main mechanism is provided through the query parameter :query-param:`property_slices` defined in section `Single Entry URL Query Parameters`_.
+Information relating to the ability of the server to handle this query parameter and the relevant ranges of indexes is provided using metadata property field :field:`array_axes` (see `Metadata properties`_).


the relevant ranges of indexes

Above you used the term "array axis". I'd use the same term here, as with "index" idk whether you mean the dimension or a slice along a dimension.

I'm not sure what you mean here, can you use the GitHub suggest edit feature to show a suggestion? (Also, the discussion of terminology with regards to indices and axes may have been clarified by a discussion we had last web meet?)

(Also, the discussion of terminology with regards to indices and axes may have been clarified by a discussion we had last web meet?)

I'd need a refresher there, sorry.

Suggested change

Information relating to the ability of the server to handle this query parameter and the relevant ranges of indexes is provided using metadata property field :field:`array_axes` (see `Metadata properties`_).

Information relating to the ability of the server to handle this query parameter. This SHOULD include the property field :field:`array_axes` (see `Metadata properties`_), listing the numerical indices of axes that may be sliced.

I'm confused, this doesn't read cleanly to me in the context it appears. The first sentence in the suggestion isn't even a complete sentence?

The context here is that we are in the general text at the top of the section trying to outline what this feature is and how it is used. The sentence right now is just trying to communicate that the metadata property array_axis is one of the essential mechanisms of the protocol.

ndaelman-hu · 2024-08-28T15:32:27Z

optimade.rst

+  The step value specifies the step size in that dimension.
+  The default is :val:`1`.
+
+  An empty value of the :query-param:`property_slices` query parameter MUST be interpreted as equivalent to the query parameter not being included in the request.


Is this then equivalent to property_ranges=[<first_dim>:::, ...]?
I.e. should this return the full property or leave it out?

This might be defined somewhere else. If so, pls refer to that section.

The query parameter not being included means the client isn't trying to use this protocol; so I think the natural interpretation is that "everything works as usual without the slices protocol". I guess that could be clarified somewhere, but I'm not sure exactly where.

ndaelman-hu · 2024-08-28T15:40:57Z

optimade.rst

+
+  - :query-url:`http://optimade.example.com/v1/trajectories/id_12345?response_fields=frame_cartesian_site_positions&property_ranges=dim_frames::999:10,dim_sites:30:70:`
+
+    This query URL requests items from the array :field:`frame_cartesian_site_positions` only for the 31st to 71st sites (i.e., with indexes 30 through 70 inclusive) for 1 out of every 10 frames of the first 1000 frames (i.e., taking steps of 10 over indexes 0 through 999 inclusive, which requests the frames with indexes 0, 10, 20, 30, ..., 990) of a trajectory with ID :val:`id_12345`.


Minor style changes to make the phrase clearer:

Split the phrase up by both attributes.

Align the order of the example with the explanation, i.e. concerning dim_frames and dim_sites.

Just reiterate the 0-indexing convention after "the 31st to 71st sites". Apart from that, it's a great refresher.

the frames with indexes 0, 10, 20, 30, ..., 990 explains it enough. The paranteses else become too verbose.

optimade.rst

ndaelman-hu · 2024-08-28T15:44:34Z

optimade.rst

@@ -1211,6 +1375,31 @@ While the following URL query parameters are OPTIONAL for clients, API implement
 The URL query parameter :query-param:`include` is OPTIONAL for both clients and API implementations.
 The meaning of these URL query parameters are as defined above in section `Entry Listing URL Query Parameters`_.

+One additional query parameter :query-param:`property_slices` MUST be handled by the API implementation either as defined below or by returning the error :http-error:`501 Not Implemented`:
+
+- **property\_slices**: A number of slice specifications to request only parts of array properties for the functionality described in `Slices of array properties`_.


May the client request multiple (different) slices from the same array axis in the same query?
E.g. property_slices=[dim_frames:3:37:5,dim_frames:49:105:2]

This is clarified in "REQUIRED keys". A link suffices then.

In the present version (which may have changed since you commented) I think this information is quite clear in the text right below.

ndaelman-hu · 2024-08-28T15:47:26Z

optimade.rst

+    For example, let us consider the property :property:`frame_cartesian_site_positions` of the trajectory entry, where the first dimension name is :val:`dim_frames`.
+    If there is another one-dimensional (i.e., with a single axis) array property :property:`_exmpl_energy` of the same trajectory entry that specifies in its property definition the same dimension name :val:`dim_frames` for its axis, then the values of :property:`_exmpl_energy` and of :property:`frame_cartesian_site_positions` at index *i* pertain to the same frame.


Her, you lost me

giovannipizzi · 2024-09-06T15:47:10Z

As a follow-up on my comment, the decision in today's online meeting is to go for the syntax dim_xxx[start:stop:step]

ndaelman-hu · 2024-09-25T17:21:57Z

Coming back to last meeting's discussion about the slicing, I'd like to explain my question further via an example.

In NOMAD, we have a tree-like schema with data at the nodes. That data may have any tensor rank, i.e. scalar, vector, matrix, etc. and slicing it is very sensible. However, our schema is being extended to be more dynamic and easier tailor.

Consequentially, we foresee that rank may vary. Consider for example a property, like forces, sliceable along the constituent atoms or spatial axes (e.g. x, y, z). Any entry can extend that definition's rank by adding independent variables, e.g. forces(time). A client could now, in principle, also slice along the time dimension. The table below lists various examples of independent variables and the full rank:

dependent variable | rank dependent variable | independent variables | rank independent variables | full rank
--|--|--|--|--
forces | 2 | time | 1 | 3
forces | 2 | temperature | 1 | 3
forces | 2 | electric field | 2 | 4
forces | 2 | strain | 3 | 5

At a database-wide level, we can only ensure a minimum set of dimensionalities for forces, i.e. the force vector over each atom. Additional, entry-specific dimensions could at best only be returned after a query has filtered down the data. Does this limitation conflict with "query parameter :query-param:property_slices for metadata"?

Lastly, while I named the dimensions here, most of them only bear integers in practice (at least the dependent ones). Would integers (represented as strings) also be fine to identify dimensions, or is that semantically too vague?

…:step]

rartino · 2024-10-18T10:50:20Z

optimade.rst

+  - :field:`requested_slice`: Dictionary.
+    A field that describes the requested slice that was provided via the query parameter :query-param:`property_slices`.
+    The subfields MUST reflect the values provided via the :query-param:`property_slices`.
+    The implementation MUST preserve the values as given in the query parameter, including the distinction between specific values and default values even when they are equivalent.


Suggested change

The implementation MUST preserve the values as given in the query parameter, including the distinction between specific values and default values even when they are equivalent.

The implementation MUST preserve the values as given in the query parameter, including the distinction between specific values and default values even when they are equivalent (see example below).

optimade.rst

rartino · 2024-10-18T12:56:15Z

optimade.rst

+  - :query-url:`http://optimade.example.com/v1/trajectories/id_12345?response_fields=frame_cartesian_site_positions&property_ranges=dim_frames::999:10,dim_sites:30:70:`
+
+    This query URL requests items from the array :field:`frame_cartesian_site_positions` only for the 31st to 71st sites (i.e., with indexes 30 through 70 inclusive) for 1 out of every 10 frames of the first 1000 frames (i.e., taking steps of 10 over indexes 0 through 999 inclusive, which requests the frames with indexes 0, 10, 20, 30, ..., 990) of a trajectory with ID :val:`id_12345`.


I think this addresses @ndaelman-hu suggestions, except for removing the explanation in the last parenthesis, which from our discussions on the workshop I do think still is necessary to be completely clear.

Suggested change

- :query-url:`http://optimade.example.com/v1/trajectories/id_12345?response_fields=frame_cartesian_site_positions&property_ranges=dim_frames::999:10,dim_sites:30:70:`

This query URL requests items from the array :field:`frame_cartesian_site_positions` only for the 31st to 71st sites (i.e., with indexes 30 through 70 inclusive) for 1 out of every 10 frames of the first 1000 frames (i.e., taking steps of 10 over indexes 0 through 999 inclusive, which requests the frames with indexes 0, 10, 20, 30, ..., 990) of a trajectory with ID :val:`id_12345`.

- :query-url:`http://optimade.example.com/v1/trajectories/id_12345?response_fields=frame_cartesian_site_positions&property_ranges=dim_sites:30:70:,dim_frames::999:10`

This query URL requests items from the trajectory with ID :val:`id_12345`.

It requests items from the array :field:`frame_cartesian_site_positions` for this trajectory.

The items that are requested are for only the 31st to 71st sites (i.e., with indexes 30 through 70 inclusive) for 1 out of every 10 frames of the first 1000 frames (i.e., taking steps of 10 over indexes 0 through 999 inclusive, which requests the frames with indexes 0, 10, 20, 30, ..., 990).

rartino · 2024-10-18T13:14:48Z

optimade.rst

+    Dimension names defined by database or definition providers MUST be prefixed by the corresponding database or namespace prefix, and SHOULD also be prefixed by ``dim_``, e.g., ``_exmpl_dim_particles``.
+    If, within one entry, two or more array axes in one or more properties share the same dimension :field:`name`, those represent the same dimension.
+    For example, let us consider the property :property:`frame_cartesian_site_positions` of the trajectory entry, where the first dimension name is :val:`dim_frames`.
+    If there is another one-dimensional (i.e., with a single axis) array property :property:`_exmpl_energy` of the same trajectory entry that specifies in its property definition the same dimension name :val:`dim_frames` for its axis, then the values of :property:`_exmpl_energy` and of :property:`frame_cartesian_site_positions` at index *i* pertain to the same frame.


@ndaelman-hu is this more clear?:

Suggested change

If there is another one-dimensional (i.e., with a single axis) array property :property:`_exmpl_energy` of the same trajectory entry that specifies in its property definition the same dimension name :val:`dim_frames` for its axis, then the values of :property:`_exmpl_energy` and of :property:`frame_cartesian_site_positions` at index *i* pertain to the same frame.

Let the trajectory entry in this example have another, one-dimensional, array property :property:`_exmpl_energy`, which in its property definition specifies *the same name*, :val:`dim_frames`, as the name of the axis corresponding to its single dimension.

The joint dimension name means the values of :property:`_exmpl_energy` and of :property:`frame_cartesian_site_positions` at index *i* pertain to the same frame.

If slicing is used to request only parts of the data along the :val:`dim_frames` dimension, that is a request to slice both the properties according to the specified slice.

rartino · 2024-10-18T13:45:57Z

I have now updated the PR to reflect the syntax dim_xxx[start:stop:step].

I have also tried to somehow address all outstanding comments of @ndaelman-hu. It would be great if you can:

"thumb up" any edit suggestions that matches your intent
counter-suggest edits if my suggested edit do not match your intent.
press 'resolve' on any conversations where I replied without an edit where you think the matter is resolved.
reply in any conversation where you think the matter is not resolved.

Finally, as a reply to this:

At a database-wide level, we can only ensure a minimum set of dimensionalities for forces, i.e. the force vector over each atom. Additional, entry-specific dimensions could at best only be returned after a query has filtered down the data. Does this limitation conflict with "query parameter :query-param:property_slices for metadata"?

Just to make sure I understand: do you mean that according to the new schemas of NOMAD:

There exist a concept "force".
The NOMAD schemas say "a force is an array of at least 3xN spatial dimensions, e.g., force = [3.2 , 4.2, 6.4], [6.2, 3.6, 6.2] for a two-atom system (in units of, e.g., E_h/a0, which you also specify somehow).
However, at any time when I (as a client) get a force from NOMAD, it may instead be an array of dimension, e.g., 3 x N x 2064, or even 3 x N x 2064 x 160, or 3 x N x 160 x 2064, where the force is resolved along a new 'dimension' such as time and (just to add something) trajectory_set_index (to index 160 trajectory sets).
Presumably, I can get information about what these dimensions are and which order they come, however, NOMAD do not already assign these different options different names, they are all 'force'?

Because then the answer is that you cannot today represent this kind of freedom as a single OPTIMADE property definition. A single property has to have a fixed dimensionality. However, if you are OK with dynamically assigning alternative names to the different options, you can translate your data representation to that of OPTIMADE.

Lets for the moment disregard that if your example with forces vs time is actually the atomic forces in a trajectory, we will already have a specific standard field for that; lets instead just discuss these as if they were completely custom NOMAD properties.

What you would do is provide a set of force property definitions:

If your force is a 3 x N array, then you define _nomad_force defined via an OPTIMADE property definition to be a two-dimensional array with the dimension names dim_spatial and dim_sites.
If your force is a 3 x N x 2064 array with forces for 2064 times, you'd call it force_per_time and define it via an OPTIMADE property definition to be a three-dimensional array with the dimension names dim_spatial, dim_sites, and _nomad_dim_time.
and so on...

Lastly, while I named the dimensions here, most of them only bear integers in practice (at least the dependent ones). Would integers (represented as strings) also be fine to identify dimensions, or is that semantically too vague?

In the OPTIMADE property definitions, a dimension name is a string. There is nothing preventing you from generating OPTIMADE dimension names from your numbers, e.g., _nomad_dim_force_3 as the dimension name of the third axis of your force property.

ndaelman-hu · 2024-10-18T13:58:44Z

I have now updated the PR to reflect the syntax dim_xxx[start:stop:step].

I have also tried to somehow address all outstanding comments of @ndaelman-hu. It would be great if you can:
* "thumb up" any edit suggestions that matches your intent

* counter-suggest edits if my suggested edit do not match your intent.

* press 'resolve' on any conversations where I replied without an edit where you think the matter is resolved.

* reply in any conversation where you think the matter is not resolved.
Finally, as a reply to this:

At a database-wide level, we can only ensure a minimum set of dimensionalities for forces, i.e. the force vector over each atom. Additional, entry-specific dimensions could at best only be returned after a query has filtered down the data. Does this limitation conflict with "query parameter :query-param:property_slices for metadata"?

Just to make sure I understand: do you mean that according to the new schemas of NOMAD:
* There exist a concept "force".

* The NOMAD schemas say "a force is an array of at least 3xN spatial dimensions, e.g., force = [3.2 , 4.2, 6.4], [6.2, 3.6, 6.2] for a two-atom system (in units of, e.g., E_h/a0, which you also specify somehow).

* However, at any time when I (as a client) get a force from NOMAD, it may instead be an array of dimension, e.g., 3 x N x 2064, or even 3 x N x 2064 x 160, or 3 x N x 160 x 2064, where the force is resolved along a new 'dimension' such as time and (just to add something) trajectory_set_index (to index 160 trajectory sets).

Thank you for clarifying @rartino ! Indeed, you understood the example correctly.

* Presumably, I can get information about what these dimensions are and which order they come, however, NOMAD do not already assign these different options different names, they are all 'force'?

While NOMAD natively provides aggregation queries, those are more so for overall statistics.
In some cases, the dimensions are stored as individual properties that may be returned. This isn't applicable across the board, however.

Because then the answer is that you cannot today represent this kind of freedom as a single OPTIMADE property definition. A single property has to have a fixed dimensionality. However, if you are OK with dynamically assigning alternative names to the different options, you can translate your data representation to that of OPTIMADE.

Lets for the moment disregard that if your example with forces vs time is actually the atomic forces in a trajectory, we will already have a specific standard field for that; lets instead just discuss these as if they were completely custom NOMAD properties.

What you would do is provide a set of force property definitions:
* If your `force` is a 3 x N array, then you define `_nomad_force` defined via an OPTIMADE property definition to be a two-dimensional array with the dimension names `dim_spatial` and `dim_sites`.

* If your `force` is a 3 x N x 2064 array with forces for 2064 times, you'd call it `force_per_time` and define it via an OPTIMADE property definition to be a three-dimensional array with the dimension names `dim_spatial`, `dim_sites`, and `_nomad_dim_time`.

* and so on...

This dynamic dimensionality has been requested in several cases. Atm, we have a preliminary implementation (in our new, revised schema). The final form isn't set yet, however. If the dimensionality remains dynamic, we'll make sure to project out the varieties as you suggested here.

Lastly, while I named the dimensions here, most of them only bear integers in practice (at least the dependent ones). Would integers (represented as strings) also be fine to identify dimensions, or is that semantically too vague?

In the OPTIMADE property definitions, a dimension name is a string. There is nothing preventing you from generating OPTIMADE dimension names from your numbers, e.g., _nomad_dim_force_3 as the dimension name of the third axis of your force property.

Yes, I just wanted to know whether this was appropriate. Thx for confirming.

sauliusg · 2024-10-18T14:11:30Z

optimade.rst

- **dictionary**: an associative array of **keys** and **values**, where **keys** are pre-determined strings, i.e., for the same entry property, the **keys** remain the same among different entries whereas the **values** change.
+  Multidimensional collections of items are represented as nested lists.
+  The specification uses **array** as a more general term for structures of nested lists representing single or multidimensional data, and the term **array axes** for the levels of nesting.
+  Note that arrays are represented using lists and not as a separate data type.


In JSON? If we move to e.g. HDF5 then there the arrays will be a real distinct data type.

This part discusses the data model internal to OPTIMADE itself, where we (for now) only formally need list in lists (but we say here ~"lets use the term 'arrays' for lists in lists"). How this data model is mapped onto JSON types is first described in section 4.2, where indeed lists -> JSON lists. A future section discussing HDF5, or for that matter XML, parquet, etc., would define how these types are mapped into native data types. It would be valid for HDF5 to say that lists in lists (in lists...) should be mapped to the HDF5 array format.

If you ask "why don't we just adopt arrays as a fundamental data type and say that in JSON arrays map to lists in lists", I think that would be a valid change (but not in this PR).

sauliusg · 2024-10-18T14:13:35Z

optimade.rst

@@ -218,7 +218,10 @@ representation in all contexts. They are as follows:
 - Basic types: **string**, **integer**, **float**, **boolean**, **timestamp**.
 - **list**: an ordered collection of items, where all items are of the same type, unless they are unknown.
  A list can be empty, i.e., contain no items.
- **dictionary**: an associative array of **keys** and **values**, where **keys** are pre-determined strings, i.e., for the same entry property, the **keys** remain the same among different entries whereas the **values** change.
+  Multidimensional collections of items are represented as nested lists.
+  The specification uses **array** as a more general term for structures of nested lists representing single or multidimensional data, and the term **array axes** for the levels of nesting.


Would it make sense to require that a multidimensional array, as opposed to just a list of lists, MUST always have the same size for all sublists at the same dimension? I.e. a matrix (2D array) can be square or rectangular, but can not be triangle or arbitrarily ragged?

Right; we deliberately wanted to allow raggedness in the design when referencing items using the property_ranges protocol.

I guess I should take your comment to mean: "should we reserve the term Array for non-ragged multidimensional data?" That isn't a bad idea, but, what is then a good term for lists-in-lists-in-lists that can be ragged? Because this PR got very tricky to formulate without a specific term for that (which now is "Array"...)

Co-authored-by: ndaelman-hu <[email protected]>

JPBergsma marked this pull request as ready for review June 29, 2023 16:32

JPBergsma requested review from rartino, vaitkus, gmrigna, sauliusg and giovannipizzi June 29, 2023 16:32

blokhin changed the title ~~Property_Ranges Querry Parameter~~ Property_Ranges Query Parameter Jun 29, 2023

vaitkus reviewed Jun 29, 2023

View reviewed changes

optimade.rst Outdated Show resolved Hide resolved

rartino mentioned this pull request Jun 30, 2023

The road to trajectories #469

Open

5 tasks

JPBergsma commented Aug 21, 2023

View reviewed changes

optimade.rst Outdated Show resolved Hide resolved

JPBergsma mentioned this pull request Oct 12, 2023

JPBergsma/partial data Materials-Consortia/optimade-python-tools#1812

Draft

JPBergsma and others added 10 commits December 19, 2023 14:07

Intermediate progress writing property ranges parameter.

3b9cbc5

Added metadata fields to be able to use property ranges query parameter.

eae1679

Small corrections.

f00b52d

moved property ranges to single_entryendpoint query param + further r…

2a0bef8

…efinement text

Some small changes after proof reading.

4a00af4

added that fields should be placed in the range dictionary.

6905d7f

Some more sentences that had to be moved to a new line.

d579c6e

Update optimade.rst

6792962

Make sure each sentence starts on a new line. Co-authored-by: Antanas Vaitkus <[email protected]>

removed unintentional changes in appendix.

93ecd0b

Seperated property_ranges query parameter from the partial data.

a84c72e

ml-evs force-pushed the JPBergsma/property_ranges branch from d2a18b7 to a84c72e Compare December 19, 2023 14:08

ml-evs added the blocking-release This is a PR or issue that presently blocks the release of next version of the spec. label Dec 19, 2023

ml-evs requested changes Dec 19, 2023

View reviewed changes

Merge branch 'develop' into JPBergsma/property_ranges

dc11574

ml-evs changed the title ~~Property_Ranges Query Parameter~~ Adding the property_ranges query parameter and associated metadata Dec 19, 2023

vaitkus reviewed Jun 14, 2024

View reviewed changes

merkys requested changes Jun 14, 2024

View reviewed changes

rartino reviewed Jun 15, 2024

View reviewed changes

optimade.rst Outdated Show resolved Hide resolved

rartino and others added 4 commits June 15, 2024 15:30

Apply suggestions from review

7e29652

Co-authored-by: Andrius Merkys <[email protected]> Co-authored-by: Antanas Vaitkus <[email protected]>

Define the term array for OPTMADE data types

085030d

Fix examples for slicing assuming a trajectory cartesian_site_coordin…

b3a96a0

…ates with different format than the structure one

Adjust confusing phrasing about slicing based on review comment

6231d2a

vaitkus reviewed Jun 21, 2024

View reviewed changes

optimade.rst Outdated Show resolved Hide resolved

optimade.rst Outdated Show resolved Hide resolved

optimade.rst Outdated Show resolved Hide resolved

optimade.rst Outdated Show resolved Hide resolved

Apply suggestions from code review

9f639d1

Co-authored-by: Antanas Vaitkus <[email protected]>

giovannipizzi reviewed Jul 4, 2024

View reviewed changes

Apply suggestions from code review

46eafb5

Co-authored-by: Antanas Vaitkus <[email protected]>

giovannipizzi requested review from merkys, rartino and vaitkus July 4, 2024 07:22

ndaelman-hu reviewed Aug 28, 2024

View reviewed changes

rartino added 2 commits October 18, 2024 11:11

Change the format of property_slices to the syntax dim_xxx[start:stop…

657a696

…:step]

Fix in formulation for property_slices format dim_xxx[start:stop:step]

919b83d

rartino reviewed Oct 18, 2024

View reviewed changes

optimade.rst Outdated Show resolved Hide resolved

rartino reviewed Oct 18, 2024

View reviewed changes

sauliusg reviewed Oct 18, 2024

View reviewed changes

Apply suggestions from review

a623390

Co-authored-by: ndaelman-hu <[email protected]>

	The default is the last index of the array along the specified dimension.
	The default is :val:`null`, which represents the last index of the array along the specified dimension.

		Slices are used for a client to ask the server to only provide a subset of items of an array, which can result in a small or large set of items.
		In contrast, the protocol for large property values is used by the server implementation to transmit a set of items that it deems too large to provide inside the normal OPTIMADE response.

	Information relating to the ability of the server to handle this query parameter and the relevant ranges of indexes is provided using metadata property field :field:`array_axes` (see `Metadata properties`_).
	Information relating to the ability of the server to handle this query parameter. This SHOULD include the property field :field:`array_axes` (see `Metadata properties`_), listing the numerical indices of axes that may be sliced.


		- :query-url:`http://optimade.example.com/v1/trajectories/id_12345?response_fields=frame_cartesian_site_positions&property_ranges=dim_frames::999:10,dim_sites:30:70:`

		This query URL requests items from the array :field:`frame_cartesian_site_positions` only for the 31st to 71st sites (i.e., with indexes 30 through 70 inclusive) for 1 out of every 10 frames of the first 1000 frames (i.e., taking steps of 10 over indexes 0 through 999 inclusive, which requests the frames with indexes 0, 10, 20, 30, ..., 990) of a trajectory with ID :val:`id_12345`.

		For example, let us consider the property :property:`frame_cartesian_site_positions` of the trajectory entry, where the first dimension name is :val:`dim_frames`.
		If there is another one-dimensional (i.e., with a single axis) array property :property:`_exmpl_energy` of the same trajectory entry that specifies in its property definition the same dimension name :val:`dim_frames` for its axis, then the values of :property:`_exmpl_energy` and of :property:`frame_cartesian_site_positions` at index i pertain to the same frame.

	The implementation MUST preserve the values as given in the query parameter, including the distinction between specific values and default values even when they are equivalent.
	The implementation MUST preserve the values as given in the query parameter, including the distinction between specific values and default values even when they are equivalent (see example below).

-  - :query-url:`http://optimade.example.com/v1/trajectories/id_12345?response_fields=frame_cartesian_site_positions&property_ranges=dim_frames::999:10,dim_sites:30:70:`
-    This query URL requests items from the array :field:`frame_cartesian_site_positions` only for the 31st to 71st sites (i.e., with indexes 30 through 70 inclusive) for 1 out of every 10 frames of the first 1000 frames (i.e., taking steps of 10 over indexes 0 through 999 inclusive, which requests the frames with indexes 0, 10, 20, 30, ..., 990) of a trajectory with ID :val:`id_12345`.
+  - :query-url:`http://optimade.example.com/v1/trajectories/id_12345?response_fields=frame_cartesian_site_positions&property_ranges=dim_sites:30:70:,dim_frames::999:10`
+    This query URL requests items from the trajectory with ID :val:`id_12345`.
+    It requests items from the array :field:`frame_cartesian_site_positions` for this trajectory.
+    The items that are requested are for only the 31st to 71st sites (i.e., with indexes 30 through 70 inclusive) for 1 out of every 10 frames of the first 1000 frames (i.e., taking steps of 10 over indexes 0 through 999 inclusive, which requests the frames with indexes 0, 10, 20, 30, ..., 990).

-    If there is another one-dimensional (i.e., with a single axis) array property :property:`_exmpl_energy` of the same trajectory entry that specifies in its property definition the same dimension name :val:`dim_frames` for its axis, then the values of :property:`_exmpl_energy` and of :property:`frame_cartesian_site_positions` at index *i* pertain to the same frame.
+    Let the trajectory entry in this example have another, one-dimensional, array property :property:`_exmpl_energy`, which in its property definition specifies *the same name*, :val:`dim_frames`, as the name of the axis corresponding to its single dimension.
+    The joint dimension name means the values of :property:`_exmpl_energy` and of :property:`frame_cartesian_site_positions` at index *i* pertain to the same frame.
+    If slicing is used to request only parts of the data along the :val:`dim_frames` dimension, that is a request to slice both the properties according to the specified slice.

Adding the property_ranges query parameter and associated metadata #481

Are you sure you want to change the base?

Adding the property_ranges query parameter and associated metadata #481

Conversation

JPBergsma commented Jun 29, 2023 • edited Loading

rartino commented Jun 30, 2023

JPBergsma commented Jun 30, 2023

JPBergsma commented Jun 30, 2023

ml-evs left a comment

Choose a reason for hiding this comment

rartino commented Jan 9, 2024

vaitkus left a comment

Choose a reason for hiding this comment

rartino commented Jun 14, 2024

merkys left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rartino Jun 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giovannipizzi commented Jul 4, 2024

ndaelman-hu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rartino Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndaelman-hu Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giovannipizzi commented Sep 6, 2024

ndaelman-hu commented Sep 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rartino commented Oct 18, 2024

ndaelman-hu commented Oct 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Adding the `property_ranges` query parameter and associated metadata #481

Adding the `property_ranges` query parameter and associated metadata #481

JPBergsma commented Jun 29, 2023 •

edited

Loading

rartino Jun 15, 2024 •

edited

Loading

rartino Oct 18, 2024 •

edited

Loading

ndaelman-hu Oct 18, 2024 •

edited

Loading