Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CQL2 - comparison with NULL values #963

Open
pmauduit opened this issue Nov 7, 2024 · 8 comments
Open

CQL2 - comparison with NULL values #963

pmauduit opened this issue Nov 7, 2024 · 8 comments
Labels

Comments

@pmauduit
Copy link

pmauduit commented Nov 7, 2024

I am in the process of updating the geotools CQL2 module to match with the current version of the specifications, see geoserver/geoserver-ogcapi#10 for the context.

Relying on the ATS - A.3.5. conformance Test 8 - Table 7, the following filter: "date"<>DATE('2022-04-16') run onto the ne_110m_populated_places_simple dataset is expected to return only 2 features. In the current state of the GeoTools code, it returns 242 features instead. Opening the dataset into QGis, we can see that there are 240 NULL values which are also returned when querying the dataset with the filter.

Shall we consider that a NULL date evaluates "different from '2022-04-16'" as the current GeoTools codebase does, or shall we exclude NULL values from the evaluation ?

@cportele
Copy link
Member

cportele commented Nov 7, 2024

From the standard:

A predicate is an expression that evaluates to the Boolean values of TRUE or FALSE or that evaluates to the value NULL when dealing with unknown values.

If a CQL2 filter expression evaluates to TRUE for an item, the item is included in the result set and is thus available for further processing such as presentation in a response document.

If a CQL2 filter expression overall evaluates to FALSE or NULL for an item, the item is not included in the result set and is thus not available for further processing.

If "date" is NULL the filter evaluates to NULL and the item is not in the result set. So, 2 is the correct result.

@aaime
Copy link
Contributor

aaime commented Nov 7, 2024

So the difference is that CQL2 logic is a 3 state logic, unlike the original CQL and OGC Filter, which was a 2 state logic (from OGC Filter, "The overall filter expression always evaluates to true or false"), and thus a system implementing both, needs to know where the filter was parsed from, and modify it accordingly?

@cportele
Copy link
Member

cportele commented Nov 7, 2024

As far as I know/remember, there is no difference in the intention between CQL2, Filter Encoding. It was just that "unknown" was somehow folded into "false" in Filter Encoding, but without discussing the handling of NULL/unknown explicitly. @pvretano?

The wording in CQL2 is the result of a discussion about this lack of clarity. See #841.

@aaime
Copy link
Contributor

aaime commented Nov 7, 2024

I see the discussion and appreciate it... but at the same time, I don't think Filter Encoding left room for doubt.

Quoting a full section of the introduction:

Each data instance in the source set is evaluated using the filter expression. The overall filter expression
always evaluates to true or false. If the expression evaluates to true, the data instance satisfies the expression
and is marked as being in the result set. If the overall filter expression evaluates to false, the data instance is
not in the result set

Quoting from section 9, comparison operators:

A comparison operator is used to form expressions that evaluate the mathematical
comparison between two arguments. If the arguments satisfy the comparison then the
expression evaluates to TRUE. Otherwise the expression evaluates to false.

Same goes for spatial operators:

A spatial operator (see Figure 6) shall determine whether its geometric arguments satisfy the stated spatial
relationship. The operator shall evaluate to true if the spatial relationship is satisfied. Otherwise, the operator
shall evaluate to false

@cportele
Copy link
Member

cportele commented Nov 7, 2024

NULL<>DATE('2022-04-16') does not evaluate to TRUE in Filter Encoding, because it is unknown, if the NULL value is that literal date. NULL does not indicate no value, it indicates an unknown or absent value. Hence the result of the predicate is FALSE in Filter Encoding, too. What is arguably missing in Filter Encoding is an explicit discussion of NULL. It was probably just assumed that the standard SQL behavior applies and everybody is familiar with it. That is at least my understanding. @pvretano will correct me, if I am wrong.

In any case, the logic in CQL2 is intentional.

@aaime
Copy link
Contributor

aaime commented Nov 7, 2024

@cportele that's interesting, can you point me where in the Filter Encoding spec I can find that the comparison filter should return false for that case?

In GeoTools we made some complex SQL encoding additions to force filter evaluation to a 2-state logic, if I can legitimately remove it, that would simplify things and I can point users back to this discussion (and would also allow 3-state logic to work out of the box at least against database sources).

@cportele
Copy link
Member

cportele commented Nov 7, 2024

@aaime - As I wrote before, I don't think Filter Encoding is clear how to evaluate NULL<>DATE('2022-04-16'). My conclusion is the result of the fact that it does not evaluate to TRUE, if the NULL semantics of SQL are used, and as a consequence, the result is FALSE.

@cportele
Copy link
Member

Meeting 2024-11-18: @pvretano will take a look and comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants