Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix set semantics bug #7

Merged
merged 6 commits into from
Jan 11, 2024
Merged

Fix set semantics bug #7

merged 6 commits into from
Jan 11, 2024

Conversation

maartyman
Copy link
Collaborator

No description provided.

@maartyman maartyman linked an issue Oct 25, 2023 that may be closed by this pull request
@coveralls
Copy link

coveralls commented Oct 25, 2023

Pull Request Test Coverage Report for Build 6657081957

  • 29 of 29 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 4676623935: 0.0%
Covered Lines: 81
Relevant Lines: 81

💛 - Coveralls

@maartyman
Copy link
Collaborator Author

High level overview:

  1. Create new PassThrough stream: storeImportStream.
  2. Read the import stream in on-demand mode. For each value check if it has a match.
  3. Await match if it matches, do nothing. If it doesn't match, push into pendingStream and storeImportStream.
  4. Import storeImportStream into store.

@maartyman
Copy link
Collaborator Author

maartyman commented Oct 25, 2023

A clean-up of the implementation still needs to be done!

lib/StreamingStore.ts Outdated Show resolved Hide resolved
lib/StreamingStore.ts Outdated Show resolved Hide resolved
lib/StreamingStore.ts Outdated Show resolved Hide resolved
@maartyman
Copy link
Collaborator Author

maartyman commented Oct 25, 2023

Everything above has been implemented. Are more tests needed?

Copy link
Member

@rubensworks rubensworks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the docs comments, everything looks fine, great tests as well 👍

lib/StreamingStore.ts Show resolved Hide resolved
@rubensworks
Copy link
Member

@surilindur Before we merge this, is it easy for you to test this in your setup with the aggregate store enabled? If so, can you check if this change solves your problem? If not, you can ignore this comment, and we can check it later for you.

@surilindur
Copy link

Thank you for looking into a fix!

After changing to the branch here, I stopped getting results from any queries with the aggregate store enabled. After some additional investigation, I found out that using the branch here in base Comunica (master branch) causes this set of unit tests to fail: packages/actor-rdf-resolve-quad-pattern-hypermedia/test/MediatedQuadSource-test.ts

The failure seems to indicate missing quads from a match call (the same test file has two other failing tests):

Summary of all failing tests
 FAIL  packages/actor-rdf-resolve-quad-pattern-hypermedia/test/MediatedQuadSource-test.ts (11.457 s)
  ● MediatedQuadSource › A MediatedQuadSource instance with aggregated store › match › should match three chained sources when queried multiple times

    expected two graphs to be isomorphic.

      Expected:
    [
      {"subject":"s11","predicate":"p11","object":"o11","graph":""},
      {"subject":"s21","predicate":"p21","object":"o21","graph":""},
      {"subject":"s12","predicate":"p12","object":"o12","graph":""},
      {"subject":"s22","predicate":"p22","object":"o22","graph":""},
      {"subject":"s13","predicate":"p13","object":"o13","graph":""},
      {"subject":"s23","predicate":"p23","object":"o23","graph":""}
    ]

      Actual:
    [

    ]

    Missing Quads (that don't contain Blank Nodes):
    [
      {"subject":"s11","predicate":"p11","object":"o11","graph":""},
      {"subject":"s21","predicate":"p21","object":"o21","graph":""},
      {"subject":"s12","predicate":"p12","object":"o12","graph":""},
      {"subject":"s22","predicate":"p22","object":"o22","graph":""},
      {"subject":"s13","predicate":"p13","object":"o13","graph":""},
      {"subject":"s23","predicate":"p23","object":"o23","graph":""}
    ]

    Additional Quads (that don't contain Blank Nodes):
    [

    ]

    Missing Blank Node Patterns:


    Additional Blank Node Patterns:

      254 |         ]);
      255 |         expect(mediatorRdfResolveHypermedia.mediate).toHaveBeenCalledTimes(4);
    > 256 |         expect(await arrayifyStream(source.match(v, v, undefined!, v, context))).toBeRdfIsomorphic([
          |                                                                                  ^
      257 |           quad('s11', 'p11', 'o11'),
      258 |           quad('s21', 'p21', 'o21'),
      259 |           quad('s12', 'p12', 'o12'),

      at Object.<anonymous> (packages/actor-rdf-resolve-quad-pattern-hypermedia/test/MediatedQuadSource-test.ts:256:82)

If I find the time, I can look into what is going wrong, but if someone else has the time before it, that would be amazing!

@rubensworks
Copy link
Member

Hmm, good catch @surilindur. There might be an edge-case that we're forgetting about.
Do you see any spec/integration tests failing by any chance?

We should probably delay merging this until we know for certain that this doesn't break anything in Comunica.

@surilindur
Copy link

This one seems to be happening consistently:

 FAIL  engines/query-sparql/test/QuerySparql-test.ts (51.179 s)
  ● System test: QuerySparql › foaf ontology broken link (no browser) › returns results with link recovery on [using full key]

    thrown: "Exceeded timeout of 20000 ms for a test.
    Add a timeout value to this test to increase the timeout, if this is a long-running test. See https://jestjs.io/docs/api#testname-fn-timeout."

      782 |   // We skip these tests in browsers due to CORS issues
      783 |   describe('foaf ontology broken link (no browser)', () => {
    > 784 |     it('returns results with link recovery on [using full key]', async() => {
          |     ^
      785 |       const result = <QueryBindings> await engine.query(`SELECT * WHERE {
      786 |     <http://xmlns.com/foaf/0.1/> a <http://www.w3.org/2002/07/owl#Ontology>.
      787 |   }`, {

      at engines/query-sparql/test/QuerySparql-test.ts:784:5
      at engines/query-sparql/test/QuerySparql-test.ts:783:3
      at Object.<anonymous> (engines/query-sparql/test/QuerySparql-test.ts:21:1)

But it looks like just a timeout at a first glance?

@surilindur
Copy link

To clarify, the following cases from the MediatedQuadSource seem to fail:

  • MediatedQuadSource › A MediatedQuadSource instance with aggregated store › match › should match three chained sources when queried multiple times
  • MediatedQuadSource › A MediatedQuadSource instance with aggregated store › match › should match three chained sources when queried multiple times in parallel
  • MediatedQuadSource › A MediatedQuadSource instance with aggregated store › match › should match three chained sources when queried multiple times for multiple queries

And then that random timeout from QuerySparql tests.

@rubensworks
Copy link
Member

engines/query-sparql/test/QuerySparql-test.ts

@surilindur That one also fails in the CI sometimes. Honestly, I don't know what it is about. I've tried reproducing locally, but to no avail.

So no errors when running the integration and spec tests in query-sparql?

@surilindur
Copy link

The integration and spec tests in query-sparql seem to pass, yes.

@surilindur
Copy link

surilindur commented Oct 26, 2023

After trying out a lot of things, I realised the aggregate store option is toggled off in base Comunica by default. 😬

Enabling the aggregateStore option with the @default JSDoc value here caused the integration tests to fail.

Example failure in the spoiler here.
✖ SELECT - OPTIONAL FILTER EXISTS
  FILTER EXISTS works within an OPTIONAL block
  Error: Invalid query evaluation

        Query: SELECT * WHERE {
  ?s ?p ?o.
  OPTIONAL {
    FILTER EXISTS {
      ?s2 ?p ?o.
      FILTER (?s != ?s2)
    }
  }
}

        Data: [{"value":"https://comunica.github.io/manifest-ldf-tests/sparql/optional-filter-exists/2triples.ttl","type":"file"}]

        Result Source: https://comunica.github.io/manifest-ldf-tests/sparql/optional-filter-exists/result.srj

        Expected:
 [QueryResultBindings:
    Variables: [
  "?o",
  "?p",
  "?s"
]
    Bindings:  [
  {
    "?s": {
      "termType": "BlankNode",
      "value": "bc_0_b0_blank"
    },
    "?p": {
      "termType": "NamedNode",
      "value": "http://predicate"
    },
    "?o": {
      "termType": "NamedNode",
      "value": "http://object"
    }
  },
  {
    "?s": {
      "termType": "NamedNode",
      "value": "http://subject"
    },
    "?p": {
      "termType": "NamedNode",
      "value": "http://predicate"
    },
    "?o": {
      "termType": "NamedNode",
      "value": "http://object"
    }
  }
]
]

        Got:
 [QueryResultBindings:
    Variables: [
  "?o",
  "?p",
  "?s"
]
    Bindings:  [
  {
    "?s": {
      "termType": "BlankNode",
      "value": "bc_0_b35_blank",
      "skolemized": {
        "termType": "NamedNode",
        "value": "urn:comunica_skolem:source_0:b35_blank"
      }
    },
    "?p": {
      "termType": "NamedNode",
      "value": "http://predicate"
    },
    "?o": {
      "termType": "NamedNode",
      "value": "http://object"
    }
  }
]
]

Hopefully that helps. Sorry about the confusion earlier.

@rubensworks
Copy link
Member

After enabling the aggregateStore option with the @default JSDoc value here, most of the integration tests started failing.

That is to be expected. The aggregateStore is not compatible with TPF-like sources.

@surilindur
Copy link

Oh okay, thank you! I had no idea. Sorry. 😦 What if I use query-sparql to query a single turtle document? Should the aggregated store support that?

With the aggregate store enabled, query-sparql fails to produce results for this, but without the aggregate store it works:

node bin/query.js https://comunica.github.io/manifest-ldf-tests/sparql/starwars-valuesbind/starwars.ttl -q 'SELECT ?person ?hair_colour WHERE {
  ?person <http://swapi.dev/documentation#name> "Luke Skywalker" ;
    <http://swapi.dev/documentation#hair_color> ?hair_colour .
}'

The same behaviour seems to happen with the link traversal repository, as well. With just one triple pattern it works, but with two or more it fails.

@rubensworks
Copy link
Member

The same behaviour seems to happen with the link traversal repository, as well. With just one triple pattern it works, but with two or more it fails.

@surilindur So this fails only with the new aggregate store from this PR, but not with the old aggregate store? Or does it fail on both?

@surilindur
Copy link

surilindur commented Oct 27, 2023

After checking it again, using the query from the previous comment here, with the aggregate store enabled:

  • query-sparql fails to produce the expected result with both master and fix-set-semantics-bug branch, which probably means that is supposed to happen and everything is fine?
  • query-sparql-solid does the same as query-sparql
  • query-sparql-link-traversal produces the expected result with master, but fails to produce any results with fix-set-semantics-bug

So if the non-link-traversal engines are not supposed to work with the aggregate store for querying a turtle document, then everything is fine. Why the link traversal one works with master branch and not the fix, I do not know.

This is really confusing to me, so apologies for the spam and confusion.

@rubensworks
Copy link
Member

query-sparql fails to produce the expected result with both master and fix-set-semantics-bug branch, which probably means that is supposed to happen and everything is fine?

The aggregate store was designed mainly for link traversal, and is untested for regular querying. I would expect it to only break for TPF queries, but I may be forgetting about something, and the general breakage is not surprising.
So indeed, all is fine here.

query-sparql-solid does the same as query-sparql

👌

query-sparql-link-traversal produces the expected result with master, but fails to produce any results with fix-set-semantics-bug

This is not ok indeed. And what exactly fails here? So the unit tests pass?

@surilindur
Copy link

I think I forgot to rebuild the link traversal engine because it seems to work now with the fix branch, as well.

But it does not make sense, because I definitely did build it with the fix branch when some of the queries failed.

@rubensworks rubensworks merged commit d11c8ce into master Jan 11, 2024
13 checks passed
@rubensworks rubensworks deleted the fix-set-semantics-bug branch January 11, 2024 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PendingStream and internal store get out of sync due to set-semantics
4 participants