Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all data is correctly indexed/returned #8

Open
adamretter opened this issue Jun 24, 2020 · 1 comment
Open

Not all data is correctly indexed/returned #8

adamretter opened this issue Jun 24, 2020 · 1 comment

Comments

@adamretter
Copy link
Contributor

adamretter commented Jun 24, 2020

Using the following RDF/XML data file - http://static.adamretter.org.uk/HHS_Provider_Relief_Fund.rdf.gz

I can't seem to ever get more than 10 results back from querying it with SPARQL in eXist-db:

xquery version "3.1";

import module namespace sparql = "http://exist-db.org/xquery/sparql";

let $query1 := '
    PREFIX ds:  <https://data.cdc.gov/resource/kh8y-3es6/>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

    SELECT (count(DISTINCT ?state) as ?count)
    WHERE {
        ?provider ds:state ?state
    }
'
return
	sparql:query($query1)

returns the count of 10, i.e.:

<sparql xmlns="http://www.w3.org/2005/sparql-results#">
    <head>
        <variable name="count"/>
    </head>
    <results>
        <result>
            <binding name="count">
                <literal datatype="http://www.w3.org/2001/XMLSchema#integer">10</literal>
            </binding>
        </result>
    </results>
</sparql>

However the XQuery on RDF/XML shows that the result should actually be 55:

count(distinct-values(doc("/db/hhs-provider/hhs-provider.rdf")//*:state/string(.)))

The result from the SPARQL query (10) is wrong, the XQuery result of 55 is correct.

@adamretter
Copy link
Contributor Author

adamretter commented Jun 24, 2020

I also decided to test this directly with TDB from Apache Jena 3.15.0

I loaded the data:

$ bin/tdbloader --loc=/tmp/tdb /tmp/HHS_Provider_Relief_Fund.rd

...

** Completed: 1,471,085 triples loaded in 18.07 seconds [Rate: 81,396.84 per second]

I created the SPARQL file /tmp/states.sparql:

PREFIX ds:  <https://data.cdc.gov/resource/kh8y-3es6/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT (count(DISTINCT ?state) as ?count)
WHERE {
   ?provider ds:state ?state
}

I then executed the SPARQL query:

$ bin/tdbquery --loc=/tmp/tdb --file /tmp/states.sparql
---------
| count |
=========
| 55    |
---------

So using TDB directly returns the correct result - therefore I have to suspect some bug somewhere in the exist-sparql module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant