Improve SPARQL query optimization by using selectivity #70

jindrichmynarz · 2019-08-07T11:28:31Z

SPARQL query optimization in Halyard is based on cardinalities pre-computed by Halyard Stats. A frequent case in SPARQL queries is to encounter triple patterns with similar cardinality, but drastically different selectivity (see e.g., here for a description of selectivity). Selectivity can be considered as the ratio of distinct predicate values to all values (i.e. (COUNT(DISTINCT ?val)/COUNT(?val)).

In some cases the optimizer can prioritize triple patterns that have low cardinality but are unselective. For example, in Wikidata the properties that link Wikidata entities with Wikipedia pages have similar cardinality. See the Halyard Stats for schema:about, schema:inLanguage, and schema:isPartOf:

PREFIX halyard: <http://merck.github.io/Halyard/ns#>
PREFIX schema:  <http://schema.org/>
PREFIX void:    <http://rdfs.org/ns/void#>

SELECT ?property ?cardinality
WHERE {
  GRAPH halyard:statsContext {
    VALUES ?property {
      schema:about
      schema:inLanguage
      schema:isPartOf
    }
    [] void:propertyPartition [
        void:property ?property ;
        void:triples ?cardinality
      ] .
  }
}

property	cardinality
schema:about	"126270886"^^xsd:long
schema:inLanguage	"67986056"^^xsd:long
schema:isPartOf	"67986056"^^xsd:long

schema:about has the highest cardinality, but it has vastly better object selectivity than schema:inLanguage or schema:isPartOf, which have only few distinct values. Given that the object of schema:about is known, the query optimizer would produce a better plan if it gave it a priority. While the object selectivity of schema:about is high, since most of its objects are unique (i.e. Wikidata entities), the object selectivity of schema:inLanguage is low, since it has very few unique objects (i.e. languages of Wikipedias).

Adding selectivity to Halyard Stats can mean adding 2 numbers to each partition, e.g., for property partitions it's selectivity with respect to objects and selectivity with respect to subjects. These can be then used by the query optimizer to produce better query plans.

The text was updated successfully, but these errors were encountered:

jindrichmynarz added the enhancement label Aug 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve SPARQL query optimization by using selectivity #70

Improve SPARQL query optimization by using selectivity #70

jindrichmynarz commented Aug 7, 2019

Improve SPARQL query optimization by using selectivity #70

Improve SPARQL query optimization by using selectivity #70

Comments

jindrichmynarz commented Aug 7, 2019