You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SPARQL query optimization in Halyard is based on cardinalities pre-computed by Halyard Stats. A frequent case in SPARQL queries is to encounter triple patterns with similar cardinality, but drastically different selectivity (see e.g., here for a description of selectivity). Selectivity can be considered as the ratio of distinct predicate values to all values (i.e. (COUNT(DISTINCT ?val)/COUNT(?val)).
In some cases the optimizer can prioritize triple patterns that have low cardinality but are unselective. For example, in Wikidata the properties that link Wikidata entities with Wikipedia pages have similar cardinality. See the Halyard Stats for schema:about, schema:inLanguage, and schema:isPartOf:
schema:about has the highest cardinality, but it has vastly better object selectivity than schema:inLanguage or schema:isPartOf, which have only few distinct values. Given that the object of schema:about is known, the query optimizer would produce a better plan if it gave it a priority. While the object selectivity of schema:about is high, since most of its objects are unique (i.e. Wikidata entities), the object selectivity of schema:inLanguage is low, since it has very few unique objects (i.e. languages of Wikipedias).
Adding selectivity to Halyard Stats can mean adding 2 numbers to each partition, e.g., for property partitions it's selectivity with respect to objects and selectivity with respect to subjects. These can be then used by the query optimizer to produce better query plans.
The text was updated successfully, but these errors were encountered:
SPARQL query optimization in Halyard is based on cardinalities pre-computed by Halyard Stats. A frequent case in SPARQL queries is to encounter triple patterns with similar cardinality, but drastically different selectivity (see e.g., here for a description of selectivity). Selectivity can be considered as the ratio of distinct predicate values to all values (i.e.
(COUNT(DISTINCT ?val)/COUNT(?val)
).In some cases the optimizer can prioritize triple patterns that have low cardinality but are unselective. For example, in Wikidata the properties that link Wikidata entities with Wikipedia pages have similar cardinality. See the Halyard Stats for
schema:about
,schema:inLanguage
, andschema:isPartOf
:schema:about
has the highest cardinality, but it has vastly better object selectivity thanschema:inLanguage
orschema:isPartOf
, which have only few distinct values. Given that the object ofschema:about
is known, the query optimizer would produce a better plan if it gave it a priority. While the object selectivity ofschema:about
is high, since most of its objects are unique (i.e. Wikidata entities), the object selectivity ofschema:inLanguage
is low, since it has very few unique objects (i.e. languages of Wikipedias).Adding selectivity to Halyard Stats can mean adding 2 numbers to each partition, e.g., for property partitions it's selectivity with respect to objects and selectivity with respect to subjects. These can be then used by the query optimizer to produce better query plans.
The text was updated successfully, but these errors were encountered: