Skip to content

Facet Indexing and Query

Drew Farris edited this page Sep 3, 2020 · 6 revisions

Work in Progress

Precomputed Facets

Table Purpose Row Column Family Column Qualifier Value
facets Holds cardinality for pivot/facet pairs pivot field value \0 facet field value \0 datatype pivot field name \0 facet field name datestamp Serialized HyperLogLog Plus
Cardinality hash reference pivot field value \0 facet hash \0 datatype pivot field name \0 facet field name '.hash' datestamp Serialized HyperLogLog Plus
facetHashes Hashes for fields with a large number values field value hash field value none none
facetMetadata Tracks pivot/facet pairs pivot field name \0 pivot field value 'pv' none none

Precomputed Facet Example

The ingest configuration for the myjson datatype (myjson-ingest-config.xml) contains the configuration directive:

   <property>
        <name>myjson.facet.category.name.network</name>
        <value>NETWORK_NAME;GENRES,EMBEDDED_CAST_PERSON_GENDER,RATING_AVERAGE</value>
    </property>

The facets table entries looks like (with values omitted..)

bbc one\x00female\x00myjson NETWORK_NAME\x00EMBEDDED_CAST_PERSON_GENDER:20200831 [PRIVATE|(BAR&FOO)]
bbc one\x00male\x00myjson NETWORK_NAME\x00EMBEDDED_CAST_PERSON_GENDER:20200831 [PRIVATE|(BAR&FOO)]
bbc one\x00romance\x00myjson NETWORK_NAME\x00GENRES:20200831 [PRIVATE|(BAR&FOO)]
cbs\x005.7\x00myjson NETWORK_NAME\x00RATING_AVERAGE:20200707 [PRIVATE|(BAR&FOO)]
cbs\x005.8\x00myjson NETWORK_NAME\x00RATING_AVERAGE:20200707 [PRIVATE|(BAR&FOO)]
cbs\x008.2\x00myjson NETWORK_NAME\x00RATING_AVERAGE:20200707 [PRIVATE|(BAR&FOO)]
cbs\x008.2\x00myjson NETWORK_NAME\x00RATING_AVERAGE:20200831 [PRIVATE|(BAR&FOO)]
cbs\x008.6\x00myjson NETWORK_NAME\x00RATING_AVERAGE:20200831 [PRIVATE|(BAR&FOO)]
cbs\x008.8\x00myjson NETWORK_NAME\x00RATING_AVERAGE:20200831 [PRIVATE|(BAR&FOO)]
cbs\x009\x00myjson NETWORK_NAME\x00RATING_AVERAGE:20200707 [PRIVATE|(BAR&FOO)]
cbs\x00action\x00myjson NETWORK_NAME\x00GENRES:20200707 [PRIVATE|(BAR&FOO)]
cbs\x00cbs\x00myjson NETWORK_NAME\x00NETWORK_NAME:20200707 [PRIVATE|(BAR&FOO)]
cbs\x00cbs\x00myjson NETWORK_NAME\x00NETWORK_NAME:20200831 [PRIVATE|(BAR&FOO)]
cbs\x00comedy\x00myjson NETWORK_NAME\x00GENRES:20200707 [PRIVATE|(BAR&FOO)]
cbs\x00comedy\x00myjson NETWORK_NAME\x00GENRES:20200831 [PRIVATE|(BAR&FOO)]
cbs\x00crime\x00myjson NETWORK_NAME\x00GENRES:20200707 [PRIVATE|(BAR&FOO)]
cbs\x00drama\x00myjson NETWORK_NAME\x00GENRES:20200707 [PRIVATE|(BAR&FOO)]
cbs\x00family\x00myjson NETWORK_NAME\x00GENRES:20200707 [PRIVATE|(BAR&FOO)]
cbs\x00female\x00myjson NETWORK_NAME\x00EMBEDDED_CAST_PERSON_GENDER:20200831 [PRIVATE|(BAR&FOO)]
cbs\x00male\x00myjson NETWORK_NAME\x00EMBEDDED_CAST_PERSON_GENDER:20200831 [PRIVATE|(BAR&FOO)]
cbs\x00medical\x00myjson NETWORK_NAME\x00GENRES:20200831 [PRIVATE|(BAR&FOO)]
cbs\x00war\x00myjson NETWORK_NAME\x00GENRES:20200831 [PRIVATE|(BAR&FOO)]
fox\x007.2\x00myjson NETWORK_NAME\x00RATING_AVERAGE:20200707 [PRIVATE|(BAR&FOO)]
fox\x007.8\x00myjson NETWORK_NAME\x00RATING_AVERAGE:20200831 [PRIVATE|(BAR&FOO)]

The facetMetadata table records which pivot/facet field pairs we have seen:

NETWORK_NAME\x00EMBEDDED_CAST_PERSON_GENDER pv: []
NETWORK_NAME\x00GENRES pv: []
NETWORK_NAME\x00NETWORK_NAME pv: []
NETWORK_NAME\x00RATING_AVERAGE pv: []

The facetHashes table holds the one-to-many mapping between a field value hash and the values for that field.

085a7d11b23a6367b8ad http://static.tvmaze.com/uploads/images/medium_portrait/0/1116.jpg: []
085a7d11b23a6367b8ad http://static.tvmaze.com/uploads/images/medium_portrait/0/1117.jpg: []
085a7d11b23a6367b8ad http://static.tvmaze.com/uploads/images/medium_portrait/0/516.jpg: []
085a7d11b23a6367b8ad http://static.tvmaze.com/uploads/images/medium_portrait/0/517.jpg: []
Clone this wiki locally