-
-
Notifications
You must be signed in to change notification settings - Fork 490
201411HierarchicalFacetSupport
Date | November, 2014 | Contacts | Craig Jones |
Status | Approved | Release | 2.12 |
Resources | Resources available | Ticket # | 679 |
Source code | Pull request | ||
Funding | Integrated Marine Observing System |
GeoNetwork currently supports simple faceted searching only - faceted searching can be performed on terms marked up in the metadata or on simple translations of these terms. This proposal seeks to add back-end support for more advanced faceted searching. In particular this proposal seeks to add:
- indexing of multi-level category paths generated using a hierarchical classification process
- search summaries broken down hierachically according the indexed category hierarchy
- drilling down on a path or paths through the hierarchy as a part of a search request
It includes support for:
- hierarchical classification using classification schemes loaded into GeoNetwork (using broader term relationships)
- plugging in custom classification methods using spring
- language translation for categories sourced from GeoNetwork classification schemes
This proposal does not include:
- user interface support for display of hierarchical facet summaries/drill-down. The existing simple faceted searching functionality (indexing, summarisation and search) is still supported and no changes to the GeoNetwork client code have been made or are required to continue using this functionality.
- hierarchical facet drilldown or summaries as a custom extension to the CSW service
An example application utilising this support in a customised 2.10 instance can be found at https://imos.aodn.org.au/imos123/home . Refer to the faceted search for collections in step 1. The Measured parameter facet consists of over a hundred possible parameters. Previously these were all displayed as single long list of possible selections making it difficult to find the relevant option or an area of interest such as temperature.
Configuration of available facets and the content of summary responses is now specified separately. Configuration is still performed in WEB-INF/config-summary.xml although custom spring elements are now utilised.
Available facets, how they should be indexed and default formatting options are now defined in a facets element. Each available facet is configured in facet children using the following attributes:
attribute | description |
---|---|
name | the name of the facet |
indexKey | the name of a field returned for indexing |
label | a label to use for the facet (used in response summaries) |
classifier | (optional) a reference to a classifier to use to generate facet values (see Configuring Classifiers below). The default is a classifier that simply returns the value of the indexKey for indexing. |
Example:
<facets>
<facet name="keyword" label="keywords" indexKey="keyword"/>
<facet name="createDateYear" label="createDateYears" indexKey="createDateYear"/>
...
<facet name="parameter" label="Parameter" indexKey="longParamName" classifier="parameterClassifier"/>
<facet name="platform" label="Platform" indexKey="platform" classifier="platformClassifier"/>
</facets>
The content and format of predefined summary types are now configured using the summaryTypes element.
Each available summary type is configured in summaryType children using the following attributes:
attribute | description |
---|---|
name | the name of the summary type |
format | (optional) the format to use in the response. Default is the current format ('FACET_NAME'). 'DIMENSION' can also be specified, refer to Dimension Response Format below for more details |
Each facet included in the summary type is configured in item children using the following attributes:
attribute | description |
---|---|
facet | the name of the facet to include |
sortBy | (optional) the ordering for the facet. Default is by count. |
sortorder (optional) | asc or desc. Default is descending. |
max | (optional) the number of values to be returned for the facet. Default is 10. |
depth | (optional) the maximum depth to which sub categories should be returned. Default is 1. Other values only make sense for multi-level facets |
translator | (optional) code of translator to use to translate facet values into language specific labels. See language translation below for configuring the new translator included in this proposal for classification scheme terms. |
Example:
<summaryTypes>
<summaryType name="hits">
<item facet="keyword" max="15"/>
<item facet="inspireTheme" sortBy="value" sortOrder="asc" max="35"/>
...
</summaryType>
<summaryType name="hierarchical_facets" format="DIMENSION">
<item facet="parameter" max="10" depth="3" translator="term:http://vocab.aodn.org.au/classificationSchemes/parameterDiscovery"/>
<item facet="organisation" max="10"/>
<item facet="platform" max="10" depth="2" translator="term:http://vocab.aodn.org.au/classificationSchemes/platformDiscovery"/>
</summaryType>
</summaryTypes>
A new summary response format has been added for summarising hierarchical facet counts. This format can be used to simplify creation of drill down search requests. It can be configured for a service by specifying 'DIMENSION' for the format attribute of the summaryType element used for that service as shown above for the "hierarchical_facets" summaryType.
Example response for this format:
<response from="1" to="5" selected="0">
<summary count="5" type="local">
<dimension name="regionKeyword" label="Region">
<category value="http://geonetwork-opensource.org/regions#country" label="country" count="5">
<category value="http://geonetwork-opensource.org/regions#10" label="Australia" count="2" />
<category value="http://geonetwork-opensource.org/regions#181" label="Zimbabwe" count="1" />
<category value="http://geonetwork-opensource.org/regions#1220" label="All fishing areas" count="1" />
<category value="http://geonetwork-opensource.org/regions#68" label="France" count="1" />
</category>
<category value="http://geonetwork-opensource.org/regions#ocean" label="ocean" count="1">
<category value="http://geonetwork-opensource.org/regions#1220" label="All fishing areas" count="1" />
</category>
</dimension>
</summary>
<metadata>
...
A new search parameter 'facet.q' has been added that allows drill down queries to be added to a search request. A drill down path is constructed as follows:
<dimension_name>{"/"<category_value>}
For example to drill down on the country category above:
http://localhost:8080/geonetwork/srv/eng/xml.search.facet?fast=index&from=1&to=50&facet.q=regionKeyword/http%253A%252F%252Fgeonetwork-opensource.org%252Fregions%2523category
Note that drill down paths use '/' as the separator between categories in the path, so embedded '/' characters in categories should be escaped using %2F or alternatively, each category in the path url encoded in addition to normal parameter encoding. For example to drill down on Australia above:
http://localhost:8080/geonetwork/srv/eng/xml.search.facet?fast=index&from=1&to=50&facet.q=regionKeyword/http%253A%252F%252Fgeonetwork-opensource.org%252Fregions%2523country%2Fhttp%253A%252F%252Fgeonetwork-opensource.org%252Fregions%252310
Multiple drill down queries can be specified by providing multiple facet.q parameters or by combining drill down queries in one facet.q parameter using '&' appropriately encoded.
Classifiers implement the org.fao.geonet.search.classifier.Classifier interface which has one method:
public List<CategoryPath> classify(String value);
They take the value of an index field provided to the GeoNetwork indexing engine and return a list of category paths that should be indexed for that value.
Classifiers are configured using spring bean configuration e.g.
<bean id="regionKeywordClassifier" class="org.fao.geonet.kernel.search.classifier.TermLabel" lazy-init="true">
<constructor-arg name="finder" ref="ThesaurusManager"/>
<constructor-arg name="conceptScheme" value="http://geonetwork-opensource.org/regions"/>
<constructor-arg name="langCode" value="eng"/>
</bean>
The bean reference is used when configuring the facet to use this classifier:
<facet name="region" label="regions" indexKey="region" classifier="regionKeywordClassifier"/>
Note: the above assumes region keywords are passed to the indexing engine using the region field.
Four classifiers are included in this proposal:
class | description |
---|---|
org.fao.geonet.search.classifier.Value | the default classifier - returns a single category path containing one category - the value passed |
org.fao.geonet.search.classifier.TermLabel | returns a list of category paths created by looking up broader terms for value passed in a classification scheme. The value passed is assumed to be the preferred label of a term in the classification scheme. |
org.fao.geonet.search.classifier.TermUri | returns a list of category paths created by looking up broader terms for value passed in a classification scheme. The value passed is assumed to be the identifier (URI) of a term in the classification scheme. |
org.fao.geonet.search.classifier.Split | returns a category path containing categories created by splitting the passed value using a regular expression. |
Note: TermLabel and TermUri classifiers may return multiple category paths if there there are many possible parent paths (e.g. a term or terms in the parent hierarchy has more than one parent).
As an example, looking up the 'Practical salinity of the water body' in a parameter thesaurus using a TermLabel classifier may return the following category path:
http://vocab.aodn.org.au/def/ClassScheme/parameter1/Category/56,
http://vocab.aodn.org.au/def/ClassScheme/parameter1/Category/50,
http://vocab.nerc.ac.uk/collection/P01/current/PSLTZZ01
Or using preferred labels in english: Physical-Water, Salinity, Practical salinity of the water body
This proposal includes a term URI language translator for translating URI categories returned by the TermUri and TermLabel classifiers to the detected or requested language for the search response. The translator is specified using a 'term:' prefix on the translator specification and the identifier (URI) of the classification scheme to use to lookup labels.
For example, to return labels in French for region keywords indexed above use the following in the summaryType configuration for the service:
<item facet="regionKeyword" translator="term:http://geonetwork-opensource.org/regions"/>
Arguments | Type | Description |
---|---|---|
finder | org.fao.geonet.kernel.ThesaurusFinder | the thesausus finder to use to find the classification scheme |
conceptScheme | java.lang.String | the identifier (URI) of the classification scheme to be used for term classification |
langCode | java.lang.String | the language of preferred labels passed to the classify method |
Arguments | Type | Description |
---|---|---|
finder | org.fao.geonet.kernel.ThesaurusFinder | the thesausus finder to use to find the classification scheme |
conceptScheme | java.lang.String | the identifier (URI) of the classification scheme to be used for term classification |
- Type:
- Module:
- Vote Proposed: 2/12/14
- +1 Patrizia, +1 Francois, +1 Jose, +1 Jesse
- Craig Jones
- Angus Scheibner
If you have some comments, start a discussion, raise an issue or use one of our other communication channels to talk to us.