The Webindex example has three major code components.
- Spark component : Generates initial Fluo and Query tables.
- Fluo component : Updates the Query table as web pages are added, removed, and updated.
- Web component : Web application that uses the Query table.
Since all of these components either read or write the Query table, you may want to read about the Query Table before reading about the code.
The following image shows a high level view of how data flows through the Fluo Webindex code.
![Observer Map](webindex_graphic.png)The PageLoader queues updated page content for processing by the PageObserver.
All Observers are setup by WebindexObservers. This class wires up everything discussed below.
The PageObserver computes changes to links in a page. It queues +1
and -1
for new and
deleted URIs to the uriQ. It also queues up changes in URIs to the export queue.
A CombineQueue is setup to track the number of pages linking to a URI. The reduce()
function in
UriInfo combines multiple updates into a single value.
UriCombineQ.UriUpdateObserver is called when a keys values changes. The update
observer queues '+1' and '-1' to the domain map. The update observer also queues change in URI
inbound link counts to the export queue.
A CombineQueue is setup to track the number of unique URIs observed in each domain. The SummingCombiner from Fluo Recipes combines updates. DomainCombineQ.DomainUpdateObserver is called when a keys value changes and it queues the changes on the export queue.
All other observers place IndexUpdate observers on the export queue. IndexUpdateTranslator is a function that translates IndexUpdates to Accumulo Mutations. This function is passed to the Fluo Recipe that exports to Accumulo tables.
IndexUpdate is is implemented by the following classes:
-
DomainUpdate - Updates information related to domain (like page count).
-
PageUpdate - Updates information related to page (like links being added or deleted).
-
UriUpdate - Updates information related to URI.
These objects are translated to mutations using code in the IndexClient.