Skip to content

Elephant Bird Lucene

isnotinvain edited this page Jan 4, 2013 · 13 revisions

Elephant-Bird provides two modules that make it easier to build and query lucene indexes in HDFS from either a map reduce or pig job.

Module layout

Elephant-Bird has two lucene modules:

  • elephant-bird-lucene which contains
    • LuceneIndexOutputFormat for creating lucene indexes in HDFS from a MR job
    • LuceneIndexInputFormat for querying lucene indexes in HDFS from a MR job
    • HdfsMergeTool for merging lucene indexes in HDFS
  • elephant-bird-pig-lucene
    • LuceneIndexStorage which wraps LuceneIndexOutputFormat
    • LuceneIndexLoader which wraps LuceneIndexInputFormat

There are a lot more details in the javadocs, but you can find some examples below of creating + searching indexes.

Creating Indexes

See Creating Indexes

Querying Indexes

See Querying Indexes