Skip to content

Latest commit

 

History

History
23 lines (18 loc) · 1.23 KB

Extract.md

File metadata and controls

23 lines (18 loc) · 1.23 KB

Binary document upload

SolrNet supports Solr "extract" feature (a.k.a. Solr "Cell") to index data from binary document formats such as Word, PDF, etc.

Here's a simple example showing how to extract text from a PDF file, without indexing it:

ISolrOperations<Something> solr = ...
using (var file = File.OpenRead(@"test.pdf")) {
    var response = solr.Extract(new ExtractParameters(file, "some_document_id") {
        ExtractOnly = true,
        ExtractFormat = ExtractFormat.Text,
    });
    Console.WriteLine(response.Content);
}

ExtractOnly = true tells Solr to just perform text extraction but not index the uploaded document. If ExtractOnly = false you can add more fields with the Fields property. Other options can be set through the properties of the ExtractParameters class. It's usually recommended to provide the StreamType for the content, as auto-detection might fail.

For more details about each option in ExtractParameters see the Solr wiki and the Solr reference guide.