Skip to content

Solr Plugin For Data Import

Steve McDonald edited this page Jun 4, 2016 · 2 revisions

Several SQL columns are stored in PHP’s internal data format (e.g., tags and attributes). Since Solr doesn’t understand PHP’s serialized data format these values must be converted to JSON during data import. This is done by some Java code in the class org.zeega.solr.ItemTransformer. This class is only a couple dozen lines of code and uses the Java class org.lovecraft.phparser.SerializedPhpParser.

These two Java source file are under the directory solr/solr-item-transformer/src. Before they can can be used by Solr, they must be compiled and put in a jar file. There is a very simple script to do this in solr/solr-item-transformer/src/build.sh. This script compiles the source files and creates the jar file zeega-solr-0.2.jar. This file must be copied and renamed to contrib/zeega/lib/zeega-solr.jar in the Solr directory. After a restart, Solr loads this file and uses it to convert the serialized PHP objects store in the database to plain text for data import.

It wasn't clear that the Java code in Github matched zeega-solr.jar on the production server. SpacemanSteve decompiled the jar file (using Procyon) from the production server and added additional error handling. This code will be committed on Github.

With the newer version of Solr, it may be important to export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 before running the build script.