-
Notifications
You must be signed in to change notification settings - Fork 18
From Sql To Solr
In JDA, MySQL is the system of record. It receives new data from user supplied JSON formatted text files. They are ingested via the command line. Data is transferred from MySQL to Solr to provide improved search, and soon, heatmaps. JDA provides a DataImportHandler that extracts archive data from MySQL and adds it to Solr. This feature is built entirely in Solr and uses Solr’s import infrastructure. Since using Solr as a secondary, search-based datastore is typical Sorl supports reading data from SQL stores.
Via solrconfig.xml, the data import request handler points to a “data import config file”. The data import config file specifies what code to run when the request handler url is accessed (at dataimport/dataimport). It includes the details of how to connect to the database (url, username, password, etc.) It can either perform a full import moving all the items from MySQL into Solr or it can perform a partial import of just the new items. The file specifies exactly which columns from which tables in the SQL database are moved to Solr. Several SQL columns are stored in PHP’s internal data format (e.g., attributes). Since Solr doesn’t understand PHP’s data format these values must be converted to JSON. This is done by some Java code (org.zeega.solr.ItemTransformer). The data import config file joins the Item table with the User table to get the user’s name and display name. Only data from the Item and User tables are added to Solr.
Solr can only generate heatmaps from one very specific type of field called RPT. Since the current Solr schema does not have a field of this type, we must extend the schema. The field’s value is a string representation of the tweet’s latitude and longitude (in Well Known Text/WKT format). This can be specified in the data import config file using a template such as:
<field column=”point_rpt” template=”POINT(${media_geo_longitude} ${media_geo_latitude})/>
This will populate the RPT field of Solr documents with the geodetic coordinates from the rows of the Item table. Adding this field to existing items requires a full import.