This will migrate a Solr node to an Elasticsearch index.
- Python 3+
- elasticsearch
- pysolr
usage: solr-to-es [-h] [--solr-query SOLR_QUERY] [--solr-fields COMMA_SEP_FIELDS]
[--rows-per-page ROWS_PER_PAGE] [--es-timeout ES_TIMEOUT]
solr_url elasticsearch_url elasticsearch_index doc_type
The following example will page through all documents on the local Solr, and submit them to the local Elasticsearch server in the index es_index
with a document type of solr_docs
.
solr-to-es http://localhost:8983/solr/<<collection_name>> http://localhost:9200 <<collection_name>> solr_docs
solr_url
is the full url to your Solr,
elasticsearch_url
is the url of your Elasticsearch server.
elasticsearch_index
is the index you will submit the Solr documents to on Elasticsearch.
doc_type
is the type of document Elasticsearch should assume you are importing.
--solr-query
defaults to *:*
--solr-fields
defaults to
(i.e. all fields)
--rows-per-page
defaults to 500
--es-timeout
defaults to 60
--es-user
for authentication in Elasticsearch
--es-password
for authentication in Elasticsearch
--es-max-retries
maximum number of times a document will be retried when 429 is received, set to 0 for no retries on 429
--es-initial-backoff
number of seconds we should wait before the first retry. Any subsequent retries will be powers of initial_backoff * 2**retry_number
Run python setup.py install
to install the script.
Here is an example of grabbing the over 114 thousand journal articles from Plos.org API about animals.
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch
solr-to-es --solr-query animal http://api.plos.org/search localhost:9200 es_plos solr_docs
curl http://localhost:9200/_cat/indices?v
Note: that you will get an 403 Forbidden error from the script, and that is because the solr.quepid.com doesn't allow deep paging, however you will have documents in your ES cluster.