Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during training when using remote ElasticSearch #57

Open
thisismana opened this issue Sep 4, 2018 · 20 comments
Open

Error during training when using remote ElasticSearch #57

thisismana opened this issue Sep 4, 2018 · 20 comments

Comments

@thisismana
Copy link

I was trying to set up a PIO server with a remote ES and remote HBASE/Zookeeper via Docker.

versions used:

  • SCALA_VERSION 2.11.8
  • PIO_VERSION 0.12.1 ("from source" downloaded from apache mirror)
  • SPARK_VERSION 2.1.2
  • ELASTICSEARCH_VERSION 5.5.2
  • HBASE_VERSION 1.3.1

Here is my config:

pio-env.sh:

#!/usr/bin/env bash

# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=${HOME}/.pio_store
PIO_FS_ENGINESDIR=${PIO_FS_BASEDIR}/engines
PIO_FS_TMPDIR=${PIO_FS_BASEDIR}/tmp

SPARK_HOME=${PIO_HOME}/vendors/spark-${SPARK_VERSION}-bin-hadoop2.7

HBASE_CONF_DIR=${PIO_HOME}/vendors/hbase-${HBASE_VERSION}/conf

# Storage Repositories
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_APPDATA_NAME=pio_appdata
PIO_STORAGE_REPOSITORIES_APPDATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS

# ES config
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=predictionio
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=es
# PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=${PIO_HOME}/vendors/elasticsearch-${ELASTICSEARCH_VERSION}

PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_LOCALFS_PATH=${PIO_FS_BASEDIR}/models

PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=${PIO_HOME}/vendors/hbase-${HBASE_VERSION}
# http://actionml.com/docs/small_ha_cluster
HBASE_MANAGES_ZK=true # when you want HBase to manage zookeeper

PIO itself seems to be running fine, here is the output of pio status:

pio status
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.12.1 is installed at /PredictionIO-0.12.1
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /PredictionIO-0.12.1/vendors/spark-2.1.2-bin-hadoop2.7
[INFO] [Management$] Apache Spark 2.1.2 detected (meets minimum requirement of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: LOCALFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[INFO] [HBLEvents] The table pio_event:events_0 doesn't exist yet. Creating now...
[INFO] [HBLEvents] Removing table pio_event:events_0...
[INFO] [Management$] Your system is all ready to go.

It seems as if the universal recommender does not pick up the PIO storage settings; but keeps his own settings. Running the integration tests it is using the template examples/handmade-engine.json where I added two lines within the sparkConf object (es.nodes and es.nodes.wan.only):

  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer": "300m",
    "es.index.auto.create": "true",
    "es.nodes.wan.only":"true",
    "es.nodes":"es"
  }

It seems to be talking to the right ES server, but I always get the following exception during the training (pio train -- --driver-memory 4g --executor-memory 4g) phase:

2018-09-04 07:49:35,562 ERROR org.apache.predictionio.data.storage.elasticsearch.ESEngineInstances [main] - Failed to update pio_meta/engine_instances/AWWjjner32JscvS-r-c9
org.apache.predictionio.shaded.org.elasticsearch.client.ResponseException: POST http://es:9200/pio_meta/engine_instances/AWWjjner32JscvS-r-c9?refresh=true: HTTP/1.1 400 Bad Request
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"mapper [sparkConf.es.nodes] of different type, current_type [text], merged_type [ObjectMapper]"}],"type":"illegal_argument_exception","reason":"mapper [sparkConf.es.nodes] of different type, current_type [text], merged_type [ObjectMapper]"},"status":400}
	at org.apache.predictionio.shaded.org.elasticsearch.client.RestClient$1.completed(RestClient.java:354)
	at org.apache.predictionio.shaded.org.elasticsearch.client.RestClient$1.completed(RestClient.java:343)
	at org.apache.predictionio.shaded.org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177)
	at org.apache.predictionio.shaded.org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436)
	at org.apache.predictionio.shaded.org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
	at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
	at java.lang.Thread.run(Thread.java:748)

This does not happen when I start a local ES on the same machine where PIO is located (using the original engine.json)

@thisismana
Copy link
Author

The solution is not to put the spark.es.nodes.wan.only into the sparkConf but to pass it to the pio train command as follows:

pio train -- --driver-memory 4g --executor-memory 4g --conf spark.es.nodes.wan.only=true

This fixes my problem but still — the ElasticSearch-Mapping is problematic, since there are many more options available to the es.nodes prefix that are not allowed to be added to the sparkConf.

@ahmet8282
Copy link

I have the same error. I have elasticsearch and predictionio in two docker containers. in the end i get Failed to update pio_meta/engine_instances however, adding wan.only did not help. did you also use something else?

[INFO] [URModel] ES fields[3]: List(popRank, read, id)
[INFO] [EsClient$] Create new index: urindex_1542467924140, items, List(popRank, read, id), Map(popRank -> (float,false), read -> (keyword,true))
[INFO] [EsClient$] Number of ES connections for saveToEs: 4
[INFO] [Engine$] org.apache.predictionio.data.storage.NullModel does not support data sanity check. Skipping check.
[INFO] [Engine$] EngineWorkflow.train completed
[INFO] [Engine] engineInstanceId=AWciPZF84YvwTDFicADa
[INFO] [CoreWorkflow$] Inserting persistent model
[INFO] [CoreWorkflow$] Updating engine instance
[ERROR] [ESEngineInstances] Failed to update pio_meta/engine_instances/AWciPZF84YvwTDFicADa
[INFO] [CoreWorkflow$] Training completed successfully.
[INFO] [AbstractConnector] Stopped Spark@59c500f7{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}

@IlCingalese
Copy link

you can pass "es.nodes.wan.only":"true" in sparkconf.
The error was given in "es.nodes":"es".
The only way to set the nodes name is in Prediction.io conf file (pio-env.sh) using:
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=es
so remove that row and all work fine.

@joshhornby
Copy link

Same issue for me.

When following the docs to run Docker:

docker-compose -f docker-compose.yml \
  -f elasticsearch/docker-compose.base.yml \
  -f elasticsearch/docker-compose.meta.yml \
  -f elasticsearch/docker-compose.event.yml \
  -f localfs/docker-compose.model.yml \
  up

And trying to run pio train I get no other nodes left - aborting...Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

I've tried both:

pio train -- --driver-memory 4g --executor-memory 4g --conf spark.es.nodes.wan.only=true

and

  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer": "300m",
    "es.index.auto.create": "true",
    "es.nodes.wan.only":"true",
    "es.nodes":"es"
  },

But still no luck, any ideas?

@IlCingalese
Copy link

IlCingalese commented Feb 8, 2019 via email

@IlCingalese
Copy link

IlCingalese commented Feb 8, 2019 via email

@joshhornby
Copy link

joshhornby commented Feb 8, 2019

Hi @IlCingalese

Thanks for the response. Although still no luck:

  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer": "64kb",
    "es.index.auto.create": "true"
  },

And then running:

pio train -- --driver-memory 4g --executor-memory 4g --conf spark.es.nodes.wan.only=true

or

pio train -- --driver-memory 4g --executor-memory 4g

Still results in the same error.

[ERROR] [NetworkClient] Node [localhost:9200] failed (Connection refused (Connection refused)); no other nodes left - aborting...
[INFO] [AbstractConnector] Stopped Spark@73c48264{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

@joshhornby
Copy link

Can you confirm what the pio-env.sh should be?

@IlCingalese
Copy link

IlCingalese commented Feb 8, 2019 via email

@joshhornby
Copy link

PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-5.6.9

Although I'm not sure if Docker is correctly pulling in this file.

Looking through the output from the CLI I can see:

[INFO] [Runner$] Submission command: /usr/share/spark-2.2.3-bin-hadoop2.7/bin/spark-submit --driver-memory 4g --executor-memory 4g --conf spark.es.nodes.wan.only=true --class org.apache.predictionio.workflow.CreateWorkflow --jars file:/usr/share/predictionio/lib/postgresql-42.2.4.jar,file:/templates/universal-recommender/target/scala-2.11/universal-recommender-assembly-0.7.3-deps.jar,file:/templates/universal-recommender/target/scala-2.11/universal-recommender_2.11-0.7.3.jar,file:/usr/share/predictionio/lib/spark/pio-data-jdbc-assembly-0.13.0.jar,file:/usr/share/predictionio/lib/spark/pio-data-hdfs-assembly-0.13.0.jar,file:/usr/share/predictionio/lib/spark/pio-data-elasticsearch-assembly-0.13.0.jar,file:/usr/share/predictionio/lib/spark/pio-data-localfs-assembly-0.13.0.jar,file:/usr/share/predictionio/lib/spark/pio-data-hbase-assembly-0.13.0.jar,file:/usr/share/predictionio/lib/spark/pio-data-s3-assembly-0.13.0.jar --files file:/etc/predictionio/log4j.properties --driver-class-path /etc/predictionio:/usr/share/predictionio/lib/postgresql-42.2.4.jar:/usr/share/predictionio/lib/mysql-connector-java-8.0.12.jar --driver-java-options -Dpio.log.dir=/var/log/predictionio file:/usr/share/predictionio/lib/pio-assembly-0.13.0.jar --engine-id com.actionml.RecommendationEngine --engine-version 9f1b62f2fb4487a817672952d831b2ea9f46f65a --engine-variant file:/templates/universal-recommender/engine.json --verbosity 0 --json-extractor Both --env PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/work/pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=elasticsearch,PIO_HOME=/usr/share/predictionio,PIO_FS_ENGINESDIR=/work/pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/work/pio_store/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_FS_TMPDIR=/work/pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=ELASTICSEARCH,PIO_CONF_DIR=/etc/predictionio,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs

@IlCingalese
Copy link

IlCingalese commented Feb 8, 2019 via email

@joshhornby
Copy link

Running jps -l from inside the docker container returns:

129 org.apache.predictionio.tools.console.Console
451 sun.tools.jps.Jps

Although it's expected that no elasticsearch container is running inside this box, I can confirm the elasticsearch box is running:

80131dca9e35        docker.elastic.co/elasticsearch/elasticsearch:5.6.4   "/bin/bash bin/es-do…"   4 hours ago         Up 12 minutes              9200/tcp, 9300/tcp                               docker_elasticsearch_1

@IlCingalese
Copy link

IlCingalese commented Feb 8, 2019 via email

@lucacanella
Copy link

lucacanella commented Feb 20, 2019

Same problem here. Elasticsearch docker container is up and running; it responds correctly when queried with curl or node, or wget.

My spark conf:

"sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer": "1024",
    "es.nodes": "elasticsearch",
    "es.port": "9200",
    "es.index.auto.create": "true",
    "es.nodes.wan.only": "true"
  }

When i run pio train i get this error twice:

[ERROR] [ESEngineInstances] Failed to update pio_meta/engine_instances/AWkLF89GASGnnVc278Ds

And some of these warnings:

[WARN] [EsInputFormat] Cannot determine task id...

In the submission command I see this env vars:

PIO_ENV_LOADED=1
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=elasticsearch
PIO_HOME=/usr/share/predictionio
PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://postgres/pio
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio
PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=ELASTICSEARCH
PIO_CONF_DIR=/etc/predictionio
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200

@pshoukry
Copy link

pshoukry commented Apr 1, 2019

@thisismana to fix this issue you need to set wan.only on the hadoop client in PIO and in UR please check those 2 pull requests:

PredictionIO FIX
UR FIX

specially: here and here

@holodazoltan
Copy link

I had the same Failed to update pio_meta/engine_instance error message when i started pio with docker compose. Then edit UR engine.json file:

  • add spark.driver-memory property
  • add spark.executor-memory property
  • add es.nodes property
  • remove spark.kryoserializer.buffer property
"sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "es.index.auto.create": "true",
    "spark.driver-memory": "4g",
    "spark.executor-memory": "4g",
    "es.nodes":"elasticsearch"
  }

I used PredictionIO v0.13.0 and UR v0.7.3. pio+UR work like a charm with any offical datasource configuration.

@happinessandlove
Copy link

I had the same Failed to update pio_meta/engine_instance error message when i started pio with docker compose. Then edit UR engine.json file:

  • add spark.driver-memory property
  • add spark.executor-memory property
  • add es.nodes property
  • remove spark.kryoserializer.buffer property
"sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "es.index.auto.create": "true",
    "spark.driver-memory": "4g",
    "spark.executor-memory": "4g",
    "es.nodes":"elasticsearch"
  }

I used PredictionIO v0.13.0 and UR v0.7.3. pio+UR work like a charm with any offical datasource configuration.

Thank you very much. Your answer fixed my problem.

@happinessandlove
Copy link

I have the same error. I have elasticsearch and predictionio in two docker containers. in the end i get Failed to update pio_meta/engine_instances however, adding wan.only did not help. did you also use something else?

[INFO] [URModel] ES fields[3]: List(popRank, read, id)
[INFO] [EsClient$] Create new index: urindex_1542467924140, items, List(popRank, read, id), Map(popRank -> (float,false), read -> (keyword,true))
[INFO] [EsClient$] Number of ES connections for saveToEs: 4
[INFO] [Engine$] org.apache.predictionio.data.storage.NullModel does not support data sanity check. Skipping check.
[INFO] [Engine$] EngineWorkflow.train completed
[INFO] [Engine] engineInstanceId=AWciPZF84YvwTDFicADa
[INFO] [CoreWorkflow$] Inserting persistent model
[INFO] [CoreWorkflow$] Updating engine instance
[ERROR] [ESEngineInstances] Failed to update pio_meta/engine_instances/AWciPZF84YvwTDFicADa
[INFO] [CoreWorkflow$] Training completed successfully.
[INFO] [AbstractConnector] Stopped Spark@59c500f7{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}

I have the same problem as you. Could you tell me your solution?

@andresviikmaa
Copy link

andresviikmaa commented Apr 24, 2019

Quick and dirty solution is to forward localhost 9200 port into one of the elasticsearch node.
For example by running this
socat tcp-listen:9200,fork tcp:elasticsearch:9200 &

and no changes are needed in sparkConf

But the real problem is: that sparkConf is saved into pio_meta_engine_instances index
and if there was es.nodes.wan present then elastic cannot write string value into es.nodes as it conflicts with already created mapping type
MapperParsingException: object mapping for [sparkConf.es.nodes] tried to parse field [ es.nodes] as object, but found a concrete value

@twshiels
Copy link

What worked for me:

pio train -- --driver-memory 6g --executor-memory 6g --conf spark.es.nodes.wan.only=true --conf spark.es.nodes=my_es_server_name:9200

Also, in the engine.json, my sparkConf:

"sparkConf": {
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
"spark.kryo.referenceTracking": "false",
"spark.kryoserializer.buffer": "300m",
"es.index.auto.create": "true",
"spark.master": "yarn",
"spark.submit.deployMode": "client"
}

Hope that helps someone...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants