-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error during training when using remote ElasticSearch #57
Comments
The solution is not to put the
This fixes my problem but still — the ElasticSearch-Mapping is problematic, since there are many more options available to the |
I have the same error. I have elasticsearch and predictionio in two docker containers. in the end i get
|
you can pass "es.nodes.wan.only":"true" in sparkconf. |
Same issue for me. When following the docs to run Docker:
And trying to run I've tried both:
and
But still no luck, any ideas? |
Remove "es.nodes":"es" from engine.json... You can set es nodes only in
prediction config file "pio-env.sh"
Il giorno ven 8 feb 2019, 14:21 Josh Hornby <[email protected]> ha
scritto:
… Same issue for me.
When following the docs to run Docker:
docker-compose -f docker-compose.yml \
-f elasticsearch/docker-compose.base.yml \
-f elasticsearch/docker-compose.meta.yml \
-f elasticsearch/docker-compose.event.yml \
-f localfs/docker-compose.model.yml \
up
And trying to run pio train I get no other nodes left - aborting...Cannot
detect ES version - typically this happens if the network/Elasticsearch
cluster is not accessible or when targeting a WAN/Cloud instance without
the proper setting 'es.nodes.wan.only'
I've tried both:
pio train -- --driver-memory 4g --executor-memory 4g --conf
spark.es.nodes.wan.only=true
and
"sparkConf": {
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
"spark.kryo.referenceTracking": "false",
"spark.kryoserializer.buffer": "300m",
"es.index.auto.create": "true",
"es.nodes.wan.only":"true",
"es.nodes":"es"
},
But still no luck, any ideas?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#57 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXkosPK1fJELuxQklaiLp1TBrFtqtNAQks5vLXnDgaJpZM4WYfYC>
.
|
Sry i forget another option..
Change this
"spark.kryoserializer.buffer": "300m"
Into
"spark.kryoserializer.buffer.mb": "300"
I think 300mb is too much.. You can set it to 64 kb
"spark.kryoserializer.buffer.kb": "64"
Il giorno ven 8 feb 2019, 14:21 Josh Hornby <[email protected]> ha
scritto:
… Same issue for me.
When following the docs to run Docker:
docker-compose -f docker-compose.yml \
-f elasticsearch/docker-compose.base.yml \
-f elasticsearch/docker-compose.meta.yml \
-f elasticsearch/docker-compose.event.yml \
-f localfs/docker-compose.model.yml \
up
And trying to run pio train I get no other nodes left - aborting...Cannot
detect ES version - typically this happens if the network/Elasticsearch
cluster is not accessible or when targeting a WAN/Cloud instance without
the proper setting 'es.nodes.wan.only'
I've tried both:
pio train -- --driver-memory 4g --executor-memory 4g --conf
spark.es.nodes.wan.only=true
and
"sparkConf": {
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
"spark.kryo.referenceTracking": "false",
"spark.kryoserializer.buffer": "300m",
"es.index.auto.create": "true",
"es.nodes.wan.only":"true",
"es.nodes":"es"
},
But still no luck, any ideas?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#57 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXkosPK1fJELuxQklaiLp1TBrFtqtNAQks5vLXnDgaJpZM4WYfYC>
.
|
Hi @IlCingalese Thanks for the response. Although still no luck:
And then running:
or
Still results in the same error.
|
Can you confirm what the |
Are you using prediction.io? If yes it's the relative configuration file
where you must set the elasticsearch server.
You find it in your prediction.io installation dir under the folder config.
Il giorno ven 8 feb 2019, 15:33 Josh Hornby <[email protected]> ha
scritto:
… Can you confirm what the pio-env.sh should be?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#57 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXkosLirWwRgi-bI_HshO7vx3Fsnr9xGks5vLYq0gaJpZM4WYfYC>
.
|
Although I'm not sure if Docker is correctly pulling in this file. Looking through the output from the CLI I can see:
|
If you run "jps -l" in the console do you see the elasticsearch process?
Il giorno ven 8 feb 2019, 15:42 Josh Hornby <[email protected]> ha
scritto:
… PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-5.6.9
Although I'm not sure if Docker is correctly pulling in this file.
Looking through the output from the CLI I can see:
[INFO] [Runner$] Submission command: /usr/share/spark-2.2.3-bin-hadoop2.7/bin/spark-submit --driver-memory 4g --executor-memory 4g --conf spark.es.nodes.wan.only=true --class org.apache.predictionio.workflow.CreateWorkflow --jars file:/usr/share/predictionio/lib/postgresql-42.2.4.jar,file:/templates/universal-recommender/target/scala-2.11/universal-recommender-assembly-0.7.3-deps.jar,file:/templates/universal-recommender/target/scala-2.11/universal-recommender_2.11-0.7.3.jar,file:/usr/share/predictionio/lib/spark/pio-data-jdbc-assembly-0.13.0.jar,file:/usr/share/predictionio/lib/spark/pio-data-hdfs-assembly-0.13.0.jar,file:/usr/share/predictionio/lib/spark/pio-data-elasticsearch-assembly-0.13.0.jar,file:/usr/share/predictionio/lib/spark/pio-data-localfs-assembly-0.13.0.jar,file:/usr/share/predictionio/lib/spark/pio-data-hbase-assembly-0.13.0.jar,file:/usr/share/predictionio/lib/spark/pio-data-s3-assembly-0.13.0.jar --files file:/etc/predictionio/log4j.properties --driver-class-path /etc/predictionio:/usr/share/predictionio/lib/postgresql-42.2.4.jar:/usr/share/predictionio/lib/mysql-connector-java-8.0.12.jar --driver-java-options -Dpio.log.dir=/var/log/predictionio file:/usr/share/predictionio/lib/pio-assembly-0.13.0.jar --engine-id com.actionml.RecommendationEngine --engine-version 9f1b62f2fb4487a817672952d831b2ea9f46f65a --engine-variant file:/templates/universal-recommender/engine.json --verbosity 0 --json-extractor Both --env PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/work/pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=elasticsearch,PIO_HOME=/usr/share/predictionio,PIO_FS_ENGINESDIR=/work/pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/work/pio_store/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_FS_TMPDIR=/work/pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=ELASTICSEARCH,PIO_CONF_DIR=/etc/predictionio,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#57 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXkosBTOjz7PHyTCWzPetzNzCd3Lyb24ks5vLYy5gaJpZM4WYfYC>
.
|
Running
Although it's expected that no elasticsearch container is running inside this box, I can confirm the elasticsearch box is running:
|
I' m sorry but i didn't use docker so i cant help you. But now the problem
is that your es service is not reachable
Il giorno ven 8 feb 2019, 15:56 Josh Hornby <[email protected]> ha
scritto:
… Running jps -l from inside the docker container returns:
129 org.apache.predictionio.tools.console.Console
451 sun.tools.jps.Jps
Although it's expected that no elasticsearch container is running inside
this box, I can confirm the elasticsearch box is running:
80131dca9e35 docker.elastic.co/elasticsearch/elasticsearch:5.6.4 "/bin/bash bin/es-do…" 4 hours ago Up 12 minutes 9200/tcp, 9300/tcp docker_elasticsearch_1
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#57 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXkosDJo7c5hfG22v8DzMVy9iOIW1t9Qks5vLZAdgaJpZM4WYfYC>
.
|
Same problem here. Elasticsearch docker container is up and running; it responds correctly when queried with curl or node, or wget. My spark conf: "sparkConf": {
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
"spark.kryo.referenceTracking": "false",
"spark.kryoserializer.buffer": "1024",
"es.nodes": "elasticsearch",
"es.port": "9200",
"es.index.auto.create": "true",
"es.nodes.wan.only": "true"
} When i run
And some of these warnings:
In the submission command I see this env vars:
|
@thisismana to fix this issue you need to set wan.only on the hadoop client in PIO and in UR please check those 2 pull requests: |
I had the same Failed to update pio_meta/engine_instance error message when i started pio with docker compose. Then edit UR engine.json file:
I used PredictionIO v0.13.0 and UR v0.7.3. pio+UR work like a charm with any offical datasource configuration. |
Thank you very much. Your answer fixed my problem. |
I have the same problem as you. Could you tell me your solution? |
Quick and dirty solution is to forward localhost 9200 port into one of the elasticsearch node. and no changes are needed in sparkConf But the real problem is: that sparkConf is saved into pio_meta_engine_instances index |
What worked for me: pio train -- --driver-memory 6g --executor-memory 6g --conf spark.es.nodes.wan.only=true --conf spark.es.nodes=my_es_server_name:9200 Also, in the engine.json, my sparkConf: "sparkConf": { Hope that helps someone... |
I was trying to set up a PIO server with a remote ES and remote HBASE/Zookeeper via Docker.
versions used:
Here is my config:
pio-env.sh
:PIO itself seems to be running fine, here is the output of
pio status
:It seems as if the universal recommender does not pick up the PIO storage settings; but keeps his own settings. Running the
integration tests
it is using the templateexamples/handmade-engine.json
where I added two lines within thesparkConf
object (es.nodes
andes.nodes.wan.only
):It seems to be talking to the right ES server, but I always get the following exception during the training (
pio train -- --driver-memory 4g --executor-memory 4g
) phase:This does not happen when I start a local ES on the same machine where PIO is located (using the original
engine.json
)The text was updated successfully, but these errors were encountered: