Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with libtorch_cpu library (Pytorch) error: "si_signo 4, si_code: 2, si_errno: 0" #116756

Open
ventaubain opened this issue Nov 13, 2024 · 0 comments
Labels
>bug needs:triage Requires assignment of a team area label

Comments

@ventaubain
Copy link

ventaubain commented Nov 13, 2024

Elasticsearch Version

8.15.3

Installed Plugins

No response

Java Version

bundled

OS Version

docker elasticsearch:8.15.3

Problem Description

I deployed a .elser_model_2_linux-x86_64 model (Elastic Learned Sparse EncodeR v2 optimized for linux-x86_64 version 12.0.0) on a ML node for using Knowledge Base. During the setup of the knowledge base, the task crashed due to a problem with the inference of the model. Before the inference, the model is declared as "started" but becomes "failed" after inference.
My version is 8.15.3 (from Docker - Linux) and my CPU is Intel(R) Xeon(R) CPU E7540.
The problem seems to be linked to pytorch library but I don't have any idea to fix that.

Related issue: #106206

Steps to Reproduce

Logs (if relevant)

{"@timestamp":"2024-11-13T16:34:25.473Z", "log.level":"ERROR", "message":"[.elser_model_2_linux-x86_64] pytorch_inference/83613 process stopped unexpectedly: Fatal error: 'si_signo 4, si_code: 2, si_errno: 0, address: 0x7deb66c697df, library: /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/../lib/libtorch_cpu.so, base: 0x7deb605df000, normalized address: 0x668a7df', version: 8.15.3 (build 44a990dc4c07de)\n", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[XXX][ml_native_inference_comms][T#3]","log.logger":"org.elasticsearch.xpack.ml.process.AbstractNativeProcess","elasticsearch.cluster.uuid":"wIMpuv9HQwSIikUkvahZLg","elasticsearch.node.id":"5_7J8bnFT2iEeArkoUWVpg","elasticsearch.node.name":"XXX","elasticsearch.cluster.name":"XXX"}
{"@timestamp":"2024-11-13T16:34:25.474Z", "log.level":"ERROR", "message":"[.elser_model_2_linux-x86_64] inference process crashed due to reason [[.elser_model_2_linux-x86_64] pytorch_inference/83613 process stopped unexpectedly: Fatal error: 'si_signo 4, si_code: 2, si_errno: 0, address: 0x7deb66c697df, library: /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/../lib/libtorch_cpu.so, base: 0x7deb605df000, normalized address: 0x668a7df', version: 8.15.3 (build 44a990dc4c07de)\n]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[XXX[ml_native_inference_comms][T#3]","log.logger":"org.elasticsearch.xpack.ml.inference.deployment.DeploymentManager","elasticsearch.cluster.uuid":"wIMpuv9HQwSIikUkvahZLg","elasticsearch.node.id":"5_7J8bnFT2iEeArkoUWVpg","elasticsearch.node.name":"XXX","elasticsearch.cluster.name":"XXX"}
{"@timestamp":"2024-11-13T16:34:25.473Z", "log.level":"ERROR", "message":"[.elser_model_2_linux-x86_64] Error processing results", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[XXX][ml_native_inference_comms][T#1]","log.logger":"org.elasticsearch.xpack.ml.inference.pytorch.process.PyTorchResultProcessor","elasticsearch.cluster.uuid":"wIMpuv9HQwSIikUkvahZLg","elasticsearch.node.id":"5_7J8bnFT2iEeArkoUWVpg","elasticsearch.node.name":"XXX","elasticsearch.cluster.name":"XXX","error.type":"org.elasticsearch.xcontent.XContentEOFException","error.message":"[3:1] Unexpected end of file","error.stack_trace":"org.elasticsearch.xcontent.XContentEOFException: [3:1] Unexpected end of file\n\tat [email protected]/org.elasticsearch.xcontent.provider.json.JsonXContentParser.nextToken(JsonXContentParser.java:61)\n\tat [email protected]/org.elasticsearch.xpack.ml.process.ProcessResultsParser$ResultIterator.hasNext(ProcessResultsParser.java:70)\n\tat [email protected]/org.elasticsearch.xpack.ml.inference.pytorch.process.PyTorchResultProcessor.process(PyTorchResultProcessor.java:105)\n\tat [email protected]/org.elasticsearch.xpack.ml.inference.deployment.DeploymentManager.lambda$startDeployment$2(DeploymentManager.java:180)\n\tat [email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1570)\nCaused by: com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Array (start marker at [Source: (FileInputStream); line: 2, column: 1])\n at [Source: (FileInputStream); line: 3, column: 1]\n\tat [email protected]/com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:697)\n\tat [email protected]/com.fasterxml.jackson.core.base.ParserBase._handleEOF(ParserBase.java:512)\n\tat [email protected]/com.fasterxml.jackson.core.base.ParserBase._eofAsNextChar(ParserBase.java:529)\n\tat [email protected]/com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd2(UTF8StreamJsonParser.java:3175)\n\tat [email protected]/com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:3145)\n\tat [email protected]/com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:757)\n\tat [email protected]/org.elasticsearch.xcontent.provider.json.JsonXContentParser.nextToken(JsonXContentParser.java:58)\n\t... 7 more\n"}

@ventaubain ventaubain added >bug needs:triage Requires assignment of a team area label labels Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug needs:triage Requires assignment of a team area label
Projects
None yet
Development

No branches or pull requests

1 participant