Replies: 1 comment 1 reply
-
Hi @JGAUG26, I'm having the same issue, how do you resolve it? Thanks! |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm encountering a binary incompatibility issue between scikit-learn and numpy while running a script in a Python environment. The error occurs when trying to import CountVectorizer from sklearn.feature_extraction.text. Below is the error traceback:
Traceback (most recent call last):
File "/tmp/dolphinscheduler/exec/process/root/1/14927164157472_1/564/683/py_564_683.py", line 13, in
from sklearn.feature_extraction.text import CountVectorizer
File "/scity/miniconda3/envs/dp-pdfocr/lib/python3.9/site-packages/sklearn/init.py", line 82, in
from .base import clone
File "/scity/miniconda3/envs/dp-pdfocr/lib/python3.9/site-packages/sklearn/base.py", line 17, in
from .utils import _IS_32BIT
File "/scity/miniconda3/envs/dp-pdfocr/lib/python3.9/site-packages/sklearn/utils/init.py", line 19, in
from .murmurhash import murmurhash3_32
File "sklearn/utils/murmurhash.pyx", line 1, in init sklearn.utils.murmurhash
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
Environment Details:
numpy version: 1.26.4
scikit-learn version: 1.0
Python version: 3.9
OS: Linux (running on server via dolphinscheduler)
I have tried upgrading scikit-learn and downgrading numpy to older versions, but the issue persists. I've also attempted recompiling scikit-learn using --no-binary :all: but that did not resolve the problem.
Code Snippet (for context):
Here's a simplified version of the operator I'm running, which uses whisper for audio transcription (note that the issue does not directly relate to this code but rather to the environment setup):
import os
import whisper
def read_audio_file(file_path):
"""Reads the audio file."""
with open(file_path, "rb") as file:
return file.read()
def write_text_file(file_path, text):
"""Writes the transcribed text to the output file."""
with open(file_path, "w", encoding="utf-8") as file:
file.write(text)
def clean_files(nas_source_path, nas_converted_path, config: dict):
"""Processes audio files and transcribes them using Whisper."""
files = os.listdir(nas_source_path)
Can you suggest how I can resolve this binary incompatibility issue between numpy and scikit-learn? Any guidance or suggestions would be greatly appreciated. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions