You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SpeechRecognition is a massive dependency, like pocketsphinx. And possibly others too. Making those dependencies "extra" would remove a lot distribution load, burden and install errors.
Anyway one wouldn't expect a tool "textract" to run complex AI recognition tools just so light-mindedly - which are instable and non-deterministic. One wouldn't use those in serious projects. Usually such file types need to be filtered before letting textract try.
So these massiv AI libraries should better all become "extra" dependencies at least.
The text was updated successfully, but these errors were encountered:
I came here to suggest the same thing: It would be great if textract was more lightweight by default. I only need something to extract text from common document formats such as .pdf, .rtf, .docx. The dependency on SpeechRecognition is problematic because its massive size greatly slows down build time of our project and increases the size of the resulting Docker image substantially.
As @kxrob suggested, the dependency could be moved to "extra" and the tool could provide clear instructions if the package is unavailable when trying to extract text from an audio file, e.g. "Extracting text from audio files is an optional feature. Please run pip install SpeechRecognition~=3.8.1".
SpeechRecognition is a massive dependency, like pocketsphinx. And possibly others too. Making those dependencies "extra" would remove a lot distribution load, burden and install errors.
Anyway one wouldn't expect a tool "textract" to run complex AI recognition tools just so light-mindedly - which are instable and non-deterministic. One wouldn't use those in serious projects. Usually such file types need to be filtered before letting textract try.
So these massiv AI libraries should better all become "extra" dependencies at least.
The text was updated successfully, but these errors were encountered: