We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Since Spark-NLP 2.7.2 + loading Check_spelling_dl crashes Building a pipeline with ContextSpellCheckerModel.pretrained() works fine
ContextSpellCheckerModel.pretrained()
import sparknlp from sparknlp.annotator import * from sparknlp.common import * from sparknlp.base import * from pyspark.ml import Pipeline from sparknlp.pretrained import PretrainedPipeline, LightPipeline spark = sparknlp.start() document_assembler = DocumentAssembler() \ .setInputCol("text") \ .setOutputCol("document") tokenizer = Tokenizer()\ .setInputCols(["document"]) \ .setOutputCol("token") spell = ContextSpellCheckerModel.pretrained()\ .setInputCols(["token"]) \ .setOutputCol("spell") nlp_pipeline = Pipeline(stages=[document_assembler, tokenizer, spell]) data = [ {"text": 'Some text hello world'}, ] df = spark.createDataFrame(data) nlp_pipeline.fit(df).transform(df).show()
import sparknlp from sparknlp.annotator import * from sparknlp.common import * from sparknlp.base import * from pyspark.ml import Pipeline from sparknlp.pretrained import PretrainedPipeline, LightPipeline spark = sparknlp.start() pipeline = PretrainedPipeline('check_spelling_dl', lang='en') data = [ {"text": 'Some text hello world'}, ] df = spark.createDataFrame(data) pipeline.transform(df).show()
https://colab.research.google.com/drive/1QpV7RYj65DXJQm2xxB1s2o_6J88yB8-n?usp=sharing
check_spelling_dl download started this may take some time. Approx size to download 112.1 MB [OK!] --------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) <ipython-input-2-f7b1aa24d037> in <module>() 10 11 spark = sparknlp.start() ---> 12 pipeline = PretrainedPipeline('check_spelling_dl', lang='en') 13 data = [ {"text": 'Some text hello world'}, ] 14 df = spark.createDataFrame(data) 8 frames /usr/local/lib/python3.6/dist-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". --> 328 format(target_id, ".", name), value) 329 else: 330 raise Py4JError( Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 12.0 failed 1 times, most recent failure: Lost task 1.0 in stage 12.0 (TID 23, localhost, executor driver): java.lang.ArrayStoreException: java.lang.Byte at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90) at scala.Array$.slowcopy(Array.scala:81) at scala.Array$.copy(Array.scala:107) at scala.collection.mutable.ResizableArray$class.copyToArray(ResizableArray.scala:77) at scala.collection.mutable.ArrayBuffer.copyToArray(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:278) at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:104) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:286) at scala.collection.AbstractTraversable.toArray(Traversable.scala:104) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) at scala.collection.AbstractIterator.toArray(Iterator.scala:1334) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.collect(RDD.scala:944) at com.johnsnowlabs.nlp.serialization.TransducerFeature.deserializeObject(Feature.scala:281) at com.johnsnowlabs.nlp.serialization.Feature.deserialize(Feature.scala:47) at com.johnsnowlabs.nlp.FeaturesReader$$anonfun$load$1.apply(ParamsAndFeaturesReadable.scala:15) at com.johnsnowlabs.nlp.FeaturesReader$$anonfun$load$1.apply(ParamsAndFeaturesReadable.scala:14) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:14) at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:8) at org.apache.spark.ml.util.DefaultParamsReader$.loadParamsInstance(ReadWrite.scala:652) at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$4.apply(Pipeline.scala:274) at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$4.apply(Pipeline.scala:272) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:272) at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348) at org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342) at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:379) at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadPipeline(ResourceDownloader.scala:373) at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadPipeline(ResourceDownloader.scala:479) at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadPipeline(ResourceDownloader.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ArrayStoreException: java.lang.Byte at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90) at scala.Array$.slowcopy(Array.scala:81) at scala.Array$.copy(Array.scala:107) at scala.collection.mutable.ResizableArray$class.copyToArray(ResizableArray.scala:77) at scala.collection.mutable.ArrayBuffer.copyToArray(ArrayBuffer.scala:48) at scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:278) at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:104) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:286) at scala.collection.AbstractTraversable.toArray(Traversable.scala:104) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) at scala.collection.AbstractIterator.toArray(Iterator.scala:1334) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more
The text was updated successfully, but these errors were encountered:
I think @albertoandreottiATgmail was training new models based on the new graph introduced in 2.7.2, but may have not been published yet.
Sorry, something went wrong.
albertoandreottiATgmail
maziyarpanahi
No branches or pull requests
Since Spark-NLP 2.7.2 + loading Check_spelling_dl crashes
Building a pipeline with
ContextSpellCheckerModel.pretrained()
works fineSuccessfully tested in the following versions of Spark NLP
Crashes on the following versions Spark NLP
This runs fine
Based on this snippet
Colab link for reproduction
https://colab.research.google.com/drive/1QpV7RYj65DXJQm2xxB1s2o_6J88yB8-n?usp=sharing
Results in the following Error message :
The text was updated successfully, but these errors were encountered: