-
Notifications
You must be signed in to change notification settings - Fork 40
"Frame length should be positive" problem in XGBoost with CPU (Mortgage-large) #71
Comments
Hi, You could also try some other parameter set like: "--num-executors 1 --executor-cores 1 --conf spark.task.cpus=1 -numWorkers=19 -nthread=1 treeMethod=hist". For hanging problem, there's a "timeout_request_workers" that may help(but not always). This parameter could reduce the hangi time when your app couldn't get enough resources from Spark. There're also some other possibilities that the program will hang. To see where it hangs, you could go to Spark's web UI, jump into "Executors" to see the "Thread Dump" |
hi, |
Hi, I guess it's probably you used the wrong version of your cudf jar. You should choose the right version according to your CUDA version. e.g. mvn package -Dcuda.classifier=cuda10, if your cuda is 10.0. You could see your cuda version by "cat /usr/local/cuda/version.txt" |
Thanks your tip! |
dear author,
I came across this article "https://github.com/rapidsai/spark-examples/blob/master/getting-started-guides/on-prem-cluster/standalone-scala.md".
When i launch distributed training without GPUs (tree method hist), the parameters setting by following: "--num-executors 1 --executor-cores 19 --conf spark.cores.max=19 --conf spark.task.cpus=1 --class ai.rapids.spark.examples.mortgage.CPUMain -numWorkers=19 -treeMethod=hist"
However, tasks of the stage "foreachPartition at XGBoost.scala:703" always blocked in "running". In a few hours after submitted the job, we obtained the feeback:
java.lang.IllegalArgumentException: Frame length should be positive: -9223371863126827765 at org.spark_project.guava.base.Preconditions.checkArgument(Preconditions.java:119) at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:134) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:81) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:748)
Could you please come up some tips about this issue? Thanks
sincerely
The text was updated successfully, but these errors were encountered: