Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TOREE-540] Fix deadlock on closing ZMQ by upgrading jeromq to 0.5.3 #205

Merged
merged 2 commits into from
Aug 6, 2023

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Aug 6, 2023

This PR aims to upgrade the jeromq to 0.5.3 to fix a deadlock issue zeromq/jeromq#919 on closing ZMQ.

I observed the unit tests have a chance to stuck infinitely.

e.g. https://github.com/apache/incubator-toree/actions/runs/5771673473/job/15645873429?pr=203

I also reproduced it on macOS, and the full stack of thread is

"pool-8-thread-2-ScalaTest-running-ZeroMQSocketRunnableSpec" #121 prio=5 os_prio=31 tid=0x000000011dff8800 nid=0xf81f runnable [0x000000017459b000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
        at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
        at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
        - locked <0x00000006f2d5eb48> (a sun.nio.ch.Util$3)
        - locked <0x00000006f2d5eb60> (a java.util.Collections$UnmodifiableSet)
        - locked <0x000000070d174c10> (a sun.nio.ch.KQueueSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
        at zmq.Signaler.waitEvent(Signaler.java:130)
        at zmq.Mailbox.recv(Mailbox.java:90)
        at zmq.Ctx.terminate(Ctx.java:249)
        at org.zeromq.ZMQ$Context.term(ZMQ.java:357)
        at org.zeromq.ZMQ$Context.close(ZMQ.java:426)
        at org.apache.toree.communication.socket.ZeroMQSocketRunnableSpec.$anonfun$new$3(ZeroMQSocketRunnableSpec.scala:63)
        at org.apache.toree.communication.socket.ZeroMQSocketRunnableSpec$$Lambda$8298/275687375.apply$mcV$sp(Unknown Source)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.util.Try$.apply(Try.scala:213)
        at org.apache.toree.communication.socket.ZeroMQSocketRunnableSpec.$anonfun$new$2(ZeroMQSocketRunnableSpec.scala:63)
        at org.apache.toree.communication.socket.ZeroMQSocketRunnableSpec$$Lambda$8103/639565717.apply(Unknown Source)
        at org.scalatest.BeforeAndAfter.$anonfun$runTest$1(BeforeAndAfter.scala:219)
        at org.scalatest.BeforeAndAfter$$Lambda$8296/1165993060.apply$mcV$sp(Unknown Source)
        at org.scalatest.Status.$anonfun$withAfterEffect$1(Status.scala:379)
        at org.scalatest.Status.$anonfun$withAfterEffect$1$adapted(Status.scala:375)
        at org.scalatest.Status$$Lambda$8297/1458487907.apply(Unknown Source)
        at org.scalatest.FailedStatus$.whenCompleted(Status.scala:497)
        at org.scalatest.Status.withAfterEffect(Status.scala:375)
        at org.scalatest.Status.withAfterEffect$(Status.scala:373)
        at org.scalatest.FailedStatus$.withAfterEffect(Status.scala:469)
        at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:217)
        at org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192)
        at org.apache.toree.communication.socket.ZeroMQSocketRunnableSpec.runTest(ZeroMQSocketRunnableSpec.scala:28)
        at org.scalatest.FunSpecLike.$anonfun$runTests$1(FunSpecLike.scala:498)
        at org.scalatest.FunSpecLike$$Lambda$7954/1683415490.apply(Unknown Source)
        at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
        at org.scalatest.SuperEngine$$Lambda$7955/1549536483.apply(Unknown Source)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
        at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:370)
        at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:407)
        at org.scalatest.SuperEngine$$Lambda$7955/1549536483.apply(Unknown Source)

TOREE-326 also reported a test stuck issue, which may be caused by the same issue.

@pan3793 pan3793 changed the title [TOREE-540] Fix deadlock on closing by upgrading jeromq to 0.5.3 [TOREE-540] Fix deadlock on closing ZMQ by upgrading jeromq to 0.5.3 Aug 6, 2023
@lresende lresende merged commit 47970f5 into apache:master Aug 6, 2023
2 checks passed
@pan3793 pan3793 deleted the TOREE-540 branch August 6, 2023 21:29
@pan3793
Copy link
Member Author

pan3793 commented Aug 7, 2023

Seems the test still has a chance to stuck, needs further investigations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants