Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in TestJlmRemoteThreadAuth and TestJlmRemoteThreadNoAuth with JITServer #20800

Open
cjjdespres opened this issue Dec 10, 2024 · 7 comments
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project test failure

Comments

@cjjdespres
Copy link
Contributor

cjjdespres commented Dec 10, 2024

Failure link

https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk11_j9_sanity.system_x86-64_linux_jit_Personal_testList_2/1191/

Failing in TestJlmRemoteThreadAuth_0 and TestJlmRemoteThreadNoAuth_0 at least.

Optional info

The failures seem to have started in the JITServer nightly tests starting Dec 6, based on a preliminary search through the failed tests. I'll look a bit more to confirm, but if so, then this may have been introduced by a commit on Dec 5 in openj9 or Dec 3 in omr.

The exact ranges are ba6e625...78500d8 and eclipse-openj9/openj9-omr@82df57f...b893b2b

Failure output (captured from console output)

Retrieved from the saved stderr of one of the failed processes:

Unhandled exception
Type=Segmentation error vmState=0x0005ffff
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007FAECABD1AC0 Handler2=00007FAECAB2B7D0 InaccessibleAddress=0000000000000207
RDI=00007FAEC40FCA14 RSI=0000000000513F00 RAX=0000000000513F00 RBX=00007FAEC83ACA0C
RCX=00007FAEC83AF6D0 RDX=00000000000001F5 R8=0000000000513F00 R9=0000000000000000
R10=0000000000000000 R11=0000000000000246 R12=00007FAEC83AE610 R13=00007FAEC4146B30
R14=00007FAEA9C09FA0 R15=0000000000513F00
RIP=00007FAEC96322BA GS=0000 FS=0000 RSP=00007FAEC83AC910
EFlags=0000000000010202 CS=0033 RBP=00007FAECA0CC5A0 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000207
xmm0=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm1=0000000000044d00 (f: 281856.000000, d: 1.392554e-318)
xmm2=00007faea8f56754 (f: 2834654976.000000, d: 6.936095e-310)
xmm3=2f746e656d656761 (f: 1835362176.000000, d: 4.307803e-80)
xmm4=000000000007d700 (f: 513792.000000, d: 2.538470e-318)
xmm5=00007faec40f004a (f: 3289317376.000000, d: 6.936118e-310)
xmm6=0000736b6e756874 (f: 1853188224.000000, d: 6.269953e-310)
xmm7=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm8=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm9=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm10=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm11=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm12=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13=00007f0afde00060 (f: 4259315712.000000, d: 6.901365e-310)
xmm14=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15=00007faeb8014cc0 (f: 3087092992.000000, d: 6.936108e-310)
Module=/home/jenkins/workspace/Test_openjdk8_j9_sanity.system_x86-64_linux_jit_Personal_testList_1/jdkbinary/j2sdk-image/jre/lib/amd64/default/libj9jit29.so
Module_base_address=00007FAEC93D6000

Method_being_compiled=com/ibm/java/lang/management/internal/ThreadMXBeanImpl.makeThreadInfo(Lopenj9/management/internal/ThreadInfoBase;)Ljava/lang/management/ThreadInfo;
Target=2_90_20241207_1972 (Linux 5.15.0-125-generic)
CPU=amd64 (4 logical CPUs) (0x1f016c000 RAM)
----------- Stack Backtrace -----------
_ZN11TR_J9VMBase27staticGetBaseComponentClassEP19TR_OpaqueClassBlockRi+0x2a (0x00007FAEC96322BA [libj9jit29.so+0x25c2ba])
_ZN16JITServerHelpers22packRemoteROMClassInfoB5cxx11EP7J9ClassP10J9VMThreadP9TR_Memoryb+0x98 (0x00007FAEC95D9BE8 [libj9jit29.so+0x203be8])
_ZL19handleServerMessagePN9JITServer12ClientStreamEP7TR_J9VMRNS_11MessageTypeE+0x27b9 (0x00007FAEC9580019 [libj9jit29.so+0x1aa019])
_Z13remoteCompileP10J9VMThreadPN2TR11CompilationEP17TR_ResolvedMethodP8J9MethodRNS1_24IlGeneratorMethodDetailsEPNS1_28CompilationInfoPerThreadBaseE.localalias+0x13aa (0x00007FAEC95937EA [libj9jit29.so+0x1bd7ea])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0x749 (0x00007FAEC95524E9 [libj9jit29.so+0x17c4e9])
_ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0x381 (0x00007FAEC9553291 [libj9jit29.so+0x17d291])
omrsig_protect+0x239 (0x00007FAECAB2C459 [libj9prt29.so+0x2a459])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x385 (0x00007FAEC9550DF5 [libj9jit29.so+0x17adf5])
_ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x128 (0x00007FAEC9551118 [libj9jit29.so+0x17b118])
_ZN2TR24CompilationInfoPerThread14processEntriesEv+0x35b (0x00007FAEC955005B [libj9jit29.so+0x17a05b])
_ZN2TR24CompilationInfoPerThread3runEv+0x42 (0x00007FAEC95503B2 [libj9jit29.so+0x17a3b2])
_Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x82 (0x00007FAEC9550462 [libj9jit29.so+0x17a462])
omrsig_protect+0x239 (0x00007FAECAB2C459 [libj9prt29.so+0x2a459])
_Z21compilationThreadProcPv+0x17b (0x00007FAEC955082B [libj9jit29.so+0x17a82b])
@cjjdespres
Copy link
Contributor Author

Attn @mpirvu

Copy link

Issue Number: 20800
Status: Open
Recommended Components: comp:jitserver, comp:jit, comp:vm

@mpirvu mpirvu added the comp:jitserver Artifacts related to JIT-as-a-Service project label Dec 10, 2024
@cjjdespres
Copy link
Contributor Author

This appears to affect all platforms, but only JDK8 and JDK11.

@cjjdespres
Copy link
Contributor Author

It does reproduce locally. The segfault happens starting here:

(gdb) frame
#16 0x00007fe7f81829b4 in handleServerMessage (client=client@entry=0x7fe7dce87580, fe=0x7fe7c8007e40, response=@0x7fe7dc95aaf8: JITServer::ResolvedMethod_getRemoteROMClassAndMethods)
    at /home/despresc/dev/openj9-openjdk-jdk11/openj9/runtime/compiler/control/JITClientCompilationThread.cpp:1306
1306	         client->write(response, JITServerHelpers::packRemoteROMClassInfo(clazz, fe->vmThread(), trMemory, true));
(gdb) print *clazz
$13 = {eyecatcher = 0, romClass = 0x0, superclasses = 0x0, classDepthAndFlags = 0, classDepthWithFlags = 0, classFlags = 0, classLoader = 0x0, classObject = 0x0, initializeStatus = 0, ramMethods = 0x0, 
  ramStatics = 0x0, arrayClass = 0x0, totalInstanceSize = 0, lastITable = 0x0, instanceDescription = 0x0, instanceLeafDescription = 0x0, instanceHotFieldDescription = 0, selfReferencingField1 = 0, 
  selfReferencingField2 = 0, initializerCache = 0x0, romableAotITable = 0, packageID = 0, module = 0x0, subclassTraversalLink = 0x0, subclassTraversalReverseLink = 0x0, iTable = 0x0, castClassCache = 0, 
  jniIDs = 0x0, lockOffset = 0, paddingForGLRCounters = 0, reservedCounter = 0, cancelCounter = 0, newInstanceCount = 0, backfillOffset = 0, replacedClass = 0x0, finalizeLinkOffset = 0, 
  nextClassInSegment = 0x0, ramConstantPool = 0x0, callSites = 0x0, methodTypes = 0x0, varHandleMethodTypes = 0x0, customSpinOption = 0x0, staticSplitMethodTable = 0x0, specialSplitMethodTable = 0x0, 
  jitMetaDataList = 0x0, gcLink = 0x0, hostClass = 0x0, nestHost = 0x0, flattenedClassCache = 0x0, hotFieldsInfo = 0x0}

and eventually we crash in TR_J9VMBase::staticGetBaseComponentClass.

@cjjdespres
Copy link
Contributor Author

Looking at the description of #20757, this might be caused by eclipse-omr/omr#7565 - there could be another case that needs to be bypassed at the server.

@mpirvu
Copy link
Contributor

mpirvu commented Dec 10, 2024

Very likely that we didn't catch all cases.

@cjjdespres
Copy link
Contributor Author

The test suite https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk8_j9_sanity.system_s390x_linux_jit_Personal/1238/ (saved test output from one of the failures is in this large archive file) has a more recent example of this failure - it ran on the 7th. It's a child JVM process that's crashing, so the only lines in the test output are:

STF 02:45:16.721 - +------ Step 3 - Wait for processes to complete
STF 02:45:16.721 - | Wait for processes to meet expectations
STF 02:45:16.721 - |   Processes: [LT1, CL1]
STF 02:45:16.721 - |
STF 02:45:16.721 - Monitoring processes: CL1 LT1
CL1 j> 2025/01/07 02:45:16.979 ServerURL=service:jmx:rmi:///jndi/rmi://localhost:1234/jmxrmi
CL1 j> 2025/01/07 02:45:17.041 Attempting to connect
CL1 j> 2025/01/07 02:45:18.371 Connection established!
CL1 j> 2025/01/07 02:45:19.536 Starting to write data
STF 02:46:49.351 - Found dump at: /home/jenkins/workspace/Test_openjdk8_j9_sanity.system_s390x_linux_jit_Personal_testList_2/aqa-tests/TKG/output_17362423519558/TestJlmRemoteThreadNoAuth_0/20250107-024508-TestJlmRemoteThreadNoAuth/results/javacore.20250107.024646.873053.0002.txt
STF 02:46:49.352 - Found dump at: /home/jenkins/workspace/Test_openjdk8_j9_sanity.system_s390x_linux_jit_Personal_testList_2/aqa-tests/TKG/output_17362423519558/TestJlmRemoteThreadNoAuth_0/20250107-024508-TestJlmRemoteThreadNoAuth/results/core.20250107.024646.873053.0001.dmp
CL1 stderr javacore file generated - /home/jenkins/workspace/Test_openjdk8_j9_sanity.system_s390x_linux_jit_Personal_testList_2/aqa-tests/TKG/output_17362423519558/TestJlmRemoteThreadNoAuth_0/20250107-024508-TestJlmRemoteThreadNoAuth/results/javacore.20250107.024646.873053.0002.txt
CL1 stderr core file generated - /home/jenkins/workspace/Test_openjdk8_j9_sanity.system_s390x_linux_jit_Personal_testList_2/aqa-tests/TKG/output_17362423519558/TestJlmRemoteThreadNoAuth_0/20250107-024508-TestJlmRemoteThreadNoAuth/results/core.20250107.024646.873053.0001.dmp
STF 02:46:50.890 - Found dump at: /home/jenkins/workspace/Test_openjdk8_j9_sanity.system_s390x_linux_jit_Personal_testList_2/aqa-tests/TKG/output_17362423519558/TestJlmRemoteThreadNoAuth_0/20250107-024508-TestJlmRemoteThreadNoAuth/results/Snap.20250107.024646.873053.0003.trc
CL1 stderr Snap file generated - /home/jenkins/workspace/Test_openjdk8_j9_sanity.system_s390x_linux_jit_Personal_testList_2/aqa-tests/TKG/output_17362423519558/TestJlmRemoteThreadNoAuth_0/20250107-024508-TestJlmRemoteThreadNoAuth/results/Snap.20250107.024646.873053.0003.trc
STF 02:50:16.212 - Heartbeat: Process CL1 is still running
STF 02:52:19.644 - Monitoring Report Summary:
STF 02:52:19.644 -   o Process CL1 has crashed unexpectedly
STF 02:52:19.644 -   o Process LT1 is still running as expected
STF 02:52:19.645 - Killing processes: CL1 LT1
STF 02:52:19.645 -   o Process CL1 pid 873054 is not running
STF 02:52:19.645 -   o Process clean up attempt 1 for LT1 pid 873053
STF 02:52:19.645 -   o Process LT1 pid 873053 stop()
STF 02:52:29.730 -   o Process LT1 pid 873053 terminate()
STF 02:52:30.731 -   o Process LT1 pid 873053 killed
**FAILED** at step 3 (Wait for processes to complete). Expected return value=0 Actual=1 at /home/jenkins/workspace/Test_openjdk8_j9_sanity.system_s390x_linux_jit_Personal_testList_2/aqa-tests/TKG/../TKG/output_17362423519558/TestJlmRemoteThreadNoAuth_0/20250107-024508-TestJlmRemoteThreadNoAuth/execute.pl line 165.
STF 02:52:31.037 - **FAILED** execute script failed. Expected return value=0 Actual=1

The actual crash data and relevant logs are in the saved test output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project test failure
Projects
Status: To do
Development

No branches or pull requests

2 participants