Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in javac compiling functionality tests #10623

Closed
pshipton opened this issue Sep 16, 2020 · 16 comments · Fixed by #10672
Closed

Crash in javac compiling functionality tests #10623

pshipton opened this issue Sep 16, 2020 · 16 comments · Fixed by #10672
Assignees
Labels
comp:jit os:windows segfault Issues that describe segfaults / JVM crashes

Comments

@pshipton
Copy link
Member

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_sanity.functional_x86-32_windows_OMR_testList_0/105
There is a core file in the artifact.
May be the same problem as #10490

09:48:02      [javac] Compiling 20 source files to C:\Users\****\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_OMR_testList_0\openjdk-tests\functional\cmdLineTests\shareClassTests\SCHelperCompatTests\bin\BallSports
09:48:03      [javac] Unhandled exception
09:48:03      [javac] Type=Segmentation error vmState=0x00000000
09:48:03      [javac] J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=719FD174 ContextFlags=0001007f
09:48:03      [javac] Handler1=72209F60 Handler2=72135120 InaccessibleReadAddress=FFFFFFF7
09:48:03      [javac] EDI=21899818 ESI=24CEF718 EAX=FFFFFFFB EBX=00000009
09:48:03      [javac] ECX=112500DB EDX=21F63E54
09:48:03      [javac] EIP=719FD174 ESP=00C4F06C EBP=FFFFFFFB EFLAGS=00210202
09:48:03      [javac] GS=002B FS=0053 ES=002B DS=002B
09:48:03      [javac] Module=C:\Users\****\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_OMR_testList_0\openjdkbinary\j2sdk-image\jre\bin\default\j9jit29.dll
09:48:03      [javac] Module_base_address=718B0000 Offset_in_DLL=0014d174
09:48:03      [javac] Target=2_90_20200916_721 (Windows Server 2012 R2 6.3 build 9600)
09:48:03      [javac] CPU=x86 (8 logical CPUs) (0x1ffb9c000 RAM)
09:48:03      [javac] ----------- Stack Backtrace -----------
09:48:03      [javac] Java_java_lang_invoke_MutableCallSite_invalidate+0xc8054 (0x719FD174 [j9jit29+0x14d174])
09:48:03      [javac] J9VMDllMain+0x4a6b (0x7190AAFB [j9jit29+0x5aafb])
09:48:03      [javac] ---------------------------------------
@pshipton pshipton added comp:jit os:windows segfault Issues that describe segfaults / JVM crashes labels Sep 16, 2020
@pshipton
Copy link
Member Author

@andrewcraik fyi

@andrewcraik
Copy link
Contributor

that backtrace looks very dubious - esp since it is windows - the offset is insane....

@pshipton
Copy link
Member Author

yup, but there is a core file to look at, available for a short time.

@pshipton
Copy link
Member Author

Another "similar" crash, adding to the 0.23 milestone plan.

MauveSingleInvocationLoadTest_special_22
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_x86-64_windows_Nightly_mauveLoadTest/134

LT  23:12:45.733 - Completed 80.1%. Number of tests started=2256
LT  stderr Unhandled exception
LT  stderr Type=Segmentation error vmState=0x00000000
LT  stderr Windows_ExceptionCode=c0000005 J9Generic_Signal=00000004 ExceptionAddress=00007FFA4ED40052 ContextFlags=0010005f
LT  stderr Handler1=00007FFA5024FD00 Handler2=00007FFA50168C50 InaccessibleReadAddress=FFFFFFFFFFFFFFFF
LT  stderr RDI=00007FFA3BBE85C3 RSI=00007FFA3BBE85C8 RAX=FFBC1C10FFBC1C00 RBX=0000000001DA3100
LT  stderr RCX=00000000FFBC1900 RDX=00007FFA3BBE85C8 R8=0000000000000000 R9=0000000001DA3500
LT  stderr R10=00000000013EFFF0 R11=00000000FFFF0000 R12=00000000FFFD13E0 R13=0000000000000010
LT  stderr R14=0000000000000000 R15=00000000FFFD1390
LT  stderr RIP=00007FFA4ED40052 RSP=0000000001A0C660 RBP=0000000001A05000 GS=002B
LT  stderr FS=0053 ES=002B DS=002B
LT  stderr XMM0 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM1 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM2 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM6 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM7 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM8 3fdfd535dd2acfe8 (f: 3710570496.000000, d: 4.973883e-001)
LT  stderr XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT  stderr Module=C:\Users\jenkins\workspace\Test_openjdk8_j9_special.system_x86-64_windows_Nightly_mauveLoadTest\openjdkbinary\j2sdk-image\jre\bin\compressedrefs\j9jit29.dll
LT  stderr Module_base_address=00007FFA4EBB0000 Offset_in_DLL=0000000000190052
LT  stderr Target=2_90_20200921_521 (Windows Server 2012 R2 6.3 build 9600)
LT  stderr CPU=amd64 (8 logical CPUs) (0x1ffb9c000 RAM)
LT  stderr ----------- Stack Backtrace -----------
LT  stderr Java_java_lang_invoke_MutableCallSite_invalidate+0xf6ad2 (0x00007FFA4ED40052 [j9jit29+0x190052])
LT  stderr (0x00000000FFBC1900)
LT  stderr (0x00000000FFBC1900)
LT  stderr (0x00000000FFFD0040)
LT  stderr (0x00000000FFFD0060)
LT  stderr Java_java_lang_invoke_MutableCallSite_invalidate+0x5df750 (0x00007FFA4F228CD0 [j9jit29+0x678cd0])
LT  stderr (0x00007FFA3BBE85C8)
LT  stderr (0x00007FFA3BB39A14)
LT  stderr (0x00007FFA3BB39A14)
LT  stderr (0x00000000FFBC1798)
LT  stderr ---------------------------------------

@liqunl
Copy link
Contributor

liqunl commented Sep 22, 2020

@rpshukla Could you do a grinder to get the failure rate?

@dsouzai
Copy link
Contributor

dsouzai commented Sep 22, 2020

Not really sure what's going on. I took a look at the core from the original crash this issue was opened against. Even with symbols, windbg wasn't able to print out the backtrace properly. However, looking at it manually:

    [javac] EDI=21899818 ESI=24CEF718 EAX=FFFFFFFB EBX=00000009
    [javac] ECX=112500DB EDX=21F63E54
    [javac] EIP=719FD174 ESP=00C4F06C EBP=FFFFFFFB EFLAGS=00210202

Crashing instruction 0x719FD174:

    j9jit29!J9::Recompilation::getJittedBodyInfoFromPC:
719fd170 8b442404   mov     eax, dword ptr [esp+4]
719fd174 f640fc30   test    byte ptr [eax-4], 30h
719fd178 7404       je      j9jit29!J9::Recompilation::getJittedBodyInfoFromPC+0xe (719fd17e)
719fd17a 8b40f8     mov     eax, dword ptr [eax-8]
719fd17d c3         ret     

Stack:

00000000`00C4F06C  7190AAFB FFFFFFFB 2189BD80 00000000 00000003 21899500 00000009 00C4F338
00000000`00C4F08C  2189C4D8 01730918 21899500 2189BD80 2189C4D8 00000001 716ED859 719026EB
00000000`00C4F0AC  21899500 00C4F338 00000000 00000002 00000000 00C4F332 71902792 00000000

Method before calling getJittedBodyInfoFromPC (i.e. near 0x7190AAFB):

7190aabe 8b6a0c         mov     ebp, dword ptr [edx+0Ch]
7190aac1 83fdfd         cmp     ebp, 0FFFFFFFDh
7190aac4 0f8472050000   je      j9jit29!DLTLogic+0x6cc (7190b03c)
7190aaca 83e0f0         and     eax, 0FFFFFFF0h
7190aacd 8b00           mov     eax, dword ptr [eax]
7190aacf f7400c00000004 test    dword ptr [eax+0Ch], 4000000h
7190aad6 0f8560050000   jne     j9jit29!DLTLogic+0x6cc (7190b03c)
7190aadc 8b842428010000 mov     eax, dword ptr [esp+128h]
7190aae3 85c0           test    eax, eax
7190aae5 0f8e51050000   jle     j9jit29!DLTLogic+0x6cc (7190b03c)
7190aaeb 8a4a0c         mov     cl, byte ptr [edx+0Ch]
7190aaee f6d1           not     cl
7190aaf0 f6c101         test    cl, 1
7190aaf3 742c           je      j9jit29!DLTLogic+0x1b1 (7190ab21)
7190aaf5 55             push    ebp
7190aaf6 e875260f00     call    j9jit29!J9::Recompilation::getJittedBodyInfoFromPC (719fd170)
7190aafb 83c404         add     esp, 4

Note, ebp comes from

7190aabe 8b6a0c         mov     ebp, dword ptr [edx+0Ch]

and is pushed on to the stack before the call to getJittedBodyInfoFromPC, which can be seen as FFFFFFFB, which is why the crash happens at inaccessible addr 0xFFFFFFF7 because FFFFFFFB - 4 = FFFFFFF7.

However, note also that edx does not change between the assignment of ebp (in instr 0x7190aabe) all the way to the crash in getJittedBodyInfoFromPC. So, looking at edx from the reg value printed out during the crash:

00000000`21F63E54  24CEF72C 21F64430 00000018 23166A24 24CEF834 21F64430 00000006 00000011
00000000`21F63E74  24CEF878 21F64430 00000006 000007D1 24CEF8C0 21F64430 00000018 23164BE4
00000000`21F63E94  24CEFA74 21F64430 00000018 230EAA84 24CEFBA0 21F64430 00000006 000007D1

So edx + 0x0Ch should be 0x23166A24. Furthermore, if edx + 0x0Ch was indeed 0xFFFFFFFB, then the test at instr 0x7190aaf0

7190aaeb 8a4a0c         mov     cl, byte ptr [edx+0Ch]
7190aaee f6d1           not     cl
7190aaf0 f6c101         test    cl, 1
7190aaf3 742c           je      j9jit29!DLTLogic+0x1b1 (7190ab21)

would have resulted in the thread jumping over the call to getJittedBodyInfoFromPC . I have no idea how ebp got the value of 0xFFFFFFFB..

@fjeremic fjeremic changed the title crash in javac compiling functionality tests Crash in javac compiling functionality tests Sep 22, 2020
@dsouzai
Copy link
Contributor

dsouzai commented Sep 22, 2020

Heh, actually, thinking on it a bit more I believe I figured out what's going on. It's caused by a race condition because of the compiler (visual studio) privatizing a field but not consistently using the privatized version.

At the time ebp is assigned, walkState.method->extra == J9_JIT_QUEUED_FOR_COMPILATION == 0xFFFFFFFB. At the time the isCompiled call is made, walkState.method->extra == 0x23166A24. edx == walkState.method.

The compiler chose to use edx+0x0Ch when it inlined the isCompiled test, but ebp (the privatized field) as the parm for getJittedBodyInfoFromPC.

Not 100% sure what a "fix" should be here... @andrewcraik do you have any suggestions? The relevant code is https://github.com/eclipse/openj9/blob/352f71b11badb3a9fd137c84237810224c36855a/runtime/compiler/control/HookedByTheJit.cpp#L867-L869

@dsouzai
Copy link
Contributor

dsouzai commented Sep 22, 2020

Not 100% sure what a "fix" should be here...

The reason for stating this is the C++ code is technically correct here because:

if TR::CompilationInfo::isCompiled(walkState.method) returns false, then even if the extra were to change, it doesn't matter
if TR::CompilationInfo::isCompiled(walkState.method) returns true, then walkState.method->extra is necessarily a valid startPC.

@dsouzai dsouzai self-assigned this Sep 22, 2020
@dsouzai
Copy link
Contributor

dsouzai commented Sep 22, 2020

Leo mentioned the issue might be happening because of a lack of volatile somewhere. Perhaps one solution is to explicitly privatize walkState.method, something like

volatile J9Method* method = walkState.method;`

@andrewcraik
Copy link
Contributor

My first reaction was that we are missing a volatile. If it is that we are not using a single canonical definition of the extra field and others could be making changes then we need to mark extra volatile to stop the compiler privatizing and doing other things to stop reads or writes from memory IMO.

@mayshukla
Copy link
Contributor

@liqunl
Copy link
Contributor

liqunl commented Sep 22, 2020

@rpshukla Thanks for the runs. The second crash is a different issue, it is a crash in populateVPicSlotCall. I created #10665 to track this issue.

@gacholio
Copy link
Contributor

Leo mentioned the issue might be happening because of a lack of volatile somewhere. Perhaps one solution is to explicitly privatize walkState.method, something like

volatile J9Method* method = walkState.method;`

I am strongly against this - it's a hack which fixes exactly one place. Does this even have any real meaning in the specification? We had to do this in a few places in the interpreter to fix ZOS XLC, but the correct solution would have been to make the field volatile. It also results in the compiler doing very foolish things (even though it correctly read the field and stored it to the stack, it would go to the stack every time instead of registerizing the value, which makes no sense).

@mpirvu
Copy link
Contributor

mpirvu commented Sep 22, 2020

Is it possible for j9method->extra to move from "startPC" (i.e. last bit 0) to something else? If so, then with volatile we have the original problem: we do the test, find the method compiled, then read extra again to find the jitted body info, but this changed to something else and we crash.

@gacholio
Copy link
Contributor

Yes, the field would need to be volatile and also read only once into a local.

@pshipton
Copy link
Member Author

pshipton commented Sep 23, 2020

Another similar crash.
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_extended.system_x86-32_windows_Nightly/496

MT2 stderr Type=Segmentation error vmState=0x00000000
MT2 stderr J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=714FE254 ContextFlags=0001007f
MT2 stderr Handler1=71D1A070 Handler2=71C45120 InaccessibleReadAddress=FFFFFFF7
MT2 stderr EDI=2AFDFD18 ESI=35629A0C EAX=FFFFFFFB EBX=00000001
MT2 stderr ECX=1124005B EDX=3543FAB4
MT2 stderr EIP=714FE254 ESP=324DF710 EBP=FFFFFFFB EFLAGS=00010202
MT2 stderr GS=002B FS=0053 ES=002B DS=002B
MT2 stderr Module=C:\Users\jenkins\workspace\Test_openjdk8_j9_extended.system_x86-32_windows_Nightly\openjdkbinary\j2sdk-image\jre\bin\default\j9jit29.dll
MT2 stderr Module_base_address=713B0000 Offset_in_DLL=0014e254
MT2 stderr Target=2_90_20200922_517 (Windows Server 2012 R2 6.3 build 9600)
MT2 stderr CPU=x86 (8 logical CPUs) (0x1ffb9c000 RAM)
MT2 stderr ----------- Stack Backtrace -----------
STF 22:47:26.463 - Found dump at: C:\Users\jenkins\workspace\Test_openjdk8_j9_extended.system_x86-32_windows_Nightly\openjdk-tests\TKG\test_output_16008298786808\SharedClasses.SCM23.MultiThread_0\20200922-224406-SharedClasses\results\core.20200922.224726.3384.0001.dmp
MT2 stderr Java_java_lang_invoke_MutableCallSite_invalidate+0xc9574 (0x714FE254 [j9jit29+0x14e254])
MT2 stderr J9VMDllMain+0x4a6b (0x7140ABAB [j9jit29+0x5abab])
MT2 stderr ---------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jit os:windows segfault Issues that describe segfaults / JVM crashes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants