Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in interruptEventLoop trying to dereference garbage pointer #251

Open
claui opened this issue Feb 9, 2024 · 0 comments
Open

Crash in interruptEventLoop trying to dereference garbage pointer #251

claui opened this issue Feb 9, 2024 · 0 comments

Comments

@claui
Copy link

claui commented Feb 9, 2024

Summary

While shutting down the JVM in a heavily multithreaded context, we’ve been encountering crashes in interruptEventLoop:

# Problematic frame:
# C  [libNRJavaSerial.so+0x818c]  Java_gnu_io_RXTXPort_interruptEventLoop+0x4c
[…]
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000d00000000
Details
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f4c14be818c, pid=92950, tid=0x00007f4bd2649700
#
# JRE version: OpenJDK Runtime Environment (8.0_342-b07) (build 1.8.0_342-b07)
# Java VM: OpenJDK 64-Bit Server VM (25.342-b07 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libNRJavaSerial.so+0x818c]  Java_gnu_io_RXTXPort_interruptEventLoop+0x4c
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   https://github.com/corretto/corretto-8/issues/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  T H R E A D  ---------------

Current thread (0x00007f4bf063b800):  JavaThread "SpringContextShutdownHook" daemon [_thread_in_native, id=621408, stack(0x00007f4bd2611000,0x00007f4bd264a000)]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000d00000000

Registers:
RAX=0x00000000000000c3, RBX=0x00000000000000c3, RCX=0x0000000000000000, RDX=0x0000000d00000000
RSP=0x00007f4bd2647bd0, RBP=0x00007f4c14dee380, RSI=0x00007f4c14beaad6, RDI=0x00007f4c14beaad6
R8 =0x00007f4c3529d6e0, R9 =0x000000000000000b, R10=0x00007f4c14ffb2bc, R11=0x00007f4c34cb3b80
R12=0x00007f4c14beaf23, R13=0x00007f4c14ffdc80, R14=0x00007f4bd2647c60, R15=0x00007f4bf063b800
RIP=0x00007f4c14be818c, EFLAGS=0x0000000000010206, CSGSFS=0x002b000000000033, ERR=0x0000000000000004
  TRAPNO=0x000000000000000e

Top of Stack: (sp=0x00007f4bd2647bd0)
0x00007f4bd2647bd0:   00007f4c14ffdc80 00007f4bd2647c40
0x00007f4bd2647be0:   0000000000000000 00007f4c272a01a8
0x00007f4bd2647bf0:   00007f4bd2647c40 00007f4c299421a6
0x00007f4bd2647c00:   00007f4bd2647c00 0000000000000000
0x00007f4bd2647c10:   00007f4bd2647c60 00007f4c14fc5f30
0x00007f4bd2647c20:   0000000000000000 00007f4c14ffdc80
0x00007f4bd2647c30:   0000000000000000 00007f4bd2647c60
0x00007f4bd2647c40:   00007f4bd2647ca8 00007f4c2728b09d
0x00007f4bd2647c50:   0000000000000000 00007f4c27299a78
0x00007f4bd2647c60:   00000000fc6b3f20 00007f4bd2647c68
0x00007f4bd2647c70:   00007f4c14ffe256 00007f4bd2647cc0
0x00007f4bd2647c80:   00007f4c14fc5f30 0000000000000000
0x00007f4bd2647c90:   00007f4c14ffe2d8 00007f4bd2647c60
0x00007f4bd2647ca0:   00007f4bd2647cc0 00007f4bd2647d08
0x00007f4bd2647cb0:   00007f4c2728b09d 0000000000000000
0x00007f4bd2647cc0:   00000000fc6b3f20 00007f4bd2647cc8
0x00007f4bd2647cd0:   00007f4c14ffee03 00007f4bd2647d28
0x00007f4bd2647ce0:   00007f4c14fc5f30 0000000000000000
0x00007f4bd2647cf0:   00007f4c14ffeed8 00007f4bd2647cc0
0x00007f4bd2647d00:   00007f4bd2647d28 00007f4bd2647d70
0x00007f4bd2647d10:   00007f4c2728b09d 0000000000000000
0x00007f4bd2647d20:   0000000000000000 00000000fc6b3f20
0x00007f4bd2647d30:   00007f4bd2647d30 00007f4c15867f52
0x00007f4bd2647d40:   00007f4bd2647d88 00007f4c14fb6bb0
0x00007f4bd2647d50:   0000000000000000 00007f4c14fef028
0x00007f4bd2647d60:   00007f4bd2647d28 00007f4bd2647d88
0x00007f4bd2647d70:   00007f4bd2647dd0 00007f4c2728b09d
0x00007f4bd2647d80:   0000000000000000 00000000fc699b60
0x00007f4bd2647d90:   00007f4bd2647d90 00007f4c153df16c
0x00007f4bd2647da0:   00007f4bd2647de8 00007f4c153df5d0
0x00007f4bd2647db0:   0000000000000000 00007f4c153df1c8
0x00007f4bd2647dc0:   00007f4bd2647d88 00007f4bd2647de8 

Instructions: (pc=0x00007f4c14be818c)
0x00007f4c14be816c:   00 00 00 00 48 8b 55 00 48 85 d2 75 13 eb 3d 0f
0x00007f4c14be817c:   1f 44 00 00 48 8b 92 c8 00 00 00 48 85 d2 74 34
0x00007f4c14be818c:   39 1a 75 f0 c7 82 98 00 00 00 01 00 00 00 48 8d
0x00007f4c14be819c:   3d 87 38 00 00 c7 82 68 01 00 00 01 00 00 00 5b 

Register to memory mapping:

RAX=0x00000000000000c3 is an unknown value
RBX=0x00000000000000c3 is an unknown value
RCX=0x0000000000000000 is an unknown value
RDX=0x0000000d00000000 is an unknown value
RSP=0x00007f4bd2647bd0 is pointing into the stack for thread: 0x00007f4bf063b800
RBP=0x00007f4c14dee380: master_index+0 in /var/lib/tomcat8/temp/libNRJavaSerial_tomcat8_0/libNRJavaSerial.so at 0x00007f4c14be0000
RSI=0x00007f4c14beaad6: <offset 0xaad6> in /var/lib/tomcat8/temp/libNRJavaSerial_tomcat8_0/libNRJavaSerial.so at 0x00007f4c14be0000
RDI=0x00007f4c14beaad6: <offset 0xaad6> in /var/lib/tomcat8/temp/libNRJavaSerial_tomcat8_0/libNRJavaSerial.so at 0x00007f4c14be0000
R8 =0x00007f4c3529d6e0: <offset 0x10086e0> in /usr/lib/jvm/jre1.8/jre/lib/amd64/server/libjvm.so at 0x00007f4c34295000
R9 =0x000000000000000b is an unknown value
R10=0x00007f4c14ffb2bc is pointing into metadata
R11=0x00007f4c34cb3b80: <offset 0xa1eb80> in /usr/lib/jvm/jre1.8/jre/lib/amd64/server/libjvm.so at 0x00007f4c34295000
R12=0x00007f4c14beaf23: <offset 0xaf23> in /var/lib/tomcat8/temp/libNRJavaSerial_tomcat8_0/libNRJavaSerial.so at 0x00007f4c14be0000
R13={method} {0x00007f4c14ffdc80} 'interruptEventLoop' '()V' in 'gnu/io/RXTXPort'
R14=0x00007f4bd2647c60 is pointing into the stack for thread: 0x00007f4bf063b800
R15=0x00007f4bf063b800 is a thread


Stack: [0x00007f4bd2611000,0x00007f4bd264a000],  sp=0x00007f4bd2647bd0,  free space=218k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libNRJavaSerial.so+0x818c]  Java_gnu_io_RXTXPort_interruptEventLoop+0x4c

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  gnu.io.RXTXPort.interruptEventLoop()V+0
j  gnu.io.RXTXPort.removeEventListener()V+62
j  gnu.io.RXTXPort.close()V+99
j  gnu.io.NRSerialPort.disconnect()V+18
j  ***redacted***
j  ***redacted***
j  ***redacted***
v  ~StubRoutines::call_stub
J 1834  sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (0 bytes) @ 0x00007f4c2795f2f7 [0x00007f4c2795f280+0x77]
j  sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100
J 11853 C2 java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (62 bytes) @ 0x00007f4c2955c1d4 [0x00007f4c2955c120+0xb4]
j  org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleElement.invoke(Ljava/lang/Object;)V+16
j  org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor$LifecycleMetadata.invokeDestroyMethods(Ljava/lang/Object;Ljava/lang/String;)V+122
j  org.springframework.beans.factory.annotation.InitDestroyAnnotationBeanPostProcessor.postProcessBeforeDestruction(Ljava/lang/Object;Ljava/lang/String;)V+12
j  org.springframework.beans.factory.support.DisposableBeanAdapter.destroy()V+48
j  org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroyBean(Ljava/lang/String;Lorg/springframework/beans/factory/DisposableBean;)V+133
j  org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingleton(Ljava/lang/String;)V+41
j  org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingleton(Ljava/lang/String;)V+2
j  org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.destroySingletons()V+105
j  org.springframework.beans.factory.support.DefaultListableBeanFactory.destroySingletons()V+1
j  org.springframework.context.support.AbstractApplicationContext.destroyBeans()V+4
j  org.springframework.context.support.AbstractApplicationContext.doClose()V+133
j  org.springframework.context.support.AbstractApplicationContext$1.run()V+14
v  ~StubRoutines::call_stub

---------------  P R O C E S S  ---------------

Java Threads: ( => current thread )
=>0x00007f4bf063b800 JavaThread "SpringContextShutdownHook" daemon [_thread_in_native, id=621408, stack(0x00007f4bd2611000,0x00007f4bd264a000)]
  0x00007f4c04026800 JavaThread "SIGTERM handler" daemon [_thread_blocked, id=621404, stack(0x00007f4bd445b000,0x00007f4bd4494000)]
  0x00007f4c18294800 JavaThread "RXTXPortMonitor(/dev/ttyS0)" daemon [_thread_in_native, id=128537, stack(0x00007f4bd44cd000,0x00007f4bd4506000)]
  [241 more Java threads omitted]

[details omitted]

---------------  S Y S T E M  ---------------

OS:PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

uname:Linux 5.10.0-27-amd64 #1 SMP Debian 5.10.205-2 (2023-12-31) x86_64
libc:glibc 2.31 NPTL 2.31 
rlimit: STACK 8192k, CORE 0k, NPROC 6141, NOFILE 524288, AS infinity
load average:0,42 0,54 0,45

[details omitted]

Memory: 4k page, physical 1607956k(77304k free), swap 2440188k(1500068k free)

vm_info: OpenJDK 64-Bit Server VM (25.342-b07) for linux-amd64 JRE (1.8.0_342-b07), built on Jul 27 2022 21:29:28 by "jenkins" with gcc 7.3.1 20180303 (Red Hat 7.3.1-5)

time: Thu Feb  8 09:24:33 2024
timezone: CET
elapsed time: 57181.688148 seconds (0d 15h 53m 1s)

Observations

Note that the address being dereferenced doesn’t even remotely resemble a memory location nor a near-zero value, but instead looks more like unrelated, non-address data:

si_addr: 0x0000000d00000000

The instruction pointer at the time of the crash:

Instructions: (pc=0x00007f4c14be818c)
0x00007f4c14be816c:   00 00 00 00 48 8b 55 00 48 85 d2 75 13 eb 3d 0f
0x00007f4c14be817c:   1f 44 00 00 48 8b 92 c8 00 00 00 48 85 d2 74 34
0x00007f4c14be818c:   39 1a 75 f0 c7 82 98 00 00 00 01 00 00 00 48 8d
0x00007f4c14be819c:   3d 87 38 00 00 c7 82 68 01 00 00 01 00 00 00 5b 

points to this piece of code (obtained via objdump -d):

Disassembly of section .text:

0000000000008140 <Java_gnu_io_RXTXPort_interruptEventLoop>:
    []
    8187:	48 85 d2             	test   %rdx,%rdx
    818a:	74 34                	je     81c0 <Java_gnu_io_RXTXPort_interruptEventLoop+0x80>
 => 818c:	39 1a                	cmp    %ebx,(%rdx)
    818e:	75 f0                	jne    8180 <Java_gnu_io_RXTXPort_interruptEventLoop+0x40>
    8190:	c7 82 98 00 00 00 01 	movl   $0x1,0x98(%rdx)
    []
Details
0000000000008140 <Java_gnu_io_RXTXPort_interruptEventLoop>:
    8140:	41 54                	push   %r12
    8142:	48 8d 0d ce 2a 00 00 	lea    0x2ace(%rip),%rcx        # ac17 <_fini+0x18b>
    8149:	48 8d 15 84 29 00 00 	lea    0x2984(%rip),%rdx        # aad4 <_fini+0x48>
    8150:	55                   	push   %rbp
    8151:	4c 8d 25 cb 2d 00 00 	lea    0x2dcb(%rip),%r12        # af23 <_fini+0x497>
    8158:	53                   	push   %rbx
    8159:	e8 42 b8 ff ff       	callq  39a0 <get_java_var@plt>
    815e:	48 8b 2d 8b 5e 20 00 	mov    0x205e8b(%rip),%rbp        # 20dff0 <master_index@@Base-0x390>
    8165:	89 c3                	mov    %eax,%ebx
    8167:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
    816e:	00 00 
    8170:	48 8b 55 00          	mov    0x0(%rbp),%rdx
    8174:	48 85 d2             	test   %rdx,%rdx
    8177:	75 13                	jne    818c <Java_gnu_io_RXTXPort_interruptEventLoop+0x4c>
    8179:	eb 3d                	jmp    81b8 <Java_gnu_io_RXTXPort_interruptEventLoop+0x78>
    817b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
    8180:	48 8b 92 c8 00 00 00 	mov    0xc8(%rdx),%rdx
    8187:	48 85 d2             	test   %rdx,%rdx
    818a:	74 34                	je     81c0 <Java_gnu_io_RXTXPort_interruptEventLoop+0x80>
    818c:	39 1a                	cmp    %ebx,(%rdx)
    818e:	75 f0                	jne    8180 <Java_gnu_io_RXTXPort_interruptEventLoop+0x40>
    8190:	c7 82 98 00 00 00 01 	movl   $0x1,0x98(%rdx)
    8197:	00 00 00 
    819a:	48 8d 3d 87 38 00 00 	lea    0x3887(%rip),%rdi        # ba28 <_fini+0xf9c>
    81a1:	c7 82 68 01 00 00 01 	movl   $0x1,0x168(%rdx)
    81a8:	00 00 00 
    81ab:	5b                   	pop    %rbx
    81ac:	5d                   	pop    %rbp
    81ad:	41 5c                	pop    %r12
    81af:	e9 cc b8 ff ff       	jmpq   3a80 <report@plt>
    81b4:	0f 1f 40 00          	nopl   0x0(%rax)
    81b8:	4c 89 e7             	mov    %r12,%rdi
    81bb:	e8 c0 b8 ff ff       	callq  3a80 <report@plt>
    81c0:	48 8d 3d 5e 2d 00 00 	lea    0x2d5e(%rip),%rdi        # af25 <_fini+0x499>
    81c7:	e8 b4 b8 ff ff       	callq  3a80 <report@plt>
    81cc:	bf e8 03 00 00       	mov    $0x3e8,%edi
    81d1:	e8 2a b9 ff ff       	callq  3b00 <usleep@plt>
    81d6:	eb 98                	jmp    8170 <Java_gnu_io_RXTXPort_interruptEventLoop+0x30>
    81d8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
    81df:	00 

Which in turn matches the index->fd != fd expression here:

while( index->fd != fd &&
index->next ) index = index->next;

In other words, while navigating the chain of event_info_structs, the code in interruptEventLoop appears to be bumping into a garbage next pointer and tries to dereference it, causing the crash.

Possibly related to issue #197, which is reporting an infinite loop rather than a crash.

Possible explanation

If I’m reading the source code correctly, each event_info_struct is owned by its monitor thread, and lives in the stack frame of that thread’s eventLoop.

I couldn’t ultimately figure out how garbage could be possibly ending up in the middle of an event_info_struct.

But a possible explanation would be: a monitor thread might be dying before it gets to bail out of its event_loop. Or maybe a TERM or some other signal is beaming the thread out of its event_loop. Either way, the event_loop never gets to run finalize_event_info_struct. This causes a stale pointer to remain in the global chain, even after the stack frame of event_loop is unrolled. The memory might later be reused by another stack frame or stack.

This theory could explain why during a subsequent invocation of interruptEventLoop in some other thread, while navigating the global chain of event_info_structs, the code might be taking some piece of stack garbage as the real thing.

All that is just speculation from static analysis of the code. But if my theory happens to be correct, it might also explain the stalling reported in issue #197; if the stack segment of a reaped monitor thread is immediately reused by another monitor thread, the stack frame address of the new event_loop might perfectly align with the stale stack frame from the past thread’s event_loop, which means its event_info_struct pointer might end up twice in the global chain. This would cause a cycle in the chain, and a subsequent call to interruptEventLoop from a Java thread might become stuck in an infinite cycle.

Fix

Preliminary testing suggests that PR #211 might be fixing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant