fix: scan PDUs before exiting event_loop due to connect timedout #434

raywang-dev · 2024-12-04T10:01:22Z

In iSCSI synchronous operations, a struct iscsi_sync_state variable (state) is allocated on the stack, and its address is assigned to pdu->scsi_cbdata.private_data. This address is eventually used in the PDU callback function.

However, if a reconnection occurs during a synchronous operation (e.g., read or write), but the connect function fails (iscsi->fd will be set to -1), the event_loop times out and exits. At this point, unprocessed PDUs remain. If the PDU callback function is triggered after the timeout (e.g., during iscsi_destroy_context), it may access the pdu->scsi_cbdata.private_data address, which no longer points to the original stack allocation. Writing to this invalid address in the callback corrupts the current stack structure, leading to process crash.

This patch addresses the issue by scanning PDUs before exiting the event_loop due connect timedout, ensuring the unprocessed PDUs are properly handled to prevent stack corruption and crash.

The following is an example showing the crash during scsi_sync_cb, where writing to the address 0x7fffffffde80 corrupts the stack:

(gdb) bt
#0  scsi_sync_cb (iscsi=0xbbf2e0, status=251658240, command_data=0x9b5600, private_data=0x7fffffffde80) at sync.c:372
#1  0x00007fffea0313a9 in iscsi_cancel_pdus (iscsi=iscsi@entry=0xbbf2e0) at pdu.c:783
#2  0x00007fffea02c205 in iscsi_destroy_context (iscsi=0xbbf2e0) at init.c:400
#3  0x00007fffea02c477 in iscsi_destroy_context (iscsi=iscsi@entry=0xbb1ad0) at init.c:427
#4  0x00007fffea06681d in _wrap_iscsi_destroy_context (self=<optimized out>, args=<optimized out>) at libiscsi_wrap.c:4542
#5  0x00007ffff79838f3 in ?? () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#6  0x00007ffff78ba2e5 in ?? () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#7  0x00007ffff78c1fc3 in _PyEval_EvalFrameDefault () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#8  0x00007ffff7a37154 in ?? () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#9  0x00007ffff78ba2e5 in ?? () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#10 0x00007ffff78c1fc3 in _PyEval_EvalFrameDefault () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#11 0x00007ffff7a37154 in ?? () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#12 0x00007ffff78ba2e5 in ?? () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#13 0x00007ffff78be737 in _PyEval_EvalFrameDefault () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#14 0x00007ffff7a37154 in ?? () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#15 0x00007ffff78ba2e5 in ?? () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#16 0x00007ffff78be737 in _PyEval_EvalFrameDefault () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#17 0x00007ffff7a36374 in PyEval_EvalCode () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#18 0x00007ffff7a88a95 in ?? () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#19 0x00007ffff7a8a1f0 in _PyRun_SimpleFileObject () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#20 0x00007ffff7a8bb9c in _PyRun_AnyFileObject () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#21 0x00007ffff7ab4383 in Py_RunMain () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#22 0x00007ffff7ab4b13 in Py_BytesMain () from /opt/smtx/python310/lib/libpython3.10.so.1.0
#23 0x00007ffff74ceb27 in __libc_start_main (main=0x401040, argc=2, argv=0x7fffffffeac8, init=<optimized out>, fini=<optimized out>,
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) x/40gx $sp
0x7fffffffdd88:	0x00007fffea0313a9	0x0000000000bbf2e0
0x7fffffffdd98:	0x0000000000bb1ad0	0x00007fffea377600
0x7fffffffdda8:	0x00007fffea02c205	0x0000000000bb1ad0
0x7fffffffddb8:	0x0000000000bb1ad0	0x00007fffea377600
0x7fffffffddc8:	0x00007fffea02c477	0x0000000000422a60
0x7fffffffddd8:	0x0000000000bb1ad0	0x00007fffea377600
0x7fffffffdde8:	0x00007fffea06681d	0x0000000000bb1ad0
0x7fffffffddf8:	0x00007ffff79699db	0x00007fffffffde88
0x7fffffffde08:	0x0000000000000001	0x0000000000422a60
0x7fffffffde18:	0x00007ffff79838f3	0x00007ffff7c25f60
0x7fffffffde28:	0x00007fffffffdf68	0x00007fffe7408d10
0x7fffffffde38:	0x00007fffe7408d18	0x0000000000422a60
0x7fffffffde48:	0x00007ffff78ba2e5	0x0000000000000000
0x7fffffffde58:	0x00007fffea341600	0x00007fffffffde88
0x7fffffffde68:	0x00007fffea6f2fc0	0x00007fffe7408d08
0x7fffffffde78:	0x00007ffff797015c	0x00007fffea693d80
0x7fffffffde88:	0x0000000000422a60	0x00007fffea3392b8
0x7fffffffde98:	0x00007fffea5318f0	0x00007fffe7408ba0
0x7fffffffdea8:	0x00007fffea376cf0	0x00007fffe7408d20
0x7fffffffdeb8:	0x00007ffff78c1fc3	0x00007fffea52eb80

In iSCSI synchronous operations, a struct iscsi_sync_state variable (state) is allocated on the stack, and its address is assigned to pdu->scsi_cbdata.private_data. This address is eventually used in the PDU callback function. However, if a reconnection occurs during a synchronous operation (e.g., read or write), but the connect function fails (iscsi->fd will be set to -1), the event_loop times out and exits. At this point, unprocessed PDUs remain. If the PDU callback function is triggered after the timeout (e.g., during iscsi_destroy_context), it may access the pdu->scsi_cbdata.private_data address, which no longer points to the original stack allocation. Writing to this invalid address in the callback corrupts the current stack structure, leading to process crash. This patch addresses the issue by scanning PDUs before exiting the event_loop due connect timedout, ensuring the unprocessed PDUs are properly handled to prevent stack corruption and crash. Signed-off-by: raywang <[email protected]>

sahlberg · 2024-12-07T06:30:55Z

Merged, thanks!

sahlberg merged commit 9637e13 into sahlberg:master Dec 7, 2024
2 checks passed

raywang-dev mentioned this pull request Dec 25, 2024

Potential Crash Risk in libiscsi Synchronous Operations #436

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: scan PDUs before exiting event_loop due to connect timedout #434

fix: scan PDUs before exiting event_loop due to connect timedout #434

raywang-dev commented Dec 4, 2024

sahlberg commented Dec 7, 2024

fix: scan PDUs before exiting event_loop due to connect timedout #434

fix: scan PDUs before exiting event_loop due to connect timedout #434

Conversation

raywang-dev commented Dec 4, 2024

sahlberg commented Dec 7, 2024