-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nvmf/rdma: Add NUMA aware policy #12
base: upstream_master
Are you sure you want to change the base?
Conversation
Increase performance SPDK NVME-oF target RDMA running on system with multiple NUMA nodes and multiple network adapter. Add param NumaPolicy in RDMA conf Signed-off-by: Ivan Betsis <[email protected]> Change-Id: Ib64033f3cfeefda02f886c31a3331cd9b219875e
2afc751
to
59bbb89
Compare
} else { | ||
pg = &rtransport->conn_sched.next_io_pgs[rqpair->device->numa_node]; | ||
} | ||
assert(*pg != NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user can specify reactors mask which will use only 1 numa node (e.g. 0xf - 4 cores on numa 0). But the new connection can be triggered on device attached to numa 1. In this case conn_sched.next_io_pgs[1] will be equal NULL and you'll hit this assert. I think that we should fall back to existing scheduling policy if we don't find a poll group for specific numa node
The controller data structure may be freed before subsystem resume done callback, we can take endpoint as the input parameter to avoid this issue. AddressSanitizer: heap-use-after-free on address 0x625000046100 at pc 0x00000082818f bp 0x7fff7b09bd10 sp 0x7fff7b09bd00 READ of size 8 at 0x625000046100 thread T0 (reactor_0) #0 0x82818e in vfio_user_dev_quiesce_resume_done /spdk/lib/nvmf/vfio_user.c:2147 #1 0x782cc0 in subsystem_state_change_done /spdk/lib/nvmf/subsystem.c:634 #2 0xad047b in _call_completion /spdk/lib/thread/thread.c:2344 #3 0xabc48d in msg_queue_run_batch /spdk/lib/thread/thread.c:710 #4 0xac0670 in thread_poll /spdk/lib/thread/thread.c:926 #5 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986 #6 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920 #7 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958 #8 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060 #9 0x99884a in spdk_app_start /spdk/lib/event/app.c:643 #10 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75 #11 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42) #12 0x407abd in _start (/spdk/build/bin/nvmf_tgt+0x407abd) 0x625000046100 is located 0 bytes inside of 8320-byte region [0x625000046100,0x625000048180) freed by thread T0 (reactor_0) here: #0 0x7f82219ff91f in __interceptor_free (/lib64/libasan.so.5+0x10d91f) #1 0x837059 in _free_ctrlr /spdk/lib/nvmf/vfio_user.c:2976 #2 0x837327 in free_ctrlr /spdk/lib/nvmf/vfio_user.c:2996 #3 0x843541 in nvmf_vfio_user_close_qpair /spdk/lib/nvmf/vfio_user.c:3742 #4 0x7d1d91 in nvmf_transport_qpair_fini /spdk/lib/nvmf/transport.c:604 #5 0x7ad922 in _nvmf_qpair_destroy /spdk/lib/nvmf/nvmf.c:1055 #6 0x761362 in nvmf_qpair_request_cleanup /spdk/lib/nvmf/ctrlr.c:4026 #7 0x761906 in spdk_nvmf_request_free /spdk/lib/nvmf/ctrlr.c:4041 #8 0x75a931 in nvmf_qpair_free_aer /spdk/lib/nvmf/ctrlr.c:3576 #9 0x7ae626 in spdk_nvmf_qpair_disconnect /spdk/lib/nvmf/nvmf.c:1127 #10 0x83db36 in _vfio_user_qpair_disconnect /spdk/lib/nvmf/vfio_user.c:3433 #11 0xabc48d in msg_queue_run_batch /spdk/lib/thread/thread.c:710 #12 0xac0670 in thread_poll /spdk/lib/thread/thread.c:926 #13 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986 #14 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920 #15 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958 #16 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060 #17 0x99884a in spdk_app_start /spdk/lib/event/app.c:643 #18 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75 #19 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42) previously allocated by thread T0 (reactor_0) here: #0 0x7f82219fff16 in __interceptor_calloc (/lib64/libasan.so.5+0x10df16) #1 0x837413 in nvmf_vfio_user_create_ctrlr /spdk/lib/nvmf/vfio_user.c:3010 #2 0x83bc68 in nvmf_vfio_user_accept /spdk/lib/nvmf/vfio_user.c:3313 #3 0xabfbd8 in thread_execute_timed_poller /spdk/lib/thread/thread.c:872 #4 0xac0c75 in thread_poll /spdk/lib/thread/thread.c:960 #5 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986 #6 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920 #7 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958 #8 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060 #9 0x99884a in spdk_app_start /spdk/lib/event/app.c:643 #10 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75 #11 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42) SUMMARY: AddressSanitizer: heap-use-after-free /spdk/lib/nvmf/vfio_user.c:2147 in vfio_user_dev_quiesce_resume_done Change-Id: Icf5e5b360b9107a3c5eb960ae59b7fe10ace1c66 Signed-off-by: Changpeng Liu <[email protected]> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11420 Community-CI: Broadcom CI <[email protected]> Tested-by: SPDK CI Jenkins <[email protected]> Reviewed-by: Dong Yi <[email protected]> Reviewed-by: John Levon <[email protected]> Reviewed-by: Ben Walker <[email protected]> Reviewed-by: Jim Harris <[email protected]>
Ubsan with clang complains when using spdk_ioviter with more iters than declared in the array: iov.c:69:9: runtime error: index 3 out of bounds for type 'struct spdk_single_ioviter[2]' #0 0x5df709 in spdk_ioviter_firstv /home/vagrant/spdk_repo/spdk/lib/util/iov.c:69:9 #1 0x53780b in raid5f_xor_stripe /home/vagrant/spdk_repo/spdk/module/bdev/raid/raid5f.c:270:24 #2 0x531bd8 in raid5f_submit_write_request /home/vagrant/spdk_repo/spdk/module/bdev/raid/raid5f.c:520:2 #3 0x52a03a in raid5f_submit_rw_request /home/vagrant/spdk_repo/spdk/module/bdev/raid/raid5f.c:596:9 #4 0x548c17 in test_raid5f_write_request /home/vagrant/spdk_repo/spdk/test/unit/lib/bdev/raid/raid5f.c/raid5f_ut.c:550:2 #5 0x544e18 in test_raid5f_submit_rw_request /home/vagrant/spdk_repo/spdk/test/unit/lib/bdev/raid/raid5f.c/raid5f_ut.c:714:3 #6 0x553e61 in __test_raid5f_submit_full_stripe_write_request /home/vagrant/spdk_repo/spdk/test/unit/lib/bdev/raid/raid5f.c/raid5f_ut.c:878:3 #7 0x543f84 in run_for_each_raid5f_config /home/vagrant/spdk_repo/spdk/test/unit/lib/bdev/raid/raid5f.c/raid5f_ut.c:748:3 #8 0x527ac1 in test_raid5f_submit_full_stripe_write_request /home/vagrant/spdk_repo/spdk/test/unit/lib/bdev/raid/raid5f.c/raid5f_ut.c:885:2 #9 0x7f4a71a0960a (/usr/lib64/libcunit.so.1+0x460a) (BuildId: 9c82dd336cbccd99721651ac0a04435e746e0fc0) #10 0x7f4a71a09937 (/usr/lib64/libcunit.so.1+0x4937) (BuildId: 9c82dd336cbccd99721651ac0a04435e746e0fc0) #11 0x7f4a71a0a897 in CU_run_all_tests (/usr/lib64/libcunit.so.1+0x5897) (BuildId: 9c82dd336cbccd99721651ac0a04435e746e0fc0) #12 0x524fe8 in main /home/vagrant/spdk_repo/spdk/test/unit/lib/bdev/raid/raid5f.c/raid5f_ut.c:1006:2 #13 0x7f4a711d750f in __libc_start_call_main (/usr/lib64/libc.so.6+0x2750f) (BuildId: 81daba31ee66dbd63efdc4252a872949d874d136) #14 0x7f4a711d75c8 in __libc_start_main@GLIBC_2.2.5 (/usr/lib64/libc.so.6+0x275c8) (BuildId: 81daba31ee66dbd63efdc4252a872949d874d136) #15 0x4235b4 in _start (/home/vagrant/spdk_repo/spdk/test/unit/lib/bdev/raid/raid5f.c/raid5f_ut+0x4235b4) (BuildId: 028d075edd1a7cd17881fd678ef076adfdbac13d) Fix this by making iters a zero-length array and put it in a union with a two-element array to keep the default size for compatibility. Change-Id: I8573b015755e9986cdadbfa1705d269d51a7c2b7 Signed-off-by: Artur Paszkiewicz <[email protected]> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/18402 Reviewed-by: Jim Harris <[email protected]> Community-CI: Mellanox Build Bot Tested-by: SPDK CI Jenkins <[email protected]> Reviewed-by: Shuhei Matsumoto <[email protected]>
Increase performance SPDK NVME-oF target RDMA running on system with
multiple NUMA nodes and multiple network adapter.
Add param NumaPolicy in RDMA conf
Signed-off-by: Ivan Betsis [email protected]
Change-Id: Ib64033f3cfeefda02f886c31a3331cd9b219875e