Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saftlib 3.0.3: saft-ctl snoop hangs if source sends 20.000 messages per second #110

Open
skorsky opened this issue Oct 25, 2023 · 1 comment

Comments

@skorsky
Copy link
Contributor

skorsky commented Oct 25, 2023

Steps to reproduce:

  1. Use an event source which produces 20.000 messages per second, i.e. a datamaster with attached schedule (rename from .txt to .dot).
    unilac900.txt
  2. saft-ctl tr0 -xv snoop 0 0 0 1 hangs (no events shown).
    snoop all events for 1 second from tr0. Snooping with an event id and mask works.
  3. saft-ctl tr0 -tij also hangs. Also commands to other timing receivers are blocked.
  4. Stopping the event source brings saft-ctl back. While saft-ctl ist blocked, saftbusd is busy on 1 CPU with nearly 100%.

This works with saftlib 2.5.0.
Hints from @miree : inspect saftlib/src/eb-source.cpp. This has been reworked for version 3.0.3. Compare to previous version.

@skorsky
Copy link
Contributor Author

skorsky commented Oct 26, 2023

Stack traces from gdb

(gdb) run
Starting program: /usr/local/bin/saftbusd libsaft-service.so tr0:dev/wbm0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
configurable chunck allocator
creating allocator with 16384 chuncks of size 128
creating allocator with 1024 chuncks of size 1024
creating allocator with 64 chuncks of size 16384
libsaft-service.so is plugin
tr0:dev/wbm0 is argument
OpenDevice::OpenDevice("dev/wbm0")
OpenDevice first,last,mask = 0,fff,fff
msi_target_adr for poll check: 00010870
needs polling? no
LM32Cluster::LM32Cluster
found 1 lm32 cpus
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7914a77 in __GI___libc_write (fd=7, buf=0x7fffffffbb90, nbytes=36) at ../sysdeps/unix/sysv/linux/write.c:26
26 ../sysdeps/unix/sysv/linux/write.c: No such file or directory.
(gdb) bt
#0 0x00007ffff7914a77 in __GI___libc_write (fd=7, buf=0x7fffffffbb90, nbytes=36) at ../sysdeps/unix/sysv/linux/write.c:26
#1 0x00007ffff74f15ac in eb_device_flush (devicep=devicep@entry=136, completed=completed@entry=0x7fffffffcc14) at format/master.c:553
#2 0x00007ffff74f6d24 in eb_socket_check (socketp=socketp@entry=128, now=, user=user@entry=0x7fffffffcc90, ready=ready@entry=0x7ffff74f80f0 <eb_check_sets>) at glue/socket.c:446
#3 0x00007ffff74f81f4 in eb_socket_run (socketp=socketp@entry=128, timeout_us=timeout_us@entry=-1) at transport/run.c:78
#4 0x00007ffff74f29f6 in eb_cycle_block (cyclep=, devicep=) at glue/cycle.c:154
#5 eb_cycle_close (cyclep=) at glue/cycle.c:187
#6 0x00007ffff7a73687 in etherbone::Cycle::close (this=0x7fffffffce1e) at /usr/local/include/etherbone.h:1099
#7 saftlib::ActionSink::updateAction (this=0x7ffff7b0a810) at src/ActionSink.cpp:400
#8 0x00007ffff7a7bdfd in saftlib::SoftwareActionSink::receiveMSI (code=4 '\004', this=0x7ffff7b0a810) at src/SoftwareActionSink.cpp:60
#9 saftlib::SoftwareActionSink::receiveMSI (this=0x7ffff7b0a810, code=) at src/SoftwareActionSink.cpp:53
#10 0x00007ffff7a6bb9e in std::function<void (unsigned long)>::operator()(unsigned long) const (__args#0=, this=) at /usr/include/c++/11/bits/std_function.h:590
#11 saftlib::SAFTd::write (this=, address=1096, width=, data=) at src/SAFTd.cpp:102
#12 0x00007ffff74f4306 in eb_socket_write (socketp=socketp@entry=128, widths=widths@entry=68 'D', ddr_b=addr_b@entry=1096, addr_l=addr_l@entry=1096, value=, error=error@entry=0x7fffffffd058) at glue/readwrite.c:164
#13 0x00007ffff74f0bca in eb_device_slave (socketp=socketp@entry=128, transportp=, evicep=devicep@entry=136, user_data=user_data@entry=0x7ffff75ff990, ready=ready@entry=0x7ffff7a685d0 <saftlib::EB_Source::get_fd(void*, int, unsigned char)>, completed=completed@entry=0x7fffffffe0e4) at format/slave.c:339
#14 0x00007ffff74f6d00 in eb_socket_check (socketp=128, now=, user=, ready=) at glue/socket.c:441
#15 0x00007ffff7a686b3 in etherbone::Socket::check (ready=0x7ffff7a685d0 <saftlib::EB_Source::get_fd(void*, int, unsigned char)>, user=0x7ffff75ff990, now=, this=0x7ffff75ff9c0) at /usr/local/include/etherbone.h:946
#16 saftlib::EB_Source::dispatch (this=0x7ffff75ff990) at src/eb-source.cpp:157
#17 0x00007ffff7f7840b in saftbus::Loop::iteration (this=0x7ffff7f987e0 saftbus::Loop::get_default()::default_loop, may_block=) at saftbus/loop.cpp:170
#18 0x00007ffff7f788ad in saftbus::Loop::run (this=0x7ffff7f987e0 saftbus::Loop::get_default()::default_loop) at saftbus/loop.cpp:205
#19 0x00005555555578fd in main (argc=, argv=0x7fffffffe458) at saftbus/saftbusd-noda.cpp:129
(gdb)

(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/local/bin/saftbusd libsaft-service.so tr0:dev/wbm0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
configurable chunck allocator
creating allocator with 16384 chuncks of size 128
creating allocator with 1024 chuncks of size 1024
creating allocator with 64 chuncks of size 16384
libsaft-service.so is plugin
tr0:dev/wbm0 is argument
OpenDevice::OpenDevice("dev/wbm0")
OpenDevice first,last,mask = 0,fff,fff
msi_target_adr for poll check: 00010870
needs polling? no
No handler for MSI 0x448
LM32Cluster::LM32Cluster
found 1 lm32 cpus
^C
Program received signal SIGINT, Interrupt.
__fcntl64_nocancel_adjusted (fd=7, cmd=cmd@entry=4, arg=0x8802) at
../sysdeps/unix/sysv/linux/fcntl_nocancel.c:63
63 ../sysdeps/unix/sysv/linux/fcntl_nocancel.c: No such file or directory.
(gdb) bt
#0 __fcntl64_nocancel_adjusted (fd=7, cmd=cmd@entry=4, arg=0x8802) at ../sysdeps/unix/sysv/linux/fcntl_nocancel.c:63
#1 0x00007ffff7914f28 in __GI___libc_fcntl64 (fd=, cmd=cmd@entry=4) at ../sysdeps/unix/sysv/linux/fcntl64.c:51
#2 0x00007ffff74f7143 in eb_dev_set_blocking (block=0, link=0x55555557a3e8) at transport/dev.c:52
#3 eb_dev_poll (transportp=, linkp=0x55555557a3e8, data=, ready=, buf=0x7fffffffbbd0 "\240\017", len=4104) at transport/dev.c:149
#4 0x00007ffff74f029c in eb_device_slave (socketp=socketp@entry=128, transportp=, devicep=devicep@entry=136, user_data=user_data@entry=0x7fffffffccd0, ready=ready@entry=0x7ffff74f80f0 <eb_check_sets>, completed=completed@entry=0x7fffffffcc54) at format/slave.c:122
#5 0x00007ffff74f6d00 in eb_socket_check (socketp=socketp@entry=128, now=, user=user@entry=0x7fffffffccd0, ready=ready@entry=0x7ffff74f80f0 <eb_check_sets>) at glue/socket.c:441
#6 0x00007ffff74f82c8 in eb_socket_run (socketp=socketp@entry=128, timeout_us=, timeout_us@entry=-1) at transport/run.c:105
#7 0x00007ffff74f29f6 in eb_cycle_block (cyclep=, devicep=) at glue/cycle.c:154
#8 eb_cycle_close (cyclep=) at glue/cycle.c:187
#9 0x00007ffff7a7bfcb in etherbone::Cycle::close (this=0x7fffffffce54) at /usr/local/include/etherbone.h:1099
#10 saftlib::SoftwareActionSink::receiveMSI (code=4 '\004', this=0x7ffff7b0a810) at src/SoftwareActionSink.cpp:81
#11 saftlib::SoftwareActionSink::receiveMSI (this=0x7ffff7b0a810, code=) at src/SoftwareActionSink.cpp:53
#12 0x00007ffff7a6bb9e in std::function<void (unsigned long)>::operator()(unsigned long) const (__args#0=, this=) at /usr/include/c++/11/bits/std_function.h:590
#13 saftlib::SAFTd::write (this=, address=1096, width=, data=) at src/SAFTd.cpp:102
#14 0x00007ffff74f4306 in eb_socket_write (socketp=socketp@entry=128, widths=widths@entry=68 'D', addr_b=addr_b@entry=1096, addr_l=addr_l@entry=1096, value=, error=error@entry=0x7fffffffd058) at glue/readwrite.c:164
#15 0x00007ffff74f0bca in eb_device_slave (socketp=socketp@entry=128, transportp=, devicep=devicep@entry=136, user_data=user_data@entry=0x7ffff75ff990, ready=ready@entry=0x7ffff7a685d0 <saftlib::EB_Source::get_fd(void*, int, unsigned char)>, completed=completed@entry=0x7fffffffe0e4) at format/slave.c:339
#16 0x00007ffff74f6d00 in eb_socket_check (socketp=128, now=, user=, ready=) at glue/socket.c:441
#17 0x00007ffff7a686b3 in etherbone::Socket::check (ready=0x7ffff7a685d0 <saftlib::EB_Source::get_fd(void*, int, unsigned char)>, user=0x7ffff75ff990, now=, this=0x7ffff75ff9c0) at /usr/local/include/etherbone.h:946
#18 saftlib::EB_Source::dispatch (this=0x7ffff75ff990) at src/eb-source.cpp:157
#19 0x00007ffff7f7840b in saftbus::Loop::iteration (this=0x7ffff7f987e0 saftbus::Loop::get_default()::default_loop, may_block=) at saftbus/loop.cpp:170
#20 0x00007ffff7f788ad in saftbus::Loop::run (this=0x7ffff7f987e0 saftbus::Loop::get_default()::default_loop) at saftbus/loop.cpp:205
#21 0x00005555555578fd in main (argc=, argv=0x7fffffffe458) at saftbus/saftbusd-noda.cpp:129
(gdb) continue
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7914a77 in __GI___libc_write (fd=7, buf=0x7fffffffbbd0,
nbytes=76) at ../sysdeps/unix/sysv/linux/write.c:26
26 ../sysdeps/unix/sysv/linux/write.c: No such file or directory.
(gdb) bt
#0 0x00007ffff7914a77 in __GI___libc_write (fd=7, buf=0x7fffffffbbd0, nbytes=76) at ../sysdeps/unix/sysv/linux/write.c:26
#1 0x00007ffff74f15ac in eb_device_flush (devicep=devicep@entry=136, completed=completed@entry=0x7fffffffcc54) at format/master.c:553
#2 0x00007ffff74f6d24 in eb_socket_check (socketp=socketp@entry=128, now=, user=user@entry=0x7fffffffccd0, ready=ready@entry=0x7ffff74f80f0 <eb_check_sets>) at glue/socket.c:446
#3 0x00007ffff74f81f4 in eb_socket_run (socketp=socketp@entry=128, timeout_us=timeout_us@entry=-1) at transport/run.c:78
#4 0x00007ffff74f29f6 in eb_cycle_block (cyclep=, devicep=) at glue/cycle.c:154
#5 eb_cycle_close (cyclep=) at glue/cycle.c:187
#6 0x00007ffff7a7bfcb in etherbone::Cycle::close (this=0x7fffffffce54) at /usr/local/include/etherbone.h:1099
#7 saftlib::SoftwareActionSink::receiveMSI (code=4 '\004', this=0x7ffff7b0a810) at src/SoftwareActionSink.cpp:81
#8 saftlib::SoftwareActionSink::receiveMSI (this=0x7ffff7b0a810, code=) at src/SoftwareActionSink.cpp:53
#9 0x00007ffff7a6bb9e in std::function<void (unsigned long)>::operator()(unsigned long) const (__args#0=, this=) at /usr/include/c++/11/bits/std_function.h:590
#10 saftlib::SAFTd::write (this=, address=1096, width=, data=) at src/SAFTd.cpp:102
#11 0x00007ffff74f4306 in eb_socket_write (socketp=socketp@entry=128, widths=widths@entry=68 'D', addr_b=addr_b@entry=1096, addr_l=addr_l@entry=1096, value=, error=error@entry=0x7fffffffd058) at glue/readwrite.c:164
#12 0x00007ffff74f0bca in eb_device_slave (socketp=socketp@entry=128, transportp=, devicep=devicep@entry=136, user_data=user_data@entry=0x7ffff75ff990, ready=ready@entry=0x7ffff7a685d0 <saftlib::EB_Source::get_fd(void*, int, unsigned char)>, completed=completed@entry=0x7fffffffe0e4) at format/slave.c:339
#13 0x00007ffff74f6d00 in eb_socket_check (socketp=128, now=, user=, ready=) at glue/socket.c:441
#14 0x00007ffff7a686b3 in etherbone::Socket::check (ready=0x7ffff7a685d0 <saftlib::EB_Source::get_fd(void*, int, unsigned char)>, user=0x7ffff75ff990, now=, this=0x7ffff75ff9c0) at /usr/local/include/etherbone.h:946
#15 saftlib::EB_Source::dispatch (this=0x7ffff75ff990) at src/eb-source.cpp:157
#16 0x00007ffff7f7840b in saftbus::Loop::iteration (this=0x7ffff7f987e0 saftbus::Loop::get_default()::default_loop, may_block=) at saftbus/loop.cpp:170
#17 0x00007ffff7f788ad in saftbus::Loop::run (this=0x7ffff7f987e0 saftbus::Loop::get_default()::default_loop) at saftbus/loop.cpp:205
#18 0x00005555555578fd in main (argc=, argv=0x7fffffffe458) at saftbus/saftbusd-noda.cpp:129
(gdb) continue
Continuing.
Unhandled unknown exception in MSI handler for 0x448
^C
Program received signal SIGINT, Interrupt.
0x00007ffff791b78d in __GI___select (nfds=8,
readfds=readfds@entry=0x7fffffffcc98,
writefds=writefds@entry=0x7fffffffcd18, exceptfds=exceptfds@entry=0x0,
timeout=timeout@entry=0x7fffffffcc60) at
../sysdeps/unix/sysv/linux/select.c:69
69 ../sysdeps/unix/sysv/linux/select.c: No such file or directory.
(gdb) bt
#0 0x00007ffff791b78d in __GI___select (nfds=8, readfds=readfds@entry=0x7fffffffcc98,
writefds=writefds@entry=0x7fffffffcd18, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7fffffffcc60) at
../sysdeps/unix/sysv/linux/select.c:69
#1 0x00007ffff74f82ab in eb_socket_run (socketp=socketp@entry=128, timeout_us=, timeout_us@entry=-1) at transport/run.c:101
#2 0x00007ffff74f29f6 in eb_cycle_block (cyclep=, devicep=) at glue/cycle.c:154
#3 eb_cycle_close (cyclep=) at glue/cycle.c:187
#4 0x00007ffff7a73687 in etherbone::Cycle::close (this=0x7fffffffce1e) at /usr/local/include/etherbone.h:1099
#5 saftlib::ActionSink::updateAction (this=0x7ffff7b0a810) at src/ActionSink.cpp:400
#6 0x00007ffff7a7bdfd in saftlib::SoftwareActionSink::receiveMSI (code=4 '\004', this=0x7ffff7b0a810) at src/SoftwareActionSink.cpp:60
#7 saftlib::SoftwareActionSink::receiveMSI (this=0x7ffff7b0a810, code=) at src/SoftwareActionSink.cpp:53
#8 0x00007ffff7a6bb9e in std::function<void (unsigned long)>::operator()(unsigned long) const (__args#0=, this=) at /usr/include/c++/11/bits/std_function.h:590
#9 saftlib::SAFTd::write (this=, address=1096, width=, data=) at src/SAFTd.cpp:102
#10 0x00007ffff74f4306 in eb_socket_write (socketp=socketp@entry=128, widths=widths@entry=68 'D', addr_b=addr_b@entry=1096, addr_l=addr_l@entry=1096, value=, error=error@entry=0x7fffffffd058) at glue/readwrite.c:164
#11 0x00007ffff74f0bca in eb_device_slave (socketp=socketp@entry=128, transportp=, devicep=devicep@entry=136, user_data=user_data@entry=0x7ffff75ff990, ready=ready@entry=0x7ffff7a685d0 <saftlib::EB_Source::get_fd(void*, int, unsigned char)>, completed=completed@entry=0x7fffffffe0e4) at format/slave.c:339
#12 0x00007ffff74f6d00 in eb_socket_check (socketp=128, now=, user=, ready=) at glue/socket.c:441
#13 0x00007ffff7a686b3 in etherbone::Socket::check (ready=0x7ffff7a685d0 <saftlib::EB_Source::get_fd(void*, int, unsigned char)>, user=0x7ffff75ff990, now=, this=0x7ffff75ff9c0) at /usr/local/include/etherbone.h:946
#14 saftlib::EB_Source::dispatch (this=0x7ffff75ff990) at src/eb-source.cpp:157
#15 0x00007ffff7f7840b in saftbus::Loop::iteration (this=0x7ffff7f987e0 saftbus::Loop::get_default()::default_loop, may_block=) at saftbus/loop.cpp:170
#16 0x00007ffff7f788ad in saftbus::Loop::run (this=0x7ffff7f987e0 saftbus::Loop::get_default()::default_loop) at saftbus/loop.cpp:205
#17 0x00005555555578fd in main (argc=, argv=0x7fffffffe458) at saftbus/saftbusd-noda.cpp:129
(gdb)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant