forked from iovisor/bcc
-
Notifications
You must be signed in to change notification settings - Fork 1
/
stackcount_example.txt
963 lines (836 loc) · 21.5 KB
/
stackcount_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
Demonstrations of stackcount, the Linux eBPF/bcc version.
This program traces functions and frequency counts them with their entire
stack trace, summarized in-kernel for efficiency. For example, counting
stack traces that led to the submit_bio() kernel function, which creates
block device I/O:
# ./stackcount submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
^C
submit_bio
submit_bh
journal_submit_commit_record.isra.13
jbd2_journal_commit_transaction
kjournald2
kthread
ret_from_fork
mb_cache_list
1
submit_bio
__block_write_full_page.constprop.39
block_write_full_page
blkdev_writepage
__writepage
write_cache_pages
generic_writepages
do_writepages
__writeback_single_inode
writeback_sb_inodes
__writeback_inodes_wb
2
submit_bio
__block_write_full_page.constprop.39
block_write_full_page
blkdev_writepage
__writepage
write_cache_pages
generic_writepages
do_writepages
__filemap_fdatawrite_range
filemap_fdatawrite
fdatawrite_one_bdev
36
submit_bio
submit_bh
jbd2_journal_commit_transaction
kjournald2
kthread
ret_from_fork
mb_cache_list
38
submit_bio
ext4_writepages
do_writepages
__filemap_fdatawrite_range
filemap_flush
ext4_alloc_da_blocks
ext4_rename
ext4_rename2
vfs_rename
sys_rename
entry_SYSCALL_64_fastpath
79
Detaching...
The output shows unique stack traces, in order from leaf (on-CPU) to root,
followed by their occurrence count. The last stack trace in the above output
shows syscall handling, ext4_rename(), and filemap_flush(): looks like an
application issued file rename has caused back end disk I/O due to ext4
block allocation and a filemap_flush().
Now adding the -P option to display stacks separately for each process:
# ./stackcount -P submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
^C
submit_bio
ext4_writepages
do_writepages
__filemap_fdatawrite_range
filemap_flush
ext4_alloc_da_blocks
ext4_release_file
__fput
____fput
task_work_run
exit_to_usermode_loop
syscall_return_slowpath
entry_SYSCALL_64_fastpath
[unknown]
[unknown]
tar [15069]
5
submit_bio
ext4_bio_write_page
mpage_submit_page
mpage_map_and_submit_buffers
ext4_writepages
do_writepages
__filemap_fdatawrite_range
filemap_flush
ext4_alloc_da_blocks
ext4_release_file
__fput
____fput
task_work_run
exit_to_usermode_loop
syscall_return_slowpath
entry_SYSCALL_64_fastpath
[unknown]
[unknown]
tar [15069]
15
submit_bio
ext4_readpages
__do_page_cache_readahead
ondemand_readahead
page_cache_async_readahead
generic_file_read_iter
__vfs_read
vfs_read
sys_read
entry_SYSCALL_64_fastpath
[unknown]
tar [15069]
113
Detaching...
The last stack trace in the above output shows syscall handling, sys_read(),
vfs_read(), and then "readahead" functions: looks like an application issued
file read has triggered read ahead. With "-P", the application can be seen
after the stack trace, in this case, "tar [15069]" for the "tar" command,
PID 15069.
The order of printed stack traces is from least to most frequent. The most
frequent in this case, the ext4_readpages() stack, was taken 113 times during
tracing.
The "[unknown]" frames are from user-level, since this simple workload is
the tar command, which apparently has been compiled without frame pointers.
It's a common compiler optimization, but it breaks frame pointer-based stack
walkers. Similar broken stacks will be seen by other profilers and debuggers
that use frame pointers. Hopefully your application preserves them so that
the user-level stack trace is visible. So how does one get frame pointers, if
your application doesn't have them to start with? For the current bcc (until
it supports other stack walkers), you need to be running an application binaries
that preserves frame pointers, eg, using gcc's -fno-omit-frame-pointer. That's
about all I'll say here: this is a big topic that is not bcc/BPF specific.
It can be useful to trace the path to submit_bio to explain unusual rates of
disk IOPS. These could have in-kernel origins (eg, background scrub).
Now adding the -d option to delimit kernel and user stacks:
# ./stackcount -P -d submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
^C
submit_bio
submit_bh
journal_submit_commit_record
jbd2_journal_commit_transaction
kjournald2
kthread
ret_from_fork
--
jbd2/xvda1-8 [405]
1
submit_bio
submit_bh
jbd2_journal_commit_transaction
kjournald2
kthread
ret_from_fork
--
jbd2/xvda1-8 [405]
2
submit_bio
ext4_writepages
do_writepages
__filemap_fdatawrite_range
filemap_flush
ext4_alloc_da_blocks
ext4_release_file
__fput
____fput
task_work_run
exit_to_usermode_loop
syscall_return_slowpath
entry_SYSCALL_64_fastpath
--
[unknown]
[unknown]
tar [15187]
5
submit_bio
ext4_bio_write_page
mpage_submit_page
mpage_map_and_submit_buffers
ext4_writepages
do_writepages
__filemap_fdatawrite_range
filemap_flush
ext4_alloc_da_blocks
ext4_release_file
__fput
____fput
task_work_run
exit_to_usermode_loop
syscall_return_slowpath
entry_SYSCALL_64_fastpath
--
[unknown]
[unknown]
tar [15187]
15
submit_bio
ext4_readpages
__do_page_cache_readahead
ondemand_readahead
page_cache_async_readahead
generic_file_read_iter
__vfs_read
vfs_read
sys_read
entry_SYSCALL_64_fastpath
--
[unknown]
[unknown]
[unknown]
tar [15187]
171
Detaching...
A "--" is printed between the kernel and user stacks.
As a different example, here is the kernel function hrtimer_init_sleeper():
# ./stackcount.py -P -d hrtimer_init_sleeper
Tracing 1 functions for "hrtimer_init_sleeper"... Hit Ctrl-C to end.
^C
hrtimer_init_sleeper
do_futex
SyS_futex
entry_SYSCALL_64_fastpath
--
[unknown]
containerd [16020]
1
hrtimer_init_sleeper
do_futex
SyS_futex
entry_SYSCALL_64_fastpath
--
__pthread_cond_timedwait
Monitor::IWait(Thread*, long)
Monitor::wait(bool, long, bool)
CompileQueue::get()
CompileBroker::compiler_thread_loop()
JavaThread::thread_main_inner()
JavaThread::run()
java_start(Thread*)
start_thread
java [4996]
1
hrtimer_init_sleeper
do_futex
SyS_futex
entry_SYSCALL_64_fastpath
--
[unknown]
[unknown]
containerd [16020]
1
hrtimer_init_sleeper
do_futex
SyS_futex
entry_SYSCALL_64_fastpath
--
__pthread_cond_timedwait
VMThread::loop()
VMThread::run()
java_start(Thread*)
start_thread
java [4996]
3
hrtimer_init_sleeper
do_futex
SyS_futex
entry_SYSCALL_64_fastpath
--
[unknown]
dockerd [16008]
4
hrtimer_init_sleeper
do_futex
SyS_futex
entry_SYSCALL_64_fastpath
--
[unknown]
[unknown]
dockerd [16008]
4
hrtimer_init_sleeper
do_futex
SyS_futex
entry_SYSCALL_64_fastpath
--
__pthread_cond_timedwait
Lio/netty/util/ThreadDeathWatcher$Watcher;::run
Interpreter
Interpreter
call_stub
JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)
JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)
JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)
thread_entry(JavaThread*, Thread*)
JavaThread::thread_main_inner()
JavaThread::run()
java_start(Thread*)
start_thread
java [4996]
4
hrtimer_init_sleeper
do_futex
SyS_futex
entry_SYSCALL_64_fastpath
--
__pthread_cond_timedwait
clock_gettime
[unknown]
java [4996]
79
Detaching...
I was just trying to find a more interesting example. This output includes
some Java stacks where user-level has been walked correctly (even includes a
JIT symbol translation). dockerd and containerd don't have frame pointers
(grumble), but Java does (which is running with -XX:+PreserveFramePointer).
Here's another kernel function, ip_output():
# ./stackcount.py -P -d ip_output
Tracing 1 functions for "ip_output"... Hit Ctrl-C to end.
^C
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_write_xmit
__tcp_push_pending_frames
tcp_push
tcp_sendmsg
inet_sendmsg
sock_sendmsg
sock_write_iter
__vfs_write
vfs_write
SyS_write
entry_SYSCALL_64_fastpath
--
__write_nocancel
[unknown]
sshd [15015]
5
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_write_xmit
__tcp_push_pending_frames
tcp_push
tcp_sendmsg
inet_sendmsg
sock_sendmsg
sock_write_iter
__vfs_write
vfs_write
SyS_write
entry_SYSCALL_64_fastpath
--
__write_nocancel
[unknown]
[unknown]
sshd [8234]
5
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_write_xmit
__tcp_push_pending_frames
tcp_push
tcp_sendmsg
inet_sendmsg
sock_sendmsg
sock_write_iter
__vfs_write
vfs_write
SyS_write
entry_SYSCALL_64_fastpath
--
__write_nocancel
sshd [15015]
7
Detaching...
This time just sshd is triggering ip_output() calls.
Watch what happens if I filter on kernel stacks only (-K) for ip_output():
# ./stackcount.py -K ip_output
Tracing 1 functions for "ip_output"... Hit Ctrl-C to end.
^C
ip_output
ip_queue_xmit
tcp_transmit_skb
tcp_write_xmit
__tcp_push_pending_frames
tcp_push
tcp_sendmsg
inet_sendmsg
sock_sendmsg
sock_write_iter
__vfs_write
vfs_write
SyS_write
entry_SYSCALL_64_fastpath
13
Detaching...
They have grouped together as a single unique stack, since the kernel part
was the same.
Here is just the user stacks, fetched during the kernel function ip_output():
# ./stackcount.py -P -U ip_output
Tracing 1 functions for "ip_output"... Hit Ctrl-C to end.
^C
[unknown]
snmpd [1645]
1
__write_nocancel
[unknown]
[unknown]
sshd [8234]
3
__write_nocancel
sshd [15015]
4
I should really run a custom sshd with frame pointers so we can see its
stack trace...
User-space functions can also be traced if a library name is provided. For
example, to quickly identify code locations that allocate heap memory for
PID 4902 (using -p), by tracing malloc from libc ("c:malloc"):
# ./stackcount -p 4902 c:malloc
Tracing 1 functions for "malloc"... Hit Ctrl-C to end.
^C
malloc
rbtree_new
main
[unknown]
12
malloc
_rbtree_node_new_internal
_rbtree_node_insert
rbtree_insert
main
[unknown]
1189
Detaching...
Kernel stacks are absent as this didn't enter kernel code.
Note that user-space uses of stackcount can be somewhat more limited because
a lot of user-space libraries and binaries are compiled without frame-pointers
as discussed earlier (the -fomit-frame-pointer compiler default) or are used
without debuginfo.
In addition to kernel and user-space functions, kernel tracepoints and USDT
tracepoints are also supported.
For example, to determine where threads are being created in a particular
process, use the pthread_create USDT tracepoint:
# ./stackcount -P -p $(pidof parprimes) u:pthread:pthread_create
Tracing 1 functions for "u:pthread:pthread_create"... Hit Ctrl-C to end.
^C
parprimes [11923]
pthread_create@@GLIBC_2.2.5
main
__libc_start_main
[unknown]
7
You can use "readelf -n file" to see if it has USDT tracepoints.
Similarly, to determine where context switching is happening in the kernel,
use the sched:sched_switch kernel tracepoint:
# ./stackcount -P t:sched:sched_switch
__schedule
schedule
worker_thread
kthread
ret_from_fork
kworker/0:2 [25482]
1
__schedule
schedule
schedule_hrtimeout_range_clock
schedule_hrtimeout_range
ep_poll
SyS_epoll_wait
entry_SYSCALL_64_fastpath
epoll_wait
Lsun/nio/ch/SelectorImpl;::lockAndDoSelect
Lsun/nio/ch/SelectorImpl;::select
Lio/netty/channel/nio/NioEventLoop;::select
Lio/netty/channel/nio/NioEventLoop;::run
Interpreter
Interpreter
call_stub
JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)
JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)
JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)
thread_entry(JavaThread*, Thread*)
JavaThread::thread_main_inner()
JavaThread::run()
java_start(Thread*)
start_thread
java [4996]
1
... (omitted for brevity)
__schedule
schedule
schedule_preempt_disabled
cpu_startup_entry
xen_play_dead
arch_cpu_idle_dead
cpu_startup_entry
cpu_bringup_and_idle
swapper/1 [0]
289
A -i option can be used to set an output interval, and -T to include a
timestamp. For example:
# ./stackcount.py -P -Tdi 1 submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
06:05:13
06:05:14
submit_bio
xfs_do_writepage
write_cache_pages
xfs_vm_writepages
do_writepages
__writeback_single_inode
writeback_sb_inodes
__writeback_inodes_wb
wb_writeback
wb_workfn
process_one_work
worker_thread
kthread
ret_from_fork
--
kworker/u16:1 [15686]
1
submit_bio
process_one_work
worker_thread
kthread
ret_from_fork
--
kworker/u16:0 [16007]
1
submit_bio
xfs_buf_submit
xlog_bdstrat
xlog_sync
xlog_state_release_iclog
_xfs_log_force
xfs_log_force
xfs_fs_sync_fs
sync_fs_one_sb
iterate_supers
sys_sync
entry_SYSCALL_64_fastpath
--
[unknown]
sync [16039]
1
submit_bio
submit_bh
journal_submit_commit_record
jbd2_journal_commit_transaction
kjournald2
kthread
ret_from_fork
--
jbd2/xvda1-8 [405]
1
submit_bio
process_one_work
worker_thread
kthread
ret_from_fork
--
kworker/0:2 [25482]
2
submit_bio
ext4_writepages
do_writepages
__writeback_single_inode
writeback_sb_inodes
__writeback_inodes_wb
wb_writeback
wb_workfn
process_one_work
worker_thread
kthread
ret_from_fork
--
kworker/u16:0 [16007]
4
submit_bio
xfs_vm_writepages
do_writepages
__writeback_single_inode
writeback_sb_inodes
__writeback_inodes_wb
wb_writeback
wb_workfn
process_one_work
worker_thread
kthread
ret_from_fork
--
kworker/u16:1 [15686]
5
submit_bio
__block_write_full_page
block_write_full_page
blkdev_writepage
__writepage
write_cache_pages
generic_writepages
blkdev_writepages
do_writepages
__filemap_fdatawrite_range
filemap_fdatawrite
fdatawrite_one_bdev
iterate_bdevs
sys_sync
entry_SYSCALL_64_fastpath
--
[unknown]
sync [16039]
7
submit_bio
submit_bh
jbd2_journal_commit_transaction
kjournald2
kthread
ret_from_fork
--
jbd2/xvda1-8 [405]
8
submit_bio
ext4_bio_write_page
mpage_submit_page
mpage_map_and_submit_buffers
ext4_writepages
do_writepages
__writeback_single_inode
writeback_sb_inodes
__writeback_inodes_wb
wb_writeback
wb_workfn
process_one_work
worker_thread
kthread
ret_from_fork
--
kworker/u16:0 [16007]
8
submit_bio
__block_write_full_page
block_write_full_page
blkdev_writepage
__writepage
write_cache_pages
generic_writepages
blkdev_writepages
do_writepages
__writeback_single_inode
writeback_sb_inodes
__writeback_inodes_wb
wb_writeback
wb_workfn
process_one_work
worker_thread
kthread
ret_from_fork
--
kworker/u16:0 [16007]
60
06:05:15
06:05:16
Detaching...
This only included output for the 06:05:14 interval. The other internals
did not span block device I/O.
The -s output prints the return instruction offset for each function (aka
symbol offset). Eg:
# ./stackcount.py -P -s tcp_sendmsg
Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end.
^C
tcp_sendmsg+0x1
sock_sendmsg+0x38
sock_write_iter+0x85
__vfs_write+0xe3
vfs_write+0xb8
SyS_write+0x55
entry_SYSCALL_64_fastpath+0x1e
__write_nocancel+0x7
sshd [15015]
3
tcp_sendmsg+0x1
sock_sendmsg+0x38
sock_write_iter+0x85
__vfs_write+0xe3
vfs_write+0xb8
SyS_write+0x55
entry_SYSCALL_64_fastpath+0x1e
__write_nocancel+0x7
sshd [8234]
3
Detaching...
If it wasn't clear how one function called another, knowing the instruction
offset can help you locate the lines of code from a disassembly dump.
The -v output is verbose, and shows raw addresses:
./stackcount.py -P -v tcp_sendmsg
Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end.
^C
ffffffff817b05c1 tcp_sendmsg
ffffffff8173ea48 sock_sendmsg
ffffffff8173eae5 sock_write_iter
ffffffff81232b33 __vfs_write
ffffffff812331b8 vfs_write
ffffffff81234625 SyS_write
ffffffff818739bb entry_SYSCALL_64_fastpath
7f120511e6e0 __write_nocancel
sshd [8234]
3
ffffffff817b05c1 tcp_sendmsg
ffffffff8173ea48 sock_sendmsg
ffffffff8173eae5 sock_write_iter
ffffffff81232b33 __vfs_write
ffffffff812331b8 vfs_write
ffffffff81234625 SyS_write
ffffffff818739bb entry_SYSCALL_64_fastpath
7f919c5a26e0 __write_nocancel
sshd [15015]
11
Detaching...
A wildcard can also be used. Eg, all functions beginning with "tcp_send",
kernel stacks only (-K) with offsets (-s):
# ./stackcount -Ks 'tcp_send*'
Tracing 14 functions for "tcp_send*"... Hit Ctrl-C to end.
^C
tcp_send_delayed_ack0x1
tcp_rcv_established0x3b1
tcp_v4_do_rcv0x130
tcp_v4_rcv0x8e0
ip_local_deliver_finish0x9f
ip_local_deliver0x51
ip_rcv_finish0x8a
ip_rcv0x29d
__netif_receive_skb_core0x637
__netif_receive_skb0x18
netif_receive_skb_internal0x23
1
tcp_send_delayed_ack0x1
tcp_rcv_established0x222
tcp_v4_do_rcv0x130
tcp_v4_rcv0x8e0
ip_local_deliver_finish0x9f
ip_local_deliver0x51
ip_rcv_finish0x8a
ip_rcv0x29d
__netif_receive_skb_core0x637
__netif_receive_skb0x18
netif_receive_skb_internal0x23
4
tcp_send_mss0x1
inet_sendmsg0x67
sock_sendmsg0x38
sock_write_iter0x78
__vfs_write0xaa
vfs_write0xa9
sys_write0x46
entry_SYSCALL_64_fastpath0x16
7
tcp_sendmsg0x1
sock_sendmsg0x38
sock_write_iter0x78
__vfs_write0xaa
vfs_write0xa9
sys_write0x46
entry_SYSCALL_64_fastpath0x16
7
Detaching...
Use -r to allow regular expressions.
The -f option will emit folded output, which can be used as input to other
tools including flame graphs. For example, with delimiters as well:
# ./stackcount.py -P -df t:sched:sched_switch
^Csnmp-pass;[unknown];[unknown];[unknown];[unknown];[unknown];-;entry_SYSCALL_64_fastpath;SyS_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule 1
kworker/7:0;-;ret_from_fork;kthread;worker_thread;schedule;__schedule 1
watchdog/0;-;ret_from_fork;kthread;smpboot_thread_fn;schedule;__schedule 1
snmp-pass;[unknown];[unknown];[unknown];[unknown];[unknown];-;entry_SYSCALL_64_fastpath;SyS_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule 1
svscan;[unknown];-;entry_SYSCALL_64_fastpath;SyS_nanosleep;hrtimer_nanosleep;do_nanosleep;schedule;__schedule 1
python;[unknown];__select_nocancel;-;entry_SYSCALL_64_fastpath;SyS_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule 1
kworker/2:0;-;ret_from_fork;kthread;worker_thread;schedule;__schedule 1
[...]
Flame graphs visualize stack traces. For information about them and links
to open source software, see http://www.brendangregg.com/flamegraphs.html .
This folded output can be piped directly into flamegraph.pl (the Perl version).
USAGE message:
# ./stackcount -h
usage: stackcount [-h] [-p PID] [-c CPU] [-i INTERVAL] [-D DURATION] [-T] [-r]
[-s] [-P] [-K] [-U] [-v] [-d] [-f] [--debug]
pattern
Count events and their stack traces
positional arguments:
pattern search expression for events
optional arguments:
-h, --help show this help message and exit
-p PID, --pid PID trace this PID only
-c CPU, --cpu CPU trace this CPU only
-i INTERVAL, --interval INTERVAL
summary interval, seconds
-D DURATION, --duration DURATION
total duration of trace, seconds
-T, --timestamp include timestamp on output
-r, --regexp use regular expressions. Default is "*" wildcards
only.
-s, --offset show address offsets
-P, --perpid display stacks separately for each process
-K, --kernel-stacks-only
kernel stack only
-U, --user-stacks-only
user stack only
-v, --verbose show raw addresses
-d, --delimited insert delimiter between kernel/user stacks
-f, --folded output folded format
--debug print BPF program before starting (for debugging
purposes)
examples:
./stackcount submit_bio # count kernel stack traces for submit_bio
./stackcount -d ip_output # include a user/kernel stack delimiter
./stackcount -s ip_output # show symbol offsets
./stackcount -sv ip_output # show offsets and raw addresses (verbose)
./stackcount 'tcp_send*' # count stacks for funcs matching tcp_send*
./stackcount -r '^tcp_send.*' # same as above, using regular expressions
./stackcount -Ti 5 ip_output # output every 5 seconds, with timestamps
./stackcount -p 185 ip_output # count ip_output stacks for PID 185 only
./stackcount -p 185 c:malloc # count stacks for malloc in PID 185
./stackcount t:sched:sched_fork # count stacks for sched_fork tracepoint
./stackcount -p 185 u:node:* # count stacks for all USDT probes in node
./stackcount -c 1 put_prev_entity # count put_prev_entity stacks for CPU 1 only
./stackcount -K t:sched:sched_switch # kernel stacks only
./stackcount -U t:sched:sched_switch # user stacks only