forked from HPCToolkit/hpctoolkit
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.ReleaseNotes
1069 lines (858 loc) · 42.3 KB
/
README.ReleaseNotes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
----------------------------------------
HPCToolkit 2021.01.10
----------------------------------------
Bug fixes
- fix thread memory pool for amd gpus
- fix hpcrun use of ignore thread
- cleanup gpu traces
- fix auditor initialization bug
hpcrun
- add spack smoke test
hpcviewer
- Using NatTable widget instead of SWT table for better scalable data representation.
- Support for more than two databases in a window (issue #135)
- Search a text in the tree table (issue #142)
Issue #20 (partial fix): enhancement for x-axis in the trace view
Issue #62 (partial fix): render tiny gpu trace when it's possible
Issue #101: fix clipped figures in the statistics view
Issue #112: fix white dots on the trace view (Windows only)
Issue #113: fix trace view refresh problem (Windows only)
Issue #118: fix computing the ideal width of table columns
Issue #121: fix tree image in the bottom-up view
Issue #126: fix illegal argument exception when flattening a tree
Issue #127: fix dark mode on macOS
Issue #128: fix -v option on Linux command line
Issue #132: fix font color for row selection on macOS
Issue #134: fix dark mode in Linux GNOME window manager
Issue #137: fix a spacer in a leaf node
Issue #138: fix call site icon position in the table
Issue #139: fix data concurrency problems
Issue #140: fix row number in the metric view
Issue #141: ensure consistency between metric view and thread view
Issue #143: fix support for ISO 8859 character set
Issue #145: fix the column width when resizing the window or the table
----------------------------------------
HPCToolkit 2021.05.15
----------------------------------------
Bug fixes
hpcrun
CPU issues
- avoid deadlock by not sampling an openmp thread before it finishes
setting up TLS
- avoid having the UCX communication library used by MPI terminate
a program when an unwind fails rather than just dropping a
sample
- fix initialization of control knobs when a process forks but
does not exec
- add a timeout to interrupt a hung cuptiActivityFlushAll and so a program
can terminate and write out all performance data already collected.
Intel GPUs
- always dump Intel GPU binaries so we can extract kernel names
even if not using GTPin binary instrumentation
NVIDIA GPUs
- avoid introducing kernel serialization while using coarse-grain
measurement by monitoring CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL rather
than CUPTI_ACTIVITY_KIND_KERNEL
hpcstruct
- correct reconstruction of loop nests for Intel GPU binaries
hpcviewer
- Fix issue #80 and #81 (null pointer exception for empty databases)
- Fix issue #79 (CCT filter on the trace view, preserve tree expansion)
- Fix issue #73 (sort direction is not shown on Linux for the first appearance)
- Fix issue #75 (closing only a window in multiple windows mode)
- Fix issue #74 (no sort direction on Linux/GTK)
- Fix issue #85 (keyboard shortcut to minimize the window)
- Fix filtering CCT nodes for thread views
- Fix hot path to select the child node instead of the parent
- Fix merging GPU databases which contain aggregate and derived metrics
by deep copying the metric descriptors.
- Fix build script to include notarization for mac
- Fix storing recent open database: store the absolute path, not the relative one.
- Fix SWT resource leaks
- Fix flickering issue on Windows when splitting the hpcviewer window.
- Fix trace view’s color map changes to also refresh other panes and windows
- Fix Find dialog layout on Linux/GTK
- Fix merging GPU databases
- Fix a procedure-color mapping bug in the trace view
- Partial fix issue 42: Fix a performance bug when sorting a table
Improvements
hpcviewer
- Improve the performance of hot-path operation by not re-revealing the tree path.
- Default window size is 1400x1000 or the screen size
- Trace view: Move depth field into a separate pane so users can change the depth easily
even when call stack view is not visible.
- Reduce memory consumption.
- Use Java XML parser to slightly improve XML parsing performance and avoid using
the old Apache xerces.
- Code clean-up, remove dead code and remove unused variables
- Issue 77: Add support for different color mapping policy in the trace view.
Default: procedure-name color instead of random color.
- Warn users when filtering is enabled
- Default is to build with Eclipse 4.19 (2021.03) except for Linux
ppc64le (built with Eclipse 4.16). Some fixes include improved dark color theme.
dotgraph
- enhance dotgraph to dump control flow graphs for GPU binaries
----------------------------------------
HPCToolkit 2021.03.01
----------------------------------------
The principal objective of this release was to support profiling for
various GPU-based applications, under a variety of programming methods,
and for a variety of GPUs.
New and Improved functionality
hpcrun
Support for GPU Profiling and Tracing
- Intel GPUs
- Coarse-grain profiling of GPU operations
(kernel launch, memory copy, synchronization, etc.)
- for OpenCL programs
- For dpcpp programs using either OpenCL or Level 0 runtimes
Tracing of GPU operations for OpenCL programs or dpcpp programs
when using the OpenCL runtime
Fine-grain measurements (instruction counts) using GT-Pin
on OpenCL or dpcpp programs using the OpenCL runtime
Support for instruction counts with GT-Pin
For OpenCL or dpcpp programs using the OpenCL runtime,
including attribution of performance to source lines
- AMD GPUs
- Profiling and tracing of GPU operations for HIP programs
- Support for OpenCL programs
- Nvidia GPUs
- Profiling and tracing of OpenCL programs
- Fine-grain measurement of GPU kernels with PC sampling
Support for CPU profiling
- Improved Handling of Dynamic Libraries
- Use LD_AUDIT to track loading and unloading of shared libraries;
enabled by default; may be disabled if necessary.
- Data Collection for Flat-profiles
- Adds resilience by bypassing stack unwind to avoid problems with new OSs,
libraries, compilers, etc.
- Improved Handling of Exported and Imported Interfaces
- Avoids conflicts with applications and other tools.
hpcstruct
Support for Intel and AMD GPU binaries
- Improved Processing of All GPU binaries
- Better handling of line-numbers, inlining information, loops, and
statements.
- Improved Handling of Paths
- Improved Performance by Analyzing Binaries in Parallel
hpcprof
Adjustments to Hide Detailed GPU Metrics by Default
hpcviewer and hpctraceviewer
Merge the Two Views into a Single hpcviewer Command
- The integrated interface enables one to view both profiles and traces.
The new user interface, based on Eclipse 4, works with Java versions 8-14.
The integrated interface can be built from source on MacOS, Windows, and
Linux using Apache Maven.
Enhance Filter Panel
- Make Panel resizable, and add an “Apply” button. Trace filters can use a
regular expression for filtering ranks.
Improved Preferences, About, Info, and Log Dialog Boxes
- New support for log files using slf4j. Logs are stored in the same
workspace and can be cleared.
Improved Color Management
- Colors are now allocated lazily.
Improved Performance
- Especially with respect to expanding or contracting trees, and for handling
very many metrics. Reduced memory usage for the viewers.
Beta Support for ARM Machines
- Eclipse has beta support for ARM.
Additional Release Notes for hpcviewer
For more information, see https://github.com/HPCToolkit/hpcviewer.e4/releases
Fixed Issues
- Overestimation of CPU time for a Thread that is Sleeping
- Previously CPU time for a sleeping thread would be incremented by the
real-time the thread was sleeping; it is now correctly incremented only by
CPU-time. The issue is not yet fixed in Trace View.
- Mishandling of map/unmap/map of Dynamic Libraries
- When a dynamic library is unmapped and a second dynamic library is mapped to
the same region of the address space, samples may not be properly attributed.
- Conflicts with Application Libraries
- HPCToolkit uses libunwind to interpret compiler-recorded unwinding
information and libxz for compression. An application using a different
version of these libraries could fail if its calls are bound to HPCToolkit’s
copies. The entry-points in HPCToolkit’s copies of libunwind and libxz are
now invisible to applications.
- Incorrect Interactions with UCX
- UCX is a communication substrate used by OpenMPI. UCX wraps mmap and munmap
and calls from an HPCToolkit signal-handler to mmap or munmap could be
intercepted by UCX and cause a program to hang or fail. HPCToolkit now uses
private versions of mmap and munmap that cannot be intercepted by other
layers in the software stack.
- libmonitor Problems Relating to fork and Thread-Termination
- Patches were issued to fix crashes or deadlocks relating to these issues.
- Problems with Applications Using RUNPATH
- The RUNPATH settings are now properly handled to find dynamic libraries
- Workaround for LD_AUDIT Bug Present before glibc 2.32
- LD_AUDIT transparently converts dlmopen calls to dlopen to merge all shared
libraries into the global namespace. If that
doesn’t work, you ask hpcrun to use an alternative to LD_AUDIT
using --disable-auditor. This alternative provides full support
for dlmopen with multiple namespaces. However, any executables or
libraries that depend upon a RUNPATH attribute will only work when
HPCToolkit uses LD_AUDIT to track dynamic library operations. When using
--disable-auditor, one can use --namespace-single or --namespace-multiple
to explicitly control whether dlmopen calls should be converted to
dlopen or not, respectively.
- Workaround for LD_AUDIT Performance Degradation
- Applications with many interlibrary calls suffer a performance degradation
when LD_AUDIT is used. To address this issue, hpcrun rewrites an
application’s Global Offset Table (GOT). As a new feature, there is
a potential that this may cause something to go wrong and cause your
application to crash. If that happens, the rewriting can be disabled
with the hpcrun argument --disable-auditor-got-rewriting to diagnose
whether that was why hpcrun caused your application to crash.
Open Issues
- CPU Time is Shown Continuously in Trace View
- CPU time is divided between the last event before the thread sleeps and
the next event after the sleep; it should be blank when thread is sleeping
- Problems with PC Samples from NVIDIA GPUs and CUDA 11.2
- HPCToolkit does not properly attribute PC samples collected for binaries
generated by CUDA 11.2. NVIDIA changed the format of line map information
produced by nvcc in CUDA 11.2. HPCToolkit uses Red Hat’s Elfutils to parse
line map information in CPU and GPU binaries. Elfutils does not yet parse
NVIDIA’s extensions to the line map, which leads to incorrect attribution.
- Problems Installing for AMD GPUs
- Installation of ROCm either as an installed version or as spack-build
package requires a two-step build, specifying ROCm as external packages.
- Problems Installing for Intel GPUs
- Installation for Intel GPUs requires manual configuration and build of
HPCToolkit. Spack can be used to build all HPCToolkit dependencies other
than Intel components. Intel’s software (oneapi, GT-Pin, Metrics Discovery
API, and Intel Graphics compiler) must be built manually and have their
paths provided to HPCToolkit’s configure as they are not available as spack
packages.
- Problems with Instruction-level Analysis of Intel GPU Binaries
- Instruction-level analysis only works with Intel’s newest compute runtime,
which has not been publicly released.
- Moving Measurements Directories may Cause Problems
- Recorded data contains absolute paths, and processing the data may not find
the files needed to generate the database.
- Issues Concerning with hpcviewer
- hpcviewer requires Java 8 or newer; it may fail on MacOS Big Sur because of
Eclipse bugs.
Improved Installation, Configuration, and Build
- Restructure Build to Hide Symbols from Third-party Libraries
- Use objcopy, and +pic options for those libraries to avoid conflicts with
applications using the same libraries.
- Rewrite Configure Script to Remove Manual Dependencies
- Update Dependencies to Latest Versions and Options
- Specify spack packages for intel-xed, libunwind, libmonitor, comgr,
llvm-amdgpu, libpfm4 (perfmon), and xz (lzma). Use binutils+nls to avoid
spack conflicts with other packages.
- Update Build of hpcviewer to use Maven
- Build can now be done from the command line.
----------------------------------------
HPCToolkit 2019.08 + some earlier
----------------------------------------
Enhancements
Build
Replaced hpctoolkit-externals with building prerequisites from spack.
See: http://hpctoolkit.org/software-instructions.html
or README.Install
Moved to using Dyninst 10 to provide better binary analysis used by
hpcstruct.
Hpcrun
Hpcrun can now tag some metrics to be hidden by default in
hpcviewer. This support was added to enable the full accounting for
the myriad GPU instructions stall reasons to be hidden by default.
Launch scripts are more robust for finding the install directory, and
provide an option for identifying the git version (branch and commit).
Hpcstruct
Hpcstruct now exploits threaded parallelism to analyze large
application binaries quickly. Use 'hpcstruct -j num binary' to
specify number of threads.
Improved handling of irreducible loops.
Hpcviewer
Hide metric columns if it doesn’t have metric values.
Using raw metric values in copying to clipboard and exporting to CSV
files, instead of exporting the displayed format of the value.
Bug fixes
hpcprof:
Fixed bug so that inlined procedures are now merged in bottom-up and
flat views. Previously, an internal buffer overflow prevented fusion
of some instances of inlined procedures in bottom-up and flat views.
Hpcrun
Fixed handling of dynamic thread allocation.
Fixed handling of precise ip modifier. Using Linux perf convention to
force hpcrun to enable precise-ip attribute (:p for slightly precise,
:pp for mostly precise, :ppp for exactly precise).
Fixed handling of memory barrier when manipulating buffer of events
recorded by Linux perf. Incorrect memory barrier handling caused some
event records to be handled incorrectly.
Fixed kernel blocking sample-source. Context-switch attribute is now
supported on Linux kernel 4.3 or newer. It is also available on RedHat
3.x or newer, which has a backport of the kernel support.
Hpcviewer
Fixed filtering nodes on Flat view.
----------
Merged the 'banal' branch into master. This fully replaces hpcstruct
and the banal code with the new inline tree format. However, the
format of the .hpcstruct file is unchanged.
The new banal code produces a better loop analysis (placement of loop
header) and has a smaller memory footprint. This also includes
support for displaying parseapi gaps (unclaimed code regions) in the
viewer.
Merged the 'perf-datacentric' branch into master. This adds support
for the perf family of events as a way of accessing the hardware
performance counters without using PAPI.
----------------------------------------
HPCToolkit Version 2017.06, June 2017
----------------------------------------
Updated the 'ompt' branch and merged master into ompt. This improves
scalability of the ompt support to large thread counts, as found on
KNL and Power8. The ompt branch, together with the LLVM OpenMP runtime
library provides
- support for attributing performance measurements to a global-view,
source-level program representation by unifying measurements of OpenMP
activity in worker threads with the main thread.
- support for attributing causes of idleness in OpenMP programs using
HPCToolkit event OMP_IDLE.
The LLVM OpenMP runtime can be used with Clang, Intel, and GNU compilers.
The LLVM OpenMP runtime with OMPT support for performance analysis
can be can be obtained as follows
svn co http://llvm.org/svn/llvm-project/openmp/trunk openmp
To use the draft OMPT support present in LLVM's OpenMP runtime, one must
build the LLVM OpenMP runtime with LIBOMP_OMPT_SUPPORT=TRUE. This version
of the runtime is designed to be used with HPCToolkit's ompt branch.
Note: support for the OMPT tools interface in HPCToolkit and LLVM
currently follows a design outlined in TR2 at openmp.org. An improved
version of the OMPT interface was designed for OpenMP 5.0, which
is described in TR4 at openmp.org. At some point soon, both LLVM and
HPCToolkit will transition to support the new API; presently, both lack
support for the new API.
Merged the 'atomics' branch into master. This branch replaces the
atomic operations in hpcrun with C11 atomics through the stdatomic.h
header file.
Updated Dyninst to version 9.3.2 in externals, plus a patch for better
binary analysis of functions that use jump tables.
Updated hpcstruct to handle new ABI on Power/LE architectures, which
has both internal and external interfaces for functions.
Various bug fixes:
-- enhanced binary analysis for call stack unwinding on x86-64, including
an enhancement to analyze functions that perform stack alignment and
a bug fix to track stack frame allocation/deallocation of space for local
variables using the load effective address (LEA) instruction.
-- fixed bug in hpcrun to correct data reinitialization after fork(). This
bug prevented using hpcrun to profile programs launched with shell scripts.
-- fixed bug in hpcrun in binary analysis related to ld.so.
-- fixed bug in hpcstruct in getRealPath() that caused hpcstruct
to sometimes report incorrect file names.
Known issues:
When profiling optimized code with HPCToolkit, one may find that a program
generates a significant number of "partial unwinds" where the call stack
can't be unwound all the way up to main. This more commonly happens on
x86-64 architectures than on PowerPC and ARM. A large number partial unwinds
may make it harder to use the top-down calling context view in hpcviewer,
which works best when call stacks unwind all the way up to main. Even
with significant numbers of partial unwinds, the bottom-up caller's view
and the flat view in hpcviewer can be used effectively for analyzing
performance. Ongoing work aims to improve call stack unwinding of
optimized code by employing compiler-generated unwinding information
where available in addition to using binary analysis to discover unwinding
recipes.
On x86-64, hpcfnbounds occasionally is too aggressive about inferring the
presence of stripped functions in optimized programs. We have noticed
this particularly for optimized Fortran. This can cause "partial unwinds",
where a call stack can't be unwound fully up to main. Improving this
analysis is the subject of ongoing work.
When using with the LLVM OpenMP runtime's OMPT support, measurements
of programs compiled with GCC using HPCToolkit's ompt branch sometimes
reveal implementation-level stack frames that belong to the OpenMP
runtime system. This will improve with the transition of HPCToolkit
and the LLVM OpenMP runtime to the new OMPT ABI designed for OpenMP
5.0. This transition should occur over the next 6 months. In the
meantime, there is nothing wrong with the quality of the information
collected. The only problem is HPCToolkit's measurements reveal more of
an implementation-level view of OpenMP than intended.
----------------------------------------
HPCToolkit Version 2016.12, Dec 2016
----------------------------------------
Added preliminary measurement, analysis, attribution, and GUI support
for Power8/LE.
Added preliminary measurement, analysis, and attribution support for
ARM64.
Added support for KNL, Knights Landing (configure as x86-64).
Overhauled data structures for managing shared state (binary analysis results)
in hpcrun to avoid mutual exclusion in the common case. This improves manycore
scalability.
Overhaul binary analysis to better attribute performance to highly-optimized
code that involves inlined functions, inlined templates, outlined OpenMP
functions.
Removed dependence on a locally-modified copy of binutils and switched
to use binutils 2.27, which supports Power8/LE and ARM64.
Updated build infrastructure to use newer versions of autotools that that
recognize the Power8 little-endian system type.
- autoconf 2.69.
- automake 1.15.
- libtool 2.4.6.
Use boost version 1.59.0.
Use dyninst version 9.3.0.
Use elfutils version 0.167.
Use libdwarf version 2016-11-24.
Use libmonitor version from Sept 15, 2016.
Use libunwind version from Feb 29, 2016.
Known problems:
HPCToolkit's GUI's are not yet available for ARM64 platforms.
Binary analysis on Power8/LE may fail to fully analyze routines that contain
switch tables. This has several effects.
(1) Samples attributed to code regions in a routine that are overlooked by
the binary analyzer will be attributed to the first source line of the
enclosing routine.
(2) Loops in code regions that are overlooked will not be reported in
hpcviewer.
----------------------------------------
HPCToolkit Version 5.4.x, Dec 2015
----------------------------------------
Merged the hpctoolkit-parseapi branch into trunk. Overhaul hpcstruct
and banal code to use ParseAPI to parse functions, blocks and loops.
Rework how hpcstruct identifies loop headers to use a top-down search
of the inline tree.
Partial merge of the non-openmp features from the hpctoolkit-ompt
branch into trunk. Support for name mappings for <program root>,
<thread root>, etc, removing redundant procedure names, and better
name demangling.
Support for filtering nodes in the viewer, allows hiding a subset of
the CCT nodes according to some pattern.
Move repository to github.
----------------------------------------
HPCToolkit Version 5.3.2 [2012.09.21]
----------------------------------------
Fix a few show-stopper problems related to recent upgrades on jaguarpf
(interlagos) at ORNL since 5.3.1.
- hpcrun
- add support for decoding the AMD XOP instructions. this was
causing many unwind failures on jaguarpf.
- fixed a bug in hpcrun-flat that was breaking the build with
PAPI 5.0
- externals
- fixed symtabAPI to handle the new STT_GNU_IFUNC symbols in the
latest glibc. this was breaking hpclink on jaguarpf.
----------------------------------------
HPCToolkit Version 5.3.1 [2012.08.27]
----------------------------------------
- hpcrun
- add documentation about environment variable HPCTOOLKIT, needed for
use on Cray XE6/XK6
- add support to use HPCRUN_TMPDIR or TMPDIR. On Cray platforms,
/tmp is unavailable
- don't try to integrate calling context trees for threads into the
calling context tree for the main thread.
- miscellaneous changes to satisfy stricter dependence checking by gcc 4.7
and include files that omit unistd.h on Debian Linux
- ensure that partial unwinds are properly categorized
- address portability issues with idleness sample source, which supports
blame shifting for OpenMP
- hpcprof
- handle case when measurements directory and database directory are on
different file systems. in this case, files can't be simply unlinked and
relinked.
- hpctraceviewer
- add support for filtering out threads and processes that are not of
interest.
- add -j option to install script to enable installer to curry a path
for the java required into the hpctraceviewer script
- fix memory leak for SWT native objects
- hpcviewer
- support a mix of flattening and zooming in flat view
- add new formatting options for derived metrics
- add -j option to install script to enable installer to curry a path
for the java required into the hpctraceviewer script
- externals
- fixed problem with GNU binutils on 64-bit PowerPC that caused performance
data to be misattributed because function "D" symbols were overlooked
- added support to disable thread tracking (needed for CUDA support)
- update documentation
----------------------------------------
HPCToolkit Version 5.3.0 [2012.06.25]
----------------------------------------
- hpcrun
- add an IO sample source to count the number of bytes read and written
- add a Global Arrays sample source
- add an API for the application to start and stop sampling
- get the process rank for gasnet and dmapp programs (static only)
- have hpcrun compress recursive calls by default
- early support for a plugin feature
- strengthen the async blocks to better handle the case of running
sync and async sample sources in the same run
- add a check that hpctoolkit and externals use the same C/C++ compilers.
this should all but eliminate the 'GLIBCXX not found' errors
- externals
- binutils: put the line info table into a splay tree. for some
BlueGene binaries, this speeds up hpcstruct by as much as 20x
- libmonitor: update to rev 140. this better handles signals and
the side threads from the Cray UPC compiler
- open analysis: bug fix to avoid missing some loops
- symtabAPI: fix an off-by-one error that sometimes caused a
segfault (thanks to Gary Mohr at Bull)
- xerces: fix a compilation bug on BlueGene
----------------------------------------
HPCToolkit Version 5.2.1: [2011.12.30]
----------------------------------------
A performance enhancement and maintenance release.
- hpcrun
- instead of discarding partial unwinds, record them in a special subtree
- create option for compressing call paths with (simple) recursive
function invocations
- rework representation of call path profile metrics to support
collection of tens or hundreds of metrics per run
- program PAPI events to count during system calls
- correct several subtle monitoring problems
- improve performance of memory leak detection
- fix how hpcrun opens files to be robust on systems where gethostid
or getpid don't behave as expected
- hpcviewer includes key performance enhancements:
- rapidly render scatter plots of metric values over all "MPI ranks"
(or threads)
- several bug fixes
- hpctraceviewer
- significantly enhance rendering performance
- several bug fixes
----------------------------------------
HPCToolkit Version 5.1.0: [2011.05.27]
----------------------------------------
- Full support for analyzing 64-bit Linux-POWER7 binaries [hpcprof, hpcstruct]
- To profile/trace large-scale executions, add support for sampling
processes to measure. Experimental. [hpcrun]
- W.r.t. binary analysis to compute unwind information for call path
profiling: improvements computing function bounds on partially
stripped code (32-bit x86 and libc on Ubuntu) [hpcfnbounds]
- Monitoring threaded applications: fix a rare race condition tracking
thread creation/destruction [libmonitor]
- When analyzing multiple measurement databases: resolve certain bugs
[hpcprof/mpi]
- When analyzing measurement databases in parallel: resolve a bug that
could cause summary metrics for certain Calling Context Tree nodes
to be compute incorrectly. [hpcprof-mpi]
- Upgrade libraries for reading binaries [hpctoolkit-externals]
- upgrade to Symtab API 7.0.1
- upgrade to libelf 0.8.13
- upgrade to libdwarf-20110113
----------------------------------------
HPCToolkit Version 5.0.0: [2011.02.15]
----------------------------------------
- HPCToolkit supports Cray XT, IBM BG/P, generic Linux-x86_64 and Linux-86
- Major overhaul of hpcrun:
- fully support dynamic loading
- fully support gathering thread creation contexts
- reenable support on x86 for tracking return counts
- use a CCT where a node keeps its children in a splay tree
- Rework post-mortem analysis tool, hpcprof:
- create scalable tool (hpcprof-mpi)
- unify hpcprof and hpcprof-mpi interface (and, when possible, internals)
- automatically compute summary metrics in both hpcprof/mpi
(using both incremental and 'standard' methods)
- rework how hpcprof/mpi searches for source files to 1) find files
more frequently; and 2) perform much more rapidly when multiple
source files must be found (the common case). No longer enter an
infinite loop if symbolic links create a cycle.
- support for --replace-path
- Rework hpcviewer:
- incrementally construct Callers View
- correctly construct Callers and Flat Views, even with summary
metrics (and without thread-level metrics)
- generate scatter plots of out-of-core thread-level metrics
- correctly expand hot path
- Rewrite hpcsummary
- Create (and recreate) a Users Manual
- Updates to HPCToolkit's externals
- full support for configuring/building from an arbitrary directory
and installing to an arbitrary --prefix.
- performance-patched binutils 2.20.1 (from 2.17)
- xerces 3.1.1 (from 2.8)
- libmonitor r116
----------------------------------------
HPCToolkit Version 4.9.9: [2009.08.29]
----------------------------------------
- Rewrite hpcrun (nee csprof)
- Support monitoring statically linked applications using hpclink
- Rename hpcrun to hpcrun-flat
- Rewrite hpcstruct (nee bloop)
- Rewrite hpcprof (nee xcsprof)
- Rewrite hpcprof-flat (nee hpcview, hpcquick, hpcprof)
- Rewrite hpcviewer
- Rewrite externals manager
- Move source code to SciDAC Outreach
- Web page: www.hpctoolkit.org
----------------------------------------
HPCToolkit Version 4.2.1: [2006.06.17]
----------------------------------------
- Use binutils 2.16.1
----------------------------------------
HPCToolkit Version 4.2.0: [2006.04.07]
----------------------------------------
- Add csprof to source tree
- Integrate separate version of xcsprof into source tree
- Integrate csprof viewer into source tree
----------------------------------------
HPCToolkit Version 4.1.3: [2006.01.09]
----------------------------------------
- Split code for hpcrun and hpcprof.
- Automake 1.9.5; Libtool 1.5.20 (still Autoconf 2.59)
----------------------------------------
HPCToolkit Version 4.1.2: [2005.08.15]
----------------------------------------
- Merge 'lump' (load module dumper) into source tree. Build when
configured with --enable-devtools.
- Automake 1.9.5; Libtool 1.5.18 (still Autoconf 2.59)
----------------------------------------
HPCToolkit Version 4.1.1: [2005.04.21]
----------------------------------------
- Add support for Group scopes
- hpcquick produces hpcviewer output by default
----------------------------------------
HPCToolkit Version 4.1.0: [2005.03.18]
----------------------------------------
- HPCToolkitRoot
- Improve sub-repository checkout
- Improve building by determining and propagating compiler
optimization and develop options when building external
repositories (OA, xerces, binutils)
- OpenAnalysis is now 'NewOA'
----------------------------------------
HPCToolkit Version 4.0.5: [2005.01.20]
----------------------------------------
- Binutils performance tuning:
- Replace binutils' DWARF2 linear-time line-lookup algorithm with
binary search.
- Use a one-element cache to drastically speed up ELF symbol table
function search
- Update libtool, autoconf and automake to fix build problems on
IRIX64 (linking templates), Tru64 (including templates in libtool
archives), and Linux (missing .so).
- Upgrade xercesc from 2.3.0 to 2.6.0
- hpcrun/hpcquick: minor fixes
----------------------------------------
HPCToolkit Version 4.0.0: [2004.11.06]
----------------------------------------
- Create HPCToolkitRoot, a shell repository to make obtaining sources,
building and installation easier.
- Autoconf HPCToolkitRoot and hpcviewer.
- HPCToolkit now uses libtool to build libraries.
- Revamp launching of hpcview et al. to not be dependent upon
Sourcemes. All needed envionment variables are set dynamically.
- Overhaul hpcquick to canonicalize all performace data files into
PROFILE files before extracting metrics. Include support for processing
hpcrun files.
- Merge separate hpcrun/hpcprof into HPCToolkit code base.
- Extend hpcrun:
- introduce support for profiling statically linked applications
- create profiles of multiple PAPI or native events
- monitor POSIX threads
- follow forks
- profile through execs
- create WALLCLK profiles
- Fix bugs in hpcprof:
- When no line information could be found, samples were dropped
- Fix several type-size problems.
- bloop:
- processes DSOs
- recognize one-bundle loops on IA64 (PC-relative target is 0)
- classify return instructions in IA64, x86 and Sparc ISAs classify,
ensuring that in CFG construction no fallthru edge is placed
between the return and possible subsequent (error handlin) code.
On Itanium with Intel's compiler, this can have drastic effects.
Make necessary changes to GNU binutils to propgate this information.
- Add options to treat irreducible intervals as loops and to turn
off potentially unsafe normalizations.
- hpcview and hpcquick handle multiple structure files
- Port HPCToolkit to opteron-Linux.
- Update documentation accordingly.
----------------------------------------
HPCToolkit Version 3.7.0: [2004.03.12]
----------------------------------------
- Replace make system with Autoconf/Automake make system
- Update code so it can be compiled by GCC 3.3.2
- hpcview:
- Use of hpcviewer is now preferred method for viewing data (as
opposed to static HTML database).
- bloop:
- Enable support for Sun's WorkShop/Forte/ONE compiler
- Revamp scope tree builder
- Rewrite key normalization routine (coalesce duplicate statements)
- Enable support of long option switches; improve option parsing.
- xprof:
- Enable reading of profile data from stdin or file
- Add basic support for processing Alpha relocated shared libraries
- Enable support of long option switches; improve option parsing.
----------------------------------------
HPCToolkit Version 3.6.0: [2003.07.05]
----------------------------------------
- Extend xprof to compute derived metrics from DCPI profiles. In the
process, significantly revise/rewrite most existing xprof code and
extend ISA and AlphaISA class.
- Significantly revise hpcquick to accept PROFILE files with -P option.
- Fix various hpcview bugs and use new xerces SAX interface.
- Revamp HPCToolkit make system (portions of the source tree can easily
be removed without breaking the build).
- Remove OpenAnalysis, binutils and xercesc from source tree
- Convert OpenAnalysys' make system to Autoconf/make.
- Add front end make system for binutils and xercesc.
- Update HPCToolkit tests for ISA changes and new xprof.
- Update to binutils 2.13.92 (snapshot) and then 2.14 (official release).
- Update to xercesc 2.3.0.
----------------------------------------
HPCToolkit Version 3.5.2: [2003.03.28]
----------------------------------------
- Rename from HPCTools to HPCToolkit to distinguish from others' use of
the name.
- Convert from RCS to CVS.
----------------------------------------
HPCTools Version 3.5.1: [2003.03.07]
----------------------------------------
- Update PGM and PROFILE formats to support load modules; other minor
tweaks. Update hpcview, bloop, xprof and ptran to use the new formats.
- Add initial DSO support to LoadModule classes.
- Test updates
- Add LoadModule library tester.
- Update support library tests.
- Add filter script for f90 modules (f90modfilt).
- Miscellaneous tweaks
- Fix strcpy bug in GetDemangledFuncName().
- Makefile tweaks
- Convert ArchIndType.h limits from const (statics) to #defines.
- Make trace a global variable so that tracing can be globally
switched on/off.
- Update alpha macros to support alpha GCC compiler
- hpcquick now supports recursive paths to option -I.
----------------------------------------
HPCTools Version 3.5.0: [2003.02.24]
----------------------------------------
- Merge HPCView 3.1 and HPCTools 1.20 into one distribution
- eliminate code duplication (support library, DTDs)
- port HPCView to ANSI/ISO style C includes (<cheader>; all functions in
std namespace
- unify and improve documentation
- Improve make system (e.g., each library is now built in a separate
location).
- Improve code organization (rename 'libs' to 'lib'; rename and cleanup
bloop's scope tree files; move general types files to 'src/include')
- Improve and test hvprof with PAPI 2.3.1
[see below for HPCView revision history]
----------------------------------------
HPCTools Version 1.20: (bloop 1.20, xprof 1.20) [2002.10.11]
----------------------------------------
- Add xprof test suite.
- Make minor changes to support GCC 3.2.
- Rewrite GNU binutils patch (for dwarf2.c) that handles the
out-of-order line sequences of Intel's 6.0 compiler. (The patch is
faster, slightly more accurate, and makes GNU happy.)
- Change the method of preventing conflict between GNU and Sun's demangler.
----------------------------------------
HPCTools Version 1.10: (bloop 1.10, xprof 1.10) [2002.08.30]
----------------------------------------
- Allow for use of either old-style C headers (<header.h>) or new
ANSI/ISO style (<cheader>; all functions are in std namespace). The
new style is now default.
- Improve error and exception handling. Detect memory allocation errors.
- Fix bug in GNU-binutils ECOFF reader.
----------------------------------------
HPCTools Version 1.05: (bloop 1.05, xprof 1.05) [2002.08.23]
----------------------------------------
- Update to use binutils-2.13
- Extend VLIW interface throughout HPCTools. (Impose explicit pc +
operation index interface for instructions. Now, many comments are
not lies!)
- Miscellaneous fixes and cleanup.
----------------------------------------
HPCTools Version 1.00: (bloop 1.00, xprof 1.00) [2002.08.16]
----------------------------------------
- Major revisions:
- Replace EEL binary support with new binutils library (uses GNU's
binutils) and new ISA library
- bloop: Replace EEL analysis with OpenAnalysis
- bloop: Add two new targets (i686-Linux, ia64-Linux) and improve support
for existing targets (alpha-OSF1, mips-IRIX64, sparc-SunOS);
- Support 'cross target' processing
- Miscellaneous fixes and cleanup.
- bloop: Use both system and GNU demangler in demangling attempts
- Update to use binutils-2.12
- Update to read (abnormal) Compaq ECOFF debugging info.
- Update to read (abnormal) SGI -64 DWARF2 and g++ -64 DWARF2 debugging
info.
- Update to read (abnormal) Intel DWARF2 debugging info
- Misc. updates
- Reorganize HPCTools directory tree
- Remove (outdated) backwards compatibility for non-standard STL headers
- Add hvprof to HPCTools/tools/hvprof
----------------------------------------
HPCTools Version 0.90 (bloop 0.90, xprof 0.90) [2001.09]
----------------------------------------
- Port to alpha-OSF1
- Port EEL to alpha
- Fix binutils 2.10 ECOFF support
- Add xprof tool (beta) for processing Compaq dcpi output
- Replace EEL dominator analysis with tarjan analysis
- Improve PGM scope tree normalization
- Bring code into compliance with ANSI/ISO C++
- Remove STLPort and use standard STL
- Bug fixes
----------------------------------------
HPCTools Version 0.80 [2001.02]
----------------------------------------
- Port to mips-IRIX64
- Port EEL to IRIX64
- Fix binutils 2.10 DWARF2 support
- Bug fixes
----------------------------------------
HPCTools Initial Version
----------------------------------------
- Support for processing sparc-SunOS binaries compiled with GCC
- Use EEL to read binaries and find loops
- Update EEL to use binutils 2.10