From 1ee0b4bca08cd45df45c91769a4df7cc50bb4fd1 Mon Sep 17 00:00:00 2001
From: John Mellor-Crummey <johnmc@rice.edu>
Date: Sun, 30 Sep 2018 16:40:15 -0500
Subject: [PATCH] update hpcrun documentation (man page, script)

1. update hpcrun man page
2. remove outdated and misnamed -q (--quiet) option to hpcrun
   in both the hpcrun script and in its implementation.
3. update hpcrun in script documentation based on new text
   in the man page.
---
 doc/man/hpcrun.tex                    | 209 +++++++++++++-------------
 src/tool/hpcrun/messages/debug-flag.c |  14 --
 src/tool/hpcrun/scripts/hpcrun.in     |  82 +++++-----
 3 files changed, 150 insertions(+), 155 deletions(-)

diff --git a/doc/man/hpcrun.tex b/doc/man/hpcrun.tex
index 647cd5b79d..006247ee6c 100644
--- a/doc/man/hpcrun.tex
+++ b/doc/man/hpcrun.tex
@@ -30,7 +30,7 @@
 \rcsInfo $Id$
 \setDate{\rcsInfoLongDate}
 }{
-\setDate{2018/06/28}
+\setDate{2018/09/30}
 \message{package rcsinfo not present, discard it}
 }
 
@@ -44,9 +44,8 @@
 
 \begin{Name}{1}{hpcrun}{The HPCToolkit Performance Tools}{The HPCToolkit Performance Tools}{hpcrun:\\ Statistical Profiling}
 
-\Prog{hpcrun} is a call path profiler based on statistical sampling.
-It supports multiple sample sources during one execution.
-\Prog{hpcrun} profiles complex applications (forks, execs, threads and dynamic linking) and may be used in conjunction with parallel process launchers such as MPICH's \texttt{mpiexec} and SLURM's \texttt{srun}.
+\Prog{hpcrun} is profiling tool that collects call path profiles of program executions 
+using statistical sampling of hardware counters, software counters, or timers.
 
 See \HTMLhref{hpctoolkit.html}{\Cmd{hpctoolkit}{1}} for an overview of \textbf{HPCToolkit}.
 
@@ -63,24 +62,30 @@ \section{Synopsis}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{Description}
 
-\Prog{hpcrun} profiles the execution of an arbitrary command \Arg{command} using statistical sampling (rather than instrumentation).
-It collects per-thread call path profiles that represent the full calling context of sample points.
-Sample points may be generated from multiple simultaneous sampling sources.
-\Prog{hpcrun} profiles complex applications that use forks, execs, threads, and dynamic linking/unlinking; it may be used in conjuction with parallel process launchers such as MPICH's \texttt{mpiexec} and SLURM's \texttt{srun}.
+\Prog{hpcrun} profiles the execution of an arbitrary command \Arg{command} using statistical sampling.
+\Prog{hpcrun} can profile an execution using multiple sample sources simultaneously, 
+supports measurement of applications with multiple processes and/or multiple threads, and handles complex runtime behaviors including
+fork, exec, and/or dynamic loading of shared libraries.
+\Prog{hpcrun} can be used in conjunction with program launchers such as \texttt{mpiexec} and SLURM's \texttt{srun}.
 
-To profile a statically linked executable, make sure to link with \HTMLhref{hpclink.html}{\Cmd{hpclink}{1}}.
+To profile a statically-linked executable, make sure to link with \HTMLhref{hpclink.html}{\Cmd{hpclink}{1}}.
 
 To configure \Prog{hpcrun}'s sampling sources, specify events and periods using the \texttt{-e/--event} option.
-For an event \emph{e} and period \emph{p}, after every \emph{p} instances of \emph{e}, a sample is generated that causes \Prog{hpcrun} to inspect the and record information about the monitored \Arg{command}.
+For an event \emph{e} and period \emph{p}, after every \emph{p} instances of \emph{e}, a sample is generated that causes \Prog{hpcrun} to inspect the 
+current calling context and augment its execution measurements of the monitored \Arg{command}.
 
-When \Arg{command} terminates, a profile measurement databse will be written to the directory:\\
+If no sample source is specified, by default \Prog{hpcrun} profile using the timer 
+CPUTIME on Linux or WALLCLOCK on Blue Gene at a frequency of 200 samples per second per thread. 
+
+When \Arg{command} terminates, a profile measurement database will be written to the directory:\\
 \\
-\SP\SP\SP \Prog{hpctoolkit-}\Arg{command}\Prog{-measurements}[\Prog{-}\Arg{pid}]\\
+\SP\SP\SP \Prog{hpctoolkit-}\Arg{command}\Prog{-measurements}[\Prog{-}\Arg{jobid}]\\
 \\
-where \Arg{pid} is an operating system process id, if available.
+where \Arg{jobid} is a parallel job launcher id associated with the execution, if available.
 
-\Prog{hpcrun} enables you to abort a process and write the partial profiling data to disk by sending the Interrupt signal (INT or Ctrl-C).
-This can be extremely useful on long-running or misbehaving applications.
+\Prog{hpcrun} allows you to abort an execution and write the partial profiling data to disk by sending a signal such as SIGHUP or SIGINT 
+(which is often bound to Control-c).
+This can be extremely useful to collect data for long-running or misbehaving applications.
 
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -125,49 +130,54 @@ \subsection{Options: Profiling}
 \begin{Description}
 
 \item[\Opt{-ds}, \Opt{--delay-sampling}]
-Don't start sampling until the application chooses to turn it on.
-Use this option to measure only a subset of the application's execution by bracketing interesting regions
+Don't start sampling until the application enables sampling under program control. 
+Use this option to measure specific intervals in an application's execution by bracketing code 
+regions where measurement is desired 
 with calls to \Prog{hpctoolkit_sampling_start()} and \Prog{hpctoolkit_sampling_stop()}.
-Sampling may be turned on and off any number of times during an execution;
-and measurements from all sampled regions are aggregated for attribution and display.
-
-\item[\OptArg{-e}{event\Lbr@period\Rbr}, \OptArg{--event}{event\Lbr@period\Rbr}]
-Profile the  event with corresponding sample period.
-\Arg{event} may be a PAPI event, a Perf event, a native processor event,
-or the special event \Prog{WALLCLOCK}.
-This option may be given multiple times to profile several events at once;
-there may be system-dependent limits on how many events can be profiled simultaneously
-and on which events may be combined for profiling.
-If no events are given, the default is to profile \Prog{WALLCLOCK@5000}.
-For perf event's counter, it is possible to specify
-the number of frequency as the sample threshold by
-prefixing with f before the number.
-For instance, to have 100 samples per second, the period
-is: @f100 .\\
-N.B.:
+Sampling may be started and stopped any number of times during an execution;
+measurements from all measurement intervals are aggregated.
+
+\item[\OptArg{-e}{event\Lbr@howoften\Rbr}, \OptArg{--event}{event\Lbr@howoften\Rbr}]
+\Arg{event} may be an architecture-independent hardware or software event supported by Linux perf, a native hardware counter event, 
+a hardware counter event supported by the PAPI library, a Linux system timer (\Prog{CPUTIME} and \Prog{REALTIME}), or the
+operating system interval timer \Prog{WALLCLOCK}.
+This option may be given multiple times to profile several events at once. 
+While events measured using the Linux perf monitoring infrastructure will be transparently multiplexed if necessary, 
+for other sampling sources or on operating systems such as the Blue Gene Compute Node Kernel,
+there may be system-dependent limits on how many events can be profiled simultaneously and on which events may be combined for profiling.
+If the value for \Arg{howoften} is a number, it will be interpreted as a sample period.
+For Linux perf events, one may specify a sampling frequency for \Arg{howoften} by writing f before a number.  
+For instance, to sample an event 100 times per second, specify \Arg{howoften} as '@f100'.
+For Linux perf events, if no value for \Arg{howoften} is specified, \Prog{hpcrun} will monitor the event using frequency-based sampling at 300 samples/second.
 \begin{itemize}
-  \item The special event \Prog{WALLCLOCK} may be used to profile the actual elapsed time in microseconds
-  \item WALLCLOCK and hardware events cannot be mixed.
+  \item For timer events \Prog{CPUTIME}, \Prog{REALTIME}, and \Prog{WALLCLOCK}, the units for a sample period are microseconds.
+  \item Timer events should not be mixed with hardware events.
   \item See the ``Sample sources'' under \textbf{NOTES} for additional details.
 \end{itemize}
 
-\item[\OptArg{-c}{number}, \OptArg{--count}{number}]
-                       Only available for perf event's counter. This option
-                       specifies the event period to sample. It uses the same
-                       format of period as the option -e mentioned above.
-
+\item[\OptArg{-c}{howoften}, \OptArg{--count}{howoften}]
+                       Only available for events managed by Linux perf. This option
+                       specifies a default value for how often to sample. The value for \Arg{howoften} may be a number that will be used as a default
+                       event period or an f followed by a number, e.g. f@100, to specify a default sampling frequency in samples/second.
 
 \item[\OptArg{-p}{level}, \OptArg{--precise-ip}{level}]
-                       Specify the precise ip level (used only with perf events):
+Specify how precisely a Linux perf sample source must attribute a hardware counter event to an instruction. 
+On modern out-of-order processors, without making special arrangements, a hardware counter event may be attributed to a 
+nearby instruction rather than the instruction that caused the event -- a phenomenon known as 'skid'. 
+For instance, skid can cause a cache miss to be attributed to an instruction that operates only on register values. 
+\Prog{hpcrun} allows a user to request that hardware counter events be attributed with a specific level of precision.
+Values for \Arg{level}:
 \begin{itemize}
- \item 0: sample ip can have arbitrary skid
- \item 1: sample ip must have constant skid
- \item 2: sample ip requested to have 0 skid
- \item 3: sample ip must have 0 skid
+ \item 0: instruction attribution may have arbitrary skid
+ \item 1: instruction attribution must have constant skid
+ \item 2: instruction attribution is requested to have 0 skid
+ \item 3: instruction attribution must have 0 skid
 \end{itemize}
-                       NOTE: Some architectures support a precise IP with 0 skid.
-                             Incorrect level will unable hpcrun to sample the events.
 
+By default, \Prog{hpcrun} will allow attribution of hardware counter events to have arbitrary skid. 
+Some processor architectures, e.g., ARM,  don't support attribution with any higher level of precision.
+If a processor does not support the specified level of attribution precision for a hardware counter event, 
+\Prog{hpcrun} may record 0 occurrences of the event without reporting an error.
 
 
 \item[\OptArg{-f}{frac}, \OptArg{-fp}{frac}, \OptArg{--process-fraction}{frac}]
@@ -197,21 +207,24 @@ \subsection{Options: Profiling}
 
 \item[\OptArg{-o}{outpath}, \OptArg{--output}{outpath}]
 Directory to receive output data.
-If not given, the default directory ia \Prog{hpctoolkit-<command>-measurements[-<pid>]}.
+If not given, the default directory ia \Prog{hpctoolkit-<command>-measurements[-<jobid>]}.
 \begin{itemize}
- Bug: If no \Arg{pid} is available and no output option is given,
- profiles from multiple runs of the same <command>  will be placed into the same output directory.
+ Caution: If no <jobid> is available and no output option is given,
+ profiles from multiple runs of the same <command> will be placed into the same output directory,
+ which may lead to confusing or incorrect analysis results.
 \end{itemize}
 
  \item[\Opt{-r}, \Opt{--retain-recursion}]
-Do not collapse simple recursive call chains to a single node.
-Normally \Prog{hpcrun} does collapse such chains to present a more useful attribution of costs.
-If this option is given, all elements of a recursive call chain are recorded.
-Note: When you use the \Prog{RETCNT} sample source then this option is enabled automatically
-in order to gather accurate counts.
+Do not collapse simple recursive call chains.
+Normally as \Prog{hpcrun} monitors an application that employs simple recursion, it collapses call chains of recursive calls to a single level. 
+This design enables a user to see how the aggregate costs of recursion are associated with each recursive call yet
+saves space and time during post-mortem analysis by collapsing long chains of recursive calls.
+If this option is given, \Prog{hpcrun} will record all elements of a recursive call chain.
+Note: When you use the \Prog{RETCNT} sample source this option is enabled automatically
+to gather accurate counts.
 
 \item[\Opt{-t}, \Opt{--trace}]
-Generate a call path trace (in addition to a call path profile).
+Generate a call path trace in addition to a call path profile.
 
 \end{Description}
 
@@ -227,18 +240,15 @@ \subsection{Options: HPCToolkit Development}
 After initialization, spin wait until you attach a debugger
 to one or more of the application's processes.
 After attaching you can set breakpoints or watchpoints in your application's code
-or in HPCToolkit’s \Prog{hpcrun} code before beginning application execution.
-To continue after attaching, use the debugger to set program variable \Prog{DEBUGGER WAIT} to zero
+or in HPCToolkit's \Prog{hpcrun} code before beginning application execution.
+To continue after attaching, use the debugger to call \Prog{hpcrun_continue()}
 and then resume execution.
-Note: Your  can only set \Prog{HPCRUN WAIT} if your HPCToolkit was built with debugging symbols.
-To build HPCToolkit with debugging symbols,
-include the option \Prog{–enable-develop} when configuring HPCToolkit.
 
 \item[\OptArg{-dd}{flag}, \OptArg{--dynamic-debug}{flag}]
 Enable the flag \Prog{flag},
 causing \Prog{hpcrun} to log debug messages guarded with that flag
 during execution.
-A list of dynamic debug flags can be found in HPCToolkit’s source code
+A list of dynamic debug flags can be found in HPCToolkit's source code
 in the file \Prog{src/tool/hpcrun/messages/messages.flag-defns}.
 Note that not all flags are meaningful on all architectures.
 The special value \Prog{ALL} enables all debug flags.
@@ -246,15 +256,8 @@ \subsection{Options: HPCToolkit Development}
 Caution: turning on debug flags produces many log messages,
 often dramatically slowing the application and potentially distorting the measured profile.
 
-\item[\Opt{-q}, \Opt{--quiet}]
-Turn on a default set of dynamic debugging vari\-ables to log information
-about HPCToolkit’s stack unwinding based on on-the-fly binary analysis.
-See the HPCToolkit User Manual for more details. \\
-Bug: this option is unfortunately named.
-
 \item[\Opt{-md}, \Opt{--monitor-debug}]
 Enable debug tracing of \Prog{libmonitor}, the \Prog{hpcrun} subsystem which implements process/thread control.
-See the HPCToolkit User Manual for more details. 
 
 \end{Description}
 
@@ -262,26 +265,25 @@ \subsection{Options: HPCToolkit Development}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
 \section{Environment Variables}
-For most systems, \Prog{hpcrun} requires no special environment variable settings.
-There are two situations, however, where \Prog{hpcrun}, to function correctly,
-\emph{must} refer to environment variables. These environment variables, and
-corresponding situations are:
-\begin{Description}
+To function correctly, \Prog{hpcrun} must know the location of the \Prog{HPCToolkit} 
+top-level installation directory so that it can access toolkit components located 
+in its \File{lib} and \File{libexec} subdirectories. 
+Under most circumstances, \Prog{hpcrun} requires no special environment variable settings.
 
-  \item[\verb+HPCTOOLKIT+] To function correctly, \Prog{hpcrun} must know
-       the location of the \Prog{HPCToolkit} top-level installation directory.
-       The \Prog{hpcrun} script uses elements of the installation \File{lib} and
-       \File{libexec} subdirectories. For most systems, the 
-       installation procedure ensures that \Prog{hpcrun} can find the requisite
-       components. Some parallel job launchers, however, will \emph{copy} the
-       \Prog{hpcrun} script to a different location from the installed base. If your
-       system uses this copying mechanism, you must set the \verb+HPCTOOLKIT+
-       environment variable to the top-level installation directory.
-       
-  \item[\verb+hpcrun+] If you refer to the \Prog{hpcrun} script via a file system link
-       you must also set \verb+HPCTOOLKIT+, for the same reason.
-       
-\end{Description}
+There are two situations, however, where \Prog{hpcrun}
+\emph{must} consult the \verb+HPCTOOLKIT+ environment variable to determine the location
+of the top-level installation directory: 
+
+\begin{itemize}
+\item On some systems, parallel job launchers (e.g., Cray's aprun) \emph{copy} the
+       \Prog{hpcrun} script to a different location. For \Prog{hpcrun} to know
+       the location of its top-level installation directory, 
+       you must set the \verb+HPCTOOLKIT+ environment variable to the 
+       top-level installation directory.
+\item 
+       If you launch \Prog{hpcrun} script via a file system link,
+       you must set \verb+HPCTOOLKIT+ for the same reason.
+\end{itemize}
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
@@ -289,13 +291,13 @@ \section{Launching}
 
 When sampling with native events, by default hpcrun will profile using perf events.
 To force HPCToolkit to use PAPI (assuming it's available) instead of perf events, one
-must prefix the event with ‘\texttt{papi::}’ as follows:
+must prefix the event with '\texttt{papi::}' as follows:
 
 \begin{verbatim}
 hpcrun -e papi::CYCLES
 \end{verbatim}
 
-For PAPI presets, there is no need to prefix the event with ‘papi::’. For instance it is
+For PAPI presets, there is no need to prefix the event with 'papi::'. For instance it is
 sufficient to specify \texttt{PAPI\_TOT\_CYC} event without any prefix to profile using PAPI.
 
 To sample an execution 100 times per second (frequency-based sampling) counting
@@ -325,11 +327,10 @@ \section{Examples}
 
 Assume we wish to profile the application \texttt{zoo}.
 The following examples lists some useful events for different processor architectures.
-In each case, the special option \texttt{--} is used to clearly demarcate the end of \Prog{hpcrun} options.
 
 \begin{itemize}
 \item \Prog{hpcrun -e CYCLES -e INSTRUCTIONS zoo}
-\item \Prog{hpcrun -e WALLCLOCK@5000 zoo}
+\item \Prog{hpcrun -e REALTIME@5000 zoo}
 \item \Prog{hpcrun -e DC_L2_REFILL@1300013 -e PAPI_L2_DCM@510011 -e PAPI_STL_ICY@5300013 -e PAPI_TOT_CYC@13000019 zoo}
 \item \Prog{hpcrun -e PAPI_L2_DCM@510011 -e PAPI_TLB_DM@510013 -e PAPI_STL_ICY@5300013 -e PAPI_TOT_CYC@13000019 zoo}
 \end{itemize}
@@ -420,21 +421,22 @@ \subsubsection{PAPI Interface (optional)}
 Some events have standard names across all platforms, e.g. \verb+PAPI_TOT_CYC+, the event that measures total cycles.
 In addition to events whose names begin with the \verb+PAPI_+ prefix, platforms also provide access to a set of native events with names that are specific to the platform's processor.
 A complete list of events supported by the PAPI library for your platform may be obtained by using the \texttt{--list-events} option.
-Any event whose name begins with the \verb+PAPI_+ prefix that is listed as "Profilable" can be used as an event in a sampling source --- provided it does not conflict with another event.
+Any event whose name begins with the \verb+PAPI_+ prefix that is listed as "Profilable" can be used as an event in a sampling source provided it does not conflict with another event.
 
 The precise rules for selecting good events and periods are complex.
 \begin{itemize}
 
 \item \textbf{Choosing sampling events.}
-We recommend using PAPI events in general.
-However, some PAPI events are not profilable because of PAPI implementation details.
+Some PAPI events are not profilable because of PAPI implementation details.
 Also, PAPI's standard event list may not cover an architectural feature you are interested in.
 In such cases, it is necessary to resort to native events.
 In many cases, you will have to consult the architecture's manual to fully understand what the event means: there is no standard event list or naming scheme and events sometimes have unusual meanings.
 
 \item \textbf{Number of sampling events.}
-Currently, hpcrun does not multiplex hardware counters.
-This means that the number of events that you may concurrently profile against is limited by your architecture's performance monitoring unit.
+\Prog{hpcrun} does not multiplex hardware counters for events measured using PAPI. (Events measured using the Linux
+perf interface will be multiplexed automatically.)
+Without multiplexing, the number of events that you may use to profile a single execution 
+is limited by your architecture's performance monitoring unit.
 Note that some architectures hard-wire one or more counters to a specific event (such as cycles).
 
 \item \textbf{Choosing sampling periods.}
@@ -465,8 +467,8 @@ \subsubsection{System itimer (WALLCLOCK).}
 For example, if the Hz rate is 1000 microseconds, one can use 500 microseconds (or just 1) and obtain about 999 interrupts per second.
 
 \subsection{Platform-specific notes}
-\subsubsection{Cray XE and XK}
-When using dynamically linked binaries on Cray XE and XK systems, you
+\subsubsection{Cray Systems}
+When using dynamically linked binaries on Cray systems, you
 should add the \verb+HPCTOOLKIT+ environment variable to your launch
 script.  Set \verb+HPCTOOLKIT+ to the top-level \verb+HPCToolkit+ install
 prefix (the directory containing the \File{bin}, \File{lib} and
@@ -493,12 +495,13 @@ \subsubsection{Cray XE and XK}
 or in your environment, and try again.
 \end{verbatim}
 
-The problem is that the Cray job launcher copies the \Prog{hpcrun}
+The problem is that the Cray ALPS job launcher copies the \Prog{hpcrun}
 script to a directory somewhere below \File{/var/spool/alps/} and runs
 it from there.  By moving \Prog{hpcrun} to a different directory, this
-breaks \Prog{hpcrun}'s method for finding its own install directory.
+breaks \Prog{hpcrun}'s default method for finding HPCToolkit's top-level
+installation directory.
 The solution is to add \verb+HPCTOOLKIT+ to your environment so that
-\Prog{hpcrun} can find its install directory.
+\Prog{hpcrun} can find HPCToolkit's top-level installation directory.
 
 \subsection{Miscellaneous}
 
diff --git a/src/tool/hpcrun/messages/debug-flag.c b/src/tool/hpcrun/messages/debug-flag.c
index 58a6fd90fb..e91c96d694 100644
--- a/src/tool/hpcrun/messages/debug-flag.c
+++ b/src/tool/hpcrun/messages/debug-flag.c
@@ -173,14 +173,6 @@ static flag_list_t all_list = {
 static int dbg_flags[N_DBG_CATEGORIES];
 
 
-static int defaults[] = {
-  DBG_PREFIX(TROLL),
-  DBG_PREFIX(DROP),
-  DBG_PREFIX(SUSPICIOUS_INTERVAL)
-};
-#define NDEFAULTS (sizeof(defaults)/sizeof(defaults[0]))
-
-
 
 //*****************************************************************************
 // forward declarations 
@@ -320,12 +312,6 @@ debug_flag_process_string(char *in, int debug_initialization)
 static void 
 debug_flag_process_env(int debug_initialization)
 {
-  if (getenv("HPCRUN_QUIET") != NULL){
-    for (int i=0; i < NDEFAULTS; i++){
-      debug_flag_set(defaults[i], 1);
-    }
-  }
-
   char *s = getenv("HPCRUN_DEBUG_FLAGS");
   if(s){
     debug_flag_process_string(s, debug_initialization);
diff --git a/src/tool/hpcrun/scripts/hpcrun.in b/src/tool/hpcrun/scripts/hpcrun.in
index fbc14960fa..deb1234547 100644
--- a/src/tool/hpcrun/scripts/hpcrun.in
+++ b/src/tool/hpcrun/scripts/hpcrun.in
@@ -141,11 +141,11 @@ and record information about the monitored <command>.
 When <command> terminates, a profile measurement databse will be written to
 the directory:
   hpctoolkit-<command>-measurements[-<jobid>]
-where <jobid> is an operating system process id.
+where <jobid> is a job launcher id that associated with the execution, if any.
 
 hpcrun enables a user to abort a process and write the partial profiling
-data to disk by sending the Interrupt signal (SIGINT or Ctrl-C).  This can
-be extremely useful on long-running or misbehaving applications.
+data to disk by sending a signal such as SIGINT (often bound to Ctrl-C).  
+This can be extremely useful on long-running or misbehaving applications.
 
 Options: Informational
   -l, -L --list-events List available events. (N.B.: some may not be profilable)
@@ -153,39 +153,51 @@ Options: Informational
   -h, --help           Print help.
 
 Options: Profiling (Defaults shown in curly brackets {})
-  -e <event>[@<period>], --event <event>[@<period>]
-                       An event to profile and its corresponding sample
-                       period. <event> may be either a PAPI, native
-                       processor event or WALLCLOCK (microseconds).  May pass
-                       multiple times as implementations permit.
-                       {WALLCLOCK@5000}.
-                       For perf event's counter, it is possible to specify
-                       the number of frequency as the sample threshold by
-                       prefixing with f before the number. 
-                       For instance, to have 100 samples per second, the period
-                       is: @f100 .
-                       N.B.: WALLCLOCK and hardware events cannot be mixed.
-
-  -c, --count <number>
-                       Only available for perf event's counter. This option
-                       specifies the event period to sample. It uses the same
-                       format of period as the option -e mentioned above. 
-                       
-  -t, --trace          Generate a call path trace (in addition to a call
-                       path profile).
+  -e <event>[@<howoften>], --event <event>[@<howoften>]
+                      event  may  be an architecture-independent hardware or 
+                      software event supported by Linux perf, a native hardware 
+                      counter event, a hardware counter event supported by the 
+                      PAPI library, a Linux  system timer (CPUTIME and REALTIME), 
+                      or the operating system interval timer WALLCLOCK.  This option 
+                      may be given multiple times to profile several events at once.  
+                      If the value for <howoften> is a number, it will be 
+                      interpreted as a sample period. For Linux perf events, one 
+                      may specify a sampling frequency for 'howoften' by writing f 
+                      before a number.  For instance, to sample an event 100 times 
+                      per second,  specify  <howoften>  as '@f100'. For Linux perf 
+                      events, if no value for <howoften> is specified, hpcrun 
+                      will monitor the event using frequency-based sampling at 300 
+                      samples/second.
+
+  -c, --count <howoften>
+                      Only  available  for  events  managed  by Linux perf. This 
+                      option specifies a default value for how often to sample. The 
+                      value for <howoften> may be a number that will be used as a 
+                      default event period or an f followed by a number, e.g. f@100, 
+                      to specify a default sampling frequency in samples/second.
+
+  -t, --trace          Generate a call path trace in addition to a call
+                       path profile.
 
   -ds, --delay-sampling
                        Delay starting sampling until the application calls
                        hpctoolkit_sampling_start().
 
   -p,  --precise-ip <level>
-                       Specify the precise ip level (used only with perf events):
-                       0: sample ip can have arbitrary skid
-                       1: sample ip must have constant skid
-                       2: sample ip requested to have 0 skid
-                       3: sample ip must have 0 skid
-                       NOTE: Some architectures support a precise IP with 0 skid.
-                             Incorrect level will unable hpcrun to sample the events.
+                       Specify how precisely a Linux perf sample source must attribute 
+                       a hardware counter event to an instruction. Values for <level>:
+
+                       0: instruction attribution may have arbitrary skid
+                       1: instruction attribution must have constant skid
+                       2: instruction attribution is requested to have 0 skid
+                       3: instruction attribution must have 0 skid
+
+                       By default, hpcrun will allow attribution of hardware counter 
+                       events to have arbitrary skid. Some processor architectures, 
+                       e.g., ARM, don't support attribution with any higher level of 
+                       precision.  If a processor does not support the specified level 
+                       of attribution precision for a hardware counter event, hpcrun 
+                       may record 0 occurrences of the event without reporting an error.
    
   -f <frac>, -fp <frac>, --process-fraction <frac>
                        Measure only a fraction <frac> of the execution's
@@ -203,8 +215,8 @@ Options: Profiling (Defaults shown in curly brackets {})
 
   -r, --retain-recursion
                        Normally, hpcrun will collapse (simple) recursive call chains
-                       to a single node. This option disables that behavior: all
-                       elements of a recursive call chain are recorded
+                       to save space and analysis time. This option disables that 
+                       behavior: all elements of a recursive call chain will be recorded
                        NOTE: If the user employs the RETCNT sample source, then this
                              option is enabled: RETCNT implies *all* elements of
                              call chains, including recursive elements, are recorded.
@@ -351,12 +363,6 @@ do
 
 	# --------------------------------------------------
 
-	-q | --quiet )
-	    export HPCRUN_QUIET=1
-	    ;;
-
-	# --------------------------------------------------
-
 	-f | -fp | --process-fraction )
 	    arg_ok "$1" || die "missing argument for $arg"
 	    export HPCRUN_PROCESS_FRACTION="$1"