Clsotog/coresight mmap #36

clsotog · 2024-12-18T16:08:02Z

We got the following error when running perf to trace ETE on multi-socket systems with more than 108 cpus.
./perf record -e cs_etm//u ls
failed to mmap with 12 (Cannot allocate memory)

The above is per-process monitoring that initiates ETE tracing on all CPUs.
System wide tracing with limited CPUs works, e.g.: perf record -e cs_etm//u -C 0-108

Current finding is related to the max number ETE trace id, which is 128 ids on Demeter.

Nvbug: https://nvbugspro.nvidia.com/bug/4678994

Lore discusion:https://lore.kernel.org/lkml/[email protected]/

There was one patch from the lore discussion that was not accepted upstream so did not include it in this PR request.
(https://lore.kernel.org/lkml/[email protected]/)
I tried with or without and did not see difference in functionality.

BugLink: https://bugs.launchpad.net/bugs/2091186 The HMAT messages printed at boot, beyond being noisy, can also print details for nodes that are not yet enabled. The primary method to consume HMAT details is via sysfs, and the sysfs interface gates what is emitted by whether the node is online or not. Hide the messages by default by moving them from "info" to "debug" log level. Otherwise, these prints are just a pretty-print way to dump the ACPI HMAT table. It has always been the case that post-analysis was required for these messages to map proximity-domains to Linux NUMA nodes, and as Priya points out that analysis also needs to consider whether the proximity domain is marked "enabled" in the SRAT. Reported-by: Priya Autee <[email protected]> Signed-off-by: Dan Williams <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Link: https://patch.msgid.link/170668982094.318782.2963631284830500182.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dave Jiang <[email protected]> (cherry picked from commit e2b952ffafced49fa6bd5cdc90f472b8bd932b5d cxl-next) Signed-off-by: Carol L Soto <[email protected] Acked-by: Brad Figg <[email protected]> Acked-by: Jacob Martin <[email protected]> Acked-by: Noah Wager <[email protected]> Signed-off-by: Brad Figg <[email protected]>

Activated has the specific meaning of a sink that's selected for use by the user via sysfs. But comments in some code that's shared by Perf use the same word, so in those cases change them to just say "selected" instead. With selected implying either via Perf or "activated" via sysfs. coresight_get_enabled_sink() doesn't actually get an enabled sink, it only gets an activated one, so change that too. And change the activated variable name to include "sysfs" so it can't be confused as a general status. Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Suzuki K Poulose <[email protected]> (cherry picked from commit a0fef3f) Signed-off-by: Carol L Soto <[email protected]>

The check for the existence of callbacks before using them implies that this happens and is supported. There are no devices without enable/disable callbacks, and it wouldn't be possible to add a new working device without adding them either, so just remove them. Furthermore, there are more callbacks than just enable and disable that are already used unguarded in other places. The comment about new session compatibility doesn't seem to match up to the line of code that it's on so remove it. I think it's alluding to the fact that sinks will check if they were already enabled via sysfs or Perf and fail the enable. But there are more detailed comments at those places, and this one isn't very useful. Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Suzuki K Poulose <[email protected]> (cherry picked from commit a11ebe1) Signed-off-by: Carol L Soto <[email protected]>

Most devices use mode, so move the mode definition out of the individual devices and up to the Coresight device. This will allow the core code to also know the mode which will be useful in a later commit. This also fixes the inconsistency of the documentation of the mode field on the individual device types. For example ETB10 had "this ETB is being used". Two devices didn't require an atomic mode type, so these usages have been converted to atomic_get() and atomic_set() only to make it compile, but the documentation of the field in struct coresight_device explains this type of usage. In the future, manipulation of the mode could be completely moved out of the individual devices and into the core code because it's almost all duplicate code, and this change is a step towards that. Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Suzuki K Poulose <[email protected]> (cherry picked from commit 9cae77c) Signed-off-by: Carol L Soto <[email protected]>

'enable', which probably should have been 'enabled', is only ever read in the core code in relation to controlling sources, and specifically only sources in sysfs mode. Confusingly it's not labelled as such and relying on it can be a source of bugs like the one fixed by commit 078dbba3f0c9 ("coresight: Fix crash when Perf and sysfs modes are used concurrently"). Most importantly, it can only be used when the coresight_mutex is held which is only done when enabling and disabling paths in sysfs mode, and not Perf mode. So to prevent its usage spreading and leaking out to other devices, remove it. It's use is equivalent to checking if the mode is currently sysfs, as due to the coresight_mutex lock, mode == CS_MODE_SYSFS can only become true or untrue when that lock is held, and when mode == CS_MODE_SYSFS the device is both enabled and in sysfs mode. The one place it was used outside of the core code is in TPDA, but that pattern is more appropriately represented using refcounts inside the device's own spinlock. Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Suzuki K Poulose <[email protected]> (cherry picked from commit d5e83f9) Signed-off-by: Carol L Soto <[email protected]>

At the moment the core file contains both sysfs functionality and core functionality, while the Perf mode is in a separate file in coresight-etm-perf.c Many of the functions have ambiguous names like coresight_enable_source() which actually only work in relation to the sysfs mode. To avoid further confusion, move everything that isn't core functionality into the sysfs file and append _sysfs to the ambiguous functions. Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Suzuki K Poulose <[email protected]> (cherry picked from commit 1f5149c) Signed-off-by: Carol L Soto <[email protected]>

Refcnt is only ever accessed from either inside the coresight_mutex, or the device's spinlock, making the atomic type and atomic_dec_return() calls confusing and unnecessary. The only point of synchronisation outside of these two types of locks is already done with a compare and swap on 'mode', which a comment has been added for. There was one instance of refcnt being used outside of a lock in TPIU, but that can easily be fixed by making it the same as all the other devices and adding a spinlock. Potentially in the future all the refcounting and locking can be moved up into the core code, and all the mostly duplicate code from the individual devices can be removed. Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Suzuki K Poulose <[email protected]> (cherry picked from commit 4545b38) Signed-off-by: Carol L Soto <[email protected]>

These are a bit annoying to keep up to date when the function signatures change. But if CONFIG_CORESIGHT isn't enabled, then they're not used anyway so just delete them. Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Suzuki K Poulose <[email protected]> (cherry picked from commit 053ad9a) Signed-off-by: Carol L Soto <[email protected]>

These could potentially become wrong silently if the enum is changed, so explicitly initialize them. Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Suzuki K Poulose <[email protected]> (cherry picked from commit 812265e) Signed-off-by: Carol L Soto <[email protected]>

Now that mode is in struct coresight_device, this pattern can be wrapped in a helper. Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Suzuki K Poulose <[email protected]> (cherry picked from commit d724f65) Signed-off-by: Carol L Soto <[email protected]>

Now that mode is in struct coresight_device accesses can be wrapped. Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Suzuki K Poulose <[email protected]> (cherry picked from commit c95c273) Signed-off-by: Carol L Soto <[email protected]>

Now that mode is in struct coresight_device, sets can be wrapped. This also allows us to add a sanity check that there have been no concurrent modifications of mode. Currently all usages of local_set() were inside the device's spin locks so this new warning shouldn't be triggered. coresight_take_mode() could maybe have been used in place of adding the warning, but there may be use cases which set the mode to the same mode which are valid but would fail in coresight_take_mode() because it requires the device to only be in the disabled state. Signed-off-by: James Clark <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Suzuki K Poulose <[email protected]> (cherry picked from commit bcaabb9) Signed-off-by: Carol L Soto <[email protected]>

Currently it's only possible to initialize with the default number of queues and then use auxtrace_queues__add_event() to grow the array. But that's problematic if you don't have a real event to pass into that function yet. The queues hold a void *priv member to store custom state, and for Coresight we want to create decoders upfront before receiving data, so add a new function that allows pre-allocating queues. One reason to do this is because we might need to store metadata (HW_ID events) that effects other queues, but never actually receive auxtrace data on that queue. Reviewed-by: Anshuman Khandual <[email protected]> Signed-off-by: James Clark <[email protected]> Tested-by: Ganapatrao Kulkarni <[email protected]> Acked-by: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steve Clevenger <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit ee73fe9) Signed-off-by: Carol L Soto <[email protected]>

The likely fix for this is to update perf so print a helpful message. Signed-off-by: James Clark <[email protected]> Tested-by: Ganapatrao Kulkarni <[email protected]> Acked-by: Anshuman Khandual <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steve Clevenger <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit 0d2e3f2) Signed-off-by: Carol L Soto <[email protected]>

Additional helpers to better replace perf_cpu_map__has_any_cpu_or_is_empty(). Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: Andrew Jones <[email protected]> Cc: André Almeida <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Atish Patra <[email protected]> Cc: Changbin Du <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Clark <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yang Li <[email protected]> Cc: Yanteng Si <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit b6b4a62) Signed-off-by: Carol L Soto <[email protected]>

Rather than iterate all CPUs and see if they are in CPU maps, directly iterate the CPU map. Similarly make use of the intersect function taking care for when "any" CPU is specified. Switch perf_cpu_map__has_any_cpu_or_is_empty() to more appropriate alternatives. Reviewed-by: James Clark <[email protected]> Signed-off-by: Ian Rogers <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: Andrew Jones <[email protected]> Cc: André Almeida <[email protected]> Cc: Athira Rajeev <[email protected]> Cc: Atish Patra <[email protected]> Cc: Changbin Du <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: K Prateek Nayak <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Paran Lee <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sandipan Das <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Steinar H. Gunderson <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yang Jihong <[email protected]> Cc: Yang Li <[email protected]> Cc: Yanteng Si <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit e28ee12) Signed-off-by: Carol L Soto <[email protected]>

The perf_cpu struct makes some iterators simpler and avoids some mistakes with interchanging CPU IDs with indexes etc. At the moment in this file the conversion to an integer is done somewhere in the middle of the call tree. Change it to delay the conversion to an int until the leaf functions. Some of the usage patterns are duplicated, so instead of changing them all, make cs_etm_get_ro() more reusable and use that everywhere. cs_etm_get_ro() didn't return an error before, but return one now so that it can also be used where an error is needed. Continue to ignore the error where it was already ignored. Use cs_etm_pmu_path_exists() instead of cs_etm_get_ro() in cs_etm_is_etmv4() because cs_etm_get_ro() prints a warning, but path exists is sufficient for this use case. Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mike Leach <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit cbaf2c4) Signed-off-by: Carol L Soto <[email protected]>

Both of these passes gather information about how to create the decoders. AUX records determine formatted/unformatted, and the HW_IDs determine the traceID/metadata mappings. Therefore it makes sense to cache the information and wait until both passes are over until creating the decoders, rather than creating them at the first HW_ID found. This will allow a simplification of the creation process where cs_etm_queue->traceid_list will exclusively used to create the decoders, rather than the current two methods depending on whether the trace is formatted or not. Previously the sample CPU from the AUX record was used to initialize the decoder CPU, but actually sample CPU == AUX queue index in per-CPU mode, so saving the sample CPU isn't required. Similarly formatted/unformatted was used upfront to create the decoders, but now it's cached until later. Reviewed-by: Anshuman Khandual <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: James Clark <[email protected]> Tested-by: Ganapatrao Kulkarni <[email protected]> Tested-by: Leo Yan <[email protected]> Acked-by: Suzuki Poulouse <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit b6aa0de) Signed-off-by: Carol L Soto <[email protected]>

Make cs_etm__setup_queue() setup a queue even if it's empty, and pre-allocate queues based on the max CPU that was recorded. In per-CPU mode aux queues are indexed based on CPU ID even if all CPUs aren't recorded, sparse queue arrays aren't used. This will allow HW_IDs to be saved even if no aux data was received in that queue without having to call cs_etm__setup_queue() from two different places. Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: James Clark <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit 57880a7) Signed-off-by: Carol L Soto <[email protected]>

The global list won't work for per-sink trace ID allocations, so put a list in each queue where the IDs will be unique to that queue. To keep the same behavior as before, for version 0 of the HW_ID packets, copy all the HW_ID mappings into all queues. This change doesn't effect the decoders, only trace ID lookups on the Perf side. The decoders are still created with global mappings which will be fixed in a later commit. Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: James Clark <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit 77c123f) Signed-off-by: Carol L Soto <[email protected]>

Now that each queue has a unique set of trace ID mappings, use this list to create the decoders. In unformatted mode just add a single mapping so only one decoder is made. Previously each queue would have a decoder created for each traced CPU on the system but this won't work anymore because CPUs can have overlapping trace IDs. This also means that the CORESIGHT_TRACE_ID_UNUSED_FLAG isn't needed any more. If mappings aren't added then decoders aren't created, rather than needing a flag to suppress creation. Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: James Clark <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit 19c3e4d) Signed-off-by: Carol L Soto <[email protected]>

This isn't a bug because Perf always masks with CORESIGHT_TRACE_ID_VAL_MASK before using these values, but to avoid it looking like it could be, make an effort to not save bad values. Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit 940007c) Signed-off-by: Carol L Soto <[email protected]>

v0.1 HW_ID packets have a new field that describes which sink each CPU writes to. Use the sink ID to link trace ID maps to each other so that mappings are shared wherever the sink is shared. Also update the error message to show that overlapping IDs aren't an error in per-thread mode, just not supported. In the future we can use the CPU ID from the AUX records, or watch for changing sink IDs on HW_ID packets to use the correct decoders. Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: James Clark <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit 1506af6) Signed-off-by: Carol L Soto <[email protected]>

Now that we have overlapping trace IDs it's also useful to know what the queue number is to be able to distinguish the source of the trace so print it inline. Hide it behind the -v option because it might not be obvious to users what the queue number is. Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexandre Torgue <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: John Garry <[email protected]> Cc: Kan Liang <[email protected]> Cc: Leo Yan <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Suzuki Poulouse <[email protected]> Cc: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: James Clark <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> (cherry picked from commit 022aa67) Signed-off-by: Carol L Soto <[email protected]>

This file is never included anywhere if CONFIG_CORESIGHT is not set so they are unused and aren't currently compile tested with any config so remove them. Reviewed-by: Anshuman Khandual <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Tested-by: Leo Yan <[email protected]> Tested-by: Ganapatrao Kulkarni <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 3417200) Signed-off-by: Carol L Soto <[email protected]>

"Process being monitored" and "pid of the process to monitor" imply that this would be the same PID if there were two sessions targeting the same process. But this is actually the PID of the process that did the Perf event open call, rather than the target of the session. So update the comments to make this clearer. Reviewed-by: Anshuman Khandual <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Tested-by: Leo Yan <[email protected]> Tested-by: Ganapatrao Kulkarni <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit eda1d11) Signed-off-by: Carol L Soto <[email protected]>

The trace ID maps will need to be created and stored by the core and Perf code so move the definition up to the common header. Reviewed-by: Anshuman Khandual <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Tested-by: Leo Yan <[email protected]> Tested-by: Ganapatrao Kulkarni <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit acb0184) Signed-off-by: Carol L Soto <[email protected]>

The trace ID API is currently hard coded to always use the global map. Add public versions that allow the map to be passed in so that Perf mode can use per-sink maps. Keep the non-map versions so that sysfs mode can continue to use the default global map. System ID functions are unchanged because they will always use the default map. Signed-off-by: James Clark <[email protected]> Reviewed-by: Mike Leach <[email protected]> Tested-by: Leo Yan <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 7e52877) Signed-off-by: Carol L Soto <[email protected]>

The global CPU ID mappings won't work for per-sink ID maps so move it to the ID map struct. coresight_trace_id_release_all_pending() is hard coded to operate on the default map, but once Perf sessions use their own maps the pending release mechanism will be deleted. So it doesn't need to be extended to accept a trace ID map argument at this point. Signed-off-by: James Clark <[email protected]> Reviewed-by: Mike Leach <[email protected]> Tested-by: Leo Yan <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit d53c825) Signed-off-by: Carol L Soto <[email protected]>

This will allow sessions with more than CORESIGHT_TRACE_IDS_MAX ETMs as long as there are fewer than that many ETMs connected to each sink. Each sink owns its own trace ID map, and any Perf session connecting to that sink will allocate from it, even if the sink is currently in use by other users. This is similar to the existing behavior where the dynamic trace IDs are constant as long as there is any concurrent Perf session active. It's not completely optimal because slightly more IDs will be used than necessary, but the optimal solution involves tracking the PIDs of each session and allocating ID maps based on the session owner. This is difficult to do with the combination of per-thread and per-cpu modes and some scheduling issues. The complexity of this isn't likely to worth it because even with multiple users they'd just see a difference in the ordering of ID allocations rather than hitting any limits (unless the hardware does have too many ETMs connected to one sink). Signed-off-by: James Clark <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 5ad628a) Signed-off-by: Carol L Soto <[email protected]>

Pending the release of IDs was a way of managing concurrent sysfs and Perf sessions in a single global ID map. Perf may have finished while sysfs hadn't, and Perf shouldn't release the IDs in use by sysfs and vice versa. Now that Perf uses its own exclusive ID maps, pending release doesn't result in any different behavior than just releasing all IDs when the last Perf session finishes. As part of the per-sink trace ID change, we would have still had to make the pending mechanism work on a per-sink basis, due to the overlapping ID allocations, so instead of making that more complicated, just remove it. Signed-off-by: James Clark <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit de0029f) Signed-off-by: Carol L Soto <[email protected]>

For Perf to be able to decode when per-sink trace IDs are used, emit the sink that's being written to for each ETM. Perf currently errors out if it sees a newer packet version so instead of bumping it, add a new minor version field. This can be used to signify new versions that have backwards compatible fields. Considering this change is only for high core count machines, it doesn't make sense to make a breaking change for everyone. Signed-off-by: James Clark <[email protected]> Tested-by: Leo Yan <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 487eec8) Signed-off-by: Carol L Soto <[email protected]>

Reduce contention on the lock by replacing the global lock with one for each map. Signed-off-by: James Clark <[email protected]> Reviewed-by: Mike Leach <[email protected]> Signed-off-by: James Clark <[email protected]> Signed-off-by: Suzuki K Poulose <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 988d40a) Signed-off-by: Carol L Soto <[email protected]>

clsotog added 30 commits December 9, 2024 12:06

clsotog added 3 commits December 17, 2024 16:07

nvidia-bfigg force-pushed the 24.04_linux-nvidia branch from 5dfc765 to 5ac091f Compare December 19, 2024 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clsotog/coresight mmap #36

Clsotog/coresight mmap #36

clsotog commented Dec 18, 2024

Clsotog/coresight mmap #36

Are you sure you want to change the base?

Clsotog/coresight mmap #36

Conversation

clsotog commented Dec 18, 2024