Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffle gpu serde #28

Open
wants to merge 25 commits into
base: 0625
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
3a984f2
Support serializing packed tables directly for shuffle write
firestarman May 6, 2024
baadb4b
Disble GPU serde for the AQE tests
firestarman May 15, 2024
11e933d
Disable by default
firestarman May 16, 2024
6e8bb5c
Fix a build error
firestarman May 17, 2024
d6082ae
Address comments
firestarman May 20, 2024
0419224
Merge branch 'branch-24.06' of github.com:NVIDIA/spark-rapids into sh…
firestarman May 20, 2024
99820e1
Support buffering small tables for Shuffle read
firestarman May 27, 2024
9727161
Merge remote-tracking branch 'NVDA/branch-24.08' into shuffle-gpu-serde
firestarman May 28, 2024
1bb4cfc
Moving split batches to host by a single copying
firestarman May 29, 2024
b3b5b5e
Add GpuBucketingUtils shim to Spark 4.0.0 (#11092)
razajafri Jun 25, 2024
6455396
Improve the diagnostics for 'conv' fallback explain (#11076)
jihoonson Jun 25, 2024
34e6bc8
Disable ANSI mode for window function tests [databricks] (#11073)
mythrocks Jun 26, 2024
3cb54c4
Fix some test issues in Spark UT and keep RapidsTestSettings update-t…
thirtiseven Jun 27, 2024
9dafc54
exclude a case based on JDK version (#11083)
thirtiseven Jun 27, 2024
3b6c5cd
Replaced spark3xx-common references to spark-shared [databricks] (#11…
razajafri Jun 28, 2024
7dc52bc
Fixed some cast_tests (#11049)
razajafri Jun 28, 2024
dd62000
Fixed array_tests for Spark 4.0.0 [databricks] (#11048)
razajafri Jun 28, 2024
f954026
Add a heuristic to skip second or third agg pass (#10950)
binmahone Jun 29, 2024
2498204
Support regex patterns with brackets when rewriting to PrefixRange pa…
thirtiseven Jun 29, 2024
f56fe2c
Fix match error in RapidsShuffleIterator.scala [scala2.13] (#11115)
xieshuaihu Jul 1, 2024
850365c
Spark 4: Handle ANSI mode in sort_test.py (#11099)
mythrocks Jul 1, 2024
9bb295a
Introduce LORE framework. (#11084)
liurenjie1024 Jul 2, 2024
b52038e
Merge branch 'branch-24.08' into shuffle-gpu-serde
firestarman Jul 2, 2024
b4ea48f
d1
firestarman Jul 4, 2024
8c4b318
retry when copying data to host for merged buffers
firestarman Jul 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions build/coverage-report
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash

#
# Copyright (c) 2020-2022, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2020-2024, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -33,11 +33,11 @@ SOURCE_WITH_ARGS="--sourcefiles "$(echo $SOURCE_DIRS | sed -e 's/:/ --sourcefile
rm -rf "$TMP_CLASS"
mkdir -p "$TMP_CLASS"
pushd "$TMP_CLASS"
jar xf "$DIST_JAR" com org rapids spark3xx-common "spark${SPK_VER}/"
jar xf "$DIST_JAR" com org rapids spark-shared "spark${SPK_VER}/"
# extract the .class files in udf jar and replace the existing ones in spark3xx-ommon and spark$SPK_VER
# because the class files in udf jar will be modified in aggregator's shade phase
jar xf "$UDF_JAR" com/nvidia/spark/udf
rm -rf com/nvidia/shaded/ org/openucx/ spark3xx-common/com/nvidia/spark/udf/ spark${SPK_VER}/com/nvidia/spark/udf/
rm -rf com/nvidia/shaded/ org/openucx/ spark-shared/com/nvidia/spark/udf/ spark${SPK_VER}/com/nvidia/spark/udf/
popd

if [ ! -f "$JDEST" ]; then
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2023, NVIDIA CORPORATION.
* Copyright (c) 2023-2024, NVIDIA CORPORATION.
*
* This file was derived from OptimizeWriteExchange.scala
* in the Delta Lake project at https://github.com/delta-io/delta
Expand All @@ -26,7 +26,7 @@ import scala.concurrent.Future
import scala.concurrent.duration.Duration

import com.databricks.sql.transaction.tahoe.sources.DeltaSQLConf
import com.nvidia.spark.rapids.{GpuColumnarBatchSerializer, GpuExec, GpuMetric, GpuPartitioning, GpuRoundRobinPartitioning}
import com.nvidia.spark.rapids.{GpuColumnarBatchSerializer, GpuExec, GpuMetric, GpuPartitioning, GpuRoundRobinPartitioning, RapidsConf}
import com.nvidia.spark.rapids.delta.RapidsDeltaSQLConf

import org.apache.spark.{MapOutputStatistics, ShuffleDependency}
Expand All @@ -39,6 +39,7 @@ import org.apache.spark.sql.execution.{CoalescedPartitionSpec, ShufflePartitionS
import org.apache.spark.sql.execution.exchange.Exchange
import org.apache.spark.sql.execution.metric.{SQLMetrics, SQLShuffleReadMetricsReporter, SQLShuffleWriteMetricsReporter}
import org.apache.spark.sql.rapids.execution.{GpuShuffleExchangeExecBase, ShuffledBatchRDD}
import org.apache.spark.sql.types.DataType
import org.apache.spark.sql.vectorized.ColumnarBatch
import org.apache.spark.util.ThreadUtils

Expand Down Expand Up @@ -84,6 +85,8 @@ case class GpuOptimizeWriteExchangeExec(
createNanoTimingMetric(DEBUG_LEVEL, "rs. shuffle combine time"),
"rapidsShuffleWriteIoTime" ->
createNanoTimingMetric(DEBUG_LEVEL, "rs. shuffle write io time"),
"rapidsShufflePartitionTime" ->
createNanoTimingMetric(DEBUG_LEVEL, "rs. shuffle partition time"),
"rapidsShuffleReadTime" ->
createNanoTimingMetric(ESSENTIAL_LEVEL, "rs. shuffle read time")
) ++ GpuMetric.wrap(readMetrics) ++ GpuMetric.wrap(writeMetrics)
Expand All @@ -97,8 +100,12 @@ case class GpuOptimizeWriteExchangeExec(
) ++ additionalMetrics
}

private lazy val serializer: Serializer =
new GpuColumnarBatchSerializer(gpuLongMetric("dataSize"))
private lazy val sparkTypes: Array[DataType] = child.output.map(_.dataType).toArray

private lazy val serializer: Serializer = new GpuColumnarBatchSerializer(
gpuLongMetric("dataSize"), allMetrics("rapidsShuffleSerializationTime"),
allMetrics("rapidsShuffleDeserializationTime"), partitioning.serdeOnGPU,
sparkTypes, new RapidsConf(conf).gpuTargetBatchSizeBytes)

@transient lazy val inputRDD: RDD[ColumnarBatch] = child.executeColumnar()

Expand All @@ -116,7 +123,7 @@ case class GpuOptimizeWriteExchangeExec(
inputRDD,
child.output,
partitioning,
child.output.map(_.dataType).toArray,
sparkTypes,
serializer,
useGPUShuffle=partitioning.usesGPUShuffle,
useMultiThreadedShuffle=partitioning.usesMultiThreadedShuffle,
Expand Down
4 changes: 4 additions & 0 deletions delta-lake/delta-20x/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@
</properties>

<dependencies>
<dependency>
<groupId>org.roaringbitmap</groupId>
<artifactId>RoaringBitmap</artifactId>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-sql_${scala.binary.version}</artifactId>
Expand Down
4 changes: 4 additions & 0 deletions delta-lake/delta-21x/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@
</properties>

<dependencies>
<dependency>
<groupId>org.roaringbitmap</groupId>
<artifactId>RoaringBitmap</artifactId>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-sql_${scala.binary.version}</artifactId>
Expand Down
4 changes: 4 additions & 0 deletions delta-lake/delta-22x/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@
</properties>

<dependencies>
<dependency>
<groupId>org.roaringbitmap</groupId>
<artifactId>RoaringBitmap</artifactId>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-sql_${scala.binary.version}</artifactId>
Expand Down
4 changes: 4 additions & 0 deletions delta-lake/delta-23x/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@
</properties>

<dependencies>
<dependency>
<groupId>org.roaringbitmap</groupId>
<artifactId>RoaringBitmap</artifactId>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-sql_${scala.binary.version}</artifactId>
Expand Down
4 changes: 4 additions & 0 deletions delta-lake/delta-24x/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@
</properties>

<dependencies>
<dependency>
<groupId>org.roaringbitmap</groupId>
<artifactId>RoaringBitmap</artifactId>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-sql_${scala.binary.version}</artifactId>
Expand Down
4 changes: 4 additions & 0 deletions delta-lake/delta-spark330db/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@
</properties>

<dependencies>
<dependency>
<groupId>org.roaringbitmap</groupId>
<artifactId>RoaringBitmap</artifactId>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-sql_${scala.binary.version}</artifactId>
Expand Down
4 changes: 4 additions & 0 deletions delta-lake/delta-spark332db/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@
</properties>

<dependencies>
<dependency>
<groupId>org.roaringbitmap</groupId>
<artifactId>RoaringBitmap</artifactId>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-sql_${scala.binary.version}</artifactId>
Expand Down
4 changes: 4 additions & 0 deletions delta-lake/delta-spark341db/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@
</properties>

<dependencies>
<dependency>
<groupId>org.roaringbitmap</groupId>
<artifactId>RoaringBitmap</artifactId>
</dependency>
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-sql_${scala.binary.version}</artifactId>
Expand Down
5 changes: 0 additions & 5 deletions dist/scripts/binary-dedupe.sh
Original file line number Diff line number Diff line change
Expand Up @@ -127,11 +127,6 @@ done

mv "$SPARK_SHARED_DIR" parallel-world/

# TODO further dedupe by FEATURE version lines:
# spark30x-common
# spark31x-common
# spark32x-common

# Verify that all class files in the conventional jar location are bitwise
# identical regardless of the Spark-version-specific jar.
#
Expand Down
3 changes: 3 additions & 0 deletions docs/additional-functionality/advanced_configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ Name | Description | Default Value | Applicable at
<a name="shuffle.ucx.activeMessages.forceRndv"></a>spark.rapids.shuffle.ucx.activeMessages.forceRndv|Set to true to force 'rndv' mode for all UCX Active Messages. This should only be required with UCX 1.10.x. UCX 1.11.x deployments should set to false.|false|Startup
<a name="shuffle.ucx.managementServerHost"></a>spark.rapids.shuffle.ucx.managementServerHost|The host to be used to start the management server|null|Startup
<a name="shuffle.ucx.useWakeup"></a>spark.rapids.shuffle.ucx.useWakeup|When set to true, use UCX's event-based progress (epoll) in order to wake up the progress thread when needed, instead of a hot loop.|true|Startup
<a name="sql.agg.skipAggPassReductionRatio"></a>spark.rapids.sql.agg.skipAggPassReductionRatio|In non-final aggregation stages, if the previous pass has a row reduction ratio greater than this value, the next aggregation pass will be skipped.Setting this to 1 essentially disables this feature.|1.0|Runtime
<a name="sql.allowMultipleJars"></a>spark.rapids.sql.allowMultipleJars|Allow multiple rapids-4-spark, spark-rapids-jni, and cudf jars on the classpath. Spark will take the first one it finds, so the version may not be expected. Possisble values are ALWAYS: allow all jars, SAME_REVISION: only allow jars with the same revision, NEVER: do not allow multiple jars at all.|SAME_REVISION|Startup
<a name="sql.castDecimalToFloat.enabled"></a>spark.rapids.sql.castDecimalToFloat.enabled|Casting from decimal to floating point types on the GPU returns results that have tiny difference compared to results returned from CPU.|true|Runtime
<a name="sql.castFloatToDecimal.enabled"></a>spark.rapids.sql.castFloatToDecimal.enabled|Casting from floating point types to decimal on the GPU returns results that have tiny difference compared to results returned from CPU.|true|Runtime
Expand Down Expand Up @@ -135,6 +136,8 @@ Name | Description | Default Value | Applicable at
<a name="sql.json.read.decimal.enabled"></a>spark.rapids.sql.json.read.decimal.enabled|When reading a quoted string as a decimal Spark supports reading non-ascii unicode digits, and the RAPIDS Accelerator does not.|true|Runtime
<a name="sql.json.read.double.enabled"></a>spark.rapids.sql.json.read.double.enabled|JSON reading is not 100% compatible when reading doubles.|true|Runtime
<a name="sql.json.read.float.enabled"></a>spark.rapids.sql.json.read.float.enabled|JSON reading is not 100% compatible when reading floats.|true|Runtime
<a name="sql.lore.dumpPath"></a>spark.rapids.sql.lore.dumpPath|The path to dump the LORE nodes' input data. This must be set if spark.rapids.sql.lore.idsToDump has been set. The data of each LORE node will be dumped to a subfolder with name 'loreId-<LORE id>' under this path. For more details, please refer to [the LORE documentation](../dev/lore.md).|None|Runtime
<a name="sql.lore.idsToDump"></a>spark.rapids.sql.lore.idsToDump|Specify the LORE ids of operators to dump. The format is a comma separated list of LORE ids. For example: "1[0]" will dump partition 0 of input of gpu operator with lore id 1. For more details, please refer to [the LORE documentation](../dev/lore.md). If this is not set, no data will be dumped.|None|Runtime
<a name="sql.mode"></a>spark.rapids.sql.mode|Set the mode for the Rapids Accelerator. The supported modes are explainOnly and executeOnGPU. This config can not be changed at runtime, you must restart the application for it to take affect. The default mode is executeOnGPU, which means the RAPIDS Accelerator plugin convert the Spark operations and execute them on the GPU when possible. The explainOnly mode allows running queries on the CPU and the RAPIDS Accelerator will evaluate the queries as if it was going to run on the GPU. The explanations of what would have run on the GPU and why are output in log messages. When using explainOnly mode, the default explain output is ALL, this can be changed by setting spark.rapids.sql.explain. See that config for more details.|executeongpu|Startup
<a name="sql.optimizer.joinReorder.enabled"></a>spark.rapids.sql.optimizer.joinReorder.enabled|When enabled, joins may be reordered for improved query performance|true|Runtime
<a name="sql.python.gpu.enabled"></a>spark.rapids.sql.python.gpu.enabled|This is an experimental feature and is likely to change in the future. Enable (true) or disable (false) support for scheduling Python Pandas UDFs with GPU resources. When enabled, pandas UDFs are assumed to share the same GPU that the RAPIDs accelerator uses and will honor the python GPU configs|false|Runtime
Expand Down
70 changes: 70 additions & 0 deletions docs/dev/lore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
layout: page
title: The Local Replay Framework
nav_order: 13
parent: Developer Overview
---

# Local Replay Framework

## Overview

LORE (the local replay framework) is a tool that allows developer to replay the execution of a
gpu operator in local environment, so that developer could debug and profile the operator for
performance analysis. In high level it works as follows:

1. Each gpu operator will be assigned a LORE id, which is a unique identifier for the operator.
This id is guaranteed to be unique within the same query, and guaranteed to be same when two
sql executions have same sql, same configuration, and same data.
2. In the first run of the query, developer could found the LORE id of the operator they are
interested in by checking spark ui, where LORE id usually appears in the arguments of operator.
3. In the second run of the query, developer needs to configure the LORE ids of the operators they
are interested in, and LORE will dump the input data of the operator to given path.
4. Developer could copy the dumped data to local environment, and replay the operator in local
environment.

## Configuration

By default, LORE id will always be generated for operators, but user could disable this behavior
by setting `spark.rapids.sql.lore.tag.enabled` to `false`.

To tell LORE the LORE ids of the operators you are interested in, you need to set
`spark.rapids.sql.lore.idsToDump`. For example, you could set it to "1[*], 2[*], 3[*]" to tell
LORE to dump all partitions of input data of operators with id 1, 2, or 3. You can also only dump
some partition of the operator's input by appending partition numbers to lore ids. For example,
"1[0 4-6 7], 2[*]" tell LORE to dump operator with LORE id 1, but only dump partition 0, 4, 5, 6,
and 7. But for operator with LORE id 2, it will dump all partitions.

You also need to set `spark.rapids.sql.lore.dumpPath` to tell LORE where to dump the data, the
value of which should point to a directory. All dumped data of a query will live in this
directory. A typical directory hierarchy would look like this:

```console
+ loreId-10/
- plan.meta
+ input-0/
- rdd.meta
+ partition-0/
- partition.meta
- batch-0.parquet
- batch-1.parquet
+ partition-1/
- partition.meta
- batch-0.parquet
+ input-1/
- rdd.meta
+ partition-0/
- partition.meta
- batch-0.parquet
- batch-1.parquet

+ loreId-15/
- plan.meta
+ input-0/
- rdd.meta
+ partition-0/
- partition.meta
- batch-0.parquet
```


30 changes: 15 additions & 15 deletions docs/dev/shims.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,15 +69,15 @@ Spark 3.0.2's URLs:

```text
jar:file:/home/spark/rapids-4-spark_2.12-24.08.0.jar!/
jar:file:/home/spark/rapids-4-spark_2.12-24.08.0.jar!/spark3xx-common/
jar:file:/home/spark/rapids-4-spark_2.12-24.08.0.jar!/spark-shared/
jar:file:/home/spark/rapids-4-spark_2.12-24.08.0.jar!/spark302/
```

Spark 3.2.0's URLs :

```text
jar:file:/home/spark/rapids-4-spark_2.12-24.08.0.jar!/
jar:file:/home/spark/rapids-4-spark_2.12-24.08.0.jar!/spark3xx-common/
jar:file:/home/spark/rapids-4-spark_2.12-24.08.0.jar!/spark-shared/
jar:file:/home/spark/rapids-4-spark_2.12-24.08.0.jar!/spark320/
```

Expand Down Expand Up @@ -143,7 +143,7 @@ This has two pre-requisites:

1. The .class file with the bytecode is bitwise-identical among the currently
supported Spark versions. To verify this you can inspect the dist jar and check
if the class file is under `spark3xx-common` jar entry. If this is not the case then
if the class file is under `spark-shared` jar entry. If this is not the case then
code should be refactored until all discrepancies are shimmed away.
1. The transitive closure of the classes compile-time-referenced by `A` should
have the property above.
Expand Down Expand Up @@ -181,28 +181,28 @@ mv org com ai public/
and you will see the dependencies of `public` classes. By design `public` classes
should have only edges only to other `public` classes in the dist jar.

Execute `jdeps` against `public`, `spark3xx-common` and an *exactly one* parallel
Execute `jdeps` against `public`, `spark-shared` and an *exactly one* parallel
world such as `spark330`

```bash
${JAVA_HOME}/bin/jdeps -v \
-dotoutput /tmp/jdeps330 \
-regex '(com|org)\..*\.rapids\..*' \
public spark3xx-common spark330
public spark-shared spark330
```

This will produce three DOT files for each "archive" with directed edges for
a class in the archive to a class either in this or another archive.

Looking at an output file, e.g. `/tmp/jdeps330/spark3xx-common.dot`,
Looking at an output file, e.g. `/tmp/jdeps330/spark-shared.dot`,
unfortunately you see that jdeps does not label the source class node but labels
the target class node of an edge. Thus the graph is incorrect as it breaks paths
if a node has both incoming and outgoing edges.

```bash
$ grep 'com.nvidia.spark.rapids.GpuFilterExec\$' spark3xx-common.dot
$ grep 'com.nvidia.spark.rapids.GpuFilterExec\$' spark-shared.dot
"com.nvidia.spark.rapids.GpuFilterExec$" -> "com.nvidia.spark.rapids.GpuFilterExec (spark330)";
"com.nvidia.spark.rapids.GpuOverrides$$anon$204" -> "com.nvidia.spark.rapids.GpuFilterExec$ (spark3xx-common)";
"com.nvidia.spark.rapids.GpuOverrides$$anon$204" -> "com.nvidia.spark.rapids.GpuFilterExec$ (spark-shared)";
```

So first create and `cd` to some other directory `/tmp/jdep330.processed` to massage
Expand All @@ -214,8 +214,8 @@ that the source nodes are guaranteed to be from the `<archive>`.
```bash
sed 's/"\([^(]*\)"\(\s*->.*;\)/"\1 (public)"\2/' \
/tmp/jdeps330/public.dot > public.dot
sed 's/"\([^(]*\)"\(\s*->.*;\)/"\1 (spark3xx-common)"\2/' \
/tmp/jdeps330/spark3xx-common.dot > spark3xx-common.dot
sed 's/"\([^(]*\)"\(\s*->.*;\)/"\1 (spark-shared)"\2/' \
/tmp/jdeps330/spark-shared.dot > spark-shared.dot
sed 's/"\([^(]*\)"\(\s*->.*;\)/"\1 (spark330)"\2/' \
/tmp/jdeps330/spark330.dot > spark330.dot
```
Expand All @@ -224,7 +224,7 @@ Next you need to union edges of all three graphs into a single graph to be able
to analyze cross-archive paths.

```bash
cat public.dot spark3xx-common.dot spark330.dot | \
cat public.dot spark-shared.dot spark330.dot | \
tr '\n' '\r' | \
sed 's/}\rdigraph "[^"]*" {\r//g' | \
tr '\r' '\n' > merged.dot
Expand All @@ -245,7 +245,7 @@ GpuTypeColumnVector needs refactoring prior externalization as of the time
of this writing:

```bash
$ dijkstra -d -p "com.nvidia.spark.rapids.GpuColumnVector (spark3xx-common)" merged.dot | \
$ dijkstra -d -p "com.nvidia.spark.rapids.GpuColumnVector (spark-shared)" merged.dot | \
grep '\[dist=' | grep '(spark330)'
"org.apache.spark.sql.rapids.GpuFileSourceScanExec (spark330)" [dist=5.000,
"com.nvidia.spark.rapids.GpuExec (spark330)" [dist=3.000,
Expand All @@ -255,9 +255,9 @@ $ dijkstra -d -p "com.nvidia.spark.rapids.GpuColumnVector (spark3xx-common)" mer
RegexReplace could be externalized safely:

```bash
$ dijkstra -d -p "org.apache.spark.sql.rapids.RegexReplace (spark3xx-common)" merged.dot | grep '\[dist='
"org.apache.spark.sql.rapids.RegexReplace (spark3xx-common)" [dist=0.000];
"org.apache.spark.sql.rapids.RegexReplace$ (spark3xx-common)" [dist=1.000,
$ dijkstra -d -p "org.apache.spark.sql.rapids.RegexReplace (spark-shared)" merged.dot | grep '\[dist='
"org.apache.spark.sql.rapids.RegexReplace (spark-shared)" [dist=0.000];
"org.apache.spark.sql.rapids.RegexReplace$ (spark-shared)" [dist=1.000,
```

because it is self-contained.
Expand Down
Loading