Skip to content

Commit

Permalink
Implement target distance metrics (#230)
Browse files Browse the repository at this point in the history
* refactor: Update hashing logic to use TargetHash and TargetDigest

The new fields in TargetHash and TargetDigest are to track the direct srcs/attrs hash separately from the overall hash. Currently, the directHash is not used at all.

* Update RuleHasher to track separate digests

* Add ability to compute target distance metrics

* Add ability to dump dependency edges to json file.

* Update get-impacted-targets

Add e2e test for dump distances

TODO: It would be nice if there was a better e2e test here -- Ask
Maxwell how he generates the integration workspace data.

* Update documentation

* Move target type filtering to happen after impacted targets are computed

* Address review feedback

1. Output detailed hash data as json
2. Use a different serialization format for targethashes to avoid
   ambiguity

* Fix typo in variable name

* Add test workspace and e2e test

* remove printlns

* Missing bazelignore

* Remove e2e test that is duplicated by other one

* Update BUILD
  • Loading branch information
alex-torok authored Oct 15, 2024
1 parent 3fffd5f commit 43ac136
Show file tree
Hide file tree
Showing 33 changed files with 1,003 additions and 121 deletions.
1 change: 1 addition & 0 deletions .bazelignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
cli/src/test/resources/workspaces
47 changes: 47 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,46 @@ Open `bazel-diff-example.sh` to see how this is implemented. This is purely an e

* We run `bazel-diff` on the starting and final JSON hash filepaths to get our impacted set of targets. This impacted set of targets is written to a file.

## Build Graph Distance Metrics

`bazel-diff` can optionally compute build graph distance metrics between two revisions. This is
useful for understanding the impact of a change on the build graph. Directly impacted targets are
targets that have had their rule attributes or source file dependencies changed. Indirectly impacted
targets are that are impacted only due to a change in one of their target dependencies.

For each target, the following metrics are computed:

* `target_distance`: The number of dependency hops that it takes to get from an impacted target to a directly impacted target.
* `package_distance`: The number of dependency hops that cross a package boundary to get from an impacted target to a directly impacted target.

Build graph distance metrics can be used by downstream tools to power features such as:

* Only running sanitizers on impacted tests that are in the same package as a directly impacted target.
* Only running large-sized tests that are within a few package hops of a directly impacted target.
* Only running computationally expensive jobs when an impacted target is within a certain distance of a directly impacted target.

To enable this feature, you must generate a dependency mapping on your final revision when computing hashes, then pass it into the `get-impacted-targets` command.

```bash
git checkout BASE_REV
bazel-diff generate-hashes [...]

git checkout FINAL_REV
bazel-diff generate-hashes --depsFile deps.json [...]

bazel-diff get-impacted-targets --depsFile deps.json [...]
```

This will produce an impacted targets json list with target label, target distance, and package distance:

```text
[
{"label": "//foo:bar", "targetDistance": 0, "packageDistance": 0},
{"label": "//foo:baz", "targetDistance": 1, "packageDistance": 0},
{"label": "//bar:qux", "targetDistance": 1, "packageDistance": 1}
]
```

## CLI Interface

`bazel-diff` Command
Expand Down Expand Up @@ -355,6 +395,13 @@ Now you can simply run `bazel-diff` from your project:
bazel run @bazel_diff//:bazel-diff -- bazel-diff -h
```

## Learn More

Take a look at the following bazelcon talks to learn more about `bazel-diff`:

* [BazelCon 2023: Improving CI efficiency with Bazel querying and bazel-diff](https://www.youtube.com/watch?v=QYAbmE_1fSo)
* BazelCon 2024: Not Going the Distance: Filtering Tests by Build Graph Distance: Coming Soon

## Running the tests

To run the tests simply run
Expand Down
16 changes: 16 additions & 0 deletions cli/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,14 @@ kt_jvm_test(
runtime_deps = [":cli-test-lib"],
)

kt_jvm_test(
name = "TargetHashTest",
jvm_flags = ["-Djava.security.manager=allow"],
test_class = "com.bazel_diff.hash.TargetHashTest",
runtime_deps = [":cli-test-lib"],
)


kt_jvm_test(
name = "SourceFileHasherTest",
data = [
Expand Down Expand Up @@ -101,6 +109,7 @@ kt_jvm_test(
jvm_flags = ["-Djava.security.manager=allow"],
test_class = "com.bazel_diff.e2e.E2ETest",
runtime_deps = [":cli-test-lib"],
data = [":workspaces"],
)

kt_jvm_test(
Expand Down Expand Up @@ -130,3 +139,10 @@ kt_jvm_library(
"@bazel_diff_maven//:org_mockito_kotlin_mockito_kotlin",
],
)

filegroup(
name = "workspaces",
srcs = [
"src/test/resources/workspaces",
],
)
11 changes: 10 additions & 1 deletion cli/src/main/kotlin/com/bazel_diff/cli/GenerateHashesCommand.kt
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,14 @@ class GenerateHashesCommand : Callable<Int> {
)
var ignoredRuleHashingAttributes: Set<String> = emptySet()

@CommandLine.Option(
names = ["-d", "--depEdgesFile"],
description = ["Path to the file where dependency edges are written to. If not specified, the dependency edges will not be written to a file. Needed for computing build graph distance metrics. See bazel-diff docs for more details about build graph distance metrics."],
scope = CommandLine.ScopeType.INHERIT,
defaultValue = CommandLine.Parameters.NULL_VALUE
)
var depsMappingJSONPath: File? = null

@CommandLine.Option(
names = ["-m", "--modified-filepaths"],
description = ["Experimental: A text file containing a newline separated list of filepaths (relative to the workspace) these filepaths should represent the modified files between the specified revisions and will be used to scope what files are hashed during hash generation."]
Expand All @@ -159,14 +167,15 @@ class GenerateHashesCommand : Callable<Int> {
cqueryCommandOptions,
useCquery,
keepGoing,
depsMappingJSONPath != null,
fineGrainedHashExternalRepos,
),
loggingModule(parent.verbose),
serialisationModule(),
)
}

return when (GenerateHashesInteractor().execute(seedFilepaths, outputPath, ignoredRuleHashingAttributes, targetType, includeTargetType, modifiedFilepaths)) {
return when (GenerateHashesInteractor().execute(seedFilepaths, outputPath, depsMappingJSONPath, ignoredRuleHashingAttributes, targetType, includeTargetType, modifiedFilepaths)) {
true -> CommandLine.ExitCode.OK
false -> CommandLine.ExitCode.SOFTWARE
}.also { stopKoin() }
Expand Down
40 changes: 27 additions & 13 deletions cli/src/main/kotlin/com/bazel_diff/cli/GetImpactedTargetsCommand.kt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import com.bazel_diff.di.loggingModule
import com.bazel_diff.di.serialisationModule
import com.bazel_diff.interactor.CalculateImpactedTargetsInteractor
import com.bazel_diff.interactor.DeserialiseHashesInteractor
import com.bazel_diff.interactor.TargetTypeFilter
import org.koin.core.context.startKoin
import org.koin.core.context.stopKoin
import picocli.CommandLine
Expand Down Expand Up @@ -38,6 +39,14 @@ class GetImpactedTargetsCommand : Callable<Int> {
)
lateinit var finalHashesJSONPath: File

@CommandLine.Option(
names = ["-d", "--depEdgesFile"],
description = ["Path to the file where dependency edges are. If specified, build graph distance metrics will be computed from the given hash data."],
scope = CommandLine.ScopeType.INHERIT,
defaultValue = CommandLine.Parameters.NULL_VALUE
)
var depsMappingJSONPath: File? = null

@CommandLine.Option(
names = ["-tt", "--targetType"],
split = ",",
Expand All @@ -49,7 +58,7 @@ class GetImpactedTargetsCommand : Callable<Int> {
@CommandLine.Option(
names = ["-o", "--output"],
scope = CommandLine.ScopeType.LOCAL,
description = ["Filepath to write the impacted Bazel targets to, newline separated. If not specified, the targets will be written to STDOUT."],
description = ["Filepath to write the impacted Bazel targets to. If using depEdgesFile: formatted in json, otherwise: newline separated. If not specified, the output will be written to STDOUT."],
)
var outputPath: File? = null

Expand All @@ -66,21 +75,20 @@ class GetImpactedTargetsCommand : Callable<Int> {

validate()
val deserialiser = DeserialiseHashesInteractor()
val from = deserialiser.execute(startingHashesJSONPath, targetType)
val to = deserialiser.execute(finalHashesJSONPath, targetType)
val from = deserialiser.executeTargetHash(startingHashesJSONPath)
val to = deserialiser.executeTargetHash(finalHashesJSONPath)

val impactedTargets = CalculateImpactedTargetsInteractor().execute(from, to)

return try {
BufferedWriter(when (val path=outputPath) {
val outputWriter = BufferedWriter(when (val path = outputPath) {
null -> FileWriter(FileDescriptor.out)
else -> FileWriter(path)
}).use { writer ->
impactedTargets.forEach {
writer.write(it)
//Should not depend on OS
writer.write("\n")
}
})

return try {
if (depsMappingJSONPath != null) {
val depsMapping = deserialiser.deserializeDeps(depsMappingJSONPath!!)
CalculateImpactedTargetsInteractor().executeWithDistances(from, to, depsMapping, outputWriter, targetType)
} else {
CalculateImpactedTargetsInteractor().execute(from, to, outputWriter, targetType)
}
CommandLine.ExitCode.OK
} catch (e: IOException) {
Expand All @@ -101,5 +109,11 @@ class GetImpactedTargetsCommand : Callable<Int> {
"Incorrect final hashes: file doesn't exist or can't be read."
)
}
if (depsMappingJSONPath != null && !depsMappingJSONPath!!.canRead()) {
throw CommandLine.ParameterException(
spec.commandLine(),
"Incorrect dep edges file: file doesn't exist or can't be read."
)
}
}
}
5 changes: 3 additions & 2 deletions cli/src/main/kotlin/com/bazel_diff/di/Modules.kt
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ fun hasherModule(
cqueryOptions: List<String>,
useCquery: Boolean,
keepGoing: Boolean,
trackDeps: Boolean,
fineGrainedHashExternalRepos: Set<String>,
): Module = module {
val cmd: MutableList<String> = ArrayList<String>().apply {
Expand Down Expand Up @@ -61,8 +62,8 @@ fun hasherModule(
single { BazelClient(useCquery, fineGrainedHashExternalRepos) }
single { BuildGraphHasher(get()) }
single { TargetHasher() }
single { RuleHasher(useCquery, fineGrainedHashExternalRepos) }
single { SourceFileHasher(fineGrainedHashExternalRepos) }
single { RuleHasher(useCquery, trackDeps, fineGrainedHashExternalRepos) }
single<SourceFileHasher> { SourceFileHasherImpl(fineGrainedHashExternalRepos) }
single { ExternalRepoResolver(workingDirectory, bazelPath, outputPath) }
single(named("working-directory")) { workingDirectory }
single(named("output-base")) { outputPath }
Expand Down
9 changes: 7 additions & 2 deletions cli/src/main/kotlin/com/bazel_diff/hash/BuildGraphHasher.kt
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ class BuildGraphHasher(private val bazelClient: BazelClient) : KoinComponent {
ignoredAttrs: Set<String>,
modifiedFilepaths: Set<Path>
): Map<String, TargetHash> {
val ruleHashes: ConcurrentMap<String, ByteArray> = ConcurrentHashMap()
val ruleHashes: ConcurrentMap<String, TargetDigest> = ConcurrentHashMap()
val targetToRule: MutableMap<String, BazelRule> = HashMap()
traverseGraph(allTargets, targetToRule)

Expand All @@ -114,7 +114,12 @@ class BuildGraphHasher(private val bazelClient: BazelClient) : KoinComponent {
ignoredAttrs,
modifiedFilepaths
)
Pair(target.name, TargetHash(target.javaClass.name.substringAfterLast('$'), targetDigest.toHexString()))
Pair(target.name, TargetHash(
target.javaClass.name.substringAfterLast('$'),
targetDigest.overallDigest.toHexString(),
targetDigest.directDigest.toHexString(),
targetDigest.deps,
))
}
.filter { targetEntry: Pair<String, TargetHash>? -> targetEntry != null }
.collect(
Expand Down
20 changes: 10 additions & 10 deletions cli/src/main/kotlin/com/bazel_diff/hash/RuleHasher.kt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import org.koin.core.component.inject
import java.util.concurrent.ConcurrentMap
import java.nio.file.Path

class RuleHasher(private val useCquery: Boolean, private val fineGrainedHashExternalRepos: Set<String>) : KoinComponent {
class RuleHasher(private val useCquery: Boolean, private val trackDepLabels: Boolean, private val fineGrainedHashExternalRepos: Set<String>) : KoinComponent {
private val logger: Logger by inject()
private val sourceFileHasher: SourceFileHasher by inject()

Expand All @@ -28,31 +28,31 @@ class RuleHasher(private val useCquery: Boolean, private val fineGrainedHashExte
fun digest(
rule: BazelRule,
allRulesMap: Map<String, BazelRule>,
ruleHashes: ConcurrentMap<String, ByteArray>,
ruleHashes: ConcurrentMap<String, TargetDigest>,
sourceDigests: ConcurrentMap<String, ByteArray>,
seedHash: ByteArray?,
depPath: LinkedHashSet<String>?,
ignoredAttrs: Set<String>,
modifiedFilepaths: Set<Path>
): ByteArray {
): TargetDigest {
val depPathClone = if (depPath != null) LinkedHashSet(depPath) else LinkedHashSet()
if (depPathClone.contains(rule.name)) {
throw raiseCircularDependency(depPathClone, rule.name)
}
depPathClone.add(rule.name)
ruleHashes[rule.name]?.let { return it }

val finalHashValue = sha256 {
safePutBytes(rule.digest(ignoredAttrs))
safePutBytes(seedHash)
val finalHashValue = targetSha256(trackDepLabels) {
putDirectBytes(rule.digest(ignoredAttrs))
putDirectBytes(seedHash)

for (ruleInput in rule.ruleInputList(useCquery, fineGrainedHashExternalRepos)) {
safePutBytes(ruleInput.toByteArray())
putDirectBytes(ruleInput.toByteArray())

val inputRule = allRulesMap[ruleInput]
when {
inputRule == null && sourceDigests.containsKey(ruleInput) -> {
safePutBytes(sourceDigests[ruleInput])
putDirectBytes(sourceDigests[ruleInput])
}

inputRule?.name != null && inputRule.name != rule.name -> {
Expand All @@ -66,7 +66,7 @@ class RuleHasher(private val useCquery: Boolean, private val fineGrainedHashExte
ignoredAttrs,
modifiedFilepaths
)
safePutBytes(ruleInputHash)
putTransitiveBytes(ruleInput, ruleInputHash.overallDigest)
}

else -> {
Expand All @@ -75,7 +75,7 @@ class RuleHasher(private val useCquery: Boolean, private val fineGrainedHashExte
heuristicDigest != null -> {
logger.i { "Source file $ruleInput picked up as an input for rule ${rule.name}" }
sourceDigests[ruleInput] = heuristicDigest
safePutBytes(heuristicDigest)
putDirectBytes(heuristicDigest)
}

else -> logger.w { "Unable to calculate digest for input $ruleInput for rule ${rule.name}" }
Expand Down
13 changes: 9 additions & 4 deletions cli/src/main/kotlin/com/bazel_diff/hash/SourceFileHasher.kt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,12 @@ import org.koin.core.qualifier.named
import java.nio.file.Path
import java.nio.file.Paths

class SourceFileHasher : KoinComponent {
interface SourceFileHasher {
fun digest(sourceFileTarget: BazelSourceFileTarget, modifiedFilepaths: Set<Path> = emptySet()): ByteArray
fun softDigest(sourceFileTarget: BazelSourceFileTarget, modifiedFilepaths: Set<Path> = emptySet()): ByteArray?
}

class SourceFileHasherImpl : KoinComponent, SourceFileHasher {
private val workingDirectory: Path
private val logger: Logger
private val relativeFilenameToContentHash: Map<String, String>?
Expand Down Expand Up @@ -38,9 +43,9 @@ class SourceFileHasher : KoinComponent {
this.externalRepoResolver = externalRepoResolver
}

fun digest(
override fun digest(
sourceFileTarget: BazelSourceFileTarget,
modifiedFilepaths: Set<Path> = emptySet()
modifiedFilepaths: Set<Path>
): ByteArray {
return sha256 {
val name = sourceFileTarget.name
Expand Down Expand Up @@ -94,7 +99,7 @@ class SourceFileHasher : KoinComponent {
}
}

fun softDigest(sourceFileTarget: BazelSourceFileTarget, modifiedFilepaths: Set<Path> = emptySet()): ByteArray? {
override fun softDigest(sourceFileTarget: BazelSourceFileTarget, modifiedFilepaths: Set<Path>): ByteArray? {
val name = sourceFileTarget.name
val index = isMainRepo(name)
if (index == -1) return null
Expand Down
Loading

0 comments on commit 43ac136

Please sign in to comment.