Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try alternative strategy for extracting tags #4928

Merged
merged 88 commits into from
May 1, 2024
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
efe8588
Try alternative strategy for extracting tags
adamrtalbot Feb 15, 2024
ef0eaf0
Add check to prevent confirm pass if no tags present
adamrtalbot Feb 15, 2024
5a184d5
Fix some names for GHA steps
adamrtalbot Feb 15, 2024
692690f
Add fake change in fastp module
adamrtalbot Feb 15, 2024
7cac598
Add fake change in bam_sort_stats_samtools module
adamrtalbot Feb 15, 2024
54750f2
Add fake changes to pytest module and subworkflow
adamrtalbot Feb 15, 2024
b9b1834
Add debug step
adamrtalbot Feb 15, 2024
d588526
fixup
adamrtalbot Feb 15, 2024
73ce73a
Got some variable names wrong...
adamrtalbot Feb 15, 2024
37fb559
Add subworkflows back to nf-tests
adamrtalbot Feb 15, 2024
cc88375
revert confirm-pass if statement
adamrtalbot Feb 15, 2024
0908cad
use JQ to separate modules and subworkflows
adamrtalbot Feb 15, 2024
ae0e2e3
Fix quoting when parsing tags
adamrtalbot Feb 15, 2024
b06e4d2
coerce modules and subworkflows to lower case
adamrtalbot Feb 15, 2024
24c981c
Add specific linting checks to dependencies
adamrtalbot Feb 15, 2024
39ff1ce
Single vs double quotes again
adamrtalbot Feb 15, 2024
8b3d6b7
Some more variable fixing
adamrtalbot Feb 15, 2024
9e9b5cb
this might work
adamrtalbot Apr 5, 2024
065b312
Merge branch 'master' into new_tag_strategy
adamrtalbot Apr 5, 2024
2ddab9a
fixup
adamrtalbot Apr 5, 2024
4dd114a
yet another fixup
adamrtalbot Apr 5, 2024
bc20f2a
Correct modules -> subworkflows
adamrtalbot Apr 5, 2024
2819c54
Fake change to pytest module
adamrtalbot Apr 5, 2024
dc297b8
Correct dependency
adamrtalbot Apr 5, 2024
21cccaa
One more time
adamrtalbot Apr 5, 2024
57fa3b0
More sorting stuff out
adamrtalbot Apr 5, 2024
64c072b
Update modules/nf-core/gunzip/tests/main.nf.test
adamrtalbot Apr 5, 2024
073ae2b
Add fake change to module without nf-test so pytest ONLY is triggered
adamrtalbot Apr 5, 2024
c345358
Merge branch 'master' into new_tag_strategy
adamrtalbot Apr 5, 2024
b606403
Merge branch 'master' into new_tag_strategy
adamrtalbot Apr 9, 2024
acc5b39
fixup incorrect variable
adamrtalbot Apr 10, 2024
7eeb5da
Merge branch 'master' into new_tag_strategy
adamrtalbot Apr 10, 2024
c2ceb2d
Switch to using self-hosted Docker profile
adamrtalbot Apr 10, 2024
c2f09a8
Add dependency checker from Carson Miller
adamrtalbot Apr 11, 2024
6979760
fixup
adamrtalbot Apr 11, 2024
dfcff56
fix(.github/python/find_changed_files.py): detect nf files less greed…
adamrtalbot Apr 11, 2024
703631c
Revert to directories instead of files for module tagging
adamrtalbot Apr 11, 2024
b99e46a
Create optional returntype for Python script
adamrtalbot Apr 11, 2024
a888a91
Remove superfluous python cache
adamrtalbot Apr 11, 2024
c6a06fd
Merge branch 'master' into new_tag_strategy
adamrtalbot Apr 11, 2024
82b40d2
feat: replace custom python with reusable github action for detecting…
adamrtalbot Apr 15, 2024
5d902b5
Fix variable name in output paths
adamrtalbot Apr 15, 2024
2238fe6
Switch to new parents dir syntax
adamrtalbot Apr 15, 2024
2b94d38
prod to bump version of action
adamrtalbot Apr 15, 2024
aeca735
Merge branch 'master' into new_tag_strategy
adamrtalbot Apr 15, 2024
6846f96
Merge branch 'master' into new_tag_strategy
adamrtalbot Apr 16, 2024
3fed071
Merge branch 'master' into new_tag_strategy
adamrtalbot Apr 23, 2024
5a28a91
Update .github/workflows/test.yml
adamrtalbot Apr 23, 2024
173c667
Update .github/workflows/test.yml
adamrtalbot Apr 23, 2024
fc43815
Exclude more subworkflows from Conda
adamrtalbot Apr 23, 2024
9db4b22
Update .github/workflows/test.yml
adamrtalbot Apr 24, 2024
979fe8b
Update .github/workflows/test.yml
adamrtalbot Apr 24, 2024
cde646a
QUILT nf-test and bamlist (#5515)
LouisLeNezet Apr 23, 2024
62f3409
New module: gnu/split (#5428)
k1sauce Apr 24, 2024
a84c33e
Add Demuxem (#5504)
mari-ga Apr 24, 2024
d50427d
Fix AMRFinderPlus (#5521)
jasmezz Apr 24, 2024
951a2dd
fix stubs salmon (#5517)
Lucpen Apr 24, 2024
ab707a2
Update mcquant: Added nf-test and meta.yml information (#5507)
FloWuenne Apr 24, 2024
0b10545
Added repeatmodeler/builddatabase (#5416)
GallVp Apr 25, 2024
891aae4
bump wisecondorx to v1.2.7 (#5523)
nvnieuwk Apr 25, 2024
1f8a032
Add Freemuxlet to modules (#5520)
wxicu Apr 25, 2024
f18f965
chore(deps): update pre-commit hook astral-sh/ruff-pre-commit to v0.4…
renovate[bot] Apr 26, 2024
b2b2180
Updating sentieon to 202308.02 (#5525)
asp8200 Apr 26, 2024
c804f96
Fix stale action (#5530)
edmundmiller Apr 26, 2024
8ffdbb0
Changed tag from single quotes to double quotes (#5531)
LeonHafner Apr 26, 2024
880bd94
[TYPO] Align commas (#5380)
lrauschning Apr 27, 2024
1023de4
GFFREAD: updated to 0.12.7, added meta and fasta input/output (#5448)
GallVp Apr 29, 2024
896d83f
re delete gatk4/bedtointervallist from pytest_modules.yml after it wa…
vlebars Apr 29, 2024
cb2fed3
Update mosdepth to 0.3.8 (#5538)
asp8200 Apr 29, 2024
912e7fd
migrating flash to nf-test (#5470)
DavideBag Apr 29, 2024
f3fb8b7
add riboseq transcriptome bam to test config (#5537)
iraiosub Apr 29, 2024
7a22dfc
Update nanoplot/main.nf (#5460)
ashotmarg Apr 29, 2024
0c817a3
Added repeatmodeler/repeatmodeler (#5536)
GallVp Apr 30, 2024
4441c29
Blastdbcmd new module (#5482)
toniher Apr 30, 2024
7644668
Merge branch 'master' into new_tag_strategy
adamrtalbot Apr 30, 2024
17210f3
Remove fake changes
adamrtalbot Apr 30, 2024
9635f63
Use merge group base ref instead of PR base_ref when merging
adamrtalbot May 1, 2024
a6369df
fixup
adamrtalbot May 1, 2024
0f052bc
Use SHA instead of head ref
adamrtalbot May 1, 2024
2ec307f
fixup
adamrtalbot May 1, 2024
8e6cf9d
work
adamrtalbot May 1, 2024
3408db5
origin/
adamrtalbot May 1, 2024
f7273f8
cmon
adamrtalbot May 1, 2024
c9eb127
one more time with feeling
adamrtalbot May 1, 2024
454616a
fixup again
adamrtalbot May 1, 2024
65047b4
Worth a try
adamrtalbot May 1, 2024
7ef86b9
Increase fetch depth to 2
adamrtalbot May 1, 2024
06bf16b
Use SHA instead of head ref
adamrtalbot May 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
318 changes: 318 additions & 0 deletions .github/python/find_changed_files.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,318 @@
#!/usr/bin/env python

# This script is used to identify *.nf.test files for changed functions/processs/workflows/pipelines and *.nf-test files
# with changed dependencies, then return as a JSON list

import argparse
import json
import logging
import re
import yaml

from itertools import chain
from pathlib import Path
from git import Repo


def parse_args() -> argparse.Namespace:
"""
Parse command line arguments and return an ArgumentParser object.

Returns:
argparse.ArgumentParser: The ArgumentParser object with the parsed arguments.
"""
parser = argparse.ArgumentParser(
description="Scan *.nf.test files for function/process/workflow name and return as a JSON list"
)
parser.add_argument(
"-r",
"--head_ref",
required=True,
help="Head reference branch (Source branch for a PR).",
)
parser.add_argument(
"-b",
"--base_ref",
required=True,
help="Base reference branch (Target branch for a PR).",
)
parser.add_argument(
"-x",
"--ignored_files",
nargs="+",
default=[
".git/*",
".gitpod.yml",
".prettierignore",
".prettierrc.yml",
"*.md",
"*.png",
"modules.json",
"pyproject.toml",
"tower.yml",
],
help="List of files or file substrings to ignore.",
)
parser.add_argument(
"-i",
"--include",
type=Path,
default=None,
help="Path to an include file containing a YAML of key value pairs to include in changed files. I.e., return the current directory if an important file is changed.",
)
parser.add_argument(
"-l",
"--log-level",
choices=["DEBUG", "INFO", "WARNING", "ERROR"],
default="INFO",
help="Logging level",
)
parser.add_argument(
"-t",
"--types",
nargs="+",
choices=["function", "process", "workflow", "pipeline"],
default=["function", "process", "workflow", "pipeline"],
help="Types of tests to include.",
)
return parser.parse_args()


def read_yaml_inverted(file_path: str) -> dict:
"""
Read a YAML file and return its contents as a dictionary but reversed, i.e. the values become the keys and the keys become the values.

Args:
file_path (str): The path to the YAML file.

Returns:
dict: The contents of the YAML file as a dictionary inverted.
"""
with open(file_path, "r") as f:
data = yaml.safe_load(f)

# Invert dictionary of lists into contents of lists are keys, values are the original keys
# { "key": ["item1", "item2] } --> { "item1": "key", "item2": "key" }
return {value: key for key, values in data.items() for value in values}


def find_changed_files(
branch1: str,
branch2: str,
ignore: list[str],
) -> list[Path]:
"""
Find all *.nf.tests that are associated with files that have been changed between two specified branches.

Args:
branch1 (str) : The first branch being compared
branch2 (str) : The second branch being compared
ignore (list) : List of files or file substrings to ignore.

Returns:
list: List of files matching the pattern *.nf.test that have changed between branch2 and branch1.
"""
# create repo
repo = Repo(".")
# identify commit on branch1
branch1_commit = repo.commit(branch1)
# identify commit on branch2
branch2_commit = repo.commit(branch2)
# compare two branches
diff_index = branch1_commit.diff(branch2_commit)

# Start empty list of changed files
changed_files = []

# For every file that has changed between commits
for file in diff_index:
# Get pathlib.Path object
filepath = Path(file.a_path)
# If file does not match any in the ignore list, add containing directory to changed_files
if not any(filepath.match(ignored_path) for ignored_path in ignore):
changed_files.append(filepath)

# Uniqueify the results before returning for efficiency
return list(set(changed_files))


def detect_include_files(
changed_files: list[Path], include_files: dict[str, str]
) -> list[Path]:
"""
Detects the include files based on the changed files.

Args:
changed_files (list[Path]): List of paths to the changed files.
include_files (dict[str, str]): Key-value pairs to return if a certain file has changed. If a file in a directory has changed, it points to a different directory.

Returns:
list[Path]: List of paths to representing the keys of the include_files dictionary, where a value matched a path in changed_files.
"""
new_changed_files = []
for filepath in changed_files:
# If file is in the include_files, we return the key instead of the value
for include_path, include_key in include_files.items():
if filepath.match(include_path):
new_changed_files.append(Path(include_key))
return new_changed_files


def detect_nf_test_files(changed_files: list[Path]) -> list[Path]:
"""
Detects and returns a list of nf-test files from the given list of changed files.

Args:
changed_files (list[Path]): A list of file paths.

Returns:
list[Path]: A list of nf-test file paths.
"""
result: list[Path] = []
for path in changed_files:
# If Path is the exact nf-test file add to list:
if path.match("*.nf.test") and path.exists():
result.append(path)
# Else recursively search for nf-test files:
else:
# Get the enclosing dir so files in the same dir can be found.
# e.g.
# dir/
# ├─ main.nf
# ├─ main.nf.test
for file in path.parent.rglob("*.nf.test"):
result.append(file)
return result


def process_files(files: list[Path]) -> list[str]:
"""
Process the files and return lines that begin with 'workflow', 'process', or 'function' and have a single string afterwards.

Args:
files (list): List of files to process.

Returns:
list: List of lines that match the criteria.
"""
result = []
for file in files:
with open(file, "r") as f:
is_pipeline_test = True
lines = f.readlines()
for line in lines:
line = line.strip()
if line.startswith(("workflow", "process", "function")):
words = line.split()
if len(words) == 2 and re.match(r'^".*"$', words[1]):
result.append(line)
is_pipeline_test = False

# If no results included workflow, process or function
# Add a dummy result to fill the 'pipeline' category
if is_pipeline_test:
result.append("pipeline 'PIPELINE'")

return result


def convert_nf_test_files_to_test_types(
lines: list[str], types: list[str] = ["function", "process", "workflow", "pipeline"]
) -> dict[str, list[str]]:
"""
Generate a dictionary of function, process and workflow lists from the lines.

Args:
lines (list): List of lines to process.
types (list): List of types to include.

Returns:
dict: Dictionary with function, process and workflow lists.
"""
# Populate empty dict from types
result: dict[str, list[str]] = {key: [] for key in types}

for line in lines:
words = line.split()
if len(words) == 2 and re.match(r'^".*"$', words[1]):
keyword = words[0]
name = words[1].strip("'\"") # Strip both single and double quotes
if keyword in types:
result[keyword].append(name)
return result


def find_changed_dependencies(paths: list[Path], tags: list[str]) -> list[Path]:
"""
Find all *.nf.test files with changed dependencies from a list of paths.

Args:
paths (list): List of directories or files to scan.
tags (list): List of tags identified as having changes.

Returns:
list: List of *.nf.test files with changed dependencies.
"""

result: list[Path] = []

nf_test_files = detect_nf_test_files(paths)

# find nf-test files with changed dependencies
for nf_test_file in nf_test_files:
with open(nf_test_file, "r") as f:
lines = f.readlines()
# Get all tags from nf-test file
# Make case insensitive with .casefold()
tags_in_nf_test_file = [
tag.casefold().replace("/", "_")
for tag in convert_nf_test_files_to_test_types(lines, types=["tag"])[
"tag"
]
]
# Check if tag in nf-test file appears in a tag.
# Use .casefold() to be case insensitive
if any(
tag.casefold().replace("/", "_") in tags_in_nf_test_file for tag in tags
):
result.append(nf_test_file)

return result


if __name__ == "__main__":

# Utility stuff
args = parse_args()
logging.basicConfig(level=args.log_level)

# Parse nf-test files for target test tags
changed_files = find_changed_files(args.head_ref, args.base_ref, args.ignored_files)

# If an additional include YAML is added, we detect additional changed dirs to include
if args.include:
include_files = read_yaml_inverted(args.include)
changed_files = changed_files + detect_include_files(
changed_files, include_files
)
nf_test_files = detect_nf_test_files(changed_files)
lines = process_files(nf_test_files)
result = convert_nf_test_files_to_test_types(lines)

# Get only relevant results (specified by -t)
# Unique using a set
target_results = list(
{item for sublist in map(result.get, args.types) for item in sublist}
)

# Parse files to identify nf-tests with changed dependencies
changed_dep_files = find_changed_dependencies([Path(".")], target_results)

# Combine target nf-test files and nf-test files with changed dependencies
# Go back one dir so we get the module or subworkflow path
all_nf_tests = [
str(test_path.parent.parent) for test_path in set(changed_dep_files + nf_test_files)
]

# Print to stdout
print(json.dumps(all_nf_tests))
Loading
Loading