Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run bigscape v2 #251

Merged
merged 57 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
d2dc38b
add BiG-SCAPE 2 to dependencies
adraismawur May 29, 2024
93848dd
add example config for bigscape 2
adraismawur May 29, 2024
9d0a2d0
implement running BiG-SCAPE
adraismawur May 29, 2024
6d40414
fix bigscape2 dependency
adraismawur May 31, 2024
02bc843
copy db file properly
adraismawur May 31, 2024
2351055
remove cluster arg
adraismawur May 31, 2024
1b43a31
run ruff formatter
adraismawur May 31, 2024
eb28d38
fix ruff check issues
adraismawur May 31, 2024
9293c64
ensure str for mypy static type checking
adraismawur Jun 3, 2024
f18ecb5
Merge branch 'dev' of github.com:NPLinker/nplinker into run-bigscape-v2
adraismawur Jun 6, 2024
4a0e86b
Move configuration to correct file
adraismawur Jun 7, 2024
7361a98
use os.path.join instead of string concat
adraismawur Jun 7, 2024
a329e43
fix merge mistake
adraismawur Jun 7, 2024
caf2711
remove extra bigscape 2 files
adraismawur Jun 14, 2024
525c707
add missing library
adraismawur Jun 14, 2024
2433d76
add validator for bigscape version
adraismawur Jun 14, 2024
8447272
add test for bigscape version
adraismawur Jun 14, 2024
f6330e9
fix typo
adraismawur Jun 14, 2024
bc096bf
Merge branch 'dev' of github.com:NPLinker/nplinker into run-bigscape-v2
adraismawur Jun 19, 2024
84095b7
add simple run testing
adraismawur Jun 19, 2024
21b4600
add test to check for nonextent input path
adraismawur Jun 19, 2024
a2b6eb8
add info to docstring
adraismawur Jul 15, 2024
c03f64a
add exception on invalid version number
adraismawur Jul 15, 2024
9e9758e
move log to after validation
adraismawur Jul 15, 2024
9e8c767
add version info to log
adraismawur Jul 15, 2024
e9f7345
use specific exception
adraismawur Jul 15, 2024
775cbf5
rework return codes and exceptions
adraismawur Jul 15, 2024
874ea3a
add wrong version test
adraismawur Jul 16, 2024
bd699de
add invalid path test for v2
adraismawur Jul 16, 2024
3189999
specify exception
adraismawur Jul 16, 2024
19e72f2
fix tests not correctly running
adraismawur Jul 16, 2024
a9c9cec
change imports to reflect style in other tests
adraismawur Jul 16, 2024
2164f6c
specify exception type
adraismawur Jul 16, 2024
92578fd
add minimal test data
adraismawur Jul 16, 2024
3ac3e91
add real data tests
adraismawur Jul 16, 2024
0db25d8
remove class
adraismawur Jul 16, 2024
65fa549
force string for mypy
adraismawur Jul 16, 2024
3096bcc
Apply suggestions from code review
adraismawur Jul 17, 2024
d4cf769
add exceptions to docstring
adraismawur Jul 17, 2024
c00d59c
add docstring to tests
adraismawur Jul 17, 2024
18b2317
use tmp path instead of data path
adraismawur Jul 17, 2024
8a356a5
add missing typing
adraismawur Jul 17, 2024
5726f22
add explanation of cluster mode
adraismawur Jul 17, 2024
aab5e69
parameterize tests
adraismawur Jul 17, 2024
1d6da60
remove two gbks
adraismawur Jul 17, 2024
7914f3a
better documentation
adraismawur Jul 17, 2024
88c1f19
skip tests with dataset
adraismawur Jul 17, 2024
195c791
do not check output code within run
adraismawur Jul 17, 2024
6cc45d1
move log
adraismawur Jul 17, 2024
4a288a9
add test with incorrect parameters for runtime exception
adraismawur Jul 17, 2024
a24454c
remove temporary nplinker.toml
adraismawur Jul 17, 2024
a4b3a46
add stderr to error log
adraismawur Jul 17, 2024
84eb933
add import needed for skipping test on CI
adraismawur Jul 17, 2024
69f7674
Apply suggestions from code review
adraismawur Jul 17, 2024
5cadd45
expand docstring
adraismawur Jul 17, 2024
1efc8fd
Apply suggestions from code review
adraismawur Jul 17, 2024
bdd1f8e
fix ruff complaints
adraismawur Jul 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions bin/install-nplinker-deps
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ pip install -q -U pip setuptools
echo "🔥 Start installing BigScape ..."
[[ -d BiG-SCAPE ]] || git clone https://github.com/medema-group/BiG-SCAPE.git
cd BiG-SCAPE
git reset --hard
CunliangGeng marked this conversation as resolved.
Show resolved Hide resolved
git config --add advice.detachedHead false # disable advice
git config pull.ff only
git checkout master
Expand All @@ -136,6 +137,20 @@ echo "🔥 Start installing BigScape ..."
chmod 775 Annotated_MIBiG_reference
ln -sf $LIB_PATH/BiG-SCAPE/bigscape.py $PY_PATH/bin
cd ..
# blob size limit to remove large files left in history
[[ -d BiG-SCAPE-v2 ]] || git clone -b dev --filter=blob:limit=10m https://github.com/medema-group/BiG-SCAPE.git BiG-SCAPE-v2
cd BiG-SCAPE-v2
git config --ad advice.detatchedHead false
git checkout 99a4c2e4923bb50e175b2e619c2cee0a14918789 # Commits on Jun 14, 2024
pip install click
pip install sqlalchemy
pip install pyhmmer
pip install tqdm
chmod 754 bigscape.py
ln -sf $LIB_PATH/BiG-SCAPE-v2/bigscape.py $PY_PATH/bin/bigscape-v2.py
cd ..


echo -e "✅ BigScape installed successfully\n"

#--- Install FastTree (not support Windows, required by BigScape)
Expand Down
18 changes: 18 additions & 0 deletions nplinker.toml
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# NPLinker default configurations

[log]
level = "INFO"
use_console = true

[mibig]
to_use = true
version = "3.1"

[bigscape]
version = 1
parameters = "--mibig --clans-off --mix --include_singletons --cutoffs 0.30"

cutoff = "0.30"

[scoring]
methods = ["metcalf"]
31 changes: 22 additions & 9 deletions src/nplinker/arranger.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,21 +282,34 @@ def _run_bigscape(self) -> None:
default BiG-SCAPE directory.
"""
self.bigscape_running_output_dir.mkdir(exist_ok=True, parents=True)

version = self.config.bigscape.version

run_bigscape(
self.antismash_dir,
self.bigscape_running_output_dir,
self.config.bigscape.parameters,
version,
)
for f in glob(
str(
self.bigscape_running_output_dir
/ "network_files"
/ "*"
/ "mix"
/ "mix_clustering_c*.tsv"

if version == 1:
for f in glob(
str(
self.bigscape_running_output_dir
/ "network_files"
/ "*"
/ "mix"
/ "mix_clustering_c*.tsv"
)
):
shutil.copy(f, self.bigscape_dir)
elif version == 2:
shutil.copy(
self.bigscape_running_output_dir / "data_sqlite.db",
self.bigscape_dir,
)
):
shutil.copy(f, self.bigscape_dir)
else:
raise ValueError(f"Invalid BiG-SCAPE version: {version}")

def arrange_strain_mappings(self) -> None:
"""Arrange the strain mappings file.
Expand Down
1 change: 1 addition & 0 deletions src/nplinker/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ def load_config(config_file: str | PathLike) -> Dynaconf:
# BigScape
Validator("bigscape.parameters", required=True, is_type_of=str),
Validator("bigscape.cutoff", required=True, is_type_of=str),
Validator("bigscape.version", required=True, is_type_of=int),
# Scoring
## `scoring.methods` must be a list of strings and must contain at least one of the
## supported scoring methods.
Expand Down
5 changes: 5 additions & 0 deletions src/nplinker/data/nplinker.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,9 @@ version = "3.1"

[bigscape]
# The parameters to use for running BiG-SCAPE.
# Version of BiG-SCAPE to run. Make sure to change the parameters property below as well
# when changing versions.
version = 1
# Required bigscape parameters are `--mix`, `--include_singletons` and `--cutoffs`. NPLinker needs
# them to run the analysis properly.
# Parameters that must NOT exist: `--inputdir`, `--outputdir`, `--pfam_dir`. NPLinker will
Expand All @@ -51,6 +54,8 @@ version = "3.1"
# `mibig.version` to the version of mibig in bigscape.
# The default value is "--mibig --clans-off --mix --include_singletons --cutoffs 0.30".
parameters = "--mibig --clans-off --mix --include_singletons --cutoffs 0.30"
# for version 2, use the following parameters string:
# parameters = "--mibig_version 3.1 --include_singletons --gcf_cutoffs 0.30"
# Which bigscape cutoff to use for NPLinker analysis.
# There might be multiple cutoffs in bigscape output.
# Note that this value must be a string.
Expand Down
75 changes: 61 additions & 14 deletions src/nplinker/genomics/bigscape/runbigscape.py
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,65 @@ def run_bigscape(
antismash_path: str | PathLike,
output_path: str | PathLike,
extra_params: str,
):
bigscape_py_path = "bigscape.py"
logger.info(
f'run_bigscape: input="{antismash_path}", output="{output_path}", extra_params={extra_params}"'
)
version: int = 1,
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
) -> bool:
"""Runs BiG-SCAPE to cluster BGCs.

The behavior of this function is slightly different depending on the version of
BiG-SCAPE that is set to run using the configuration file.
Mostly this means a different set of parameters is used between the two versions.
adraismawur marked this conversation as resolved.
Show resolved Hide resolved

The AntiSMASH output directory should be a directory that contains GBK files.
The directory can contain subdirectories, in which case BiG-SCAPE will search
recursively for GBK files.

By default, only GBK Files with "cluster" or "region" in the filename are
accepted. GBK Files with "final" in the filename are excluded.
adraismawur marked this conversation as resolved.
Show resolved Hide resolved

Args:
antismash_path: Path to the antismash output directory.
output_path: Path to the output directory where BiG-SCAPE will write its results.
extra_params: Additional parameters to pass to BiG-SCAPE.
version: The version of BiG-SCAPE to run. Can be 1 or 2.
adraismawur marked this conversation as resolved.
Show resolved Hide resolved

Returns:
True if BiG-SCAPE ran successfully, False otherwise.
"""
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
# switch to correct version of BiG-SCAPE
if version == 1:
bigscape_py_path = "bigscape.py"
elif version == 2:
bigscape_py_path = "bigscape-v2.py"
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
else:
raise ValueError("Unexpected BiG-SCAPE version number specified")
adraismawur marked this conversation as resolved.
Show resolved Hide resolved

try:
subprocess.run([bigscape_py_path, "-h"], capture_output=True, check=True)
except Exception as e:
raise Exception(f"Failed to find/run bigscape.py (path={bigscape_py_path}, err={e})") from e
raise FileNotFoundError(
f"Failed to find/run bigscape.py (path={bigscape_py_path}, err={e})"
adraismawur marked this conversation as resolved.
Show resolved Hide resolved
) from e

if not os.path.exists(antismash_path):
raise Exception(f'antismash_path "{antismash_path}" does not exist!')
raise FileNotFoundError(f'antismash_path "{antismash_path}" does not exist!')

# configure the IO-related parameters, including pfam_dir
args = [bigscape_py_path, "-i", antismash_path, "-o", output_path, "--pfam_dir", PFAM_PATH]
logger.info(f"Running BiG-SCAPE version {version}")
logger.info(
f'run_bigscape: input="{antismash_path}", output="{output_path}", extra_params={extra_params}"'
)

# assemble arguments. first argument is the python file
args = [bigscape_py_path]

# version 2 points to specific Pfam file, version 1 points to directory
# version 2 also requires the cluster subcommand
if version == 1:
args.extend(["--pfam_dir", PFAM_PATH])
elif version == 2:
args.extend(["cluster", "--pfam_path", os.path.join(PFAM_PATH, "Pfam-A.hmm")])
adraismawur marked this conversation as resolved.
Show resolved Hide resolved

# add input and output paths. these are unchanged
args.extend(["-i", str(antismash_path), "-o", str(output_path)])

# append the user supplied params, if any
if len(extra_params) > 0:
Expand All @@ -39,9 +82,13 @@ def run_bigscape(
logger.info(f"BiG-SCAPE command: {args}")
result = subprocess.run(args, stdout=sys.stdout, stderr=sys.stderr, check=True)
logger.info(f"BiG-SCAPE completed with return code {result.returncode}")
# use subprocess.CompletedProcess.check_returncode() to test if the BiG-SCAPE
# process exited successfully. This throws an exception for non-zero returncodes
# which will indicate to the PODPDownloader module that something went wrong.
result.check_returncode()

return True
# return true on any non-error return code
if result.returncode == 0:
return True

# otherwise log details and raise a runtime error
logger.error(f"BiG-SCAPE failed with return code {result.returncode}")
logger.error(f"output: {str(result.stdout)}")
adraismawur marked this conversation as resolved.
Show resolved Hide resolved

raise RuntimeError(f"Failed to run BiG-SCAPE with error code {result.returncode}")
2 changes: 2 additions & 0 deletions src/nplinker/nplinker_default.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ to_use = true
version = "3.1"

[bigscape]
version = 1
parameters = "--mibig --clans-off --mix --include_singletons --cutoffs 0.30"

cutoff = "0.30"

[scoring]
Expand Down
Loading
Loading