Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skipping BLAST because ncbiblast+ is not in PATH #159

Open
guanghua13 opened this issue Jan 17, 2024 · 4 comments
Open

Skipping BLAST because ncbiblast+ is not in PATH #159

guanghua13 opened this issue Jan 17, 2024 · 4 comments

Comments

@guanghua13
Copy link

guanghua13 commented Jan 17, 2024

v %>% foreignness_score(db = "human") %>% print
Removing temporary fasta files.
nmer
1: SIINFEKL
2: ILAKFLHWL
3: GILGFVFTL
Warning message:
In foreignness_score(., db = "human") :
Skipping BLAST because ncbiblast+ is not in PATH

Any help would be appreciated.
Thanks.

@marshelma
Copy link

I got the same issue, ncbiblast+ is installed:

(base) xma@xma-Precision-M4800:~/biosoft$ which blastn

return:
/home/xma/biosoft/ncbi-blast-2.16.0+/bin/blastn

@andrewrech
Copy link
Owner

Could you drop a debug(function_you_are_using) and then check what is in the path from R?

@andrewrech andrewrech reopened this Sep 19, 2024
@marshelma
Copy link

Could you drop a debug(function_you_are_using) and then check what is in the path from R?

Thank you for your reply! I check the path and the blast+ is not in the PATH from R. After added it, the output changed but still no score calculated:

v %>% foreignness_score(db = "human") %>% print
Generating FASTA to query.
Running blastp for homology to IEDB antigens.
Removing temporary fasta files.
nmer

1: SIINFEKL
2: ILAKFLHWL
3: GILGFVFTL

I also run the debug function, here is the output:

debugging in: foreignness_score(., db = "human")
debug: {
if (!db %chin% c("mouse", "human"))
stop("db must be "human" or "mouse"")
on.exit({
message("Removing temporary fasta files.")
try(list.files(pattern = "foreignness_score|blastp") %>%
file.remove())
})
if (suppressWarnings(system("which blastp 2> /dev/null",
intern = TRUE)) %>% length() == 0) {
warning("Skipping BLAST because ncbiblast+ is not in PATH")
return(data.table::data.table(nmer = v))
}
message("Generating FASTA to query.")
names(v) <- 1:length(v) %>% as.character()
sdt <- v %>% data.table::as.data.table() %>% .[, :=(nmer_id,
names(v))]
AA <- Biostrings::AAStringSet(v, use.names = TRUE)
Biostrings::writeXStringSet(AA, file = "foreignness_score_fasta.fa",
format = "fasta")
message("Running blastp for homology to IEDB antigens.")
ag_dir <- Sys.getenv("AG_DATA_DIR")
if (db == "mouse")
db <- paste0(ag_dir, "/Mu_iedb.fasta.pin")
if (db == "human")
db <- paste0(ag_dir, "/iedb.bdb.pin")
if (!file.exists(db)) {
warn <- paste("Skipping foreignness_score because BLAST database cannot be found.",
"To set a custom path to the antigen.garnish data folder",
"set environomental variable AG_DATA_DIR from the shell",
"or from R using Sys.setenv", "", "Re-download installation data:",
"$ curl -fsSL "https://s3.amazonaws.com/get.rech.io/antigen.garnish-2.3.0.tar.gz\" | tar -xvz",
"", "Documentation:", "https://neoantigens.rech.io",
sep = "\n")
warnings(warn)
return(data.table::data.table(nmer = v))
}
db <- paste("-db", db %>% stringr::str_replace("\.pin",
""), sep = " ")
threads <- Sys.getenv("AG_THREADS") %>% as.numeric(.)
if (threads == "" || threads %>% is.na()) {
threads <- parallel::detectCores()
}
if ((threads %>% as.numeric()) %>% is.na()) {
stop("Error reading environment variable AG_THREADS as numeric.")
}
system(paste0("blastp -query foreignness_score_fasta.fa ",
db, " -evalue 100000000 -matrix BLOSUM62 -gapopen 11 -gapextend 1 -out blastp_iedbout.csv -num_threads ",
threads, " -outfmt '10 qseqid sseqid qseq qstart qend sseq sstart send length mismatch pident evalue bitscore'"))
blastdt <- list.files(pattern = "iedbout\.csv")
if (length(blastdt) == 0) {
message("No blast output against IEDB returned.")
return(data.table::data.table(nmer = v))
}
if (all(file.info(blastdt)$size == 0)) {
message("No IEDB matches found by blast.")
return(data.table::data.table(nmer = v, foreignness_score = 0))
}
blastdt <- blastdt %>% data.table::fread()
blastdt %>% data.table::setnames(names(.), c("nmer_id", "IEDB_anno",
"nmer", "q_start", "q_stop", "WT.peptide", "s_start",
"s_end", "overlap_length", "mismatch_length", "pident",
"evalue", "bitscore"))
blastdt <- blastdt[, :=(nmer, nmer %>% stringr::str_replace_all(pattern = "-|\",
replacement = ""))] %>% .[, :=(WT.peptide, WT.peptide %>%
stringr::str_replace_all(pattern = "-|\
", replacement = ""))] %>%
.[!is.na(nmer) & !is.na(WT.peptide)]
blastdt <- blastdt[nmer %like% "^[ARNDCQEGHILKMFPSTWYV]+$" &
WT.peptide %like% "^[ARNDCQEGHILKMFPSTWYV]+$"]
if (nrow(blastdt) == 0) {
message(paste("No IEDB matches found with cannonical AAs, can't compute foreignness score...."))
return(data.table::data.table(nmer = v))
}
message("Summing IEDB local alignments...")
blastdt[, :=(SW, make_sw_alignment(nmer, WT.peptide))]
modeleR <- function(als, a = 26, k = 4.86936) {
be <- -k * (a - als)
sumexp <- sum(exp(be))
Zk <- 1 + sumexp
R <- sumexp/Zk
return(R)
}
blastdt[, :=(foreignness_score, SW %>% modeleR()), by = "nmer_id"]
fa <- Biostrings::readAAStringSet(db %>% stringr::str_replace("bdb$",
"fasta") %>% stringr::str_replace("^-db\ ", ""))
f <- fa %>% as.character()
names(f) <- names(fa)
blastdt[, :=(IEDB_anno, lapply(IEDB_anno, function(i) {
mv <- f[which(stringr::str_detect(pattern = stringr::fixed(i),
names(f)))]
mv <- mv[which(stringr::str_detect(pattern = stringr::fixed(WT.peptide),
mv))]
return(paste(names(mv), WT.peptide, collapse = "|"))
})), by = 1:nrow(blastdt)]
anndt <- blastdt[, .SD %>% unique(), .SDcols = c("nmer_id",
"nmer", "IEDB_anno", "foreignness_score", "SW")]
blastdt <- blastdt[, .SD %>% unique(), .SDcols = c("nmer_id",
"foreignness_score")]
sdt[, :=(nmer_id, as.character(nmer_id))]
blastdt[, :=(nmer_id, as.character(nmer_id))]
sdt <- merge(sdt, blastdt, by = "nmer_id")
sdt %>% data.table::setnames(".", "nmer")
anndt <- anndt[, :=(msw, max(SW)), by = "nmer_id"] %>%
.[SW == msw] %>% .[, .SD %>% unique(), .SDcols = c("nmer_id",
"IEDB_anno")]
anndt[, :=(IEDB_anno, paste(IEDB_anno %>% unique(), collapse = "|")),
by = "nmer_id"]
anndt %<>% unique
anndt[, :=(nmer_id, as.character(nmer_id))]
sdt <- merge(sdt, anndt, by = "nmer_id", all.x = TRUE)
sdt <- sdt[, .SD %>% unique(), .SDcols = c("nmer", "foreignness_score",
"IEDB_anno")]
return(sdt)
}

@leeprichman
Copy link
Collaborator

blastdt <- list.files(pattern = "iedbout\\.csv")
if (length(blastdt) == 0) {
message("No blast output against IEDB returned.")
return(data.table::data.table(nmer = v))
}

May explain whats happening here. BLAST may now be in path but no file is being output from blast to read. Not sure why the canned message is not displaying that explains this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants