Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Think I can't connect to Sirius because of the new 5.0 version #33

Open
jsaintvanne opened this issue Mar 14, 2023 · 18 comments
Open

Think I can't connect to Sirius because of the new 5.0 version #33

jsaintvanne opened this issue Mar 14, 2023 · 18 comments
Labels
bug Something isn't working

Comments

@jsaintvanne
Copy link

Hi,

I would like to try your tool and workflow but when I launch your data test I obtain a lot of informations but the following also :

WARNING 10:47:05 - 4: Cannot parse retention time: 'NAs'
WARNING 10:47:05 - Could not load GrbSolver! Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3: gurobi/GRBException
WARNING 10:47:05 - Could not load CPLEXSolver! Sirius was compiled with the following ILP solvers: GLPK-v1.7.0 (included), Gurobi-v9.1.1, CPLEX-v12.7.1, COIN-OR-v1.17.3: ilog/concert/IloNumVar
WARNING 10:47:07 - Error when try to connect to Server. Try again in 4.0s 
 Cause: Connection reset
WARNING 10:47:08 - Error when try to connect to Server. Try again in 4.0s 
 Cause: Error when querying REST service. Bad Response Code: 404 | Message: Not Found| Content: <html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

And after it restart trying to connect in loop and never finish.... Think that Sirius closed all connection with 4.9 version...

Can someone help me please ?

@zmahnoor14
Copy link
Owner

Hello,

Yes the error is due to SIRIUS4 and the webservices linked to SIRIUS4 are not supported anymore.
We are currently working on adding the latest SIRIUS version (in a non-interactive way), however, the SIRIUS5 integration might take some time.

Could you let me know if you are using MAW to run your dataset and how big the size of your whole dataset (all .mzML files)? I can help with using the workflow interactively with SIRIUS5.

Mahnoor

@jsaintvanne
Copy link
Author

Hello,

Thanks for your answer it is most clear for me now why it doesn't work.

We have to test on 4 DDA pos files :
image

And 4 DDA neg files :
image

It is few things but it is just to test your tool to see what it is able to do

Should you tell me how you think to do ? First part in R, then extract to run Sirius GUI then again R part for the post Sirius steps ?
Thanks !

@zmahnoor14
Copy link
Owner

Alright. Here are few points that you can consider and let me know:

  1. If you only plan to use SIRIUS5, you can use the MAW-R module (without the spectral database dereplication function which takes substantial time, based on the size of your file). This option will only give .ms input files for SIRIUS.

  2. Do you also want to run your data against spectral DBs (HMDB, MassBank, GNPS) as well? If yes, you would need the three databases as R objects which I can share with you. In this option you run the whole MAW-R module. At the end you will have spectral database dereplication results AND .ms files that can be used as an input to SIRIUS.

Once you perform the MAW-R module, and have .ms files; I can also write quick script to run SIRIUS for all files instead of using the GUI, but that depends on your preference. Using the script would only be possible if you already have SIRIUS CLI installed in your system and you have already logged in using your credentials in terminal (also which OS you are using)?

Also, the workflow for now takes only one file (we are working on a different route for parallelisation), so you can only run one file at a time for now.

My next question would be: Would you like a docker container for this task?

Let me know your thoughts here or write to me on [email protected]

Hope this information helps.

Kind regards,
Mahnoor

@LiZhihua1982
Copy link

Dear Mahnoor,
I also meet this problem. "I can also write quick script to run SIRIUS for all files instead of using the GUI......" It is very useful! Thank you very much!

Best regards

Li Zhihua

@LiZhihua1982
Copy link

Hi I meet another maybe similar reason problem as below,
The following object is masked from ‘package:readr’:

parse_date

Error in unserialize(node$con) :
MultisessionFuture () failed to receive results from cluster RichSOCKnode #1 (PID 105 on ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: The total size of the 31 globals exported is 1.63 MiB. The three largest globals are ‘spec_dereplication_file’ (479.83 KiB of class ‘function’), ‘order’ (358.62 KiB of class ‘function’) and ‘sirius_param’ (114.21 KiB of class ‘function’)
Calls: future ... resolved -> resolved.ClusterFuture -> receiveMessageFromWorker
Execution halted

@LiZhihua1982
Copy link

Hi I have not found the file spectral_results_for_xxxx.csv in the folder spectral_dereplication
root@ed67702f821e:/opt/workdir/data/HY1/spectral_dereplication# ls
GNPS HMDB MassBank

@zmahnoor14
Copy link
Owner

Hello,

Please refer to the updated sections in the "provenance" branch README.md.

README.md link from provenance branch

I have mentioned SIRIUS5 as a separate section from MAW-R and following the steps to use SIRIUS5 should work now. It is important to make changes in the parameters to the function run_sirius according to your data. The function also only takes results from MAW-R which should be run previously to have .ms input files in the directory /file_Name/insilico/SIRIUS. And a list of the .ms files and their corresponding .json output files are written in a file called /file_Name/insilico/MS1DATA_SiriusP.tsv.

I hope this is clear enough to run SIRIUS5. Please let me know when you encounter any further issue.

@zmahnoor14
Copy link
Owner

Hi I have not found the file spectral_results_for_xxxx.csv in the folder spectral_dereplication root@ed67702f821e:/opt/workdir/data/HY1/spectral_dereplication# ls GNPS HMDB MassBank

I would assume that the function wasn't finished because this file is generated after the function is finished. Did you encounter any error message or do you think the function was interrupted?

@zmahnoor14
Copy link
Owner

Hi I meet another maybe similar reason problem as below, The following object is masked from ‘package:readr’:

parse_date

Error in unserialize(node$con) : MultisessionFuture () failed to receive results from cluster RichSOCKnode #1 (PID 105 on ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: The total size of the 31 globals exported is 1.63 MiB. The three largest globals are ‘spec_dereplication_file’ (479.83 KiB of class ‘function’), ‘order’ (358.62 KiB of class ‘function’) and ‘sirius_param’ (114.21 KiB of class ‘function’) Calls: future ... resolved -> resolved.ClusterFuture -> receiveMessageFromWorker Execution halted

If you re run the workflow, do you still get this error? Generally it should work, as it is an error from the parallelisation of the workflow. If you re run and the error persists, let me know.

@LiZhihua1982
Copy link

Hi,
Yes I also this errors both in MAW-r:1.0.0 and NAW-r:1.0.1
root@9e99c13fa386:/opt/workdir# Rscript --no-save --no-restore --verbose Workflow_R_Script.r >outputFile.txt
running
'/usr/local/lib/R/bin/R --no-echo --no-restore --no-save --no-restore --file=Workflow_R_Script.r'

Loading required package: foreach
Loading required package: iterators
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:future’:

values

The following objects are masked from ‘package:base’:

expand.grid, I, unname

Loading required package: BiocParallel
Loading required package: ProtGenerics

Attaching package: ‘ProtGenerics’

The following object is masked from ‘package:stats’:

smooth

Attaching package: ‘Spectra’

The following object is masked from ‘package:ProtGenerics’:

addProcessing

Attaching package: ‘MsCoreUtils’

The following objects are masked from ‘package:Spectra’:

bin, smooth

The following objects are masked from ‘package:ProtGenerics’:

bin, smooth

The following object is masked from ‘package:stats’:

smooth

Attaching package: ‘dplyr’

The following object is masked from ‘package:MsCoreUtils’:

between

The following objects are masked from ‘package:S4Vectors’:

first, intersect, rename, setdiff, setequal, union

The following objects are masked from ‘package:BiocGenerics’:

combine, intersect, setdiff, union

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

Attaching package: ‘rvest’

The following object is masked from ‘package:readr’:

guess_encoding

Loading required package: Rcpp
Using libcurl 7.68.0 with OpenSSL/1.1.1f

Attaching package: ‘curl’

The following object is masked from ‘package:readr’:

parse_date

Error in unserialize(node$con) :
MultisessionFuture () failed to receive results from cluster RichSOCKnode #1 (PID 99 on ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: The total size of the 31 globals exported is 1.64 MiB. The three largest globals are ‘spec_dereplication_file’ (486.30 KiB of class ‘function’), ‘order’ (358.62 KiB of class ‘function’) and ‘sirius_param’ (114.21 KiB of class ‘function’)
Calls: future ... resolved -> resolved.ClusterFuture -> receiveMessageFromWorker
Execution halted

@zmahnoor14 zmahnoor14 added the bug Something isn't working label Apr 7, 2023
@LiZhihua1982
Copy link

Dear Mahnoor,
Maybe this link http://gforge.se/2015/02/how-to-go-parallel-in-r-basics-tips/ will be useful for fixing this problem. Thanks.

@LiZhihua1982
Copy link

@zmahnoor14
Copy link
Owner

Sorry for the delayed response here, I am currently trying to fix all these issues with CWL, but it is still not ready yet. So to work around these issues here are some solutions:

  • Running spectral database dereplication: I understand the error is from future library and since my collaborator @lgadelha added this function, I will ask him for this possible error. While I run the files that I have here I don't get this error so, I would ask you to send me one mzML example so that I also take a look at it and try to reproduce the error [email protected]
    Meanwhile you can try to run the latest R script. You will need the following:
  1. Workflow_R_Script_all.r
  2. All Spectral databases downloaded from Zenodo
  3. your one input file .mzML
  4. COCONUT database for MetFrag
    Now, first you pull latest docker image and run the workflow script from the docker container. This docker container won;t have all these files, so I recommend keeping all the above files (1,2,3,4 points) in your working directory (or pwd).
    docker pull zmahnoor/maw-r:1.0.8 docker run --name checkMAW_R -v $(pwd):/opt/workdir -i -t zmahnoor/maw-r:1.0.8 /bin/bash
    This way, you will have all your files in your pwd(or working directory) mounted to /opt/workdir inside the container. Now from within the container you can write:
    Rscript Workflow_R_Script_all.r your_input.mzML gnps.rda hmdb.rda mbankNIST.rda any-string-as-id 15 coconut COCONUT_Jan2022.csv &
    This will give you a PID for this task within the container. You can disown id to be able to run it in the background. To leave the container you can CTRL+p and then CTRL+q to exit in daemon mode.
    Now your container will run in the background and write all the outputs in your pwd.

You will get the spectral database results, and in insilico folder you will find metfrag parameter files (if you want to use metfrag CLI) and SIRIUS parameter files (that you can use to run SIRIUS or SIRIUS CLI).

once you have these results, ideally you can use MAW-Py to perform candidate selection. I have already prepared scripts for candidate selection with Spectral DBs and Metfrag and but due to some changes in SIRIUS results, that is not currently possible. However, I will work on it after the MAW-CWL is ready.

Let me know if this solves the problem. If the future library problem persists, can you send me an example file, and your systems RAM? I will try to recreate the error and solve it with Luiz.

Kind regards,
Mahnoor

@LiZhihua1982
Copy link

LiZhihua1982 commented May 1, 2023 via email

@LiZhihua1982
Copy link

LiZhihua1982 commented May 1, 2023 via email

@LiZhihua1982
Copy link

LiZhihua1982 commented May 3, 2023 via email

@LiZhihua1982
Copy link

LiZhihua1982 commented May 3, 2023 via email

@zmahnoor14
Copy link
Owner

Dear Li Zhihua,

sorry for my late response, I am still a PhD student and currently trying to finish my projects and have a lot of work on hands : )

Here is the link to the COCONUT installation: https://upload.uni-jena.de/data/6459b9bd932378.80487188/COCONUT_Jan2022.csv

I would also suggest to use the updated HMDB5.0 version and here is the link for the download: https://upload.uni-jena.de/data/6459ba4f540612.50374105/hmdb.rda

Regarding Workflow_R_Script_all.r, you can download the latest one on the GitHub repository with the link: https://github.com/zmahnoor14/MAW/blob/provenance/cwl/Workflow_R_Script_all.r

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants