-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test existing pipeline implementation #19
Comments
Hi @bintriz I am having a hard time to understand your sample_list.txt format. I am looking at your code in
|
The pipeline accepts both kinds of files, bam and fastq, as input. If the input file type is bam, it convert bam into fastq files and then proceed forward. 3 kinds of location are possible. At the first time, it only accepted synapse id as location as it used the synapse client as interface to download input files. Later, I added the interface to download files directly from the NDA using the aws client. And also added the function to use local input files. The location where output files to be uploaded is controlled as as parentid option. If that option is turned on, output files are uploaded to certain synapse folder which is specified as synapse id and deleted. Without this option specified, output files are stayed in local. In the case you use AWS, the occupied space will be charged. That's the reason I added this option as the final step. If you rephrase the README file under your point of view, it would be great! |
@bintriz I ran The command was
where
This failed with the following errors
The
|
Output file name is determined by sample name in the sample_list file. If you'd like to use fastq files as input, you should put R1 and R2 files together. This pipeline group all inputs by sample name. So, if you have multiple input files due separate library preparation or multiple sequencing runs, please just put all input files together and use the same sample name. The, the pipeline groups all files and makes one merged reprocessed bam file. |
Thanks for the explanations @bintriz! Make sure this information makes it into the readme. This is slightly out of scope, but allowing multiple input formats and moving logic into the code is not optimal. There should be separate steps for each task - if you start with |
If somebody can make it optimal, it would be good. I made it just working. |
@bintriz as you suggested I put together R1 and R2 files in the following ways but received the same error as before. See details below. The input files are in this Synapse folder (syn17931318). The first way I tried was using local FASTQ files:
The second way was with Synapse location:
|
In my environment, your sample file is working. Looks like the error means somehow the python library handling qsub doesn't parse job id well in your pcluster. I'll look at it. |
@bintriz would it help you if I gave you access to our AWS EC2 instance? |
That’s a good idea. I’ll send my public key in separate email. Please add it into ~/.ssh/authorized_keys. Then, I may be able to log in your AWS cluster.
… On Jan 29, 2019, at 1:22 PM, Attila Gulyás-Kovács ***@***.***> wrote:
@bintriz <https://github.com/bintriz> would it help you if I gave you access to our AWS EC2 instance?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#19 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOGeyEXczQZCoH1oFB5n9ZKspIz2Eimks5vIJ91gaJpZM4aEDzd>.
|
Hi Attila,
I found the reason. This error is due to that your SGE system wasn’t set up a parallel environment named “threaded” which this pipeline job scripts rely on. README already mentions it in https://github.com/bsmn/bsmn-pipeline#extra-set-up-for-sge <https://github.com/bsmn/bsmn-pipeline#extra-set-up-for-sge>. Please set this up and try again. It would work.
… On Jan 29, 2019, at 1:33 PM, Taejeong Bae ***@***.***> wrote:
That’s a good idea. I’ll send my public key in separate email. Please add it into ~/.ssh/authorized_keys. Then, I may be able to log in your AWS cluster.
> On Jan 29, 2019, at 1:22 PM, Attila Gulyás-Kovács ***@***.*** ***@***.***>> wrote:
>
> @bintriz <https://github.com/bintriz> would it help you if I gave you access to our AWS EC2 instance?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub <#19 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOGeyEXczQZCoH1oFB5n9ZKspIz2Eimks5vIJ91gaJpZM4aEDzd>.
>
|
@bintriz , I didn't know that AWS EC2 was also an SGE system therefore I ignored that part of the documentation since it seemed irrelevant for AWS EC2. |
@bintriz , in any case, I ran the code in the documentation but got the following error message. Can you advice what might have gone wrong? Thanks.
|
I realized that heredoc doesn't work with qconf -Ap. So, I separate it into two steps by creating a temp file. Try this below.
|
Hi @bintriz I did the qconf now with success, at least I didn't get any error messages. I submitted the same pair of small FASTQ files for mapping as before; see my previous note in this thread. As before, I called your This time I didn't get any error for running |
Hi @kdaily sorry if this is not the best place to report this kind issue but I got With the command line synapse client
With the python synapseclient:
|
genome_mapping.sh is just a submitter of jobs to SGE. The qstat command of SGE will give you the current job status. By the way, if you run the same samples twice with the same sample id, it would be problem. My job scripts rely on the sample id to create working directory. So, two sets of jobs with the same sample id would compete each other and try to overwrite files with the same file name. |
Thanks for the explanation @bintriz. Hopefully the two jobs won't mess up each other because they differ in their input: for the first job it's local files and for the second job it's files on Synapse. The fact that the files in the two location are copies of each other should doesn't matter, should it? |
Once the files are downloaded, all of the names of intermediate and result files are determined based on the sample id using it as a prefix. So, if you use the same sample id, it should be a problem. |
I see. I've just checked the two jobs with
The start of both jobs had 100 exit status as assessed by
I deleted both jobs and now I try to resubmit only one of them :) |
Hi @bintriz I've been checking the status of the mapping I submitted a week ago.
Something went wrong in
The logfile
|
The previous error was raised when I tried to map fastq files stored on the AWS EC2 instance. I deleted the pending job from the queue and started a new run but this time with the same fastq files stored on the BSMN scratch space on Synapse. |
Hi @bintriz I submitted two days ago the mapping jobs for the small test FASTQs that are stored on Synapse (the BSMN Scratch Space). The jobs are still in the waiting queue. What do you think, is two days waiting time in the queue is normal?
|
Hi @bintriz! The latest run also failed, but this time the error occurred in Apart from the error, the jobs were in Below are the details.
|
@attilagk I can't think of any reason other than misconfiguration that your jobs would not run. All nodes that you are requesting should be built immediately, unless you are using spot pricing for EC2 instances. The default for CFN (and ParallelCluster) is to use on demand nodes though. |
Exit code 100 is usually reserved for 'general error' which is defined by the application running. Looks like @bintriz is doing something with that here: https://github.com/bsmn/bsmn-pipeline/blob/master/genome_mapping/job_scripts/pre_1.download.sh#L5 |
I'm going to open this issue to track the work related to the first milestone, which is getting the existing pipeline working for other people besides @bintriz!
I'm going to assign it to @bintriz and @attilagk as they are the ones working the most on it at this point.
Please do open specific issues related to bugs, feature requests, etc. that arise.
The text was updated successfully, but these errors were encountered: