Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try to move on from six.moves #36

Open
mpoelchau opened this issue Jun 4, 2020 · 3 comments
Open

try to move on from six.moves #36

mpoelchau opened this issue Jun 4, 2020 · 3 comments
Assignees

Comments

@mpoelchau
Copy link
Contributor

Do we still need the six.moves library for urlretrieve?

https://github.com/NAL-i5K/NAL_RNA_seq_annotation_pipeline/blob/update-rnannot/rnannot/RNAseq_annotate.py#L12

This might be a moot issue if we figure out a different, more reliable way to download SRA data.

@HsiuKangHuang
Copy link
Contributor

We can use wget, curl and urlretrieve to download sra files from NCBI. However, these ways are not recommended.
Downloading files via https is really slow and many sra samples will not work as the compression related reference genome is missing. (http://www.metagenomics.wiki/tools/short-read/ncbi-sra-file-format/wget-download)
We may get this kind of error: "name not found while resolving tree within virtual file system module - failed"

The better way to download sra files is using prefetch or fasterq-dump. (Both are included in sratoolkit)

  • Using prefetch can download the whole SRR file. The default output directory is /home/[user]/ncbi/public/sra/. The output directory can be changed by changing the SRA toolkit default directory configuration.
    However, I tried this on Ceres but nothing happened after I run prefetch command.

  • Using fasterq-dump can also download the SRR file and convert it to fastq format.
    I tried this but got this: Failed to call external service.

@HsiuKangHuang
Copy link
Contributor

HsiuKangHuang commented Jun 11, 2020

I found the way to slove this problem.
Before using prefetch and fasterq-dump, I need to make sure that sratoolkit configuration enabled remote access.(https://ncbi.github.io/sra-tools/install_config.html)
Run vdb-config -i and enabled remote access. (Press 'M' to go to main page. Then, press 'E' to enable it. Last, press 's' and 'x' to save and exit configuration page)

@mpoelchau
Copy link
Contributor Author

It works! I'd just suggest removing the six.moves requirement - https://github.com/NAL-i5K/NAL_RNA_seq_annotation_pipeline/blob/update-rnannot/rnannot/RNAseq_annotate.py#L12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants