Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading slice of BAM recognized as complete but BAM download is incomplete #184

Open
JasperO98 opened this issue Jan 2, 2023 · 9 comments
Labels
bug Something isn't working

Comments

@JasperO98
Copy link

JasperO98 commented Jan 2, 2023

When downloading a slice of a BAM PyEGA downloads only a few Kb of data and then marks the download as complete.

Used versions

  • Operating System version: Ubuntu 22.04 LTS (5.10.102.1-microsoft-standard-WSL2) on Windows 10 Enterprise 22H2 (19045.2364)
  • Python version: 3.11.0
  • PyEGA3 version: PyEGA3-5.0.1

To Reproduce

Steps to reproduce the behaviour that led you to the bug.

  1. Trying to download BAM file EGAF00005572695.
pyega3 -c 10 -cf ../ega_credentials.json fetch --max-retries -1 --format BAM --output-dir . -r 19 EGAF00005572695;
  1. After a few minutes download stops and is marked as complete. pyega3_output.log

Additional context

I've already contacted the EGA helpdesk about this problem, but they couldn't resolve my issue. I already tried to check if my port 8443 was open and this was the case. Looking forward to your reply.

@JasperO98 JasperO98 added the bug Something isn't working label Jan 2, 2023
@aaclan-ebi
Copy link
Collaborator

Hi @JasperO98,

Were you referring to EGAF00005572698 or EGAF00005572695 ? The log file seems truncated as well. Do you still have the full logs with you?

I tried looking into this and the log says it was able to download the 221K bam file with accession EGAF00005572695 successfully. That file should contain all the lines matching the reference name 19 which was specified in the -r argument in the command. Do you still have the incomplete file you downloaded? Would you be able to send it to our helpdesk ([email protected]) for us to check?

Many thanks
Alegria

@JasperO98
Copy link
Author

Hey @aaclan-ebi,
Sorry I was referring to EGAF00005572695 and have updated the log file in the initial issue message.
Yes, the download is reported as completed but it downloaded of data 221 KB, but I'm expecting a file of 4 GB.
I've already send the incomplete file to the helpdesk, but they were not able to help me.

@aaclan-ebi
Copy link
Collaborator

I see, Thanks, @JasperO98 , we will investigate this further.

@CsabaHalmagyi
Copy link
Contributor

@JasperO98 Could you please confirm the issue still exist? We pushed updates to both the api and the client since this issue was created.

@JasperO98
Copy link
Author

Yes the problem still persists using PyEGA3=5.0.2
pyega3_output.log

@CsabaHalmagyi
Copy link
Contributor

CsabaHalmagyi commented Sep 29, 2023

@JasperO98 In the log you attached I do not see any sign that the client is actually downloading the file. Could you please check if you already have a directory with the name of EGAF00005572695 present in the output directory? If so, could you please delete the directory (the client checks if a subdirectory with the file id is already present at the download location with files, and if so, it won't attempt to download the file) and retry downloading the slice?

@JasperO98
Copy link
Author

I tried that just now and it results in the same issue.

@CsabaHalmagyi
Copy link
Contributor

Could you also try the following command (delete the previously downloaded data first)?

pyega3 -c 10 -cf ../ega_credentials.json fetch --max-retries -1 --format BAM --output-dir . -r chr19 EGAF00005572695

(Submitters might use different naming conventions when uploading files)

@JasperO98
Copy link
Author

Unfortunately that also does not work.

pyega3 -c 10 -cf ../ega_credentials.json fetch --max-retries -1 --format BAM --output-dir . -r chr19 EGAF00005572695
[2023-09-29 13:00:59 +0200]
[2023-09-29 13:00:59 +0200] pyEGA3 - EGA python client version 5.0.2 (https://github.com/EGA-archive/ega-download-client)
[2023-09-29 13:00:59 +0200] Parts of this software are derived from pyEGA (https://github.com/blachlylab/pyega) by James Blachly
[2023-09-29 13:00:59 +0200] Python version : 3.11.5
[2023-09-29 13:00:59 +0200] OS version : Linux #2311-Microsoft Tue Nov 08 17:09:00 PST 2022
[2023-09-29 13:00:59 +0200] Server URL: https://ega.ebi.ac.uk:8443/v2
[2023-09-29 13:00:59 +0200] Session-Id: 325751053
[2023-09-29 13:01:00 +0200]
[2023-09-29 13:01:00 +0200] Authentication success for user '[email protected]'
[2023-09-29 13:01:00 +0200] File Id: 'EGAF00005572695'(236223204951 bytes).
[2023-09-29 13:01:00 +0200] Total space : 7452.02 GiB
[2023-09-29 13:01:00 +0200] Used space : 6552.95 GiB
[2023-09-29 13:01:00 +0200] Free space : 899.07 GiB
Traceback (most recent call last):
  File "/home/jasper/anaconda3/envs/pyega/lib/python3.11/site-packages/htsget/io.py", line 96, in __get
    response.raise_for_status()
  File "/home/jasper/anaconda3/envs/pyega/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error:  for url: https://ega.ebi.ac.uk:8443/v2/htsget/reads/EGAF00005572695?referenceName=chr19&format=BAM

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jasper/anaconda3/envs/pyega/bin/pyega3", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/jasper/anaconda3/envs/pyega/lib/python3.11/site-packages/pyega3/pyega3.py", line 150, in main
    execute_subcommand(args, data_client)
  File "/home/jasper/anaconda3/envs/pyega/lib/python3.11/site-packages/pyega3/libs/commands.py", line 22, in execute_subcommand
    fetch_data(args, data_client)
  File "/home/jasper/anaconda3/envs/pyega/lib/python3.11/site-packages/pyega3/libs/commands.py", line 49, in fetch_data
    file.download_file_retry(num_connections=args.connections,
  File "/home/jasper/anaconda3/envs/pyega/lib/python3.11/site-packages/pyega3/libs/data_file.py", line 307, in download_file_retry
    htsget.get(
  File "/home/jasper/anaconda3/envs/pyega/lib/python3.11/site-packages/htsget/io.py", line 81, in get
    manager.run()
  File "/home/jasper/anaconda3/envs/pyega/lib/python3.11/site-packages/htsget/protocol.py", line 143, in run
    self.__retry(self._handle_ticket_request)
  File "/home/jasper/anaconda3/envs/pyega/lib/python3.11/site-packages/htsget/protocol.py", line 114, in __retry
    method(*args)
  File "/home/jasper/anaconda3/envs/pyega/lib/python3.11/site-packages/htsget/io.py", line 147, in _handle_ticket_request
    first_piece = next(stream, "").decode(encoding)
                  ^^^^^^^^^^^^^^^^
  File "/home/jasper/anaconda3/envs/pyega/lib/python3.11/site-packages/htsget/io.py", line 107, in _stream
    response = self.__get(url, headers=headers, stream=True, timeout=self.timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jasper/anaconda3/envs/pyega/lib/python3.11/site-packages/htsget/io.py", line 100, in __get
    raise exceptions.ClientError(str(he), response.text)
htsget.exceptions.ClientError: 404 Client Error:  for url: https://ega.ebi.ac.uk:8443/v2/htsget/reads/EGAF00005572695?referenceName=chr19&format=BAM:{"htsget":{"timestamp":"2023-09-29T11:01:01.063+00:00","url":"http://ega.ebi.ac.uk/v2/htsget/reads/EGAF00005572695","error":"NotFound","message":"Sequence \"chr19\" not found"}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants