Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipfile.BadZipFile: File is not a zip file #39

Open
Intellouis opened this issue Oct 10, 2023 · 0 comments
Open

zipfile.BadZipFile: File is not a zip file #39

Intellouis opened this issue Oct 10, 2023 · 0 comments
Assignees

Comments

@Intellouis
Copy link

When I tried to run "python local_nlp_evaluation.py" in https://gitlab.aicrowd.com/aicrowd/challenges/iglu-challenge-2022/iglu-2022-rl-mhb-baseline, it should download a dataset using IGLU gridworld (as it is in the code local_nlp_evaluation.py, line 29:
dataset = IGLUDataset(task_kwargs=None, force_download=False, ))

Then it would call the library function: ~/.conda/envs/iglu_mhb/lib/python3.9/site-packages/gridworld/data/iglu_dataset.py, line 181:
download(url=url, destination=path, data_prefix=data_path, description='downloading multiturn dataset')

Then it would call: ~/.conda/envs/iglu_mhb/lib/python3.9/site-packages/gridworld/data/load.py, line 9:

def download(url, destination, data_prefix, description='downloading dataset into'):
    os.makedirs(data_prefix, exist_ok=True)
    r = requests.get(url, stream=True)
    CHUNK_SIZE = 1048576
    total_length = int(r.headers.get('content-length'))
    with open(destination, "wb") as f:
        with tqdm(desc=description, 
                  total=(total_length // CHUNK_SIZE) + 1) as pbar:
            for chunk in r.iter_content(chunk_size=CHUNK_SIZE): 
                if chunk: # filter out keep-alive new chunks
                    f.write(chunk)
                    pbar.update(1)

Here I checked the parameters in function download(...), and found that CHUNK_SIZE=1048576, total_length=248, (total_length // CHUNK_SIZE + 1)=1. Then I found that the zip file it downloaded occupies 248 Bytes, which is consistent with the value of total_length.
However, then it calls ~/.conda/envs/iglu_mhb/lib/python3.9/site-packages/gridworld/data/iglu_dataset.py, line 186:
with ZipFile(path) as zfile:
There was an error:

zipfile.BadZipFile: File is not a zip file

indicating that the downloaded zip file is incomplete, which verified my doubt that total_length is unexpectedly smaller than CHUNK_SIZE. I wonder how to solve this problem.

The whole traceback is as follows, for your reference:

(iglu_mhb) lab@lab-Precision-Tower-7910:~/Desktop/IGLU/codes/iglu-2022-rl-mhb-baseline$ python local_nlp_evaluation.py 

Invalid data stream
Loading parsed dataset failed. Downloading full dataset.
downloading multiturn dataset: 100%|███████████████| 1/1 [00:00<00:00, 799.98it/s]
Traceback (most recent call last):
  File "/home/lab/Desktop/IGLU/codes/iglu-2022-rl-mhb-baseline/local_nlp_evaluation.py", line 62, in <module>
    main()
  File "/home/lab/Desktop/IGLU/codes/iglu-2022-rl-mhb-baseline/local_nlp_evaluation.py", line 29, in main
    dataset = IGLUDataset(task_kwargs=None, force_download=False, )
  File "/home/lab/.conda/envs/iglu_mhb/lib/python3.9/site-packages/gridworld/data/iglu_dataset.py", line 139, in __init__
    self.download_dataset(data_path, force_download)
  File "/home/lab/.conda/envs/iglu_mhb/lib/python3.9/site-packages/gridworld/data/iglu_dataset.py", line 188, in download_dataset
    with ZipFile(path) as zfile:
  File "/home/lab/.conda/envs/iglu_mhb/lib/python3.9/zipfile.py", line 1266, in __init__
    self._RealGetContents()
  File "/home/lab/.conda/envs/iglu_mhb/lib/python3.9/zipfile.py", line 1333, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
@artemZholus artemZholus self-assigned this Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants