Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data download error: time data does not match format #523

Open
junkoda opened this issue Dec 7, 2023 · 2 comments
Open

Data download error: time data does not match format #523

junkoda opened this issue Dec 7, 2023 · 2 comments

Comments

@junkoda
Copy link

junkoda commented Dec 7, 2023

Sorry for reopening this issue, but I encountered this issue again, now with non-English locale.

#484

The previous fix handled AM/PM in en_US locale, but still assumed that locale is English. At that time, I was not sure if python locale can be non-English, but indeed it is possible.

The error

$ kaggle competitions download -c blood-vessel-segmentation
time data 'Fri, 03 Nov 2023 22:06:42 GMT' does not match format '%a, %d %b %Y %H:%M:%S %Z'

kaggle==1.5.16
Ubuntu 22.04.3 LTS
Python 3.11.5 (miniconda)

$ echo $LC_TIME
ja_JP.UTF-8

Easy fix on user side is,

export LC_ALL=C

Reproduce

from datetime import datetime
import locale

print(locale.getlocale(locale.LC_TIME))
print(datetime.now().strftime('%a, %d %b %Y %H:%M:%S %Z'))

s = 'Fri, 03 Nov 2023 22:06:42 GMT'
t = datetime.strptime(s, '%a, %d %b %Y %H:%M:%S %Z')

Running this python script as locale_error.py gives:

$ python3 locale_error.py
('ja_JP', 'UTF-8')
木, 07 12月 2023 14:23:15
Traceback ...
ValueError: time data 'Fri, 03 Nov 2023 22:06:42 GMT' does not match format '%a, %d %b %Y %H:%M:%S %Z'

%a and %b in strptime() is locale dependent. Locale of my new computer is Japanese.

This could be reproduce with:

$ LC_TIME=ja_JP.utf8 python3 locale_error.py

if that locale is available in your OS:

$ locale -a
C.utf8
en_US.utf8
...
ja_JP.utf8

but probably not. To add locale, say German, in ubuntu,

$ sudo locale-gen de_DE.utf8
$ locale -a
...
de_DE.utf8
$ LC_TIME=de_DE.utf8 python3 locale_error.py
('de_DE', 'UTF-8')
Do, 07 Dez 2023 14:27:45
...
ValueError: time data 'Fri, 03 Nov 2023 22:06:42 GMT' does not match format '%a, %d %b %Y %H:%M:%S %Z'

If you get error with this python script, same error should be raised with the kaggle command:

$ LC_TIME=de_DE.utf8 kaggle competitions download -c blood-vessel-segmentation
time data 'Fri, 03 Nov 2023 22:06:42 GMT' does not match format '%a, %d %b %Y %H:%M:%S %Z'

This is certainly OS dependent. I confirmed with ubuntu 22.04 LTS and 18.04 LTS, but macOS Ventrua works in a different way; no error with macOS.

Fix

One way to fix is to change locale every time you call strptime

saved = locale.setlocale(locale.LC_ALL)
locale.setlocale(locale.LC_ALL, 'C')
t = datetime.strptime(s, '%a, %d %b %Y %H:%M:%S %Z')
locale.setlocale(locale.LC_ALL, saved)

An alternative can be a conversion of the string to ISO 8601 format manually and then use datetime.fromisoformat().

I don't know why no one has reported since July, but I hope this will help many Kagglers. I am using ubuntu in English and didn't expect LC_TIME to be Japanese; this must be happening in many countries. Thanks.

Jun

@ritikgupta65
Copy link

kindly assign this issue to me @junkoda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants