-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XLSX files are not detected correctly #87
Comments
Ah, thanks for identifying this. We've had a few similar reports, mostly involving DOCX files. Do you find that re-uploading the same file resolves it? Perhaps we could use |
Hm no, re-uploading doesn't seem to change anything.
Sounds good to me! |
I think that re-uploading works for us because we're also using https://github.com/qld-gov-au/ckanext-resource-type-validation/ This fix will likely be prioritised, as it's affecting our clients. |
Thanks a lot! I can confirm it works! In case this is useful for others: the XLS icon was still not displayed for me, and format was still set to |
BTW, another issue for us is that XLSX files aren't pushed to the CKAN DataStore (anymore). This seems to be due to xlrd removing support for XLSX files: ckan/datapusher#232 Are you using datapusher or xloader to submit tabular data to the DataStore? |
We use XLoader. I see you've raised this there, at ckan/ckanext-xloader#161 As far as I can see, it's not related to ckanext-s3filestore? But if you want to coordinate with us about it, we do have an XLoader fork, https://github.com/qld-gov-au/ckanext-xloader |
It's not related to ckanext-s3filestore at all. Thanks, I will check out your fork of XLoader. |
Hi all, XLSX files are detected as Zip files when ckanext-s3filestore is active. The reason is, that the extension only looks at the first 512 bytes of the uploaded file: https://github.com/qld-gov-au/ckanext-s3filestore/blob/cf0c5bd/ckanext/s3filestore/uploader.py#L545
The part that differentiates a XLSX from a regular Zip comes later (when, depends on the file as well):
Since the amount of bytes you have to read before python-magic determines it's a XLSX changes with XLSX size/complexity, you probably have to pass the whole file to be sure it works reliably.
CKAN by default tries to look at the file extension to determine the mimetype (config
ckan.mimetype_guess = file_ext
), or reads the whole file (ckan.mimetype_guess = file_contents
): https://github.com/ckan/ckan/blob/0a596b8/ckan/lib/uploader.py#L274What do you think is the best way to fix this? To behave the same way as CKAN does, or stick to
magic.from_buffer
but read the whole file? Performance wise that probably won't do any harm. Or usemagic.from_file/from_descriptor
?The text was updated successfully, but these errors were encountered: