-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Populate metadata should support other encondings than utf-8 #323
Comments
After a quick glance, I can't really see how populate_metadata.py is affected, as the respective code which is suspected to cause the issue is in populate_roi.py. |
This issue has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/populate-metadata-py-and-non-utf-8-csvs/64595/2 |
@dominikl It's maybe not the most natural place for the code to live, but |
@dominikl : Thanks for opening the issue again. Here is the diff of my modified I'm not sure if it makes sense to redefine this within the metadata script or if it would make sense to make this option available directly within 5,6d4
<
<
15d12
<
19d15
<
21d16
<
30c25,26
<
---
> import tempfile
> from past.utils import old_div
55d50
<
60a56,84
> class OwnFileProvider(DownloadingOriginalFileProvider):
>
> def get_original_file_data(self, original_file):
> """
> Downloads an original file to a temporary file and returns an open
> file handle to that temporary file seeked to zero. The caller is
> responsible for closing the temporary file.
> """
>
>
> self.raw_file_store.setFileId(original_file.id.val)
> temporary_file = tempfile.NamedTemporaryFile(mode='rt+',
> dir=str(self.dir),
> encoding="utf-8-sig")
> size = original_file.size.val
> size_new = 0
> for i in range((old_div(size, self.BUFFER_SIZE)) + 1):
> index = i * self.BUFFER_SIZE
> data = self.raw_file_store.read(index, self.BUFFER_SIZE)
> try:
> data_write = data.decode("utf-8").rstrip('\0')
> size_new += len(data_write.encode("utf-8-sig"))
> except UnicodeDecodeError:
> data_write = data.decode("latin-1").rstrip('\0')
> size_new += len(data_write.encode("utf-8-sig"))
> temporary_file.write(data_write)
> temporary_file.seek(0)
> temporary_file.truncate(size_new)
> return temporary_file
122c146
< provider = DownloadingOriginalFileProvider(conn)
---
> provider = OwnFileProvider(conn)
198a223,224
>
> |
populate_metadata.py assumes that the csv files are encoded with utf-8. It fails if that's not the case. Maybe there should an option to specify the encoding.
See https://forum.image.sc/t/populate-metadata-py-and-non-utf-8-csvs/64595
The text was updated successfully, but these errors were encountered: