Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource Can't load data from some file-like objects #1675

Open
LincolnPuzey opened this issue Aug 26, 2024 · 2 comments
Open

Resource Can't load data from some file-like objects #1675

LincolnPuzey opened this issue Aug 26, 2024 · 2 comments
Labels
feature New functionality

Comments

@LincolnPuzey
Copy link

frictionless version: 5.17.0
python version: 3.12
platform: linux

The docs here https://framework.frictionlessdata.io/docs/schemes/stream.html say data can be loaded from file-like objects,

this does work for the example given when the object is an opened file:

from frictionless import Resource

with open("example.csv", "rb") as f:
    resource = Resource(
        source=f,
        format="csv",
        encoding="utf-8",
)
print(resource.read_rows())

# prints [{'A': 1, 'B': 2}]

However for classes like BytesIO and SpooledTemporaryFile it doesn't work

e.g. Bytes IO

from io import BytesIO
from frictionless import Resource

file_like_object = BytesIO(b"A,B\n1,2")
resource = Resource(
    source=file_like_object,
    format="csv",
    encoding="utf-8",
)
print(resource.read_rows())

Error:

Traceback (most recent call last):
    print(resource.read_rows())
          ^^^^^^^^^^^^^^^^^^^^
  File "python3.12/site-packages/frictionless/resources/table.py", line 423, in read_rows
    with helpers.ensure_open(self):
  File "python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "python3.12/site-packages/frictionless/helpers/general.py", line 97, in ensure_open
    thing.open()
  File "python3.12/site-packages/frictionless/resources/table.py", line 161, in open
    self.__open_parser()
  File "python3.12/site-packages/frictionless/resources/table.py", line 178, in __open_parser
    self.__parser.open()
  File "python3.12/site-packages/frictionless/system/parser.py", line 95, in open
    self.__loader = self.read_loader()
                    ^^^^^^^^^^^^^^^^^^
  File "python3.12/site-packages/frictionless/system/parser.py", line 126, in read_loader
    return loader.open()
           ^^^^^^^^^^^^^
  File "python3.12/site-packages/frictionless/system/loader.py", line 107, in open
    self.__byte_stream = self.read_byte_stream()
                         ^^^^^^^^^^^^^^^^^^^^^^^
  File "python3.12/site-packages/frictionless/system/loader.py", line 137, in read_byte_stream
    byte_stream = self.read_byte_stream_create()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "python3.12/site-packages/frictionless/schemes/stream/loader.py", line 17, in read_byte_stream_create
    if not os.path.isfile(byte_stream.name):  # type: ignore
                          ^^^^^^^^^^^^^^^^
AttributeError: '_io.BytesIO' object has no attribute 'name'

e.g. SpooledTemporaryFile

from tempfile import SpooledTemporaryFile
from frictionless import Resource

file_like_object = SpooledTemporaryFile(max_size=1024, mode="w+b")
file_like_object.write(b"A,B\n1,2")
file_like_object.seek(0)

resource = Resource(
    source=file_like_object,
    format="csv",
    encoding="utf-8",
)

print(resource.read_rows())

Error mostly the same except for:

File "python3.12/site-packages/frictionless/schemes/stream/loader.py", line 17, in read_byte_stream_create
    if not os.path.isfile(byte_stream.name):  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen genericpath>", line 30, in isfile
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Is there reason frictionless expects the file object to have .name that is an actual file
in read_byte_stream_create?

Being able to pass a SpooledTemporaryFile would be particularly useful,
because FastAPI presents uploaded files as SpooledTemporaryFile instances.

@LincolnPuzey
Copy link
Author

With some further testing I found calling .rollover() on the SpooledTemporaryFile made it work,

e.g. this works

from tempfile import SpooledTemporaryFile
from frictionless import Resource

file_like_object = SpooledTemporaryFile(max_size=1024, mode="w+b")
file_like_object.write(b"A,B\n1,2")
file_like_object.seek(0)
file_like_object.rollover()  # force file to disk
resource = Resource(
    source=file_like_object,
    format="csv",
    encoding="utf-8",
)

print(resource.read_rows())
# prints [{'A': 1, 'B': 2}]

However it also appears that frictionless closes the file at some point, because when we try to

file_like_object.seek(0)
print(file_like_object.read())

after frictionless does its work, we get a ValueError: seek of closed file error.
This is not ideal since we want to use frictionless to process the file, then read it again to save it somewhere else.

@pierrecamilleri
Copy link
Collaborator

Thanks for the report and for the workaround. I agree that automatically closing the file may not be the best behaviour.

The part that crashes is part of an exception handling :

    if not os.path.isfile(byte_stream.name):  # type: ignore
       note = f"only local streams are supported: {byte_stream}"
       raise FrictionlessException(errors.SchemeError(note=note))

So supporting file-like in-memory data seems like a new feature to me.

However, in the meanwhile, I can at least : 

  • make sure that the right exception with explicit error message is raised
  • Update the docs to make it clearer.

@pierrecamilleri pierrecamilleri added the feature New functionality label Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality
Projects
None yet
Development

No branches or pull requests

6 participants
@LincolnPuzey @pierrecamilleri and others