Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser can fail for large files (>128 KiB) due to stale reference #19

Open
SydMontague opened this issue Aug 14, 2022 · 3 comments
Open

Comments

@SydMontague
Copy link
Collaborator

Description

When constructing a parser a std::istream referenced is used and stored. However, the lifecycle of the std::istream can be shorter than the parser, leaving it with a stale reference.

This won't affect small files (<= INPUTBUF_CAP), since those will be buffered on construction. However, with larger files it will continuously attempt to read from the stream. If the reference has become stale at that point it will result in an error.

This can for example happen when a Parser is constructed and stored as a class member with a locally scoped std::ifstream.

Sample Code

struct Test {
  std::unique_ptr<aria::csv::Parser> parser;
  
  Test(std::string path) {
    std::ifstream stream(path, std::ios::in);
    parser = std::make_unique<area::csv::Parser>(stream);
  }
};

Solutions/Workarounds

The problem can be avoided by making sure the std::istream doesn't go out of scope before the parser, making this error mostly a user error.

However, I think for a future revision it might be worthwhile to have the parser take possession of the passed stream, for example in form of a smart pointer.

Until then this issue might serve as help for the next poor fellow who falls into that trap. ;)

@ajtruckle
Copy link

@SydMontague
Do you know if your suggestions have been applied to the current codebase? If not, can you please show me what changes I need to make at my end to address this? Thank you. 🙏

@SydMontague
Copy link
Collaborator Author

The issue hasn't been directly addressed in the codebase yet. Doing so would require an API breakage of some kind.

If you're just using the library, simply make sure the istream you're passing to the CsvParser class never goes out of scope before you're done using the parser.
If you want to create a patch for the library itself then you'd have to pick one of the ways the class can take possession of the argument passed to it. The easiest way would be changing the member and constructor argument from std::istream& to std::shared_ptr<std::istream>.

There might be other solutions, but I'd need to invest a bit more brain power to think about their viability.

@ajtruckle
Copy link

@SydMontague That's fine then. In this case I have a simple need for reading a CSV and it is all done in the one function.

I did manage to fork a new branch with my static DetectUtf8_BOM method for the benefit of others (#22). I created a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants