Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure a truncated file always uses EOFError #228

Open
maxnoe opened this issue May 5, 2021 · 5 comments
Open

Make sure a truncated file always uses EOFError #228

maxnoe opened this issue May 5, 2021 · 5 comments

Comments

@maxnoe
Copy link
Member

maxnoe commented May 5, 2021

Right now, a truncated file can raise many different types of exceptions, e.g. ValueError when np.frombuffer gets a too short buffer.

We should make sure that the same root cause always results in the same exception.

@mexanick
Copy link

Actually, there's more to it: since generator is used to iterate over the file, the iteration would stop if a file is corrupted (i.e. not truncated, but some record in the middle is erroneous). I'm not 100% sure whether it may happen, that one record in simtel file got corrupted, but the next is not, but in such case the iteration will stop and the remainder of the file won't be processed.
This perhaps can be resolved by yielding an exception instead of throwing. See e.g. https://stackoverflow.com/questions/11366892/handle-generator-exceptions-in-its-consumer

p.s. in current implementation it is also not possible to catch an exception like this:

for event in EventSource('event_file.simtel'):
    try:
        do_something(event)
    except MyException as e:
        print(f'Caught exception!: {e}')

due to the fact that exception will be raised in generator.

@maxnoe
Copy link
Member Author

maxnoe commented May 19, 2021

I'm not 100% sure whether it may happen, that one record in simtel file got corrupted, but the next is not,

This is not really possible, as simtel array files are a streaming data format. If something is broken, everything after that point will be garbage / impossible to judge if correct. You won't be able to find back to something working.

p.s. in current implementation it is also not possible to catch an exception like this:

That's how iterators work. If you want to be safe against that, do it like this:

with EventSource(...) as source:
    it = iter(source)
    while True:
        try:
            event = next(it)
        except StopIteration:
            break
        except MyException:
            pass

@kosack
Copy link

kosack commented Mar 8, 2023

This is not really possible, as simtel array files are a streaming data format. If something is broken, everything after that point will be garbage / impossible to judge if correct. You won't be able to find back to something working.

Are there no "sync markers"? Some event-wise binary formats allow re-syncing after corrupt data by seeking to the next start-of-event marker and re-starting the stream.

Though I'm less worried about that for simulations, and more for observations...

@maxnoe
Copy link
Member Author

maxnoe commented Mar 8, 2023

Are there no "sync markers"?

Yes, there are. But we never encountered such a case of a "broken in the middle" file. It was always until now truncated files where processing stopped due to some error

@kosack
Copy link

kosack commented Mar 8, 2023

Yeah, I think for simulations it's not worth worrying about as the files are smallish and can be re-generated. I will however propose a requirement for DL0...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants