Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

OOM while peeking data #173

Open
prat-z opened this issue Sep 20, 2017 · 18 comments
Open

OOM while peeking data #173

prat-z opened this issue Sep 20, 2017 · 18 comments

Comments

@prat-z
Copy link

prat-z commented Sep 20, 2017

Trying to use Tape, for a logging library, to persist logs, it is throwing OOM when calling peek(int max) . The individual data size is ~50 bytes, and calling the peek method with max = 20.

FATAL EXCEPTION: JSync thread Process: bla.bla, PID: 2396 java.lang.OutOfMemoryError: Failed to allocate a 570425356 byte allocation with 15877888 free bytes and 361MB until OOM at com.squareup.tape2.QueueFile$ElementIterator.next(QueueFile.java:549) at com.squareup.tape2.QueueFile$ElementIterator.next(QueueFile.java:514) at com.squareup.tape2.FileObjectQueue$QueueFileIterator.next(FileObjectQueue.java:93) at com.squareup.tape2.ObjectQueue.peek(ObjectQueue.java:58) at bla.bla.TapeCacheManager.getCache(TapeCacheManager.java:85) at bla.bla.JCacheSyncManager$2.run(JCacheSyncManager.java:124) at android.os.Handler.handleCallback(Handler.java:751) at android.os.Handler.dispatchMessage(Handler.java:95) at android.os.Looper.loop(Looper.java:154) at android.os.HandlerThread.run(HandlerThread.java:61)

@anismiles
Copy link

I am also facing the same issue. Any luck with a solution?

@prat-z
Copy link
Author

prat-z commented Sep 22, 2017

On testing on pre N devices, I found out that it works as expected. But the issue is recurring on many Android N devices, and no solution yet.

@NightlyNexus
Copy link
Contributor

Can you reliably reproduce this?
OOMs can be deceptive because the heavy memory usage may be elsewhere, but the OOM will show up later as something else begins allocating.

@f2prateek
Copy link
Collaborator

I do think it's possible Tape is at fault here. From the trace, Tape is trying to allocate 570425356 bytes (~500MB). Unless an entry in the queue is that big, it shouldn't be doing that.

@pforhan
Copy link
Contributor

pforhan commented Sep 22, 2017

Try pulling one entry off the queue at a time, just for fun.

I'd suggest debugging. Perhaps the item is getting corrupted somehow? 570 MB is an enormous allocation attempt. At line 548 see if current.length really is that big.

Likewise, debug while writing the entries in, and see if anything is too large. How big is the file on disk?

@f2prateek
Copy link
Collaborator

Also FWIW, we've seen this behaviour in analytics-android, where Tape tried to allocate a byte array of an insanely large size, even though we guard against adding data that is too big (https://github.com/segmentio/analytics-android/blob/master/analytics/src/main/java/com/segment/analytics/SegmentIntegration.java#L287-L289).

Try pulling one entry off the queue at a time, just for fun.

We tried doing that and it would print a few elements and then fail at a particular element (same problem of running out of memory when trying to allocate a byte array to read the data).

Unfortunately I'd exhausted the goodwill of the customer(s) running into the issue so I added some other application level fallbacks for this case and wasn't able to debug more.

If I had access to the corrupted file, I would have probably done something like modifying these few lines (https://github.com/square/tape/blob/master/tape/src/main/java/com/squareup/tape2/QueueFile.java#L548-L549) to figure out some more details.

try {
        Element current = readElement(nextElementPosition);
        byte[] buffer = new byte[current.length];
} catch (OutOfMemoryError e) {
        log(e);
        print("length in file", readFileChunk(nextElementPosition + Element.HEADER_LENGTH));
        print("element chunk in file", readFileChunk(nextElementPosition + Element.HEADER_LENGTH + { increase this in chunks of 10 bytes }));
}

@prat-z
Copy link
Author

prat-z commented Sep 22, 2017

@pforhan Tested with individual entry size ~50bytes and with only 10 entries in the queue, still crashing.

@pforhan
Copy link
Contributor

pforhan commented Sep 22, 2017

I'd recommend writing up a test project that can reproduce the issue. My suspicion is that your converter logic is actually writing way more than you thought you were. (One time we accidentally serialized our entire app data structure by a misplaced field or two.) If that's true then your file would also be this massive size though. @f2prateek's tweak would be good for debugging as well, if you don't want to step through individual lines.

I'd go further to also println / log out the size of all entries as they go into the queue.

@pforhan
Copy link
Contributor

pforhan commented Sep 22, 2017

If it's not that, then I suspect maybe we have some misalignment of entry headers or something causing us to read the length wrong.

We don't use this version of Tape yet, so I haven't seen this yet.

@f2prateek
Copy link
Collaborator

f2prateek commented Sep 22, 2017

Also for additional context, the reports I shared were actually from the old version. We haven't seen this in the new version yet (but we haven't rolled it out to as many folks yet).

@gavinwilliams
Copy link

We're having the same issue here, I've attached the queue file.
uploadcustomer.queue.txt

@gavinwilliams
Copy link

What's even weirder is that if I remove the special characters, I now get this exception...

java.lang.RuntimeException: Unable to create application com.itstheflashpack.flashpackamplify.MainApplication: java.io.IOException: File is truncated. Expected length: 2064261152, Actual length: 4032

@kcraigie
Copy link

Note: We ran into this exact issue then discovered we had made a change that began instantiating multiple FileObjectQueue's on the same file. This resource contention created the scenario described above (a tiny 4K queue file with a single task with a gargantuan size field).

@ber4444
Copy link

ber4444 commented May 16, 2019

same runtime crash here, showing up in crashlytics from time to time

@CarlesContell
Copy link

Same crash, I had to wrap the iterator.next() with a try/catch and delete the queue file (and assuming data loss) to avoid the app being stuck.

The error uploaded to crashlytics has this stacktrace:
Fatal Exception: java.lang.OutOfMemoryError: Failed to allocate a 1768515960 byte allocation with 12582912 free bytes and 382MB until OOM, max allowed footprint 14634304, growth limit 402653184
at com.squareup.tape2.QueueFile$ElementIterator.next(QueueFile.java:549)
at com.squareup.tape2.QueueFile$ElementIterator.next(QueueFile.java:514)
at com.squareup.tape2.FileObjectQueue$QueueFileIterator.next(FileObjectQueue.java:93)

But the queue file is actually 8,19KB.

My guess is the queue file gets corrupted and a register has its length attribute modified with a very high value that when it's trying to allocate the byte array crashes.

I am using the beta version (implementation 'com.squareup.tape2:tape:2.0.0-beta1')

@fangzhzh
Copy link

fangzhzh commented Feb 24, 2021

Yup, had been having the same issue, for a long time.
Now I guard all queue file operation in one thread, it still happen.

@jasells
Copy link

jasells commented Feb 21, 2024

What was the final solution to this? I am getting a similar OOM using Stripe.Terminal, which in turn uses this squareup.tape2 lib.

@w2ji
Copy link

w2ji commented Jun 11, 2024

I was able to reproduce this issue given the corrupted file uploaded by @gavinwilliams. It seems like when the file is corrupted, the length of the element returned at this line is extremely big, something like (2064261152), causing the next line to allocate large amount of memory resulting in OOM:

byte[] buffer = new byte[current.length];

Error:

Failed to allocate a 2064261168 byte allocation with 8199903 free bytes and 504MB until OOM\, target footprint 16399807\, growth limit 536870912, stack_trace=java.lang.OutOfMemoryError: Failed to allocate a 2064261168 byte allocation with 8199903 free bytes and 504MB until OOM\, target footprint 16399807\, growth limit 536870912

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests