Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

Batched remove() - Using remove() less frequently than peek() #189

Open
roytmana opened this issue Apr 15, 2018 · 6 comments
Open

Batched remove() - Using remove() less frequently than peek() #189

roytmana opened this issue Apr 15, 2018 · 6 comments

Comments

@roytmana
Copy link

Hi,

Sorry for positing a question here but I am not sure where else I can ask a question.
Is it safe to use remove() less frequently than peek (batch remove for every n peek operations)?
I am using your queue to implement persistent blocking queue (my take() operation would block till queue has any data in it) and I also do not need to remove() every taken (peeked) element immediately because I can tolerate replaying some processed but un-removed items during recovery from a failure. So for performance reasons I would prefer not to remove them immediately but every say 100 items - my queue processing happens immediately on new element showing up so it will be very rare to have more than one or two elements on the queue except if consumer can't process them in timely fashion or at all. So in my case I will have one remove() for practically every peek()
I did not try to measure what impact batched removes would have but given "rwd" file mode every operation is costly. On my fast machine with SSD drive I was only able to pump 1500 add() operations per second with tiny byte[20] payloads

your answer and your opinion on usefulness of batched removes is very much appreciated
Thank you,
Alex

@roytmana
Copy link
Author

roytmana commented Apr 15, 2018

I retract my question. I did not realize that pick()/iterate does not keep its own position in the queue - always start from head which is only advanced by remove. So it looks like in a multi-threaded environment there is no way to read from queue process data and remove of successful except keeping the lock for entire period from starting iteration to end of processing and removal. So if consumer for some reason stuck producers will not be able to put anything on the queue till the consumer released the lock.

I guess in case of a single consumer and multiple producers I still could lock, peek, unlock, process then lock and remove assuming the only one consumer performing any remove operation

@JakeWharton
Copy link
Collaborator

JakeWharton commented Apr 15, 2018 via email

@roytmana
Copy link
Author

Thanks @JakeWharton I have switched to a single consumer for the queue with fork/join multiple parallel processors and then removal from the queue when all processors are done.
I am curious what performance to expect on adding small items (20 bytes). For me it is pretty slow - about 1500 items/sec on fast machine with SSD. I can pump date to a remote database faster. I tried to turn antivirus off but does not seem to make much difference. What is your typical insertion rate with small items?

@swankjesse
Copy link
Contributor

Tape wasn't designed to be super high throughout; it was designed to be simple and safe.

If you want to ramp up throughput, encode items in batches. It's a more complicated API but you'll save fsync calls which dominates performance.

@roytmana
Copy link
Author

thank you @swankjesse I was thinking about it but adding does not lend itself to batching in my case - the whole point is to put it on a persistent queue immediately for reliability not to accumulate it in memory. As far as reading I do drain whole the queue and then remove all drained items at once but to avoid extra fsync but iteration I think reads records one by one. I do not know if there is anything else I can do beyond bulk removal.
I really liked Tape when I found it and would prefer not to roll my own solution although my case is simpler - I just need ability to recover queue on crash/restart and dumping overflow data if consumer is not available. I could do with an in memory queue backed up by append-only fixed record size file with head index that is moved on every N-th peek because I can tolerate replaying already processed transactions. Given all that I think I can make it good deal faster but it'll still have many corner cases to worry about and test...

@roytmana
Copy link
Author

sorry, the last question - when 2.0 release is to be expected? I am using snapshot maven but would prefer a release if possible

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants