Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom log ids #8

Open
laurentsenta opened this issue Apr 8, 2017 · 4 comments
Open

Custom log ids #8

laurentsenta opened this issue Apr 8, 2017 · 4 comments

Comments

@laurentsenta
Copy link

konserve is pretty straightforward to use and to extend, thanks for sharing & maintaining it.

I use the append API a lot and it'd be more efficient for me to have ordering on the log ids.

From what I understand, it should be easy to pass a channel as a source of ids.
Because we go-locked on the current key, append operations can't be interleaved. append would preserve the order from the ids-source channel.

Do you see any problem with this approach?
I'd be happy to provide a patch with this feature.

@whilo
Copy link
Member

whilo commented Apr 8, 2017

I am not sure I understand what you are trying to do exactly, maybe I can clearify a few points from my side and we can figure something out together. First of all append could be part of a library on top of konserve, for me konserve just abstracts the bare minimum of durable IO that is necessary. Everything else should be implementable on top and it should be doable as a library against the konserve protocols. This keeps konserve extremely portable, but it has tradeoffs in regard of access to richer storage semantics provided by the underlying store.

I wrote append last summer to scale replikativ up, which worked extremely well. Having said that, it is not a very efficient implementation, at the moment we need 3 IO ops, which is bad. One thing that I desire for all datastructures built on top of konserve is that they are persistent in the Clojure sense. Do you propose to replace the ids with something non-random/hashed? In this case the user can easily screw up the log by using the same ids in two concurrent operations.

I am currently porting the hitchhiker tree to konserve. For me it solves most problems I have in this space. It even covers the append-log reasonably well, I suppose. You just use larger op-buffers and a smaller b-tree component. But even with default settings it has less IO than the current append operation. It allows you to insert elements with a key, so you could provide your id just by insertion into it and it gives you efficient range queries. It is still a bit of work to do, but I am confident that it will work in cljs soon and that I can keep the performance of the clj version. We will see... If you like to help, feel free to chime in :)

@laurentsenta
Copy link
Author

laurentsenta commented Apr 8, 2017

My use case is to store a WAL (among other things). I'd need to know whether a log has already been processed. I could build an index (more I/Os and more code) or simply use a custom function (f key) -> id. It's pretty common IMHO:
https://firebase.googleblog.com/2015/02/the-2120-ways-to-ensure-unique_68.html
https://engineering.instagram.com/sharding-ids-at-instagram-1cf5a71e5a5c

Hitchhiker tree would be overpowered in my case.

@whilo
Copy link
Member

whilo commented Apr 9, 2017

Ah, I think I might get what you are trying to do. Their id's behave like Datomic's squuid. I have built the same in the replikativ streaming API recently, where you have a channel which sends you ack messages after values have been applied to the stream. https://github.com/replikativ/replikativ/blob/master/src/replikativ/crdt/cdvcs/realize.cljc#L172

You want to keep the log fixed and provide the ids incrementally? Feel free to copy the code of append and provide a demo :)

@whilo
Copy link
Member

whilo commented Oct 5, 2017

@lsenta Have you managed to use append for your purpose?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants