Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DB serialization support #132

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

cretz
Copy link

@cretz cretz commented Dec 27, 2021

@FiloSottile
Copy link
Contributor

I was about to send a PR to expose the same functions!

IMHO, bundling the schema name and the serialization does not make much sense, though. The schema name is a context-specific choice, not a property of the serialization. For example, one can serialize the main database, and then deserialize it later as snapshot.


// Reopens the database as in-memory representation of given serialized bytes.
// The given *Serialized instance should remain referenced (i.e. not GC'd) for
// the life of the DB since the bytes within are referenced directly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not something the caller can guarantee, short of importing go4.org/unsafe/assume-no-moving-gc.

Generally, it's probably not a good idea to let C and Go memory cross the boundaries like this. It adds complexity including the need for the finalizers, and the invalid state of Serialized values after being used.

In Serialize, I'd always make a copy into Go memory unless SQLITE_SERIALIZE_NOCOPY is set. That way there is never a need for freeing the memory. If an application can't stand copies for performance reasons, they can use SQLITE_SERIALIZE_NOCOPY. Otherwise, they can probably tolerate two as well as one.

In Deserialize, I would always copy the input into sqlite3_malloc64'd memory, and force SQLITE_DESERIALIZE_FREEONCLOSE on. This is unfortunate in terms of memory usage, especially when using //go:embed which is my use case, and basically precludes the use of the szBuf > szDb feature, but it is the only safe API because we can't rely on Go memory staying still.

The good news is that this removes any need for the Serialized type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, yeah all my machinations here were to avoid copies, but it sounds like I can't have a stable pointer. So I guess we have to have copies.

I did this for a side project (https://github.com/cretz/temporal-sdk-go-advanced/tree/main/temporalsqlite for SQLite on https://temporal.io/) so it may be a few days until I can get to simplifying. If needing sooner or you have the bandwidth, feel free to steal the impl here and simplify w/ always-copy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually it might be worth waiting for https://go.dev/issue/46787. I wonder if we can provide the same API I describe above, and implement it with copies in Go 1.18, and with Pinner in Go 1.19.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

golang/go#46787 has landed.

@cretz
Copy link
Author

cretz commented Feb 23, 2022

IMHO, bundling the schema name and the serialization does not make much sense, though.

Agreed.

Comment on lines +47 to +50
// Bytes returns the serialized bytes. Do not mutate this value. This is only
// valid for the life of its receiver and should be copied for any other
// longer-term use.
func (s *Serialized) Bytes() []byte { return s.data }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer more robust API semantics for this. This is too error prone. Assume the user of this API is not going to read your docs. How can we make it fool proof. Consider that there is virtually no case where someone is going to instantiate a Serialized object and not copy the data. So why bother with the intermediary? Why even have a Serialized object at all?

IMO the ideal API should just accept a Writer where the serialized data is written, or accept a Reader where the serialized data can be read from so it can be instantiated into the database.

Comment on lines +53 to +57
if len(s.data) > 0 && s.shouldFreeData {
s.shouldFreeData = false
s.sqliteOwnsData = false
C.sqlite3_free(unsafe.Pointer(&s.data[0]))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complex logic with no comments. Why is this necessary?

// the database.
//
// https://www.sqlite.org/c3ref/serialize.html
func (conn *Conn) Serialize(schema string, flags ...SerializeFlags) *Serialized {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accept a writer and write the copied serialized data to it.

Comment on lines +107 to +111
const (
SQLITE_DESERIALIZE_FREEONCLOSE DeserializeFlags = C.SQLITE_DESERIALIZE_FREEONCLOSE
SQLITE_DESERIALIZE_RESIZEABLE DeserializeFlags = C.SQLITE_DESERIALIZE_RESIZEABLE
SQLITE_DESERIALIZE_READONLY DeserializeFlags = C.SQLITE_DESERIALIZE_READONLY
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too low level to expose in the Go API here. When we deserialize we are taking go memory with the serialized DB and malloc-ing a space of C memory for which sqlite can operate an in-memory database. I think the only option we should expose is READONLY and maybe RESIZEABLE. I think since we are managing the C memory, we should control FREEONCLOSE, which we should probably set, so we don't have to continue to manage that memory.

// The Serialized parameter should no longer be used after this call.
//
// https://www.sqlite.org/c3ref/deserialize.html
func (conn *Conn) Deserialize(s *Serialized, flags ...DeserializeFlags) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend this function signature:

Suggested change
func (conn *Conn) Deserialize(s *Serialized, flags ...DeserializeFlags) error {
func (conn *Conn) Deserialize(serialized io.Reader, size, additional int, flags ...DeserializeFlags) error {

And then use size+additional to malloc the C memory and then copy the contents of serialized into that memory.

Set FREEONCLOSE so we don't have to continue to manage that C memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants