Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VFSFile iVersion 3 methods, version 2 passthrough #418

Open
1 of 5 tasks
rogerbinns opened this issue Mar 24, 2023 · 8 comments
Open
1 of 5 tasks

VFSFile iVersion 3 methods, version 2 passthrough #418

rogerbinns opened this issue Mar 24, 2023 · 8 comments

Comments

@rogerbinns
Copy link
Owner

rogerbinns commented Mar 24, 2023

Context: https://groups.google.com/g/python-sqlite/c/IIpnmLGyhrE

Items to fix:

  • The doc is weak on details
  • You can't control whether the version 2 methods pass through or not
  • APSW shell should have .vfslist / .vfsinfo commands

Items to not do:

  • You can't implement the version 2 methods
  • The version 3 methods are not exposed or implementable
@Nikratio
Copy link

My use-case for this is that I want to track all writes to the main database file so that I can keep it in sync with a remote copy. I think this means that I can't allow any shared memory use. At the same time, I would otherwise like to forward all operations to the default VFSFile implementation.

I think Shm is used only for the WAL file (not the main database file) so this should work fine in practice. However, I would feel a lot safer if attempts to call xShm* would fail loudly. It would be great if there was a way to do that.

Would it be feasible to e.g. make a xShmMap = None definition in the VFSFile class translate to the corresponding sqlite3_io_methods element to be set to null?

@rogerbinns
Copy link
Owner Author

How strictly do you want to track writes? For example do you want to block local writes until remote is in sync, or do you just need to know something changed so you can eventually get around to it? If your use case is on the strict side, then there are already a variety of solutions out there like SQLiteCloud.

It is correct that you can't detect writes when shared memory is in use. (Technically you could by mprotect the area, have a signal handler to detect writes, and similar expensive schemes).

I will add an iVersion flag or similar to control which version of the interface is presented to SQLite. Something similar was done for virtual tables.

@Nikratio
Copy link

I want to replicate writes asynchronously, so no need to block.

Would setting iVersion=1 flag be the recommended solution to set the xShm* pointers to null? In that case, would it ever be possible to use xFetch (iVersion=3) while still not implementing xShm*?

(Not sure what xFetch is actually used for, just wondering)

@rogerbinns
Copy link
Owner Author

There is a small combinatorial problem due to 3 sets of methods and wanting some NULL, so that would be part of exactly how many parameters there are.

xFetch looks like a way of you owning the in memory storage. Regular xRead requires you to copy the data into a buffer SQLite provides. xFetch lets you return a pointer, avoiding that copy. The current SQLite VFS implementations only implement xFetch if mmap is enabled. But even that has issues - if the file size has changed then mremap can change the address of the mapping. The VFS keep a reference count of Fetch/UnFetch calls and only does mremap if the outstanding count is zero.

It does look like there is no sense in making it possible to implement the iVersion 2 & 3 calls in Python.

If you only need loose tracking, it would seem that a VFS approach is way overkill. Couldn't you just periodically poll the last modify timestamp on the database files and sync on those changing?

There is a also a tracing vfs.

@Nikratio
Copy link

I need to know which specific parts of the database are changing, so that I don't needlessly upload the entire file. So mtime doesn't quite do it.

I could use tracing_vfs, but to me that seems like overkill (why parse text messages for all VFS operations?)

@rogerbinns
Copy link
Owner Author

I don't know the specifics of your requirements, but I'd tend to go for a simpler more robust solution using rsync to transfer the files on change. rsync does per block checksums and then transfers changed blocks only. It would also handle the case of data moving from wal to the main file since the checksum would remain the same so no need to transfer copies of that block. An inotify style hook would then invoke it as needed.

I wouldn't expect you to use the tracing vfs as is, but rather hacking it down to exactly what you need in the most convenient way.

No matter what, the issue description currently had my thinking on what will and won't be implemented and i believe it will also work for your needs.

@Nikratio
Copy link

I'm confused about the tracing VFS idea. You're saying I should write a custom C VFS (based on tracing VFS) instead of doing this in Python through APSW?

(I can't use rsync because I'm dealing with a dumb REST backend, not a server under my control)

@rogerbinns
Copy link
Owner Author

APSW has to receive parameters in C, convert them to Python, convert back to C again to call the VFS being inherited from, then go through the conversions again with the result values and out parameters. It is doing a lot of work, when all you wanted to know was modified ranges! Hacking down the tracing vfs would leave you with a module having little footprint or code getting only the information you need.

It is a shame you can't update the REST backend. If it at least gave checksums for blocks then you could do the syncing without having to mess with VFS, and it would be far more resilient against transient network issues etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants