Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rough draft of a database library, for discussion #76

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

qaphla
Copy link

@qaphla qaphla commented Oct 15, 2024

This is a first attempt at a database library, which is not intended to work as is (and includes correspondingly many assumes, and a few admits), but rather as a starting point for discussion of what such a library should look like. There are a few main design decisions that I think are reasonable, but which are open for discussion, and also several questions where I'm not sure what the best thing to do is at this point, which are more relevant for discussion.

Design

  • A database consists of a main state, containing a list of state IDs, each of which points to a state containing a single "row" of the database.

  • Rows may have an arbitrary (user-specified) type, and corresponding local predicates specifying what is a valid row and how rows may be updated (not yet implementable), along with the usual later and knowable lemmas.

  • The database as a whole can take in a global predicate, which will be used to build the predicate for the main state, specifying some property over the list of rows pointed to by the main state. Similar to the later and knowable lemmas, this comes with a pair of properties, making it possible to prove the global predicate for empty databases and when updating a database.

  • The interface allows us to:

    1. Initialize a database (as empty)
    2. Add new rows to an existing database
    3. Update an existing row in a database (selected by a unique identifier, currently just its state ID)
    4. Find a row satisfying a predicate in a database (if one exists). Currently, this finds the first such row, but could be altered to give all such rows.

    Open questions

    1. Currently, the database-global predicate is very general. This is, broadly, a good thing, but may overcomplicate common cases. Most of the time, I believe that the core property we want is that some portions of each database row are unique and immutable, and therefore can reliably be used as identifiers for rows, so that if we extract (via find, say) two rows from the database that agree on a unique field, then those two rows are the same, up to updates (e.g., they agree on all immutable fields, and the older is related to the newer by the (transitive closure of?) the row update predicate. It is also useful to have immutable portions of rows, but this can be expressed purely locally, as part of the row update predicate.
    2. Currently, of course, we do not have any conditions on how states may be updated, so the row update predicate is not implementable. One question that comes up with this predicate is what it should have access to --- of course, the old and new state, along with the principal (and potentially the state ID), as well as the current trace at time of update. Does it also need the trace at the time that the old state was set? This may simplify expressing some properties, but it is unclear if they are actually useful properties to express --- e.g., state 1 may be updated to state 2 only if there is an event ev with a particular form between when state 1 was set and when state 2 is set.
    3. How should we best get information about the global database predicate out when retrieving rows? It feels like the current form of the predicate is perhaps too general to get useful information from. This relates to 1 --- the more restricted the global predicate is, and the more aware of its form the database library is, the more we can encapsulate getting information out of it, but it may also make specifying the parameters of the database more unwieldy.
    4. In principle, we could store additional information in the main database, instead of just a list of state IDs --- for instance, we could use a different unique identifier for database rows, so that the state IDs used by the rows are not revealed to the user, but then this doesn't actually give any better guarantees than just using state IDs --- the state could be arbitrarily updated according to its predicates regardless.

#row_t:Type0 -> {|db_types row_t|} ->
{|protocol_invariants|} -> db_pred:db_predicate row_t -> prop
let has_db_invariants #row_t #db_t #invs db_pred =
let db_local_state : local_state db = local_state_db row_t in
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to avoid needing this, but for some reason, can't get typeclass inference to work property to find an instance of local_state db, probably because db doesn't depend on the row type. Maybe the db_types need to be split into two parts, one that depends on the row type and one that doesn't?

@qaphla
Copy link
Author

qaphla commented Oct 15, 2024

Another general question is if we want to include support for deletion in some form from databases. My impression is that in most cases, this causes additional problems without much benefit, because uniqueness becomes much less useful in the presence of deletions, but I would be curious to hear if someone has applications for it. Of course, we can "fake" deletion by having a "deleted" flag as part of a row, and always query only for rows that are not deleted, which may be sufficient for any such applications, without needing a change to the library itself.

@TWal
Copy link
Collaborator

TWal commented Oct 15, 2024

I will have a deeper look later, but overall it seems that this could replace DY.Lib.State.Map as it is more general? The corruption model of the database is more fine-grained than DY.Lib.State.Map which seems good to have.

@qaphla
Copy link
Author

qaphla commented Oct 15, 2024

I will have a deeper look later, but overall it seems that this could replace DY.Lib.State.Map as it is more general? The corruption model of the database is more fine-grained than DY.Lib.State.Map which seems good to have.

Yes, I believe that this is (will be, once it works) a strict generalization of DY.Lib.State.Map in its current form, particularly with regard to corruption, but also with more general search options. That said, it might be worth building a thin layer over it for the specific use case of a key-value map (or maybe a multi-key-value map, or similar), so that the user can, for instance, avoid needing to manually write down the global predicate that specifies that keys are unique, and other such simplifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants