Designing the Data Layer for Efficiency #20

emperorjm · 2023-09-18T17:11:22Z

emperorjm
Sep 18, 2023
Maintainer

Background:

When discussing data organization, those of us rooted in traditional software development naturally think of Relational Databases. We're familiar with tables, primary keys, defining relationships, and the relentless pursuit of data normalization. Contrasting this is the blockchain realm, where data predominantly lives in Key-Value stores. This setup draws parallels with NoSQL databases, but it's more reminiscent of tools like Redis, Cassandra, or Memcached.

The tech community today often regards relational structures as complex and outdated. However, in blockchain dapp design, there's no one-size-fits-all solution. Let's delve deeper.

Example:

Let’s look at an example. Suppose we have a multi-library system that tracks all the books in each library.

Relational Database Approach:

Here's a typical relational structure::

CREATE TABLE Library (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255),
    description TEXT
);

CREATE TABLE Location (
    id INT AUTO_INCREMENT PRIMARY KEY,
    library_id BIGINT,
    name VARCHAR(255),
    address VARCHAR(255),
    FOREIGN KEY (Library_id) REFERENCES Library(id)
);


CREATE TABLE Book (
    id INT AUTO_INCREMENT PRIMARY KEY,
    isbn VARCHAR(13) NOT NULL UNIQUE,
    title VARCHAR(255) NOT NULL,
);

CREATE TABLE BookLocation (
    book_id BIGINT,
    location_id BIGINT,
    library_id BIGINT,
    PRIMARY KEY (book_id, location_id), 
    FOREIGN KEY (book_id) REFERENCES Book(id),
    FOREIGN KEY (location_id) REFERENCES Location(id),
    FOREIGN KEY (library_id) REFERENCES Library(id)
);

NoSQL (Document-Oriented) Approach:

With NoSQL databases, such as MongoDB, the data presentation is denormalized, offering flexibility. We might consider two collections: one for libraries (with locations) and another for books. However, let's explore a unified library collection approach:

{
    "_id": ObjectId("some_id_for_library"),
    "title": "Library Name",
    "description": "Library Description",
    "locations": [
        {
            "location_id": ObjectId("some_id_for_location_1"),
            "name": "Location Name",
            "description": "Location Description",
            "address": "Location Address",
            "books": [
                {
                    "book_id": ObjectId("some_id_for_book_1"),
                    "name": "Book Name 1",
                    "description": "Book Description 1"
                },
                {
                    "book_id": ObjectId("some_id_for_book_2"),
                    "name": "Book Name 2",
                    "description": "Book Description 2"
                }
                // ... more books for this location
            ]
        },
        {
            "location_id": ObjectId("some_id_for_location_2"),
            "name": "Another Location Name",
            "description": "Another Location Description",
            "address": "Another Location Address",
            "books": [
                // ... books for this location
            ]
        }
        // ... more locations if any
    ]
}

Key-Value Database Approach:

What’s the ideal structure for a key-value database, like what we have in CosmWasm? Should we follow the traditional normalized structure, or should we simply put everything together, similar to what Document databases recommend (MongoDB, n.d.)? One might be tempted to do something like the following:

#[derive(Serialize, Deserialize, Clone, Debug, PartialEq)]
pub struct Library {
    pub id: u32,
    pub name: String,
    pub description: String,
    pub locations: Vec<Location>,
}

#[derive(Serialize, Deserialize, Clone, Debug, PartialEq)]
pub struct Location {
    pub id: u32,
    pub name: String,
    pub description: Option<String>,
    pub address: String,
    pub books: Vec<Book>,
}

#[derive(Serialize, Deserialize, Clone, Debug, PartialEq)]
pub struct Book {
    pub id: u32,
    pub name: String,
    pub description: String,
}

// Constants for storage keys
pub const LIBRARY: Map<u32, Library> = Map::new("library");
pub const LIBRARY_COUNTER: Item<u32> = Item::new("library_counter");
pub const LOCATION_COUNTER: Item<u32> = Item::new("location_counter");
pub const BOOK_COUNTER: Item<u32> = Item::new("book_counter");

Past experiences suggest that this structure may be inefficient for large data sets. For example, modifying just one book out of a million would require loading all one million book entries, leading to increased gas usage. Search operations could also become sluggish.

Questions:

Gas Consumption and Efficiency: Pulling extensive records in scenarios with vast datasets could inflate gas costs. If we have a library with a million books, fetching a single extensive record could be resource-intensive. Using Big O notation, this operation seems O(n^2) in complexity when it comes to searching and updating records.

Relational Mimicry in Key-Value Store: To update attributes like the library's name or location, it's inefficient to pull records with redundant data. Perhaps we should lean on our relational instincts and try to normalize the data to increase efficiency.
Something like the following could suffice:

pub const LIBRARY: Map<u32, Library> = Map::new("library");
pub const BOOK: Map<u32, Book> = Map::new("book");

Design Philosophy: Do you think the design principles applied in traditional databases can or should be directly translated into blockchain data design, or do we need to reconsider our foundational assumptions?

Scalability Concerns: How do you see the trade-offs between data normalization in relational databases and the flatter structures in key-value stores evolving as the volume of data on the blockchain grows?

Performance Metrics: Beyond gas consumption, what other performance metrics should we consider crucial when designing data structures for blockchain?

key-value stores generally lack the transactional integrity and query capabilities of relational databases, so you may need to implement additional logic to handle those aspects if they are necessary for your application. Our next discussion will touch on the effective use of Indexes.

References

MongoDB. (n.d.). Document Database - NoSQL. [online] Available at: https://www.mongodb.com/document-databases#:~:text=Document%20databases%20have%20the%20following [Accessed 18 Sep. 2023].

drewstaylor · 2024-02-08T17:05:00Z

drewstaylor
Feb 8, 2024
Maintainer

Thanks for this content @emperorjm

The suggested approach is very much the same as what I just wrote in the Governance forum regarding the "high gas fees" topic (read the post)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Archway Network

Designing the Data Layer for Efficiency #20

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Archway Network

Designing the Data Layer for Efficiency #20

emperorjm Sep 18, 2023 Maintainer

Background:

Example:

Relational Database Approach:

NoSQL (Document-Oriented) Approach:

Key-Value Database Approach:

Questions:

References

Replies: 1 comment

drewstaylor Feb 8, 2024 Maintainer

emperorjm
Sep 18, 2023
Maintainer

drewstaylor
Feb 8, 2024
Maintainer