feat: Add Data Sync Plugin for external database synchronization #75

onyedikachi-david · 2025-01-24T23:04:31Z

Purpose

Fixes: #72
/claim #72
Implement data synchronization functionality between external data sources and StarbaseDB's internal SQLite database. This plugin enables close-to-edge replica creation by automatically pulling and synchronizing data from external sources (like PostgreSQL) at configurable intervals.

Tasks

Verify

Start a PostgreSQL instance:

docker run --name starbasedb-postgres -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=demo -p 5432:5432 -d postgres:15

Configure the plugin in wrangler.toml:

[plugins.data-sync]
sync_interval = 300 # 5 minutes
tables = ["users", "products"]

Set up environment variables for database credentials
Test synchronization:

Monitor sync status: curl http://localhost:8787/sync-status
View synced data: curl http://localhost:8787/sync-data
Check sync metadata: curl http://localhost:8787/debug

Before

StarbaseDB instances could only:

Use internal SQLite database
Connect to external databases for direct queries
No automatic data synchronization
No edge-replica capabilities

After

StarbaseDB now supports:

Automatic data synchronization from external sources
Configurable sync intervals and table selection
Incremental updates based on timestamps and IDs
Type mapping between PostgreSQL and SQLite
Close-to-edge replica functionality
Monitoring and debugging capabilities

Signed-off-by: David Anyatonwu <[email protected]>

Brayden

Put a few thoughts on the pull request. I still need to run through the demo and test, just starting with a code review.

One question I have: is there any way to abstract the Postgres specific code out so we could easily support other types of databases easily in the future? For example the column mappings between Postgres <> SQLite would obviously only work if the user connects a Postgres database to it, and would break likely for MySQL. Maybe there is a high level plugin like DataSyncPlugin and then another plugin you can pass inside of it that can override functions that would be database specific like:

new DataSyncPlugin(syncSource: PostgresSync | MySQLSync | etc)

And maybe PostgresSync overrides some default class implementation and all the database specific logic (e.g. calling to the information_schema & column type mapping) could exist in there?

Thoughts?

Brayden · 2025-01-26T16:28:41Z

plugins/data-sync/README.md

+2. For each configured table:
+    - Retrieves the table schema from the external database
+    - Creates a corresponding table in the internal database
+    - Periodically checks for new or updated records based on `created_at` timestamp and `id`


Is there a mechanism we can put in place for tables that don't contain a created_at or id column? Perhaps as part of the sync config where they define an array of tables they want to sync they can also say what key they want to use to annotate latest entry.

Brayden · 2025-01-26T16:32:40Z

plugins/data-sync/README.md

+1. The plugin creates a metadata table in the internal database to track sync state
+2. For each configured table:
+    - Retrieves the table schema from the external database
+    - Creates a corresponding table in the internal database


What happens when a Postgres table contains both a schema and table name (e.g. users.profile) and SQLite only supports tables without schemas. Would the name of the table become users.profile? Would we want the users moving forward to query that table with the ${schema}.${table} name notation moving forward?

I assume for any Postgres public schema tables we would just create them with simply their table name (e.g. ${table}) without a schema prefix, correct?

Lastly, if the user did decide to do public.users would we have a beforeQuery hook that was smart enough in this plugin to know we could omit public. from it as that table is in our SQLite root?

Brayden · 2025-01-26T16:39:17Z

plugins/data-sync/index.ts

+            // Create metadata table if it doesn't exist
+            await this.dataSource?.rpc.executeQuery({
+                sql: `
+          CREATE TABLE IF NOT EXISTS data_sync_metadata (


So far for our plugins we've been following the tmp_ prefix for our table names so users know what might have been created from Starbase versus what they've created themselves. Would you mind updating to contain that prefix so it should be:

CREATE TABLE IF NOT EXISTS tmp_data_sync_metadata (

onyedikachi-david · 2025-01-26T17:12:12Z

Put a few thoughts on the pull request. I still need to run through the demo and test, just starting with a code review.

One question I have: is there any way to abstract the Postgres specific code out so we could easily support other types of databases easily in the future? For example the column mappings between Postgres <> SQLite would obviously only work if the user connects a Postgres database to it, and would break likely for MySQL. Maybe there is a high level plugin like DataSyncPlugin and then another plugin you can pass inside of it that can override functions that would be database specific like:
new DataSyncPlugin(syncSource: PostgresSync | MySQLSync | etc)
And maybe PostgresSync overrides some default class implementation and all the database specific logic (e.g. calling to the information_schema & column type mapping) could exist in there?

Thoughts?

It's best actually, let me see how to implement it.

…ulti-db support rm test demo; not working

onyedikachi-david · 2025-01-27T17:27:26Z

@Brayden

Refactored the data sync plugin to:

Abstract database-specific code into DatabaseSyncSource class
Add tmp_ prefix to all synced tables
Support custom schemas with proper table name mapping
Add flexible sync configuration (custom columns, batch sizes)
Improve type mapping and validation
Add comprehensive error handling and logging
Removed demo test; not working.

feat: Add Data Sync Plugin for external database synchronization

b9fb9f8

Signed-off-by: David Anyatonwu <[email protected]>

algora-pbc bot mentioned this pull request Jan 24, 2025

Replicate data from external source to internal source with a Plugin #72

Open

algora-pbc bot added the 🙋 Bounty claim label Jan 24, 2025

Brayden reviewed Jan 26, 2025

View reviewed changes

refactor: abstract database-specific code from data sync plugin for m…

34be2b6

…ulti-db support rm test demo; not working

onyedikachi-david added 2 commits January 27, 2025 18:33

docs: add documentation for data sync plugins

40023d1

chore: update meta.json files

7ab1edb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Data Sync Plugin for external database synchronization #75

feat: Add Data Sync Plugin for external database synchronization #75

onyedikachi-david commented Jan 24, 2025 •

edited

Loading

Brayden left a comment

Brayden Jan 26, 2025

Brayden Jan 26, 2025

Brayden Jan 26, 2025

onyedikachi-david commented Jan 26, 2025

onyedikachi-david commented Jan 27, 2025 •

edited

Loading

feat: Add Data Sync Plugin for external database synchronization #75

Are you sure you want to change the base?

feat: Add Data Sync Plugin for external database synchronization #75

Conversation

onyedikachi-david commented Jan 24, 2025 • edited Loading

Purpose

Tasks

Verify

Before

After

Brayden left a comment

Choose a reason for hiding this comment

Brayden Jan 26, 2025

Choose a reason for hiding this comment

Brayden Jan 26, 2025

Choose a reason for hiding this comment

Brayden Jan 26, 2025

Choose a reason for hiding this comment

onyedikachi-david commented Jan 26, 2025

onyedikachi-david commented Jan 27, 2025 • edited Loading

onyedikachi-david commented Jan 24, 2025 •

edited

Loading

onyedikachi-david commented Jan 27, 2025 •

edited

Loading