-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Data Sync Plugin for external database synchronization #75
base: main
Are you sure you want to change the base?
feat: Add Data Sync Plugin for external database synchronization #75
Conversation
Signed-off-by: David Anyatonwu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put a few thoughts on the pull request. I still need to run through the demo and test, just starting with a code review.
One question I have: is there any way to abstract the Postgres specific code out so we could easily support other types of databases easily in the future? For example the column mappings between Postgres <> SQLite would obviously only work if the user connects a Postgres database to it, and would break likely for MySQL. Maybe there is a high level plugin like DataSyncPlugin
and then another plugin you can pass inside of it that can override functions that would be database specific like:
new DataSyncPlugin(syncSource: PostgresSync | MySQLSync | etc)
And maybe PostgresSync
overrides some default class implementation and all the database specific logic (e.g. calling to the information_schema & column type mapping) could exist in there?
Thoughts?
plugins/data-sync/README.md
Outdated
2. For each configured table: | ||
- Retrieves the table schema from the external database | ||
- Creates a corresponding table in the internal database | ||
- Periodically checks for new or updated records based on `created_at` timestamp and `id` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a mechanism we can put in place for tables that don't contain a created_at
or id
column? Perhaps as part of the sync config where they define an array of tables they want to sync they can also say what key they want to use to annotate latest entry.
plugins/data-sync/README.md
Outdated
1. The plugin creates a metadata table in the internal database to track sync state | ||
2. For each configured table: | ||
- Retrieves the table schema from the external database | ||
- Creates a corresponding table in the internal database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when a Postgres table contains both a schema and table name (e.g. users
.profile
) and SQLite only supports tables without schemas. Would the name of the table become users.profile
? Would we want the users moving forward to query that table with the ${schema}.${table}
name notation moving forward?
I assume for any Postgres public schema tables we would just create them with simply their table name (e.g. ${table}
) without a schema prefix, correct?
Lastly, if the user did decide to do public.users
would we have a beforeQuery
hook that was smart enough in this plugin to know we could omit public.
from it as that table is in our SQLite root?
plugins/data-sync/index.ts
Outdated
// Create metadata table if it doesn't exist | ||
await this.dataSource?.rpc.executeQuery({ | ||
sql: ` | ||
CREATE TABLE IF NOT EXISTS data_sync_metadata ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far for our plugins we've been following the tmp_
prefix for our table names so users know what might have been created from Starbase versus what they've created themselves. Would you mind updating to contain that prefix so it should be:
CREATE TABLE IF NOT EXISTS tmp_data_sync_metadata (
It's best actually, let me see how to implement it. |
…ulti-db support rm test demo; not working
Refactored the data sync plugin to:
|
Purpose
Fixes: #72
/claim #72
Implement data synchronization functionality between external data sources and StarbaseDB's internal SQLite database. This plugin enables close-to-edge replica creation by automatically pulling and synchronizing data from external sources (like PostgreSQL) at configurable intervals.
Tasks
Implement core Data Sync Plugin
sync_interval
optiontables
optioncreated_at
andid
columnsAdd plugin configuration support
Implement sync mechanisms
Create comprehensive demo and documentation
Verify
wrangler.toml
:Set up environment variables for database credentials
Test synchronization:
curl http://localhost:8787/sync-status
curl http://localhost:8787/sync-data
curl http://localhost:8787/debug
Before
StarbaseDB instances could only:
After
StarbaseDB now supports: