Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPLT-1049 Provision separate DB per user #144

Merged
merged 10 commits into from
Jul 26, 2023

Conversation

morgsmccauley
Copy link
Collaborator

@morgsmccauley morgsmccauley commented Jul 21, 2023

Revert of #143, which therefore contains the changes from #132 #139 and #140. The original implementation (#132) had a few problems, which I'll address in the PR.


This PR updates the Lambda Runner provisioning step to create a separate Postgres database per user. This provides greater security, preventing users from manipulating other users data within the provided SQL.

Provisioning High-Level

This section outlines the high level steps during provisioning, as well as some notes on the changes made to them.

1. Create database/user in Postgres

It's possible to run SQL against Postgres via Hasura, but the provided statements are run within a transaction block. As CREATE DATABASE can not be run within a transaction, we need a direct connection to Postgres to create new databases.

Crypto random passwords are generated for each PG user, these will be stored by Hasura in plain text within its metadata table. Exposing the password/connection string via an environment variable isn't really feasible as this would require: creating a secret, updating the Cloud Run revision, and restarting the instance. We can look at more secure options for this in future.

2. Add database to Hasura

Database connections are added as 'data sources' to Hasura. To maintain the same query/mutation shape as we have currently, i.e. {account_id}_{function_name}_{table_name}, root_field and type_name prefixes will be configured on the data source. The requirement for these is discussed below.

3. Untrack existing tables

As the new DB is a direct copy of the existing schema, 'tracking' the tables will result in conflicts within GraphQL. To prevent this, the existing tables must be 'untracked' in Hasura, removing them from the GraphQL schema.

3. Run user provided SQL

A user could theoretically create two different functions with the same tables, resulting in conflicts within Postgres. To avoid this, a new schema is created per function. Since this is scoped within the accounts DB, we can name this after just the function, rather than a concatenation of both account/function like we have currently.

Hasura surfaces tables in GraphQL via {schema_name}_{table_name}. With the shared DB we have currently, the schema_name is set to {account_id}_{function_name}. As mentioned above, this is now only set to {function_name}, which could potentially lead to conflicts if another account forks the same indexer. To prevent this, the data source is configured to prefix all root fields and types with {account_id}

4. Track tables, foreign key relationships, and add permissions

These steps are mostly unchanged, but have been slightly refactored to take in to account execution against different databases/sources.

Migration

All accounts/functions will be migrated to a new DB after Coordinator restart, this happens for the following reasons:

  1. Coordinator requests provisioning when first encountering the function, it then disables it for that function to prevent the Runner from unnecessarily checking if the infra exists.
  2. The 'isProvisioned' check has been updated to check if the DB exists. This will be false for all indexers, which therefore triggers the updated provisioning step.

The migration which happens within the Runner will not migrate the existing data, the user will need to re-index to re-populate their database.

A script has been created to migrate users data if required. This will be done on an ad-hoc basis and requires some manual intervention. To migrate data, the following will need be completed:

  1. Configure the environment variables in ./scripts/migrate-schema-to-db.js
  2. Before deploying these changes, stop Coordinator and wait for the SQS queue to drain - this ensures no Indexer functions are running, and therefore automatic provisioning wont occur.
  3. Run (for example) node ./scripts/migrate-schema-to-db.js dataplatform_near social_feed - this moves the schema to it's own DB, along with its data, and configures it within Hasura

These steps are essentially the same as the automatic provisioning within Provisioner, with the addition of data migration. Automatic provisioning will be skipped within the Runner as the DB/Schema already exist.

Other Considerations

By default, all users are able to connect to the template databases, which are used as 'templates' for new databases. To prevent users from modifying these templates we should remove PUBLIC access from them:

REVOKE CONNECT ON DATABASE template0 FROM PUBLIC
REVOKE CONNECT ON DATABASE template1 FROM PUBLIC

@@ -7,12 +7,17 @@ provider:
name: aws
runtime: nodejs16.x
region: eu-central-1
timeout: 15
timeout: 120
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting a large timeout to ensure we don't timeout, I'll reduce this once we have a clearer idea of how long provisioning is taking.

startFromBlock-runner:
type: queue
fifo: true
maxRetries: 1
# batchSize: 100
worker:
handler: handler.consumer
timeout: 15 # 1.5 minutes as lift multiplies this value by 6 (https://github.com/getlift/lift/blob/master/docs/queue.md#retry-delay)
timeout: 120 # 12 minutes as lift multiplies this value by 6 (https://github.com/getlift/lift/blob/master/docs/queue.md#retry-delay)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visibility timeout of 12 minutes doesn't really apply anymore since we aren't retrying

@morgsmccauley morgsmccauley marked this pull request as ready for review July 26, 2023 03:08
@morgsmccauley morgsmccauley requested a review from a team as a code owner July 26, 2023 03:08
@gabehamilton
Copy link
Collaborator

Ok, so the REVOKE CONNECT is something we need to manually do on each new environment?

@morgsmccauley
Copy link
Collaborator Author

Ok, so the REVOKE CONNECT is something we need to manually do on each new environment?

@gabehamilton yep

@morgsmccauley morgsmccauley merged commit c1e2fb2 into main Jul 26, 2023
1 check passed
@morgsmccauley morgsmccauley deleted the revert-143-DPLT-1049-revert branch July 26, 2023 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants