This repo contains an example of a scalable architecture for AI-powered applications. On the surface, it's an AI app where users can upload PDFs and chat with them. However, under the hood, each user gets a dedicated vector database instance (Postgres on Neon with pgvector).
You can check out the live version at https://ai-vector-db-per-tenant.pages.dev
The app is built using the following technologies:
- Neon - Fully managed Postgres
- Remix - Full-stack React framework
- Remix Auth - Authentication
- Drizzle ORM - TypeScript ORM
- Railway - Deployment Platform
- Vercel AI SDK - TypeScript toolkit for building AI-powered applications
- Cloudflare R2 - Object storage
- OpenAI with gpt-4o-mini - LLM
- Upstash - Redis for rate limiting
- Langchain - Framework for developing applications powered by large language models (LLMs)
Rather than having all vector embeddings stored in a single Postgres database, you provide each tenant (a user, an organization, a workspace, or any other entity requiring isolation) with its own dedicated Postgres database instance where you can store and query its embeddings.
Depending on your application, you will provision a vector database after a specific event (e.g., user signup, organization creation, or upgrade to paid tier). You will then track tenants and their associated vector databases in your application's main database.
This approach offers several benefits:
- Each tenant's data is stored in a separate, isolated database not shared with other tenants. This makes it possible for you to be compliant with data residency requirements (e.g., GDPR)
- Database resources can be allocated based on each tenant's requirements.
- A tenant with a large workload that can impact the database's performance won't affect other tenants; it would also be easier to manage.
Here's the database architecture diagram of the demo app that's in this repo:
The main application's database consists of three tables: documents
, users
, and vector_databases
.
- The
documents
table stores information about files, including their titles, sizes, and timestamps, and is linked to users via a foreign key. - The
users
table maintains user profiles, including names, emails, and avatar URLs. - The
vector_databases
table tracks which vector database belongs to which user.
Then, each vector database that gets provisioned has an embeddings
table for storing document chunks for retrieval-augmented generation (RAG).
For this app, vector databases are provisioned when a user signs up. Once they upload a document, it gets chunked and stored in their dedicated vector database. Finally, once the user chats with their document, the vector similarity search runs against their database to retrieve the relevant information to answer their prompt.
Code snippet example of provisioning a vector database
// Code from app/lib/auth.ts
authenticator.use(
new GoogleStrategy(
{
clientID: process.env.GOOGLE_CLIENT_ID,
clientSecret: process.env.GOOGLE_CLIENT_SECRET,
callbackURL: process.env.GOOGLE_CALLBACK_URL,
},
async ({ profile }) => {
const email = profile.emails[0].value;
try {
const userData = await db
.select({
user: users,
vectorDatabase: vectorDatabases,
})
.from(users)
.leftJoin(vectorDatabases, eq(users.id, vectorDatabases.userId))
.where(eq(users.email, email));
if (
userData.length === 0 ||
!userData[0].vectorDatabase ||
!userData[0].user
) {
const { data, error } = await neonApiClient.POST("/projects", {
body: {
project: {},
},
});
if (error) {
throw new Error(`Failed to create Neon project, ${error}`);
}
const vectorDbId = data?.project.id;
const vectorDbConnectionUri = data.connection_uris[0]?.connection_uri;
const sql = postgres(vectorDbConnectionUri);
await sql`CREATE EXTENSION IF NOT EXISTS vector;`;
await migrate(drizzle(sql), { migrationsFolder: "./drizzle" });
const newUser = await db
.insert(users)
.values({
email,
name: profile.displayName,
avatarUrl: profile.photos[0].value,
userId: generateId({ object: "user" }),
})
.onConflictDoNothing()
.returning();
await db
.insert(vectorDatabases)
.values({
vectorDbId,
userId: newUser[0].id,
})
.returning();
const result = {
...newUser[0],
vectorDbId,
};
return result;
}
return {
...userData[0].user,
vectorDbId: userData[0].vectorDatabase.vectorDbId,
};
} catch (error) {
console.error("User creation error:", error);
throw new Error(getErrorMessage(error));
}
},
),
);
Code snippet and diagram of RAG
// Code from app/routes/api/document/chat
// Get the user's messages and the document ID from the request body.
const {
messages,
documentId,
}: {
messages: Message[];
documentId: string;
} = await request.json();
const { content: prompt } = messages[messages.length - 1];
const { data, error } = await neonApiClient.GET(
"/projects/{project_id}/connection_uri",
{
params: {
path: {
project_id: user.vectorDbId,
},
query: {
role_name: "neondb_owner",
database_name: "neondb",
},
},
},
);
if (error) {
return json({
error: error,
});
}
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
dimensions: 1536,
model: "text-embedding-3-small",
});
const vectorStore = await NeonPostgres.initialize(embeddings, {
connectionString: data.uri,
tableName: "embeddings",
columns: {
contentColumnName: "content",
metadataColumnName: "metadata",
vectorColumnName: "embedding",
},
});
const result = await vectorStore.similaritySearch(prompt, 2, {
documentId,
});
const model = new ChatOpenAI({
apiKey: process.env.OPENAI_API_KEY,
model: "gpt-4o-mini",
temperature: 0,
});
const allMessages = messages.map((message) =>
message.role === "user"
? new HumanMessage(message.content)
: new AIMessage(message.content),
);
const systemMessage = new SystemMessage(
`You are a helpful assistant, here's some extra additional context that you can use to answer questions. Only use this information if it's relevant:
${result.map((r) => r.pageContent).join(" ")}`,
);
allMessages.push(systemMessage);
const stream = await model.stream(allMessages);
return LangChainAdapter.toDataStreamResponse(stream);
While this approach is beneficial, it can also be challenging to implement. You need to manage each database's lifecycle, including provisioning, scaling, and de-provisioning. Fortunately, Postgres on Neon is set up differently:
- Postgres on Neon can be provisioned via the in ~2 seconds, making provisioning a Postgres database for every tenant possible. You don't need to wait several minutes for the database to be ready.
- The database's compute can automatically scale up to meet an application's workload and can shut down when the database is unused.
instant-postgres.mp4
This makes the proposed pattern of creating a database per tenant not only possible but also cost-effective.
When you have a database per tenant, you need to manage migrations for each database. This project uses Drizzle:
- The schema is defined in
/app/lib/vector-db/schema.ts
using TypeScript. - Migrations are then generated by running
bun run vector-db:generate
, and stored in/app/lib/vector-db/migrations
. - Finally, to migrate all databases, you can run
bun run vector-db:migrate
. This command will run a script that connects to each tenant's database and applies the migrations.
It's important to note that any schema changes you would like to introduce should be backward-compatible. Otherwise, you would need to handle schema migrations differently.
While this pattern is useful in building AI applications, you can simply use it to provide each tenant with its own database. You could also use a database other than Postgres for your main application's database (e.g., MySQL, MongoDB, MSSQL server, etc.).
If you have any questions, feel free to reach out to in the Neon Discord or contact the Neon Sales team. We'd love to hear from you.