Skip to content

citusdata/pg_shard

Repository files navigation

Deprecation Notice

The new open-source Citus extension for PostgreSQL natively integrates pg_shard and provides a superset of its functionality. We’ll not be making any more pg_shard releases and the project is entering end-of-life status.

We have migration steps to help existing customers perform a drop-in replacement. Contact us for more information, or feel free to stop by our Slack channel to have questions answered in real-time.


pg_shard

Build Status Coverage Release License

pg_shard is a sharding extension for PostgreSQL. It shards and replicates your PostgreSQL tables for horizontal scale and high availability. The extension also seamlessly distributes your SQL statements, without requiring any changes to your application. Join the mailing list to stay on top of the latest developments.

As a standalone extension, pg_shard addresses many NoSQL use cases. It also enables real-time analytics, and has an easy upgrade path to CitusDB for complex analytical workloads (distributed joins). Further, the extension provides access to standard SQL tools, and powerful PostgreSQL features, such as diverse set of indexes and semi-structured data types.

This README serves as a quick start guide. We address architectural questions on sharding, shard rebalancing, failure handling, and distributed consistency mechanisms on our webpage. Also, we're actively working on improving pg_shard, and welcome any questions or feedback on our mailing list.

Building

pg_shard runs on Linux and OS X. The extension works with PostgreSQL 9.3.4+, PostgreSQL 9.4.0+, and CitusDB 3.2+. Building pg_shard requires GCC 4.6 or greater (or a similarly recent version of Clang).

Once you have PostgreSQL or CitusDB installed, you're ready to build pg_shard. For this, you will need to include the pg_config directory path in your make command. This path is typically the same as your PostgreSQL installation's bin/ directory path. For example:

# Path when PostgreSQL is compiled from source
PATH=/usr/local/pgsql/bin/:$PATH make
sudo PATH=/usr/local/pgsql/bin/:$PATH make install

# Path when CitusDB package is installed
PATH=/opt/citusdb/4.0/bin/:$PATH make
sudo PATH=/opt/citusdb/4.0/bin/:$PATH make install

pg_shard also includes regression tests. To verify your installation, start your PostgreSQL instance with the shared_preload_libraries setting mentioned below, and run make installcheck.

Note: If you'd like to build against CitusDB, please contact us at engage @ citusdata.com.

Upgrading from Previous Versions

To upgrade an existing installation, simply:

  1. Build and install the latest pg_shard release (see the Building section)
  2. Restart your PostgreSQL server
  3. Run ALTER EXTENSION pg_shard UPDATE; on the PostgreSQL server

Note that taking advantage of the new repair functionality requires that you also install pg_shard on all your worker nodes.

Setup

pg_shard uses a master node to store shard metadata. In the simple setup, this node also acts as the interface for all queries to the cluster. As a user, you can pick any one of your PostgreSQL nodes as the master, and the other nodes in the cluster will then be your workers.

An easy way to get started is by running your master and worker instances on the same machine. In that case, each instance will be one PostgreSQL database that runs on a different port. You can simply use localhost as the worker node's name in this setup.

Alternatively, you could start up one PostgreSQL database per machine; this is more applicable for production workloads. If you do this, you'll need to configure your PostgreSQL instances so that they can talk to each other. For that, you'll need to update the listen_addresses setting in your postgresql.conf file, and change access control settings in pg_hba.conf.

Whatever you decide, the master must be able to connect to the workers over TCP without any interactive authentication. In addition, a database using the same name as the master's database must already exist on all worker nodes.

Once you decide on your cluster setup, you will need to make two changes on the master node. First, you will need to add pg_shard to shared_preload_libraries in your postgresql.conf:

shared_preload_libraries = 'pg_shard'    # (change requires restart)

Second, the master node in pg_shard reads worker host information from a file called pg_worker_list.conf in the data directory. You need to add the hostname and port number of each worker node in your cluster to this file. For example, to add two worker nodes running on the default PostgreSQL port:

$ emacs -nw $PGDATA/pg_worker_list.conf

# hostname port-number
worker-101  5432
worker-102  5432

Then, you can save these settings and restart the master node.

Table Sharding

Now, let's log into the master node and create the extension:

CREATE EXTENSION pg_shard;

At this point you're ready to distribute a table. To let pg_shard know the structure of your table, define its schema as you would do with a normal table:

CREATE TABLE customer_reviews
(
    customer_id TEXT NOT NULL,
    review_date DATE,
    review_rating INTEGER,
    review_votes INTEGER,
    review_helpful_votes INTEGER,
    product_id CHAR(10),
    product_title TEXT,
    product_sales_rank BIGINT,
    product_group TEXT,
    product_category TEXT,
    product_subcategory TEXT,
    similar_product_ids CHAR(10)[]
);

This table will not be used to store any data on the master but serves as a prototype of what a customer_reviews table should look like on the worker nodes. After you're happy with your schema, and have created the desired indexes on your table, tell pg_shard to distribute the table:

-- Specify the table to distribute and the column to distribute it on
SELECT master_create_distributed_table(table_name := 'customer_reviews',
                                       partition_column := 'customer_id');

This function informs pg_shard that the table customer_reviews should be hash partitioned on the customer_id column. Now, let's create shards for this table on the worker nodes:

-- Specify the table name, total shard count and replication factor
SELECT master_create_worker_shards(table_name := 'customer_reviews',
                                   shard_count := 16,
                                   replication_factor := 2);

This function creates a total of 16 shards. Each shard owns a portion of a hash token space, and gets replicated on 2 worker nodes. The shard replicas created on the worker nodes have the same table schema, index, and constraint definitions as the table on the master node. Once all replicas are created, this function saves all distributed metadata on the master node.

Usage

Once you created your shards, you can start issuing queries against the cluster. Currently, UPDATE and DELETE commands require the partition column in the WHERE clause.

INSERT INTO customer_reviews (customer_id, review_rating) VALUES ('HN802', 5);
INSERT INTO customer_reviews VALUES
  ('HN802', '2004-01-01', 1, 10, 4, 'B00007B5DN',
   'Tug of War', 133191, 'Music', 'Indie Music', 'Pop', '{}');
INSERT INTO customer_reviews (customer_id, review_rating) VALUES ('FA2K1', 10);

SELECT avg(review_rating) FROM customer_reviews WHERE customer_id = 'HN802';
SELECT count(*) FROM customer_reviews;

UPDATE customer_reviews SET review_votes = 10 WHERE customer_id = 'HN802';
DELETE FROM customer_reviews WHERE customer_id = 'FA2K1';

Loading Data from a File

A script named copy_to_distributed_table is provided to facilitate loading many rows of data from a file, similar to the functionality provided by PostgreSQL's COPY command. It will be installed into the scripts directory for your PostgreSQL installation (you can find this by running pg_config --bindir).

As an example, the invocation below would copy rows into the users table from a CSV-like file using pipe characters as a delimiter and the word NULL to signify a null value. The file contains a header line, which will be skipped.

copy_to_distributed_table -CH -d '|' -n NULL input.csv users

Call the script with the -h for more usage information.

Maximizing Throughput

Because INSERT commands must wait for a response before returning, loading rows through a single connection can never fully exploit the write capacity of your pg_shard cluster. To achieve the highest insert rates, it is necessary to load data in a concurrent fashion. The difference, even in a small cluster, will be significant.

For example, we could split an input file (input.csv) into chunks and load each chunk using a separate instance of copy_to_distributed_table (requires GNU split):

mkdir chunks
split -nl/64 input.csv chunks/
find chunks -type f | xargs -t -n1 -P64 -I% copy_to_distributed_table -C % users

Note that the above example loads the contents of input.csv using 64 processes. The optimal value will vary depending on factors such as cluster size and hardware.

This advice applies similarly to application design: if you have workers loading data into a pg_shard cluster, experiment to determine the number of workers that maximizes cluster utilization.

Repairing Shards

If for whatever reason a shard placement fails to be updated during a modification command, it will be marked as inactive. The master_copy_shard_placement function can be called to repair an inactive shard placement using data from a healthy placement. In order for this function to operate, pg_shard must be installed on all worker nodes and not just the master node. The shard will be protected from any concurrent modifications during the repair.

SELECT master_copy_shard_placement(shard_id := 12345,
                                   source_node_name := 'good_host',
                                   source_node_port := 5432,
                                   target_node_name := 'bad_host',
                                   target_node_port := 5432);

Usage with CitusDB

When installed within CitusDB, pg_shard will use the distribution metadata catalogs provided by CitusDB. No special syncing step is necessary: your pg_shard-distributed tables will be visible to CitusDB and vice versa. Just ensure the pg_shard.use_citusdb_select_logic config variable is turned on (the default when running within CitusDB) and you'll be good to go!

Look Under the Hood

When you distribute a table and create shards for it, pg_shard saves related metadata on the master node. You can probe into this metadata by logging into the master and running the following:

SELECT * FROM pgs_distribution_metadata.partition;
SELECT * FROM pgs_distribution_metadata.shard;
SELECT * FROM pgs_distribution_metadata.shard_placement;

The partition metadata table indicates to pg_shard which PostgreSQL tables are distributed and how. The shard metadata table then maps a distributed table to its logical shards, and associates each shard with a portion of a hash token space spanning between ]-2B, +2B[. Last, the shard_placement table maintains each shard's location information, that is, the worker node name and port for that shard. As an example, if you're using a replication factor of 2, then each shard will have two shard placements.

Each shard placement in pg_shard corresponds to one PostgreSQL table on a worker node. You can probe into these tables by connecting to any one of the workers, and running standard PostgreSQL commands:

psql -d postgres -h worker-101 -p 5432
postgres=# \d

Limitations

pg_shard is intentionally limited in scope during its first release, but is fully functional within that scope. We classify pg_shard's current limitations into two groups. In one group, we have features that we don't intend to support in the medium term due to architectural decisions we made:

  • Transactional semantics for queries that span across multiple shards — For example, you're a financial institution and you sharded your data based on customer_id. You'd now like to withdraw money from one customer's account and debit it to another one's account, in a single transaction block.
  • Unique constraints on columns other than the partition key, or foreign key constraints.
  • Distributed JOINs also aren't supported in pg_shard - If you'd like to run complex analytic queries, please consider upgrading to CitusDB.

Another group of limitations are shorter-term but we're calling them out here to be clear about unsupported features:

  • Table alterations are not supported: customers who do need table alterations accomplish them by using a script that propagates such changes to all worker nodes.
  • DROP TABLE does not have any special semantics when used on a distributed table. An upcoming release will add a shard cleanup command to aid in removing shard objects from worker nodes.
  • Queries such as INSERT INTO foo SELECT bar, baz FROM qux are not supported.

Besides these limitations, we have a list of features that we're looking to add. Instead of prioritizing this list ourselves, we decided to keep an open discussion on GitHub issues and hear what you have to say. So, if you have a favorite feature missing from pg_shard, please do get in touch!

License

Copyright © 2012–2015 Citus Data, Inc.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

See the LICENSE file for full details.