Skip to content
domino-succ edited this page Sep 4, 2014 · 69 revisions

News

      
Domino v0.1 is released! (06/25/2014)

Overview


Domino is a transaction engine implementing SUCC on the top of HBase. Stateless Update Concurrency Control (SUCC) is a distributed transaction control model. SUCC has the following properties:

  1. largely removes the conflict detection;
  2. does not require rollback;
  3. assists the failed commit/forward to finish.

These features make SUCC support high concurrency and low delay. Meanwhile, we present P4B abnormal, a new abnormal in Snapshot Isolation (SI). SUCC prevents not only all the abnormal of SI, also P4B abnormal.

Domino is a transaction engine implementing SUCC. Domino ameliorates the system architecture with the benefit of the HBase Coprocessor. The TPC-C results shows that Domino achieves high score (tpmC/W=12.24) under distributed environment, which is nearly the same with the performance of Oracle with its dedicated disk array. Compared with the typical HBased method, our scheme achieves 700% increase in throughput on average. Domino is open source under GPL License.

P4B Anomaly: Lost Commit

Snapshot isolation model is able to prevent the following anomalies: P0/P1/P2/P3/P4. However, snapshot isolation brings another anomaly when it performs commit. We present this new anomaly as P4B, Lost Commit.

During the commit procedure of snapshot isolation, a transaction T firstly gets its commit id c. If there are no transactions TS within T’s procedure where TS updates data W that T also updates, T will then commit (W is a data set : W=x1,x2,…,xn).

If another transaction T’ read x after T’s commit-timestamp c (commit_id), an anomaly will happen as shown in the following table.

From the table, we can see that T[t] reads diffrent values: x3[ 3] and x3[t], while the right result is that T[t] should read the same value of x3, e.g., x3[ 3]. This anomaly seems like P2 (Non-repeatable). However, the two reads in P2 happened respectively before and after T[t]’s commit operation, which is diffrent from this anomaly. In this anomaly described above, the inconsistent reads happened after T[t]’s commit and the first read result is wrong (lost T[t]’s commit). We define this anomaly as P4B-Lost Commit.

The anomaly P4B is because of the non-atomic commit procedure of the snapshot isolation model. Snapshot isolation and existing works do not present this anomaly, therefore the exist transaction engines based on snapshot isolation mostly have this anomaly. For example, HAcid handles commit procedure completely in terms of snapshot isolation, and does not present and consider P4B anomaly; HBaseSI caches the read result on client to ensure consistency of multiple reads. An client of HBaseSI reads the first value as the cache result, such as x = x1, then it uses x1 as the value of x in the later reads. However, according to P4B, the first read value is wrong; Percolator uses lock to block reads, which in turn prevents P4B. But it degrades the performance.

The model of Domino/SUCC

The motivation of our work is to reduce lock contention which is the fundamental cause of transaction dependency. Our primary idea is to save multiple snapshots for each data item in the data table when it is updated by new transactions. Every snapshot has two kinds of tags to support FULL-ACID properties. The first one is the “version” which indicates the available data. The second one is “status”. It is designed to show the data updated by which type of update operation. In our system, the update operations fall into two basic types:Stateless-Update and Stateful-Update.

Stateless-Update is a kind of insert operation which inserts new data regardless of the current data. Stateful-Update depends on the current data, such as B′ = B+1 where B′ is the new value and B is the current value. Besides, there is a meta table storing the status of the transactions. When a transaction commits, it sets its transaction id in the meta table. Unlike the previous methods, we find a new abnormal in snapshot isolation, then we present our scheme with the three advantages:

  • Almost no lock-contention. We store multiple snapshots for each data item with different versions. When a transaction T[a] starts up, it acquires its transaction id. Only the snapshots whose versions are less than the transaction id of T[a] is visible to T[a]. Among the visible snapshots, only the latest one is available for T[a] and T[a] can preform CRUD operations on this snapshot. So every transaction operates its own version data and the concurrent transactions have no lock-competition except Stateful Update. Although it brings redundancy, the current distributed systems and scalable hardware can afford it. Such multiple snapshots design brings significant high concurrency.
  • No rollback. When encountering failures, a transaction only needs to give up the rest operations and return a fail signal to the caller. Although the failed transaction does not finish, the “version” involved by the transaction has not been set yet. Therefore, the data already been updated by the failed transaction is invisible to the other transactions. Such dirty data can be moved by the “Data Cleaner” (we do not discuss the data cleaner in this paper). Rollback is a heavy-cost operation, which brings high IO and lock-contention in the traditional database systems. Our scheme decreases the cost of rollback.
  • Commit/forward assist. When committing a transaction T[a], we first set the meta table with T[a]’s commit id C_id_a. After that, the “version” in the data table is set with C_id_a. When a latter transaction T[b] arrives, if the “version” is NULL, T[b] will check the meta table. If C_id_a is already set, the transaction regards the data as valid, and set the “version” in the data table with C_id_a. That is, if the former transaction fail to complete the commit procedure, the latter transaction will help to finish it. We call this procedure Commit/forward assist. These features make our transaction scheme support high concurrency and low delay.

Proof

We prove the correctness of SUCC through analyzing the anomaly prevent.

  1. P0 (Dirty Write). Suppose T[a] and T[b] are two concurrent transactions, ωa(x[a] = Xa) ∈ T[a], and ωb(x[b] =Xb) ∈ T[b]. Before the startup of T[a] and T[b], x[t] = X0 and x.version[c0] = t. After the complement of ωa and ωb, any rollbacks of T[a] and T[b] will not bring P0 anomaly. Because:
    If the execution sequence is c[a]a[b] or a[b]c[a], suppose the commit id of T[a] is ca, then x.version[ca] = a and x[a] = Xa. Meanwhile, x.status[b] is removed by HANDLE STATUS. So the final value of x is Xa; In the same way, if the execution sequence is c[a]a[b] or a[b]c[a], the final value of x is Xb. If the execution sequence is a[a]a[b] or a[b]a[a], then x.status[a] and x.status[b] are both remove by HANDLE STATUS, and final value of x is still X0. The above scenarios only exist for stateless updates (ω(x)). Any stateful updates (φ(x)) is protected by the conflict detection of SUCC, so it is impossible that φ(x) and ω(x) can execute and commit concurrently. Therefore, P0 is prevented.
  2. P1 (Dirty Read). Suppose T[a] and T[b] are two concurrent transactions, wa(x[a] = Xa) ∈ T[a], rb(x) ∈ T[b], and wa is executed before rb. Then before T[a] commits or rollbacks, T[b] is not able to read Xa. Because before T[a]’s commit, x.version does not point to x[a]. And during the execution of T[a], metaa.status = ACTIV E, so x.status[a] is also skipped by T[b]. Therefore, P1 is prevented.
  3. P2 (Non-repeatable Read) and P3 (Phantom Read). Suppose T[a] and T[b] are two concurrent transactions, and they have the following reads and writes sequence:
    ra1(x[t] = Xt)wb(x[b] = Xb)c[b]ra2(x), where ra_1, ra2 ∈ T[a], wb ∈ T[b]. c is T[b]’s commit id.
    According to GET T ID to write a new item Y , ra2(Q) is not able to reach Y , because T[b]’s commit id c > a. Here, Q is query which can locate Y . Therefore, SUCC is able to prevent P3.
  4. P4 (Lost Update). Suppose T[a] and T[b] are two concurrent transactions, and they have the following reads and writes sequences:

    sequence 1:ra(x) rb(x) wa(x)wb(x),where ra, wa∈ T[a],rb, wb∈ T[b]

    If wa and wb are both stateful updates, wb will detect x.sataus[a] and then T[b] rollbacks, which prevents P4. If one of wa and wb is stateless update, wb will also cause T[B] rollback. Because we do not know the commit sequence of T[a] and T[b], conflict may happen.

    sequence 2:ra(x) rb(x) wa(x) c[a] wb(x),where ra, wa∈ T[a],rb, wb∈ T[b]

    If wb is stateful update, whether wa is stateful or stateless, T[b] will rollback. Because stateful update will commit later, which prevents P4. If wb is stateless update and wa is stateful update, because the commit point of T[b] is later than that of T[a], T[b]’s commit will not conflict with T[a]. Therefore, P4 will not happen.

System Architecture

Domino consists of four components, DETS, TME, DTO and Domino Client. DETS (Domino Endpoint Transaction System) runs on HBase RegionServer with Coprocessor. Therefore, the scalability of DETS is as strong as HBase. The routing protocol is based on the RowKey, which highly saves the maintenance costs. Meanwhile, the routing protocol reduces the most communication between client and server, highly improving the system performance.
Domino also uses Coprocessor Framework to implement DTO (Domino Timestamp Oracle) and TME (Transaction Metadata Endpoint). DTO is responsible for generating increasing transaction id and TME manages the metadata for Domino. Domino client includes HBaseClient and provides APIs for the user.

Interface (APIs)

Initiate Client

  • public Domino(String zookeeperAddress) throws IOException;
  • public Domino(Configuration config) throws IOException;

DDL Handler

The APIs is used to manage data table and start a transaction.
Create a Table

  • public void createTable(HTableDescriptor table) throws IOException;
  • public void createTable(HTableDescriptor table, byte[ ][ ] splitKeys) throws IOException;
  • public void createTable(HTableDescriptor table, byte[ ] startKey, byte[] endKey, int numRegions) throws IOException;
    Delete a Table
  • public void dropTable(byte[ ] name) throws IOException;
Begin a transaction
  • public Transaction startTransaction() throws IOException;
    startTransaction() returns a handler “Transaction” to deal with the transaction.

Transaction operators

. Transaction’s commit, rollback, read and write are all provided by Class Transaction.


  1. Read a record.
    public Result get(Get get, byte[ ] table) throws IOException;
    This api uses HBase’s ‘Get’ as a parameter, and the result is returned in the format of HBase’s ‘Result’.

  2. Write a stateless record.
    public void put(Put put, byte[ ] table) throws IOException;

  3. Write a stateful record.
    public void putStateful(Put put, byte[ ] table) throws IOException;

  4. Delete a record.
    public void delete(byte[ ] row, byte[ ] table) throws IOException;

  5. Scan the given range data.
    public ResultScanner scan(Scan scan, byte[] table) throws IOException;
    This api uses HBase’s Scan as a parameter.

  6. Commit a transaction.
    public void commit() throws IOException;

  7. Rollback a transaction.
    public void rollback() throws IOException;


An example of how to use domino client can be found at https://github.com/domino-succ/domino/blob/master/example.java.

Deployment

It is very simple to deploy Domino in the real-world system. The prerequisite is that there should be an executable HBase cluster and the version of HBase should be higher than 0.94.

Domino consist four library files (JARs) as follows:

  1. domino-client-.jar, the client class library;
  2. domino-common-.jar, Common class library;
  3. domino-core-.jar, the core Endpoint class library;
  4. domino-id-service-.jar, transaction ID service class library.
Server side deployment.

Before HBase’s startup, we should first put the above four library files (JARs) into HBase’s , and then startup HBase. Follow the above steps, the service of Domino is deployed successfully.

Client use.

When using domino at client, we only need to include three jars, domino-client, domino-common and domino-id-service, into . The example how to use the client can be found at https://github.com/domino-succ/domino/blob/master/example.java.

Evaluation

The TPC-C results shows that SUCC achieves high score (tpmC/W=12.24) under distributed environment, which is nearly the same with the performance of oracle with its dedicated disk array.

For more evaluation results, please refer our papers.

Papers

Download our papers

People

Tieying Zhang
Zhen Zhao
Xiaobo Hao

Welcome you to join Domino.

Clone this wiki locally