Schedule automatic Cassandra backup #29

seanshahkarami · 2017-09-05T21:01:51Z

Since we're about to start a lot of work on beehive, we should make sure we have a Cassandra backup process in place.

I went ahead and built a tool to pull datasets, we just need to schedule it and have a place to keep the backups: https://github.com/waggle-sensor/beehive-server/tree/master/data-exporter

The current missing half is having a complementary script to do a restore, but at least we have the raw data available now.

seanshahkarami · 2017-09-06T23:26:10Z

The main thing blocking this is choosing a reliable storage location. One interesting point is, if we're ready to consider S3 as a possible redundancy location, we automatically get an access controlled, directory-like interface to the data stored there. In other words, we could go ahead and start pointing some people to this for pulling datasets.

seanshahkarami · 2017-09-27T18:22:41Z

After thinking this over last night, I don't think this is right thing to have for the real deployment. It was more of a emergency reaction to how beehive1 is currently deployed. Cassandra is setup using a single node, so you don't get any of Cassandra's resilience guarantees during failure...

Cassandra is designed specifically to use replication and eventual consistency between a cluster nodes. In production, you'd have a number of nodes running in a cluster so you can drop a certain number of nodes at anytime and still continue running without data loss. Building on top of that reliability feature is the right way to go, if we're using Cassandra.

seanshahkarami · 2017-09-28T03:16:57Z

Turns out clustering works beautifully in the example I tried... It only took 10 minutes to setup a 3 node cluster on my own machine and load it with some test data. Using a keyspace with a replication factor of 2, things worked as expected - any single node could be taken completely offline and I still had access to the entire dataset. Just something to think about for production deployment...

seanshahkarami added the enhancement label Sep 5, 2017

seanshahkarami added this to the Beehive Production Candidate milestone Sep 5, 2017

seanshahkarami added ops reliability and removed enhancement labels Sep 5, 2017

seanshahkarami added backup data labels Sep 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schedule automatic Cassandra backup #29

Schedule automatic Cassandra backup #29

seanshahkarami commented Sep 5, 2017 •

edited

Loading

seanshahkarami commented Sep 6, 2017 •

edited

Loading

seanshahkarami commented Sep 27, 2017

seanshahkarami commented Sep 28, 2017 •

edited

Loading

Schedule automatic Cassandra backup #29

Schedule automatic Cassandra backup #29

Comments

seanshahkarami commented Sep 5, 2017 • edited Loading

seanshahkarami commented Sep 6, 2017 • edited Loading

seanshahkarami commented Sep 27, 2017

seanshahkarami commented Sep 28, 2017 • edited Loading

seanshahkarami commented Sep 5, 2017 •

edited

Loading

seanshahkarami commented Sep 6, 2017 •

edited

Loading

seanshahkarami commented Sep 28, 2017 •

edited

Loading