You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 17, 2022. It is now read-only.
The main thing blocking this is choosing a reliable storage location. One interesting point is, if we're ready to consider S3 as a possible redundancy location, we automatically get an access controlled, directory-like interface to the data stored there. In other words, we could go ahead and start pointing some people to this for pulling datasets.
After thinking this over last night, I don't think this is right thing to have for the real deployment. It was more of a emergency reaction to how beehive1 is currently deployed. Cassandra is setup using a single node, so you don't get any of Cassandra's resilience guarantees during failure...
Cassandra is designed specifically to use replication and eventual consistency between a cluster nodes. In production, you'd have a number of nodes running in a cluster so you can drop a certain number of nodes at anytime and still continue running without data loss. Building on top of that reliability feature is the right way to go, if we're using Cassandra.
Turns out clustering works beautifully in the example I tried... It only took 10 minutes to setup a 3 node cluster on my own machine and load it with some test data. Using a keyspace with a replication factor of 2, things worked as expected - any single node could be taken completely offline and I still had access to the entire dataset. Just something to think about for production deployment...
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Since we're about to start a lot of work on beehive, we should make sure we have a Cassandra backup process in place.
I went ahead and built a tool to pull datasets, we just need to schedule it and have a place to keep the backups: https://github.com/waggle-sensor/beehive-server/tree/master/data-exporter
The current missing half is having a complementary script to do a restore, but at least we have the raw data available now.
The text was updated successfully, but these errors were encountered: