You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 17, 2022. It is now read-only.
Having a clear idea of what kind of queries and reports we'd like to extract from our databases is crucial to knowing how to organizing them. This impacts a number of things I'll add in the comments.
The text was updated successfully, but these errors were encountered:
Which data stores do we need? Cassandra? MySQL? Elasticsearch? In particular, do we even need MySQL if the other two can cover all our use cases? You could imagine using Cassandra as our data and configuration warehouse and Elasticsearch providing all the searchability and analytics.
How do we organize Cassandra tables? Cassandra is very sensitive to how you choose your partition / primary keys, particularly since there's not really a good concept of joins or building additional indices. This often means you need to design a table for a particular query, even if it means duplicating data.
Here's a concrete example: Suppose we want to support both bulk (daily) data pulls and efficient viewing into the last 72 hours of data from a particular node.
We may keep a table partitioned for each node-id+date, as we are now. In addition, we can create a per-node "rolling window" table of recent data partitioned by node-id in a "time sliceable" way and where entries have a TTL of 72 hours. Then, our loader just inserts a copy of the data into both.
Having a clear idea of what kind of queries and reports we'd like to extract from our databases is crucial to knowing how to organizing them. This impacts a number of things I'll add in the comments.
The text was updated successfully, but these errors were encountered: