Skip to content
This repository has been archived by the owner on Feb 17, 2022. It is now read-only.

Layout expected queries and reports we'd like to generate #36

Open
seanshahkarami opened this issue Sep 5, 2017 · 2 comments
Open

Layout expected queries and reports we'd like to generate #36

seanshahkarami opened this issue Sep 5, 2017 · 2 comments

Comments

@seanshahkarami
Copy link
Member

Having a clear idea of what kind of queries and reports we'd like to extract from our databases is crucial to knowing how to organizing them. This impacts a number of things I'll add in the comments.

@seanshahkarami
Copy link
Member Author

Which data stores do we need? Cassandra? MySQL? Elasticsearch? In particular, do we even need MySQL if the other two can cover all our use cases? You could imagine using Cassandra as our data and configuration warehouse and Elasticsearch providing all the searchability and analytics.

@seanshahkarami
Copy link
Member Author

seanshahkarami commented Sep 5, 2017

How do we organize Cassandra tables? Cassandra is very sensitive to how you choose your partition / primary keys, particularly since there's not really a good concept of joins or building additional indices. This often means you need to design a table for a particular query, even if it means duplicating data.

Here's a concrete example: Suppose we want to support both bulk (daily) data pulls and efficient viewing into the last 72 hours of data from a particular node.

We may keep a table partitioned for each node-id+date, as we are now. In addition, we can create a per-node "rolling window" table of recent data partitioned by node-id in a "time sliceable" way and where entries have a TTL of 72 hours. Then, our loader just inserts a copy of the data into both.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant