-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
potential db compression settings? #665
Comments
I guess I'm not clear is MMW / ODM2 using timescaledb or influxdb . Influxdb is referenced in the architecture diagram outlined 4yrs ago https://github.com/ODM2/ODM2DataSharingPortal/blob/main/doc/ArchitectureDiagram/Data%20Sharing%20Portal%20Architecture%20with%20Logos%20-%20Copy.png |
@neilh10, we migrated from InfluxDB to the PosgreSQL Timescale extension (i.e. TimescaleDB) with our v0.12 release in December 2021. We've unfortunately gotten behind on updating all of our documentation to reflect cumulative changes since then (outside of our release notes). To answer the original question, we're compressing old data into 90-day chunks, that are auto-created every 90 days, with a 1-month buffer. So there is always at least 1 month of uncompressed data. Regardless, the server-load required to insert historical data should get substantially improved when we implement #674 with our planned v0.18 Milestone. |
Closing to track work on this under: |
@aufdenkampe thanks for the update. I'm just wondering then what is the impact when a POST is made with a time stamp that is in a range that is compressed. |
The effect of a post of data into a compressed time chunk is that the server needs to use substantially more resources to insert the data, because the entire chunk needs to get decompressed, appended, then compressed again. This is only for updates. Reading from a compressed chuck is fast. #688 does alleviate the problem because it spreads out the work load. We don't mind our server doing the work. The issue is just getting too many posts at once leads to the server getting overloaded in the minute when they all arrive. We have a ton of time where the server CPU is idle. The point of SQS is spread the work into that idle time. |
Its likely systems in the field stop transmitting readings and then to reconnect at a later date and upload stored readings. #485
It would be optimal for there to be a model for what historical timeframe might be supported.
I'm just wondering what are the current internal settings fort the timescaledb compression are, if any.
https://www.timescale.com/features/compression
Examples are
#658 - had three systems interrupt Mar 12, and reconnected May 23rd and and when restored had three systems upload data.
Salmon Creek - that had its solar power cable pulled May 15, and restored June 27th and on reconnection transmitted data up https://monitormywatershed.org/sites/TUCA_Sa01/
Mill Creek installed June 12, and stopped a couple of days later https://monitormywatershed.org/sites/TUCA_Mi06/
or very remote Navarro Creek that stopped connecting June 2 https://monitormywatershed.org/sites/TUCA-Na13/
As I understand it, the timescaledb will compress data based on a historical timeframe , and then later if there is a POST to that period it will un-compress the rows associated with the device, then insert the new record.
Its likely that the un-compression will take time, and could exceed the 10second timeout that the device will wait for an http 201.
In that case the device will retry on the next connection, and continue retrying till a http 201 is received.
Then I'm guessing then on some schedule, timedb will recompress the data.
The text was updated successfully, but these errors were encountered: