docs: archive

open-telemetry · Jan 7, 2025 · fc0fd18 · fc0fd18
1 parent 67e8e35
commit fc0fd18
Show file tree

Hide file tree

Showing 2 changed files with 172 additions and 0 deletions.
diff --git a/pkg/stanza/fileconsumer/design/archive.md b/pkg/stanza/fileconsumer/design/archive.md
@@ -0,0 +1,172 @@
+# File archiving
+
+With archiving enabled, file offsets older than three poll cycles are stored on disk rather than being discarded. This feature enabled fileconsumer to remember file for a longer period and also aims to use limited amount of memory. 
+
+## Settings exposed for archiving
+
+1. `polls_to_archive`
+    - This settings control the number of poll cycles to archive (above the in-memory three poll cycle limit).
+    - If you set `polls_to_archive` to 10, then fileconsumer will store upto 10 poll cycles on disk.
+
+
+## How does archiving work?
+
+- We stores the offsets older than three poll cycles on disk. If we use `polls_to_archive: 10`, the on-disk structure looks like following:
+![on-disk](images/on-disk.png)
+    - Once we hit the limit of `polls_to_archive` poll cycles, we roll over and overwrite oldest data. The on-disk structure represents a ring buffer
+    - We retain a total of 13 poll cycles: 3 cycles in memory and 10 cycles on disk.
+
+Basic terminology before we proceed further:
+1. `archiveIndex`: The `archiveIndex` refers to the on-disk position where the next data will be written.
+2. `polls_to_archive`: This refers to number of poll cycles to archive or the maximum size of on-disk ring buffer
+
+### How does reading from archiving work?
+
+During reader creation, we group all the new (or unmatched) files and try to find a match in archive. From high level, it consists of following steps:
+1. We start from most recently written index on archive and load the data from it.
+2. If we don't have any unmatched files, we exit the loop. 
+3. We loop through all the unmatched files and the file's fingerprint is cross referenced against archive'd data.
+    a. If a match is found, we update the offset for the file
+4. We move to next most recent index and continue from step 2. 
+
+Let's take a few examples to understand this:
+
+- Consider the following structure,
+![read-1](images/read-1.png)
+    - Here, we have stored data for previous eight poll cycles (3 poll cycles in memory + 5 on disk)
+    - When we enter the reading mode, we first read data from most recently written index.
+        - The most recently data is stored at `archiveIndex-1` because `archiveIndex` points to the position where the next data will be written.
+    - After evaluating data at this index, we move to the next most recent index.
+    - We continue this process until one of the following conditions is met:
+        - We have no unmatched files left.
+        - We have read through the entire archive.
+        - We encounter an empty value. This can happen if the archive is partially filled
+            - In above diagram, once we reach at the beginning of the archive (i.e. index `0`), we roll over and proceed to the next most recent index. In this case, it is index `9`, which contains no data. 
+- Let's take one more example where we have overwritten older data,
+![read-2](images/read-2.png)
+    - Here, the archive is completely filled and we have rolled over overwriting older data.
+    - `archiveIndex` points to `4` i.e. the least recent data.
+    - We first load the most recent data (i.e. `archiveIndex-1`) and try to match offsets against it.
+    - Once we evaulate data from this index, we move to previous index and we continue this process until read through the entire archive
+
+### How does writing to archive work?
+
+Writing to archive is rather simple:
+
+- At the end of each poll cycle, instead of purging the readers older than 3 cycles, we move that oldest readers to the archive. 
+- We write data to `archiveIndex` and increment the index. Consider the following image:
+![write](images/write.png)
+    - Before the poll cycle, `archiveIndex` is pointed next to `5`.
+    - At the end of each poll cycle, we write the data to `archiveIndex` and increment the index.
+    - After the cycle, the on-disk structure looks like the one on the right.
+
+## Archive restoration
+
+Archive restoration is an important step if the user changes `polls_to_archive` setting. This section explains how changing this setting impacts the underlying disk structure after a collector run. 
+
+There are two cases to consider:
+1. When `polls_to_archive` has increased. In other words, new archive will be larger than older one.
+1. When `polls_to_archive` has decreased. In other words, the archive size has shrunk.
+
+### Case 1: `polls_to_archive` has increased
+This case is straightforward.
+
+Consider following image,
+
+![grown](images/grown-1.png)
+
+The previous archive size was `10` and later it got changed to `15`. We just move the `archiveIndex` to next free slot. In this case, the next available slot is at index `10`.
+
+### Case 2: `polls_to_archive` has decreased
+
+There different sub-cases to consider. 
+
+#### Case 2.1: Most recently written index is in bounds w.r.t. new `polls_to_archive`
+
+*Scenario 1: Most recently written index is in bounds and we have overwritten the data atleast once*
+
+![case-3](images/case-3.png)
+Following configurations are in for this case:
+- previous `polls_to_archive` was `10`
+- new `polls_to_archive` is `7`
+- most recently written index is `4` (pointing to data `14`)
+- `t.archiveIndex` i.e. least recently written index is `6`
+
+Here, we can see that most recently written index (i.e. `4`) is in bounds w.r.t. new `polls_to_archive` (i.e. `7`). In other words, `most recently written index < new polls_to_archive`. 
+
+We now need to construct a new, smaller archive with 7 most recent elements. 
+These elements are (from most recent to least recent):
+
+```14, 13, 12, 11, 10, 9, 8```
+
+We do this in following manner:
+- The elements on left of `archiveIndex` will always be included in the new archive. Hence, we don't touch them.
+- We then take the remaining elements and reconstruct the archive.
+    - The remaining elements are equal to `new polls_to_archive - archiveIndex`.
+    - In above image, there are five elements on the left of `archiveIndex` and we will always include them.
+    - We take two most recent elements from the right side and include them in archive, discarding remaining
+
+Pseudocode:
+```go
+if (storage[archiveIndex] == nil ) {
+    // we'll talk about this condition in scenario 2
+    return
+}
+most_recent_index := (t.archiveIndex-1) % previous_polls_to_archive // index 5 in above image
+least_recent_index := (most_recent_index-new_polls_to_archive) % previous_polls_to_archive // index 8 in above image
+
+for i := 0; i < new_polls_to_archive-archiveIndex; i++ {
+    storage[archiveIndex+i] = storage[least_recent_index] // rewrite on left side of storage
+    least_recent_index++
+}
+// archiveIndex remains unchanged in this case, as it's already pointing at the least recently written data.
+```
+
+*Scenario 2: Most recently written index is in bounds and we have not overwritten the data*
+
+![case-4](images/case-4.png)
+
+Following configurations are in for this case:
+- previous `polls_to_archive` was `10`
+- new `polls_to_archive` is `6`
+- most recently written index is `5` (pointing to data `14`)
+- `t.archiveIndex` i.e. least recently written index is `6`
+
+If the slot pointed by `archiveIndex` is nil, it means that we haven't rolled over and that the next slots are empty and we don't need to perform any swapping. 
+In above pseudocode, the first condition handles this scenario. 
+
+#### Case 2.2: Most recently written index is out of bounds or at bounds w.r.t. new `polls_to_archive`
+
+*Scenario 1: Most recently written index is out of bounds*
+
+![case-2](images/case-2.png)
+
+Following configurations are in for this case:
+- previous `polls_to_archive` was `10`
+- new `polls_to_archive` is `5`
+- most recently writin index is `9`
+- `t.archiveIndex` i.e. least recently written index is `0`
+
+Here, we can see that most recently written index (i.e. `9`) is out of bounds w.r.t. new `polls_to_archive` (i.e. `5`). In other words, `most recently written index > new polls_to_archive`. 
+
+We take five (because new `polls_to_archive` is `5`) most recently written elements and construct a new, smaller archive.
+Pseudocode:
+
+```go
+most_recent_index := (t.archiveIndex-1) % previous_polls_to_archive // index 9 in above image
+least_recent_index := (most_recent_index-new_polls_to_archive) % previous_polls_to_archive // index 4 in above image
+
+for i := 0; i < new_polls_to_archive; i++ {
+    storage[i] = storage[least_recent_index] // rewrite from beginning of storage
+    least_recent_index++
+}
+archiveIndex = 0 // point archiveIndex least recently written data
+```
+
+The new archive is represented by the lower list in the image above.
+
+*Scenario 2: Most recently written index is at the bounds*
+
+![case-1](images/case-1.png)
+
+The pseudocode remains same and same steps are performed. 
diff --git a/pkg/stanza/fileconsumer/design.md → pkg/stanza/fileconsumer/design/design.md b/pkg/stanza/fileconsumer/design.md → pkg/stanza/fileconsumer/design/design.md