Adding docs to delta_bench (#214)

* Updating docs in delta_bench * Adding allow dead code for assert_almost_equal function
blackportal-ai · Jan 21, 2025 · b877fe3 · b877fe3
1 parent 399afcf
commit b877fe3
Show file tree

Hide file tree

Showing 3 changed files with 97 additions and 7 deletions.
diff --git a/delta/src/optimizers/mod.rs b/delta/src/optimizers/mod.rs
@@ -37,16 +37,16 @@ pub mod rms_prop;
 pub mod sgd;
 pub mod sgd_momentum;
 
-use std::fmt::Debug;
-use ndarray::ArrayD;
 pub use ada_delta::AdaDelta;
 pub use ada_grad::AdaGrad;
 pub use adam::Adam;
 pub use gradient_descent::GradientDescent;
 pub use mini_batch_gd::MiniBatchGD;
+use ndarray::ArrayD;
 pub use rms_prop::RMSProp;
 pub use sgd::SGD;
 pub use sgd_momentum::SGDWithMomentum;
+use std::fmt::Debug;
 
 use crate::common::Tensor;
 use crate::devices::Device;
@@ -88,9 +88,10 @@ pub struct OptimizerConfig {
 /// # Panics
 ///
 /// Panics if the conversion of `actual` to a slice fails or if any element in `actual` differs from the corresponding element in `expected` by more than `tolerance`.
+#[allow(dead_code)]
 fn assert_almost_equal(actual: &ArrayD<f32>, expected: &[f32], tolerance: f32) {
     let actual_slice = actual.as_slice().expect("Failed to convert ArrayD to slice");
     for (a, e) in actual_slice.iter().zip(expected.iter()) {
         assert!((a - e).abs() < tolerance, "Expected: {:?}, Actual: {:?}", e, a);
     }
-}
+}
diff --git a/delta_bench/README.md b/delta_bench/README.md
@@ -1,3 +1,92 @@
 # Delta Bench
 
-To run the various bencharks, run `cargo bench`
+To run the various benchmarks for Delta, execute `cargo bench`. However, it’s important to note that the current benchmarks are more of a template to
+show the structure of how we should approach benchmarking, rather than an optimized or effective test of our code.
+
+There are major improvement opportunities to be discovered and acted upon, and this guide is designed to help you properly evaluate Delta’s performance.
+
+## Current State of the Benchmarks
+
+The existing benchmarks are in their infancy and are structured as starting points. They are not ideal representations of real-world workloads,
+and they don’t fully capture the performance characteristics of the system. Our goal is to evolve them into effective tools for evaluating realistic performance.
+
+**Currently, the benchmarks:**
+
+-	Use small synthetic data.
+- Lack measures for system resource utilization (e.g., memory, CPU).
+- Do not simulate production-like loads or variability in workloads.
+- Focus on individual operations but don’t consider the broader context of performance goals such as throughput, latency, and scalability.
+
+While these are useful for early-stage development, they need to be expanded and refined to make Delta ready for real-world use cases and heavy workloads.
+
+## Benchmarking Guidelines
+
+### 1. Define Clear Performance Goals
+
+Before you dive into improving the benchmarks, start by defining what you are measuring. Are you focused on:
+
+- **Throughput:** The rate at which the system performs operations (e.g., SGD updates per second).
+- **Latency:** The time it takes to complete a single operation (e.g., time to process one batch of data).
+- **Percentile Performance:** Measuring performance at different percentiles (e.g., ensuring the 90th or 95th percentile of operations meet latency goals).
+- **Worst-case Performance:** Evaluating the system under extreme load conditions (e.g., the maximum number of operations before performance degrades unacceptably).
+
+These performance goals will shape how the benchmarks are structured and how the data is measured. Without clear goals, the benchmarks will not provide actionable insights.
+
+### 2. Simulate Realistic Workloads
+
+The key to effective benchmarking is ensuring that the workload closely reflects how Delta will be used in production. This means:
+
+- **Data Size & Shape:** Ensure the data you’re using in the benchmarks is representative of what the model will actually encounter. For example, training data sizes, tensor shapes, and complexities in your models should mirror real-world scenarios.
+- **Workload Variability:** The current benchmarks assume static and predictable loads. To improve this, we need to simulate dynamic loads, where the data and training conditions vary in ways that resemble real-world applications.
+- **Longer Runs:** Instead of short and quick runs, consider longer benchmark tests to observe performance over time, especially if memory management or other resources need to be monitored under prolonged load.
+
+### 3. Measure System Resource Utilization
+
+Benchmarking isn’t just about how fast an operation runs—it’s about how efficiently resources (like CPU and memory) are used.
+To get a more comprehensive view of performance, incorporate the following:
+
+- **CPU/GPU Usage:** How much CPU/GPU is consumed during operations, especially during parallel training steps or large tensor operations.
+- **Memory Usage:** Monitor memory usage as your tensors grow or as the optimizer works on larger datasets. Memory bottlenecks can have significant performance impacts that are not immediately visible with raw execution time.
+- **Disk I/O:** If applicable, measure how often the system performs disk reads/writes (especially if data loading is a bottleneck).
+
+### 4. Incorporate Parallelism and Concurrency
+
+Delta is designed to benefit from parallelism and concurrency, so it’s crucial that benchmarks reflect this aspect of the framework. Some recommendations:
+
+- **Multi-core Utilization:** Ensure your benchmarks take advantage of multi-threading, especially for operations like matrix multiplication, gradient updates, etc.
+- **Parallel Optimizer Steps:** Test how well Delta can perform optimizer steps concurrently on multiple data batches or models.
+
+### 5. Realistic Load Testing
+
+Arguably the hardest but most important aspect of benchmarking is testing under realistic load conditions. Here’s how to approach this:
+
+- **Record and Replay:** Ideally, we want to capture real production data and replay it under various conditions. This will help simulate realistic performance scenarios without needing to constantly gather new data.
+- **Simulate Production Stress:** Add load to the system progressively until it reaches realistic stress points. This will help us understand how Delta scales and where bottlenecks appear.
+- **Vary the Load:** Rather than running the system at a constant rate, introduce dynamic load variations—some data points may be heavier than others, or the optimizer might be tested on more complex datasets.
+
+### 6. Refining the Benchmark Structure
+
+While the existing benchmarks are a good starting point, they are limited in scope. Here are some improvements to consider:
+
+- **Workload Diversity:** Add more test cases that reflect different scenarios (e.g., small and large datasets, different tensor shapes, multi-model optimizations).
+- **Latency & Throughput Metrics:** Capture latency for individual operations and throughput for batches of operations, and analyze how these metrics behave as the system is loaded.
+- **Error Handling:** Include tests for how well the system handles edge cases (e.g., memory exhaustion, data corruption, interrupted operations).
+
+### 7. Iterative Refinement and Continuous Monitoring
+
+Finally, remember that benchmarking is an iterative process. As we add features and refine the system, it’s important to revisit benchmarks and update them to reflect any changes in system behavior or performance goals.
+
+- **Continuous Monitoring:** Once the improved benchmarks are in place, use them as a part of continuous integration (CI) pipelines to monitor ongoing performance.
+- **Refinement Over Time:** As we gather more data and insights, continually refine the benchmarks to better reflect real-world conditions.
+
+## Next Steps
+
+The current state of the Delta benchmarks is a starting point. However, there are significant improvements to be made. By following the guidelines above, we can create meaningful, real-world benchmarks that provide insight into the true performance of our system.
+
+If you’re contributing to the Delta project, start by enhancing the existing benchmarks with these principles in mind. Focus on making the benchmarks:
+
+- More representative of real-world data and loads.
+- More comprehensive, measuring not just speed but also resource usage and system behavior under stress.
+- More scalable, testing how well Delta performs as it grows.
+
+Let’s make sure that our benchmarks not only test performance but give us valuable insights into how Delta behaves in production-like conditions.
diff --git a/delta_bench/src/optimizers/sgd_momentum_benchmark.rs b/delta_bench/src/optimizers/sgd_momentum_benchmark.rs
@@ -1,7 +1,7 @@
 use criterion::{Criterion, black_box, criterion_group, criterion_main};
 use deltaml::common::Tensor;
 use deltaml::common::ndarray::{Dimension, IxDyn, Shape};
-use deltaml::optimizers::{SGDWithMomentum, Optimizer};
+use deltaml::optimizers::{Optimizer, SGDWithMomentum};
 use rand::Rng;
 
 #[allow(dead_code)]
@@ -39,7 +39,7 @@ fn benchmark_sgd_with_momentum_optimizer_large(c: &mut Criterion) {
     let gradients = Tensor::new(black_box(gradients_data.clone()), Shape::from(dims.clone()));
 
     let mut group = c.benchmark_group("SGDWithMomentumOptimizer");
-    group.measurement_time(std::time::Duration::new(10, 0));
+    group.measurement_time(std::time::Duration::new(40, 0));
     group.sample_size(40);
 
     group.bench_function("sgd_with_momentum_optimizer_large", |b| {
@@ -64,4 +64,4 @@ criterion_group!(
     benchmark_sgd_with_momentum_optimizer_small,
     benchmark_sgd_with_momentum_optimizer_large
 );
-criterion_main!(benches);
+criterion_main!(benches);