ShiftHandler - README

This is a implementation of the paper Modeling Shifting Workloads for Learned Database Systems (SIGMOD 2024):

@article{wu2024modeling,
  title={Modeling Shifting Workloads for Learned Database Systems},
  author={Wu, Peizhi and Ives, Zachary G},
  journal={Proceedings of the ACM on Management of Data},
  volume={2},
  number={1},
  pages={1--27},
  year={2024},
  publisher={ACM New York, NY, USA}
}

Overview

This project focuses on the use of Replay Buffer to handle workload shifts in query-driven learned database systems.

Replay Buffer Strategies: The following replay buffers are used in the project:

All: Uses all available queries for replay.
Random Sampling (RS): Randomly samples queries from the workload for replay.
Latest: Uses the latest k queries for replay.
ShiftHandler w/ CBP: Tries to maintain class balances in the replay buffer.
ShiftHandler w/ LWP: Prioritizes queries based on calculated loss, aiming to emphasize hard examples.

File Structure

ShiftHandler: Directory containing the code for ShiftHandler.
card_exp: Directory containing the code for experiment of cardinality estimation.
cost_exp: Directory containing the code for experiment of cost estimation.

How to run the experiment of cardinality estimation

Please enter this folder.
Downloads all files in https://github.com/andreaskipf/learnedcardinalities/tree/master/data, and place them in the ./data directory.

Run the following command:

 python run_card_exp.py --buffersize 300 --numtrain 1000 --numtest 100

Important configuration options:
- --buffersize: Size of the replay buffer.
- --batch: Batch size for each training step.
- --queries: Total number of queries to use for training.
- --numtrain: Number of training queries per template/task.
- --numtest: Number of test queries per template/task.
- --imbalance: Include this flag if there is class imbalance.
Analyzing Results

To ensure a fair comparison, the experiment will be conducted using 10 different random seeds.
The overall results will be saved in a .txt file (e.g., card_result_imb_False.txt) in JSON format for each replay buffer approach.

How to run the experiment of cost estimation

Please enter this folder.
You will need to download the query plan file (plans.txt) from this link, and place it into the folder.
Run the following command:
```
 python run_cost_exp.py --buffersize 50
```
Important configuration options:
- --buffersize: Size of the replay buffer.
- --batch: Batch size for each training step.
- --imbalance: Include this flag if there is class imbalance.
Analyzing Results

To ensure a fair comparison, the experiment will also be conducted using 10 different random seeds.
The overall results will be saved in a .txt file (e.g., cost_result_imb_False.txt) in JSON format for each replay buffer approach.

Contact

If you have any questions, feel free to contact me through email ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ShiftHandler		ShiftHandler
card_exp		card_exp
cost_exp		cost_exp
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ShiftHandler - README

Overview

File Structure

How to run the experiment of cardinality estimation

How to run the experiment of cost estimation

Contact

About

Releases

Packages

Languages

License

pagegitss/ShiftHandler

Folders and files

Latest commit

History

Repository files navigation

ShiftHandler - README

Overview

File Structure

How to run the experiment of cardinality estimation

How to run the experiment of cost estimation

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages