[FEATURE] Rss partition sliced store #2086

maobaolong · 2024-08-26T03:46:52Z

Code of Conduct

I agree to follow this project's Code of Conduct

Search before asking

I have searched in the issues and found no similar issues.

Describe the feature

RSS store partition divided by slice, default to 10GB.

Motivation

Avoid a single extremely huge partition waste whole node/disk io.

Describe the solution

RSS store partition divided by slice, default to 10GB, then a 21G partition can stored to 3 slice.

Additional context

https://docs.google.com/document/d/1R9LcPIkmWml0aD3rQbhKgO9qWBSE9NknRCyhljC09Uw/edit?usp=sharing

Are you willing to submit PR?

Yes I am willing to submit a PR!

The text was updated successfully, but these errors were encountered:

…ltiply server (#2093) ### What changes were proposed in this pull request? Support sliced store partition to multiply server. Limitation: - Only finished tested the netty mode. ### Why are the changes needed? Fix: #2086 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Start multiply servers and coordinator on local - Start a spark standalone env on local - Start spark-shell and execute `test.scala` ```Console bin/spark-shell --master spark://localhost:7077 --deploy-mode client --conf spark.rss.client.reassign.blockRetryMaxTimes=3 --conf spark.rss.writer.buffer.spill.size=30 --conf spark.rss.client.reassign.enabled=true --conf spark.shuffle.manager=org.apache.spark.shuffle.RssShuffleManager --conf spark.rss.coordinator.quorum=localhost:19999 --conf spark.rss.storage.type=LOCALFILE --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.rss.test.mode.enable=true --conf spark.rss.client.type=GRPC_NETTY --conf spark.sql.shuffle.partitions=1 -i test.scala ``` - test.scala ```scala val data = sc.parallelize(Seq(("A", 1), ("B", 2), ("C", 3), ("A", 4), ("B", 5), ("A", 6), ("A", 7),("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7))); val result = data.reduceByKey(_ + _); result.collect().foreach(println); System.exit(0); ``` <img width="410" alt="image" src="https://github.com/user-attachments/assets/7c72fa3e-cfb5-4361-9875-a82b6aeeedfb">

maobaolong mentioned this issue Sep 3, 2024

[#2086] feat(spark): Support cut partition to slices and served by multiply server #2093

Merged

maobaolong closed this as completed in #2093 Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Rss partition sliced store #2086

[FEATURE] Rss partition sliced store #2086

maobaolong commented Aug 26, 2024 •

edited

Loading

[FEATURE] Rss partition sliced store #2086

[FEATURE] Rss partition sliced store #2086

Comments

maobaolong commented Aug 26, 2024 • edited Loading

Code of Conduct

Search before asking

Describe the feature

Motivation

Describe the solution

Additional context

Are you willing to submit PR?

maobaolong commented Aug 26, 2024 •

edited

Loading