Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Rss partition sliced store #2086

Closed
3 tasks done
maobaolong opened this issue Aug 26, 2024 · 0 comments · Fixed by #2093
Closed
3 tasks done

[FEATURE] Rss partition sliced store #2086

maobaolong opened this issue Aug 26, 2024 · 0 comments · Fixed by #2093

Comments

@maobaolong
Copy link
Member

maobaolong commented Aug 26, 2024

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the feature

RSS store partition divided by slice, default to 10GB.

Motivation

Avoid a single extremely huge partition waste whole node/disk io.

Describe the solution

RSS store partition divided by slice, default to 10GB, then a 21G partition can stored to 3 slice.

Additional context

https://docs.google.com/document/d/1R9LcPIkmWml0aD3rQbhKgO9qWBSE9NknRCyhljC09Uw/edit?usp=sharing

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
maobaolong added a commit that referenced this issue Nov 7, 2024
…ltiply server (#2093)

### What changes were proposed in this pull request?

Support sliced store partition to multiply server.

Limitation:

- Only finished tested the netty mode.

### Why are the changes needed?

Fix: #2086

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

- Start multiply servers and coordinator on local
- Start a spark standalone env on local
- Start spark-shell and execute `test.scala`

```Console
bin/spark-shell  --master  spark://localhost:7077  --deploy-mode client --conf spark.rss.client.reassign.blockRetryMaxTimes=3 --conf spark.rss.writer.buffer.spill.size=30 --conf spark.rss.client.reassign.enabled=true  --conf spark.shuffle.manager=org.apache.spark.shuffle.RssShuffleManager --conf spark.rss.coordinator.quorum=localhost:19999  --conf spark.rss.storage.type=LOCALFILE --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.rss.test.mode.enable=true --conf spark.rss.client.type=GRPC_NETTY --conf spark.sql.shuffle.partitions=1  -i test.scala
```

- test.scala
```scala
val data = sc.parallelize(Seq(("A", 1), ("B", 2), ("C", 3), ("A", 4), ("B", 5), ("A", 6), ("A", 7),("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7)));
val result = data.reduceByKey(_ + _);
result.collect().foreach(println);
System.exit(0);
```

<img width="410" alt="image" src="https://github.com/user-attachments/assets/7c72fa3e-cfb5-4361-9875-a82b6aeeedfb">
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant