Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server-side performance issue with small I/O #190

Open
wangvsa opened this issue Mar 15, 2024 · 0 comments
Open

Server-side performance issue with small I/O #190

wangvsa opened this issue Mar 15, 2024 · 0 comments
Assignees
Labels
priority: medium Medium priority type: new feature Request for new feature

Comments

@wangvsa
Copy link
Collaborator

wangvsa commented Mar 15, 2024

This issue documents the performance challenges observed during the Montage experiments. The high server-side overhead encountered when processing small I/O requests needs to be mitigated to make PDC beneficial for Montage or AI applications.

What does this feature solve or improve?

The Montage results suggest that the server may not be operating at its peak efficiency. In scenarios with numerous concurrent requests, the server could potentially become a bottleneck. We may observe similar I/O patterns from AI applications as well.

Describe the solution you'd like

Server side algorithm for processing I/O requests can be improved.
Server-side multi-threading should also be able to improve the efficiency.

Montage results on Perlmutter

The Montage components execute a large number of small reads and writes. Within the tested workflow, each I/O operation amounts to approximately 3000 bytes.
The performance of PDC, with or without cache, remains similar, indicating that the majority of the time was consumed by server processing
image

I did some further investigations on one component, mProjExecMPI. This component executes N small writes, followed by one read, and then another M writes. I implemented optimizations, including utilizing session consistency and combining all writes into one batched call. However, the performance remains suboptimal. Especially, the single read operation takes 2 seconds, suggesting it was awaiting processing on the server side.

image
@wangvsa wangvsa added the type: new feature Request for new feature label Mar 15, 2024
@houjun houjun self-assigned this Mar 18, 2024
@jeanbez jeanbez added the priority: medium Medium priority label Jun 4, 2024
This was referenced Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: medium Medium priority type: new feature Request for new feature
Projects
None yet
Development

No branches or pull requests

3 participants