Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add diagnostic reason for GPU slowness in Profiler tool #1374

Open
3 tasks
cindyyuanjiang opened this issue Oct 4, 2024 · 1 comment
Open
3 tasks
Assignees
Labels
affect-output A change that modifies the output (add/remove/rename files, add/remove/rename columns) core_tools Scope the core module (scala) feature request New feature or request

Comments

@cindyyuanjiang
Copy link
Collaborator

cindyyuanjiang commented Oct 4, 2024

We want to add support for top 1-3 use cases for diagnostic reason for GPU slowness in Profiler tool.

  1. Highest stage time (descending order) contributors -- like spill amount per stage, skew (input /output bytes), shuffle bytes depending on the stage plan, GPU semaphore time, and number of tasks.
  2. SQL Plan buffer time, operator time, number of input and output rows for execs in that stage.
  3. Table sizes (row and column counts) for interesting operators in these long running stages (joins, aggregations etc).

Subtasks

  • Create a stage-level summary view with memory/disk spilled, input/output bytes, shuffle read/write and gpu semaphore info (descending order of stage duration)
  • Create an IO View which highlights CPU and GPU specific SQL/Operator metrics for Scans and Shuffles
  • Create Filtered Operator and SQL Metric View for top 7 stages (highest duration)
@kuhushukla
Copy link
Collaborator

Can we break this down into subtasks?

On a second thought, we should combine the two tables shown as an example here. My original intent was to keep the first view simple but the latter table is not too bad for that

@cindyyuanjiang cindyyuanjiang added the affect-output A change that modifies the output (add/remove/rename files, add/remove/rename columns) label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affect-output A change that modifies the output (add/remove/rename files, add/remove/rename columns) core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants