-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add alternative implementation of device timer to SyclTimer class #1872
Conversation
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_149 ran successfully. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_150 ran successfully. |
52e211c
to
9322201
Compare
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_174 ran successfully. |
9322201
to
ecd1c17
Compare
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_177 ran successfully. |
ecd1c17
to
f4fa901
Compare
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_199 ran successfully. |
SyclTimer now supports device_timer keyword argument, a legacy behavior "queue_barrier", and new one based on sequential order manager, which inserts an empty task into the manager to record start and end of block of timed code. Docstring of SyclTimer updated. All data attributes needed for functioning of the timer are created during class instance construction now.
Check different device_timer values, test argument validation, and test cumulative timing.
f4fa901
to
d1011c5
Compare
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_209 ran successfully. |
Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_210 ran successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR adds Python API to submit empty body single task to a queue.
dpctl.SyclTimer
is modified to acquiredevice_timer
keyword argument with supported values being"queue_barrier"
(legacy behavior, a default), and"order_manager"
.With
"order_manager"
, timer submits the empty body single tasks (fence tasks) to the queue, using order manager to order them so as to fence timed submissions. For example, execution of the following snippet:results in a task graph
[prior_tasks] -> [fence_start_task] -> [ compute_tasks] -> [fence_end_task] -> [subsequent_tasks]
.Timer uses profiling data from events associated with fence tasks to estimate execution time of compute tasks as measured by the device's timer.
The
device_timer="order_manager"
is useful to timedpctl.tensor
operations which leverage order manager.