Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dynamic Selection] Adding sycl profiling for auto tune policy #1464

Merged
merged 31 commits into from
Apr 30, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
9417ddb
Added static assert for keyArgs==Args
AnuyaWelling2801 Jan 9, 2024
7e54e28
WIP for jit compilation
AnuyaWelling2801 Jan 10, 2024
268f0d5
Fixing Autotune bug
AnuyaWelling2801 Jan 17, 2024
0dda712
Added sycl profiling without host task
AnuyaWelling2801 Mar 21, 2024
a38327a
Corrections in dynamic traits
AnuyaWelling2801 Mar 22, 2024
70f993c
Remove comments in dynamic_selection_traits
AnuyaWelling2801 Mar 22, 2024
59fa9a8
Merged main
AnuyaWelling2801 Mar 22, 2024
69f5014
Adding back jit compolation restrictions
AnuyaWelling2801 Mar 22, 2024
757c470
Adressing comments to add thread safety, better traits
AnuyaWelling2801 Mar 27, 2024
37a895a
Added backend trait to check if profiling is enabled
AnuyaWelling2801 Mar 27, 2024
d9fe635
Backend traits, renaming and adding a lock to lazy report
AnuyaWelling2801 Apr 4, 2024
95ee3ac
Fixed memory leaks
AnuyaWelling2801 Apr 9, 2024
cd5b318
No nullptr for selection handle in async waiter constructor
AnuyaWelling2801 Apr 15, 2024
e58f264
Added an erase and remove_if
AnuyaWelling2801 Apr 15, 2024
e134770
Adressing comments for profiling report, Adding profiling to default …
AnuyaWelling2801 Apr 24, 2024
6ce3b5e
Changes to std::chrono::duration
AnuyaWelling2801 Apr 25, 2024
a8ad96b
Addressed comments for sycl backend
AnuyaWelling2801 Apr 25, 2024
b70aeec
Addressed comments to sycl backend
AnuyaWelling2801 Apr 26, 2024
60710e2
Fix USM memory leaks and create unique task names (#1537)
SergeyKopienko Apr 26, 2024
1e76278
Fix the format of report method - using `report_duration` in second p…
SergeyKopienko Apr 26, 2024
c217e1a
Fixing clang format
AnuyaWelling2801 Apr 26, 2024
75dd226
Adding header file for backend_traits
AnuyaWelling2801 Apr 26, 2024
bc95cd2
Changed structure for sycl_backend submit
AnuyaWelling2801 Apr 26, 2024
19e78e7
Fixed clang format
AnuyaWelling2801 Apr 26, 2024
3d3dab2
Changes to make variables is_profiling_enabled and aliases for report…
AnuyaWelling2801 Apr 29, 2024
8009669
Moved is_profiling_enabled outside the async_waiter class
AnuyaWelling2801 Apr 29, 2024
ae46b12
Making sycl backend submit function more readable
AnuyaWelling2801 Apr 29, 2024
0cf0daa
Fixed clang format
AnuyaWelling2801 Apr 29, 2024
5028e63
Simplify sycl_backend::submit(SelectionHandle s, Function&& f, Args&&…
SergeyKopienko Apr 30, 2024
aa07b29
Minor changes to auto_tune_policy.h and sycl_backend.h
AnuyaWelling2801 Apr 30, 2024
797ea84
Improving readability
AnuyaWelling2801 Apr 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
#include <tuple>
#include <unordered_map>
#include "oneapi/dpl/internal/dynamic_selection_traits.h"
#include "oneapi/dpl/internal/dynamic_selection_impl/backend_traits.h"
#if _DS_BACKEND_SYCL != 0
# include "oneapi/dpl/internal/dynamic_selection_impl/sycl_backend.h"
#endif
Expand Down Expand Up @@ -217,6 +218,11 @@ class auto_tune_policy
select(Function&& f, Args&&... args)
{
static_assert(sizeof...(KeyArgs) == sizeof...(Args));
if constexpr(backend_traits::lazy_report_v<Backend> && backend_traits::enable_profiling_v<Backend>){
if(backend_->has_enable_profiling == true){
backend_->lazy_report();
}
}
akukanov marked this conversation as resolved.
Show resolved Hide resolved
if (state_)
{
std::lock_guard<std::mutex> l(state_->m_);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
// -*- C++ -*-
//===----------------------------------------------------------------------===//
//
// Copyright (C) 2023 Intel Corporation
//
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef _ONEDPL_INTERNAL_BACKEND_TRAITS_H
#define _ONEDPL_INTERNAL_BACKEND_TRAITS_H

#include "oneapi/dpl/internal/dynamic_selection_traits.h"
akukanov marked this conversation as resolved.
Show resolved Hide resolved

namespace oneapi
{
namespace dpl
{
namespace experimental
{
namespace internal
{
template <typename Backend>
auto
has_lazy_report_impl(...) -> std::false_type;

template <typename Backend>
auto
has_lazy_report_impl(int) -> decltype(std::declval<Backend>().lazy_report(), std::true_type{});

template <typename Backend>
struct has_lazy_report : decltype(has_lazy_report_impl<Backend>(0))
{
};

template <typename Backend, typename U = bool>
struct has_enable_profiling : std::false_type { };

// Specialization for U = int
template <typename Backend>
struct has_enable_profiling <Backend, decltype((void) Backend::has_enable_profiling, true)> : std::true_type { };
}//namespace internal

namespace backend_traits {
template <typename S>
struct lazy_report_value
{
static constexpr bool value = ::oneapi::dpl::experimental::internal::has_lazy_report<S>::value;
};
template <typename S>
inline constexpr bool lazy_report_v = lazy_report_value<S>::value;
akukanov marked this conversation as resolved.
Show resolved Hide resolved
akukanov marked this conversation as resolved.
Show resolved Hide resolved

template <typename S>
struct enable_profiling_value
{
static constexpr bool value = ::oneapi::dpl::experimental::internal::has_enable_profiling<S>::value;
};
template <typename S>
inline constexpr bool enable_profiling_v = enable_profiling_value<S>::value;
akukanov marked this conversation as resolved.
Show resolved Hide resolved
} //namespace backend_traits

} // namespace experimental
} // namespace dpl
} // namespace oneapi

#endif /*_ONEDPL_INTERNAL_BACKEND_TRAITS_H*/
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,11 @@ struct dynamic_load_policy
selection_type
select(Args&&...)
{
if constexpr(backend_traits::lazy_report_v<Backend> && backend_traits::enable_profiling_v<Backend>){
if(backend_->has_enable_profiling == true){
backend_->lazy_report();
}
}
if (state_)
{
std::lock_guard<std::mutex> l(state_->m_);
Expand Down
111 changes: 95 additions & 16 deletions include/oneapi/dpl/internal/dynamic_selection_impl/sycl_backend.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,25 +35,79 @@ class sycl_backend
using execution_resource_t = resource_type;
using resource_container_t = std::vector<execution_resource_t>;

bool has_enable_profiling = false;
akukanov marked this conversation as resolved.
Show resolved Hide resolved
private:
class async_waiter
class async_waiter_base{
SergeyKopienko marked this conversation as resolved.
Show resolved Hide resolved
public:
virtual void wait() = 0;
akukanov marked this conversation as resolved.
Show resolved Hide resolved
virtual void report() = 0;
virtual bool is_complete() = 0;
};

template<typename Selection>
class async_waiter : public async_waiter_base
{
sycl::event e_;

Selection* s;
public:
async_waiter(const sycl::event& e) : e_(e) {}
async_waiter(sycl::event e) : e_(e){}
async_waiter(sycl::event e, Selection* selection) : e_(e), s(selection) {}

sycl::event
unwrap()
{
return e_;
}

void
wait()
wait() override
{
e_.wait();
}

void
report() override{
if constexpr (report_value_v<Selection, execution_info::task_time_t>){
cl_ulong time_start = e_.template get_profiling_info<sycl::info::event_profiling::command_start>();
cl_ulong time_end = e_.template get_profiling_info<sycl::info::event_profiling::command_end>();
s->report(execution_info::task_time, time_end-time_start);
}

}
akukanov marked this conversation as resolved.
Show resolved Hide resolved

bool
is_complete() override{
return e_.get_info<sycl::info::event::command_execution_status>() == sycl::info::event_command_status::complete;
}

};

struct async_waiter_list_t{

std::mutex m_;
std::vector<async_waiter_base*> async_waiters;
SergeyKopienko marked this conversation as resolved.
Show resolved Hide resolved

template<typename T>
akukanov marked this conversation as resolved.
Show resolved Hide resolved
void add_waiter(T *t){
std::lock_guard<std::mutex> l(m_);
async_waiters.push_back(t);
}

void lazy_report(){
std::lock_guard<std::mutex> l(m_);
int size = async_waiters.size();
for(auto i = async_waiters.begin(); i!=async_waiters.begin()+size; i++){
if((*i)->is_complete()){
(*i)->report();
async_waiters.erase(i);
SergeyKopienko marked this conversation as resolved.
Show resolved Hide resolved
SergeyKopienko marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
};

async_waiter_list_t async_waiter_list;


class submission_group
{
resource_container_t resources_;
Expand Down Expand Up @@ -89,6 +143,9 @@ class sycl_backend
for (auto e : v)
{
global_rank_.push_back(e);
if(e.template has_property<sycl::property::queue::enable_profiling>()){
has_enable_profiling = true;
}
akukanov marked this conversation as resolved.
Show resolved Hide resolved
}
sgroup_ptr_ = std::make_unique<submission_group>(global_rank_);
}
Expand All @@ -106,27 +163,45 @@ class sycl_backend
report_value_v<SelectionHandle, execution_info::task_time_t>)
{
std::chrono::steady_clock::time_point t0;
bool use_event_profiling = q.template has_property<sycl::property::queue::enable_profiling>();
if constexpr (report_value_v<SelectionHandle, execution_info::task_time_t>)
{
t0 = std::chrono::steady_clock::now();
if (!use_event_profiling)
{
t0 = std::chrono::steady_clock::now();
}
}
auto e1 = f(q, std::forward<Args>(args)...);
auto e2 = q.submit([=](sycl::handler& h) {
h.depends_on(e1);
h.host_task([=]() {
if constexpr (report_value_v<SelectionHandle, execution_info::task_time_t>)
s.report(execution_info::task_time, (std::chrono::steady_clock::now() - t0).count());
if constexpr (report_info_v<SelectionHandle, execution_info::task_completion_t>)
{
if constexpr(report_info_v<SelectionHandle, execution_info::task_completion_t>){
auto e2 = q.submit([=](sycl::handler& h){
h.depends_on(e1);
h.host_task([=](){
s.report(execution_info::task_completion);
}
});
});
});
return async_waiter{e2};
return async_waiter{e2, new SelectionHandle(s)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we don't currently have a policy that requires reporting of both task completion and time, but this implementation would not support it because of this return. If we assumed they were mutually exclusive, the next condition would be an "else if", but its just an "if" so that both might be supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
else if constexpr(report_value_v<SelectionHandle, execution_info::task_time_t>){
akukanov marked this conversation as resolved.
Show resolved Hide resolved
if (use_event_profiling)
{
auto waiter = async_waiter{e1, new SelectionHandle(s)};
async_waiter_list.add_waiter(new async_waiter(waiter));
SergeyKopienko marked this conversation as resolved.
Show resolved Hide resolved
return waiter;
}
else{
auto e2 = q.submit([=](sycl::handler& h){
h.depends_on(e1);
h.host_task([=](){
s.report(execution_info::task_time, (std::chrono::steady_clock::now() - t0).count());
});
});
return async_waiter{e2, new SelectionHandle(s)};
}
}
}
else
{
return async_waiter{f(unwrap(s), std::forward<Args>(args)...)};
return async_waiter{f(unwrap(s), std::forward<Args>(args)...), new SelectionHandle(s)};
}
}

Expand All @@ -142,6 +217,10 @@ class sycl_backend
return global_rank_;
}

void lazy_report(){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not thread safe. You might have concurrent accesses to the async_waiter array that change its size -- calls to lazy_report and submit (which calls add_waiter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Created a new structure async_wait_arr.

async_waiter_list.lazy_report();
}

private:
resource_container_t global_rank_;
std::unique_ptr<submission_group> sgroup_ptr_;
Expand Down
Loading
Loading