-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Data management update to support SUA ifaces for Homogen OneDAL tables #2045
ENH: Data management update to support SUA ifaces for Homogen OneDAL tables #2045
Conversation
checking to_table and from_table for dpnp/dpctl
Wrong inputs check TBD
without refactoring
/intelci: run |
/intelci: run |
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will continue my review at 8:15.
onedal/datatypes/_data_conversion.py
Outdated
# of the compute-follows-data execution. | ||
# Host tables first converted into numpy.narrays and then to array from xp | ||
# namespace. | ||
return xp.asarray( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will open the possibility that we will store non-numpy arrays as stored estimator attributes (we can store USM, not array api compliant values). This is going to complicate things when a computation falls back to sklearn in a fit step, the results could be of a completely different type. How will this be handled with fit attributes which fallback, do we iterate on and convert all instance attributes? At minimum please make a ticket into an investigation into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is shouldn't be a follow up ticket of this PR. I don't think that without any input it is needed to create any ticket.
Potentially this could be raised on DBSCAN via Array API enabling with this PR changes integration, since stock scikit-learn doesn't support non-numpy array inputs.
) | ||
@pytest.mark.parametrize("order", ["F", "C"]) | ||
@pytest.mark.parametrize("data_shape", data_shapes) | ||
@pytest.mark.parametrize("dtype", [np.float32, np.float64]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You aren't using dtype in the test, even though you've modified the kfold template for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 8eca500
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional comments about heterogeneous inputs and failures, as well as a question about std::decay_t use (as I thought this only was to handle arrays/functions).
onedal/datatypes/_data_conversion.py
Outdated
and sycl_queue.sycl_device.is_cpu | ||
and table.__sycl_usm_array_interface__["syclobj"] is None | ||
): | ||
# oneDAL returns tables with None sycl queue for CPU sycl queue inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment, it helped a lot. I guess is this a problem in oneDAL? I guess if someone was insistent on trying to use the same queue throughout, this would cause them problems as this would generate a new CPU queue. Not a blocker, just something for us to think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there no new queue will be created, the queue will be used to copy host data to the usm allocation
@icfaust why we are talking about on potential support of hetrogenous tables of oneDal here? And how this should be covered by the scope of the PR ENH: Data management update to support SUA ifaces for Homogen OneDAL tables ??? I will address some minor refactoring comments, that shouldn't affect perf, but still think this is good to go. |
Can you guarantee that a non-sliced homogeneous USM nd_array will not be encountered by to_table at this moment? Maybe an offline discussion is warranted. |
@icfaust It is planned that all these implenmetation will be used after array validation. Currently we have primitives for numpy ndarrays, on scikit-learn side supported such checks for Array API inputs, that will be used for usm_ndarrays (dpctl). On #2096 already checked some stock scikit-learn's array api primitievs, and there also will be added check_array or validate_array for dpctl array api inputs. These changes for |
following uxlfoundation#1936
check sua_iface interop if_no_dpc_backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to merge.
…tables (uxlfoundation#2045) * ENH: Data management update to support SUA ifaces * TEST: added memory leak tests * TEST: enabled tests for the dpnp/dpctl inputs * Support of __sycl_usm_array_interface__ implemented without dpctl as a build dependency. * Covered by tests: * refactored onedal/utils/_array_api.py file. * updated take primitive in sklearnex/tests/test_memory_usage.py. * Documented code
Description
Data management update. Update
to_table
andfrom_table
primitives to support SUA ifaces with zero-copy.Detailed
Major updates
__sycl_usm_array_interface__
implemented without dpctl as a build dependency.__sycl_usm_array_interface__
protocol.__sycl_usm_array_interface__
protocol for the onedal tables.to_table
uses input's__sycl_usm_array_interface__
attr to create a onedal table on the devicefrom_table
add to the table__sycl_usm_array_interface__
attr to follow sycl usm zero-copy conversion to the required dpnp.ndarray or dpctl.tensor.usm_ndarray datatype.__sycl_usm_array_interface__["syclobj"]==None
. Requires of fixing the flow on backend OneDAL lib first. Ticket is created.datatypes
module: moved helpers and common srcs intodatatypes.utils
onedal/datatypes/tests/test_data.py::test_input_sua_iface_zero_copy
onedal/datatypes/tests/test_data.py::test_table_conversions
sklearnex/tests/test_memory_usage.py::test_table_conversions_memory_leaks
to check mem leaks with table conversions__sycl_usm_array_interface__
or SUA tensorsMinor
onedal/utils/_array_api.py
file.take
primitive insklearnex/tests/test_memory_usage.py
.Benchmark numbers are shared.
Test statistics
New test cases 164++ .
Documentation:
onedal4py
table helpers #2139Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
Performance