-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enhancement] add oneDAL finiteness_checker implementation to onedal #2126
[enhancement] add oneDAL finiteness_checker implementation to onedal #2126
Conversation
/azp run CI |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run CI |
Azure Pipelines successfully started running 1 pipeline(s). |
/intelci: run |
onedal/datatypes/table.cpp
Outdated
@@ -72,6 +72,10 @@ ONEDAL_PY_INIT_MODULE(table) { | |||
const auto column_count = t.get_column_count(); | |||
return py::make_tuple(row_count, column_count); | |||
}); | |||
table_obj.def_property_readonly("dtype", [](const table& t){ | |||
// returns a numpy dtype, even if source was not from numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@icfaust Thank you for the clarification! I don't like this extension of table API which is wo for numpy and dpnp/dpctl.
I would suggest to solve the problem at python level https://github.com/intel/scikit-learn-intelex/blob/5bb54a520f5f3d1cf719e149df6526de943e393b/onedal/utils/validation.py#L452 without needing to extend table API
/intelci: run |
/intelci: run |
/intelci: run |
using task_list = types<task::compute>; | ||
auto sub = m.def_submodule("finiteness_checker"); | ||
|
||
#ifndef ONEDAL_DATA_PARALLEL_SPMD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only instantiate if not data parallel spmd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now there isn't an spmd implimentation in oneDAL. If you forsee this as a problem, we will have to go back to oneDAL and add it (we can make that available for next release if important). @ethanglaser let me know.
) | ||
@pytest.mark.parametrize("allow_nan", [False, True]) | ||
@pytest.mark.parametrize( | ||
"dataframe, queue", get_dataframes_and_queues("numpy,dpnp,dpctl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there issues with pandas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas inputs are not to be encountered by this function, check_array should always convert them to numpy inputs, we also do not support heterogeneous tables yet, which will be done at a later point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue is handled in the follow up PR: #2177
X = _convert_to_dataframe(X, sycl_queue=queue, target_df=dataframe) | ||
|
||
if check is None or (allow_nan and check == "NaN"): | ||
_assert_all_finite(X, allow_nan=allow_nan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we checking that result of call is expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_assert_all_finite will raise a ValueError if a non-finite occurs, meaning we are using the native functionality of the function in the test to cause a fail (i.e. it will raise an error if there is a problem, in this case, no error should be expected).
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…_sample_weight``` (#2177) * add finiteness_checker pybind11 bindings * added finiteness checker * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Rename finiteness_checker.cpp to finiteness_checker.cpp * Update finiteness_checker.cpp * add next step * follow conventions * make xtable explicit * remove comment * Update validation.py * Update __init__.py * Update validation.py * Update __init__.py * Update __init__.py * Update validation.py * Update _data_conversion.py * Update _data_conversion.py * Update policy_common.cpp * Update policy_common.cpp * Update _policy.py * Update policy_common.cpp * Rename finiteness_checker.cpp to finiteness_checker.cpp * Create finiteness_checker.py * Update validation.py * Update __init__.py * attempt at fixing circular imports again * fix isort * remove __init__ changes * last move * Update policy_common.cpp * Update policy_common.cpp * Update policy_common.cpp * Update policy_common.cpp * Update validation.py * add testing * isort * attempt to fix module error * add fptype * fix typo * Update validation.py * remove sua_ifcae from to_table * isort and black * Update test_memory_usage.py * format * Update _data_conversion.py * Update _data_conversion.py * Update test_validation.py * remove unnecessary code * make reviewer changes * make dtype check change * add sparse testing * try again * try again * try again * temporary commit * first attempt * missing change? * modify DummyEstimator for testing * generalize DummyEstimator * switch test * further testing changes * add initial validate_data test, will be refactored * fixes for CI * Update validation.py * Update validation.py * Update test_memory_usage.py * Update base.py * Update base.py * improve tests * fix logic * fix logic * fix logic again * rename file * Revert "rename file" This reverts commit 8d47744. * remove duplication * fix imports * Rename test_finite.py to test_validation.py * Revert "Rename test_finite.py to test_validation.py" This reverts commit ee799f6. * updates * Update validation.py * fixes for some test failures * fix text * fixes for some failures * make consistent * fix bad logic * fix in string * attempt tp see if dataframe conversion is causing the issue * fix iter problem * fix testing issues * formatting * revert change * fixes for pandas * there is a slowdown with pandas that needs to be solved * swap to transpose for speed * more clarity * add _check_sample_weight * add more testing' * rename * remove unnecessary imports * fix test slowness * focus get_dataframes_and_queues * put config_context around * Update test_validation.py * Update base.py * Update test_validation.py * generalize regex * add fixes for sklearn 1.0 and input_name * fixes for test failures * Update validation.py * Update test_validation.py * Update validation.py * formattintg * make suggested changes * follow changes made in #2126 * fix future device problem * Update validation.py * minor changes based on #2206, suggestions * remove xp as keyword * only_non_negative -> ensure_non_negative * add commentary * formatting * address changes * Update test_validation.py * Update base.py * Update test_validation.py * Update sklearnex/utils/validation.py Co-authored-by: ethanglaser <[email protected]> --------- Co-authored-by: ethanglaser <[email protected]>
Description
A GPU side finiteness checker was added to oneDAL in 2024.7 (uxlfoundation/oneDAL#2781), this becomes useful with the merge of #2045. This will expose the first step, which is to include it in the onedal folder. This may replace daal4py's assert_all_finite, but will occur in a later PR. This adds limited testing on onedal side. This doesn't require performance benchmarking at this stage, but will be necessary in the next PR.
Aspects have been integrated in the following PRs:
#2152
#2151
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
Performance