Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] add oneDAL finiteness_checker implementation to onedal #2126

Merged
merged 79 commits into from
Nov 28, 2024

Conversation

icfaust
Copy link
Contributor

@icfaust icfaust commented Oct 23, 2024

Description

A GPU side finiteness checker was added to oneDAL in 2024.7 (uxlfoundation/oneDAL#2781), this becomes useful with the merge of #2045. This will expose the first step, which is to include it in the onedal folder. This may replace daal4py's assert_all_finite, but will occur in a later PR. This adds limited testing on onedal side. This doesn't require performance benchmarking at this stage, but will be necessary in the next PR.

Aspects have been integrated in the following PRs:
#2152
#2151


Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.
  • I have provided justification why quality metrics have changed or why changes are not expected.
  • I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@icfaust icfaust marked this pull request as ready for review October 24, 2024 08:52
@icfaust icfaust marked this pull request as draft October 24, 2024 08:53
@icfaust
Copy link
Contributor Author

icfaust commented Oct 28, 2024

/azp run CI

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@icfaust
Copy link
Contributor Author

icfaust commented Oct 29, 2024

/azp run CI

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@icfaust
Copy link
Contributor Author

icfaust commented Nov 21, 2024

/intelci: run

onedal/utils/tests/test_validation.py Outdated Show resolved Hide resolved
@@ -72,6 +72,10 @@ ONEDAL_PY_INIT_MODULE(table) {
const auto column_count = t.get_column_count();
return py::make_tuple(row_count, column_count);
});
table_obj.def_property_readonly("dtype", [](const table& t){
// returns a numpy dtype, even if source was not from numpy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@icfaust Thank you for the clarification! I don't like this extension of table API which is wo for numpy and dpnp/dpctl.

I would suggest to solve the problem at python level https://github.com/intel/scikit-learn-intelex/blob/5bb54a520f5f3d1cf719e149df6526de943e393b/onedal/utils/validation.py#L452 without needing to extend table API

@icfaust
Copy link
Contributor Author

icfaust commented Nov 21, 2024

/intelci: run

@icfaust
Copy link
Contributor Author

icfaust commented Nov 25, 2024

/intelci: run

@icfaust icfaust requested a review from ethanglaser November 27, 2024 05:36
icfaust added a commit to icfaust/scikit-learn-intelex that referenced this pull request Nov 27, 2024
@icfaust
Copy link
Contributor Author

icfaust commented Nov 27, 2024

/intelci: run

using task_list = types<task::compute>;
auto sub = m.def_submodule("finiteness_checker");

#ifndef ONEDAL_DATA_PARALLEL_SPMD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only instantiate if not data parallel spmd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now there isn't an spmd implimentation in oneDAL. If you forsee this as a problem, we will have to go back to oneDAL and add it (we can make that available for next release if important). @ethanglaser let me know.

)
@pytest.mark.parametrize("allow_nan", [False, True])
@pytest.mark.parametrize(
"dataframe, queue", get_dataframes_and_queues("numpy,dpnp,dpctl")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there issues with pandas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas inputs are not to be encountered by this function, check_array should always convert them to numpy inputs, we also do not support heterogeneous tables yet, which will be done at a later point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is handled in the follow up PR: #2177

onedal/utils/tests/test_validation.py Outdated Show resolved Hide resolved
X = _convert_to_dataframe(X, sycl_queue=queue, target_df=dataframe)

if check is None or (allow_nan and check == "NaN"):
_assert_all_finite(X, allow_nan=allow_nan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we checking that result of call is expected?

Copy link
Contributor Author

@icfaust icfaust Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_assert_all_finite will raise a ValueError if a non-finite occurs, meaning we are using the native functionality of the function in the test to cause a fail (i.e. it will raise an error if there is a problem, in this case, no error should be expected).

@icfaust icfaust requested a review from ethanglaser November 28, 2024 08:22
@icfaust
Copy link
Contributor Author

icfaust commented Nov 28, 2024

/intelci: run

Copy link
Contributor

@Vika-F Vika-F left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@icfaust icfaust merged commit 5fcc7fb into uxlfoundation:main Nov 28, 2024
27 checks passed
icfaust added a commit that referenced this pull request Dec 10, 2024
…_sample_weight``` (#2177)

* add finiteness_checker pybind11 bindings

* added finiteness checker

* Update finiteness_checker.cpp

* Update finiteness_checker.cpp

* Update finiteness_checker.cpp

* Update finiteness_checker.cpp

* Update finiteness_checker.cpp

* Update finiteness_checker.cpp

* Rename finiteness_checker.cpp to finiteness_checker.cpp

* Update finiteness_checker.cpp

* add next step

* follow conventions

* make xtable explicit

* remove comment

* Update validation.py

* Update __init__.py

* Update validation.py

* Update __init__.py

* Update __init__.py

* Update validation.py

* Update _data_conversion.py

* Update _data_conversion.py

* Update policy_common.cpp

* Update policy_common.cpp

* Update _policy.py

* Update policy_common.cpp

* Rename finiteness_checker.cpp to finiteness_checker.cpp

* Create finiteness_checker.py

* Update validation.py

* Update __init__.py

* attempt at fixing circular imports again

* fix isort

* remove __init__ changes

* last move

* Update policy_common.cpp

* Update policy_common.cpp

* Update policy_common.cpp

* Update policy_common.cpp

* Update validation.py

* add testing

* isort

* attempt to fix module error

* add fptype

* fix typo

* Update validation.py

* remove sua_ifcae from to_table

* isort and black

* Update test_memory_usage.py

* format

* Update _data_conversion.py

* Update _data_conversion.py

* Update test_validation.py

* remove unnecessary code

* make reviewer changes

* make dtype check change

* add sparse testing

* try again

* try again

* try again

* temporary commit

* first attempt

* missing change?

* modify DummyEstimator for testing

* generalize DummyEstimator

* switch test

* further testing changes

* add initial validate_data test, will be refactored

* fixes for CI

* Update validation.py

* Update validation.py

* Update test_memory_usage.py

* Update base.py

* Update base.py

* improve tests

* fix logic

* fix logic

* fix logic again

* rename file

* Revert "rename file"

This reverts commit 8d47744.

* remove duplication

* fix imports

* Rename test_finite.py to test_validation.py

* Revert "Rename test_finite.py to test_validation.py"

This reverts commit ee799f6.

* updates

* Update validation.py

* fixes for some test failures

* fix text

* fixes for some failures

* make consistent

* fix bad logic

* fix in string

* attempt tp see if dataframe conversion is causing the issue

* fix iter problem

* fix testing issues

* formatting

* revert change

* fixes for pandas

* there is a slowdown with pandas that needs to be solved

* swap to transpose for speed

* more clarity

* add _check_sample_weight

* add more testing'

* rename

* remove unnecessary imports

* fix test slowness

* focus get_dataframes_and_queues

* put config_context around

* Update test_validation.py

* Update base.py

* Update test_validation.py

* generalize regex

* add fixes for sklearn 1.0 and input_name

* fixes for test failures

* Update validation.py

* Update test_validation.py

* Update validation.py

* formattintg

* make suggested changes

* follow changes made in #2126

* fix future device problem

* Update validation.py

* minor changes based on #2206, suggestions

* remove xp as keyword

* only_non_negative -> ensure_non_negative

* add commentary

* formatting

* address changes

* Update test_validation.py

* Update base.py

* Update test_validation.py

* Update sklearnex/utils/validation.py

Co-authored-by: ethanglaser <[email protected]>

---------

Co-authored-by: ethanglaser <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants