New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

ENH: SPMD interface for IncrementalBasicStatistics #1961

Merged

olegkkruglov merged 18 commits into uxlfoundation:main from olegkkruglov:incbs-spmd

Sep 2, 2024

Contributor

olegkkruglov commented Jul 25, 2024

Description

Added SPMD interface for IncrementalBasicStatistics
Changed policy saving workflow, now queue is saved to attributes instead of policy. It is necessary because finalize_fit requires spmd_policy, but partial_fit requires data_parallel_policy on oneDAL side
finalize_fit now uses provided queue for computations on onedal4py side.
Contains some content from TEST: test coverage for sklearnex SPMD ifaces #1777 for test implementation

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes, if necessary.
The unit tests pass successfully.
I have run it locally and tested the changes extensively.
I have resolved any merge conflicts that might occur with the base branch.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details)
I have added a respective label(s) to PR if I have a permission for that.

olegkkruglov requested review from samir-nasibli and Alexsandruss as code owners

July 25, 2024 18:39

olegkkruglov requested review from ethanglaser, icfaust, samir-nasibli and Alexsandruss and removed request for samir-nasibli and Alexsandruss

July 25, 2024 18:39

Contributor Author

olegkkruglov commented Jul 25, 2024

/intelci:run

olegkkruglov added enhancement testing labels

Contributor Author

olegkkruglov commented Jul 26, 2024

/intelci: run

ethanglaser reviewed

View reviewed changes

onedal/spmd/basic_statistics/__init__.py Outdated Show resolved Hide resolved

onedal/spmd/basic_statistics/incremental_basic_statistics.py Outdated Show resolved Hide resolved

olegkkruglov force-pushed the incbs-spmd branch from 4c1faad to 26a5f85 Compare

August 19, 2024 13:46

Contributor Author

olegkkruglov commented Aug 19, 2024

/intelci: run

3 similar comments

Contributor Author

olegkkruglov commented Aug 19, 2024

/intelci: run

Contributor

ethanglaser commented Aug 20, 2024

/intelci: run

Contributor

ethanglaser commented Aug 20, 2024

/intelci: run

uxlfoundation deleted a comment from olegkkruglov

Contributor

ethanglaser commented Aug 20, 2024

https://intel-ci.intel.com/ef5f1a8a-fd29-f1cd-b7c8-a4bf010d0e2e

Contributor Author

olegkkruglov commented Aug 22, 2024

/intelci: run

icfaust reviewed

View reviewed changes

Contributor

icfaust left a comment

Most of my comments are about the tests, as the inheritance is rather straightforward. Most are semantic questions since we are setting precedence for the other incremental algos. Please ping me when the changes related to the review are complete and I will re-review.

sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py Outdated Show resolved Hide resolved

sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py Outdated Show resolved Hide resolved

sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py Outdated Show resolved Hide resolved

sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py Show resolved Hide resolved

sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py Outdated Show resolved Hide resolved

sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py Show resolved Hide resolved

sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py Outdated Show resolved Hide resolved

sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py Outdated Show resolved Hide resolved

onedal/spmd/basic_statistics/incremental_basic_statistics.py Outdated Show resolved Hide resolved

onedal/spmd/basic_statistics/incremental_basic_statistics.py Outdated Show resolved Hide resolved

icfaust mentioned this pull request

ENH: SPMD interface for IncrementalEmpiricalCovariance #1941

Merged

8 tasks

icfaust reviewed

View reviewed changes

Contributor

icfaust left a comment

Follow up changes associated with tolerance and dtyping, please also see my note on support_usm_ndarray.

sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py

+                  dpt_data = _convert_to_dataframe(data, sycl_queue=queue, target_df=dataframe)
+                  local_dpt_data = _convert_to_dataframe(
+                      _get_local_tensor(data), sycl_queue=queue, target_df=dataframe

Contributor

icfaust Aug 28, 2024

Please be explicit with dtype in _convert_to_dataframe as per the other comment

Contributor

samir-nasibli Aug 30, 2024

_convert_to_dataframe this will not work for array api for example, array api case should be updated in _convert_to_dataframe.
Even explicitly provided dtype doesn't guarantee that this will be the same dtpye. I think, asserts after _convert_to_dataframe more preferable.

Contributor Author

olegkkruglov Aug 30, 2024

i still can't get why should it be only here and not in tons of other occurrences of this call in our repo. in my understanding it will become completely unnecessary after we test _convert_to_dataframe separately which is already planned. moreover, dtype check of the _convert_to_dataframe result is not straightforward as far as i understand because the returning value may have different types depending on arguments of convert_to_dataframe. so it would lead to perceptible test size increase which i don't like and don't think it should be here

sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py Show resolved Hide resolved

olegkkruglov requested review from maria-Petrova, napetrov and bdmoore1 as code owners

August 28, 2024 16:03

bdmoore1 approved these changes

View reviewed changes

olegkkruglov force-pushed the incbs-spmd branch from eadb1d7 to a9ab64e Compare

August 28, 2024 16:39

olegkkruglov removed the request for review from maria-Petrova

August 28, 2024 16:49

Contributor

samir-nasibli commented Aug 30, 2024

/intelci:run

samir-nasibli reviewed

View reviewed changes

onedal/spmd/basic_statistics/incremental_basic_statistics.py Outdated Show resolved Hide resolved

sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py

+                  dpt_data = _convert_to_dataframe(data, sycl_queue=queue, target_df=dataframe)
+                  local_dpt_data = _convert_to_dataframe(
+                      _get_local_tensor(data), sycl_queue=queue, target_df=dataframe

Contributor

samir-nasibli Aug 30, 2024

_convert_to_dataframe this will not work for array api for example, array api case should be updated in _convert_to_dataframe.
Even explicitly provided dtype doesn't guarantee that this will be the same dtpye. I think, asserts after _convert_to_dataframe more preferable.

sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py

+                  "dataframe,queue",
+                  get_dataframes_and_queues(dataframe_filter_="dpnp,dpctl", device_filter_="gpu"),
+              )
+              @pytest.mark.parametrize("weighted", [True, False])

Contributor

samir-nasibli Aug 30, 2024

Just add indexes weighted and non_weighted for better logs.

Contributor Author

olegkkruglov Aug 30, 2024

could you clarify please or give the example?

Contributor

samir-nasibli Aug 30, 2024

Could be done on follow up refactoring. Non critical

@pytest.mark.parametrize("weighted", [True, False], ids=["weighted", "non_weighted"])

samir-nasibli reviewed

View reviewed changes

Contributor

samir-nasibli left a comment •

edited

Loading

Please provide latest green CI run after all comments addressed. Generally, looks to me good. Just minor comments should be addressed. I am ok to do some follow-up work after the merge

Contributor

samir-nasibli commented Aug 30, 2024

Please rebase and run internal CI as well

icfaust approved these changes

View reviewed changes

Contributor

icfaust left a comment •

edited

Loading

Approval contingent on:

@ethanglaser 's weigh-in on this comment: https://github.com/intel/scikit-learn-intelex/pull/1961/files#r1732104503

Implementing @samir-nasibli 's suggestions from his review (negotiate with him on which to implement/not implement)

@ethanglaser 's suggestion to have a follow-up on the gold data DRY issue #1961 (comment)

@ethanglaser 's suggestion to have a follow-up refactor of spmd (#1961 (comment))

@samir-nasibli 's suggestion to have testing implemented for _convert_to_dataframe and adding assert statements (#1961 (comment))

When we have these odds and ends complete then we can merge. I can write the ticket for @samir-nasibli 's suggestion for _convert_to_supported testing, would @olegkkruglov or @ethanglaser write the tickets for the other two? I think this should be doable today.

olegkkruglov added 17 commits

August 30, 2024 05:11


          Add IncrementalBasicStatisticsSPMD

08327e7


          Fix __init__.py

94f6925


          Remove accidentally added comment

65eda21


          Fix spmd estimator setting on sklearnex side

690d7fa


          Rename class


          Fix lint

bd505e5


          Add weights generator

1d2de5d


          Fix dtype and tests

2cbcc47


          Address comments

b322137


          dtype update

a7acfaf


          Remove finalize_fit definition

403c606


          Removed support_usm_nd_array

aab119e


          Revert accidentally pushed changes in docs

e21b495


          Address comments

3c5f7cb


          Remove generate_weights

88307b3


          Rename class reference

b10b58a


          Update self._queue in every partial_fit call

4bc41d0

olegkkruglov force-pushed the incbs-spmd branch from 23d7f4c to 4bc41d0 Compare

August 30, 2024 12:12

Contributor Author

olegkkruglov commented Aug 30, 2024

/intelci: run

Contributor Author

olegkkruglov commented Aug 30, 2024 •

edited

Loading

https://intel-ci.intel.com/ef66d25b-494b-f180-8e51-a4bf010d0e2e

Contributor Author

olegkkruglov commented Aug 30, 2024

@ethanglaser 's suggestion to have a follow-up on the gold data DRY issue #1961 (comment)

@ethanglaser 's suggestion to have a follow-up refactor of spmd (#1961 (comment))

ticket 8352 covers both


          Change naming for base class reference

8d5bc56

samir-nasibli approved these changes

View reviewed changes

Contributor

samir-nasibli left a comment

Expecting all follow up tickets addressed. Thank you for the work done!

olegkkruglov merged commit 7ecc9f1 into uxlfoundation:main

23 checks passed

icfaust mentioned this pull request

ENH: SPMD interface for IncrementalPCA #1979

Merged

10 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

ethanglaser ethanglaser left review comments

bdmoore1 bdmoore1 approved these changes

samir-nasibli samir-nasibli approved these changes

icfaust icfaust approved these changes

Alexsandruss Awaiting requested review from Alexsandruss Alexsandruss is a code owner

Labels

enhancement testing