Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: SPMD interface for IncrementalBasicStatistics #1961

Merged
merged 18 commits into from
Sep 2, 2024

Conversation

olegkkruglov
Copy link
Contributor

Description

  • Added SPMD interface for IncrementalBasicStatistics
  • Changed policy saving workflow, now queue is saved to attributes instead of policy. It is necessary because finalize_fit requires spmd_policy, but partial_fit requires data_parallel_policy on oneDAL side
  • finalize_fit now uses provided queue for computations on onedal4py side.
  • Contains some content from TEST: test coverage for sklearnex SPMD ifaces #1777 for test implementation
  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes, if necessary.
  • The unit tests pass successfully.
  • I have run it locally and tested the changes extensively.
  • I have resolved any merge conflicts that might occur with the base branch.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details)
  • I have added a respective label(s) to PR if I have a permission for that.

@olegkkruglov
Copy link
Contributor Author

/intelci:run

@olegkkruglov olegkkruglov added enhancement New feature or request testing Tests for sklearnex/daal4py/onedal4py & patching sklearn labels Jul 25, 2024
@olegkkruglov
Copy link
Contributor Author

/intelci: run

@olegkkruglov
Copy link
Contributor Author

/intelci: run

3 similar comments
@olegkkruglov
Copy link
Contributor Author

/intelci: run

@ethanglaser
Copy link
Contributor

/intelci: run

@ethanglaser
Copy link
Contributor

/intelci: run

@uxlfoundation uxlfoundation deleted a comment from olegkkruglov Aug 20, 2024
@ethanglaser
Copy link
Contributor

@olegkkruglov
Copy link
Contributor Author

/intelci: run

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my comments are about the tests, as the inheritance is rather straightforward. Most are semantic questions since we are setting precedence for the other incremental algos. Please ping me when the changes related to the review are complete and I will re-review.

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up changes associated with tolerance and dtyping, please also see my note on support_usm_ndarray.

dpt_data = _convert_to_dataframe(data, sycl_queue=queue, target_df=dataframe)

local_dpt_data = _convert_to_dataframe(
_get_local_tensor(data), sycl_queue=queue, target_df=dataframe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be explicit with dtype in _convert_to_dataframe as per the other comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_convert_to_dataframe this will not work for array api for example, array api case should be updated in _convert_to_dataframe.
Even explicitly provided dtype doesn't guarantee that this will be the same dtpye. I think, asserts after _convert_to_dataframe more preferable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still can't get why should it be only here and not in tons of other occurrences of this call in our repo. in my understanding it will become completely unnecessary after we test _convert_to_dataframe separately which is already planned. moreover, dtype check of the _convert_to_dataframe result is not straightforward as far as i understand because the returning value may have different types depending on arguments of convert_to_dataframe. so it would lead to perceptible test size increase which i don't like and don't think it should be here

@samir-nasibli
Copy link
Contributor

/intelci:run

dpt_data = _convert_to_dataframe(data, sycl_queue=queue, target_df=dataframe)

local_dpt_data = _convert_to_dataframe(
_get_local_tensor(data), sycl_queue=queue, target_df=dataframe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_convert_to_dataframe this will not work for array api for example, array api case should be updated in _convert_to_dataframe.
Even explicitly provided dtype doesn't guarantee that this will be the same dtpye. I think, asserts after _convert_to_dataframe more preferable.

"dataframe,queue",
get_dataframes_and_queues(dataframe_filter_="dpnp,dpctl", device_filter_="gpu"),
)
@pytest.mark.parametrize("weighted", [True, False])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add indexes weighted and non_weighted for better logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you clarify please or give the example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be done on follow up refactoring. Non critical

@pytest.mark.parametrize("weighted", [True, False], ids=["weighted", "non_weighted"])

Copy link
Contributor

@samir-nasibli samir-nasibli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide latest green CI run after all comments addressed. Generally, looks to me good. Just minor comments should be addressed. I am ok to do some follow-up work after the merge

@samir-nasibli
Copy link
Contributor

Please rebase and run internal CI as well

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval contingent on:

@ethanglaser 's weigh-in on this comment: https://github.com/intel/scikit-learn-intelex/pull/1961/files#r1732104503

Implementing @samir-nasibli 's suggestions from his review (negotiate with him on which to implement/not implement)

@ethanglaser 's suggestion to have a follow-up on the gold data DRY issue #1961 (comment)

@ethanglaser 's suggestion to have a follow-up refactor of spmd (#1961 (comment))

@samir-nasibli 's suggestion to have testing implemented for _convert_to_dataframe and adding assert statements (#1961 (comment))

When we have these odds and ends complete then we can merge. I can write the ticket for @samir-nasibli 's suggestion for _convert_to_supported testing, would @olegkkruglov or @ethanglaser write the tickets for the other two? I think this should be doable today.

@olegkkruglov
Copy link
Contributor Author

/intelci: run

@olegkkruglov
Copy link
Contributor Author

olegkkruglov commented Aug 30, 2024

@olegkkruglov
Copy link
Contributor Author

@ethanglaser 's suggestion to have a follow-up on the gold data DRY issue #1961 (comment)

@ethanglaser 's suggestion to have a follow-up refactor of spmd (#1961 (comment))

ticket 8352 covers both

Copy link
Contributor

@samir-nasibli samir-nasibli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expecting all follow up tickets addressed. Thank you for the work done!

@olegkkruglov olegkkruglov merged commit 7ecc9f1 into uxlfoundation:main Sep 2, 2024
23 checks passed
@icfaust icfaust mentioned this pull request Sep 3, 2024
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request testing Tests for sklearnex/daal4py/onedal4py & patching sklearn
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants