-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: SPMD interface for IncrementalBasicStatistics #1961
Conversation
/intelci:run |
/intelci: run |
4c1faad
to
26a5f85
Compare
/intelci: run |
3 similar comments
/intelci: run |
/intelci: run |
/intelci: run |
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my comments are about the tests, as the inheritance is rather straightforward. Most are semantic questions since we are setting precedence for the other incremental algos. Please ping me when the changes related to the review are complete and I will re-review.
sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py
Outdated
Show resolved
Hide resolved
sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py
Outdated
Show resolved
Hide resolved
sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py
Outdated
Show resolved
Hide resolved
sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py
Show resolved
Hide resolved
sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py
Outdated
Show resolved
Hide resolved
sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py
Show resolved
Hide resolved
sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py
Outdated
Show resolved
Hide resolved
sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up changes associated with tolerance and dtyping, please also see my note on support_usm_ndarray.
dpt_data = _convert_to_dataframe(data, sycl_queue=queue, target_df=dataframe) | ||
|
||
local_dpt_data = _convert_to_dataframe( | ||
_get_local_tensor(data), sycl_queue=queue, target_df=dataframe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be explicit with dtype in _convert_to_dataframe as per the other comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_convert_to_dataframe
this will not work for array api for example, array api case should be updated in _convert_to_dataframe
.
Even explicitly provided dtype doesn't guarantee that this will be the same dtpye. I think, asserts after _convert_to_dataframe
more preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i still can't get why should it be only here and not in tons of other occurrences of this call in our repo. in my understanding it will become completely unnecessary after we test _convert_to_dataframe separately which is already planned. moreover, dtype check of the _convert_to_dataframe result is not straightforward as far as i understand because the returning value may have different types depending on arguments of convert_to_dataframe. so it would lead to perceptible test size increase which i don't like and don't think it should be here
sklearnex/spmd/basic_statistics/tests/test_incremental_basic_statistics_spmd.py
Show resolved
Hide resolved
eadb1d7
to
a9ab64e
Compare
/intelci:run |
dpt_data = _convert_to_dataframe(data, sycl_queue=queue, target_df=dataframe) | ||
|
||
local_dpt_data = _convert_to_dataframe( | ||
_get_local_tensor(data), sycl_queue=queue, target_df=dataframe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_convert_to_dataframe
this will not work for array api for example, array api case should be updated in _convert_to_dataframe
.
Even explicitly provided dtype doesn't guarantee that this will be the same dtpye. I think, asserts after _convert_to_dataframe
more preferable.
"dataframe,queue", | ||
get_dataframes_and_queues(dataframe_filter_="dpnp,dpctl", device_filter_="gpu"), | ||
) | ||
@pytest.mark.parametrize("weighted", [True, False]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just add indexes weighted
and non_weighted
for better logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you clarify please or give the example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be done on follow up refactoring. Non critical
@pytest.mark.parametrize("weighted", [True, False], ids=["weighted", "non_weighted"])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide latest green CI run after all comments addressed. Generally, looks to me good. Just minor comments should be addressed. I am ok to do some follow-up work after the merge
Please rebase and run internal CI as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approval contingent on:
@ethanglaser 's weigh-in on this comment: https://github.com/intel/scikit-learn-intelex/pull/1961/files#r1732104503
Implementing @samir-nasibli 's suggestions from his review (negotiate with him on which to implement/not implement)
@ethanglaser 's suggestion to have a follow-up on the gold data DRY issue #1961 (comment)
@ethanglaser 's suggestion to have a follow-up refactor of spmd (#1961 (comment))
@samir-nasibli 's suggestion to have testing implemented for _convert_to_dataframe
and adding assert statements (#1961 (comment))
When we have these odds and ends complete then we can merge. I can write the ticket for @samir-nasibli 's suggestion for _convert_to_supported
testing, would @olegkkruglov or @ethanglaser write the tickets for the other two? I think this should be doable today.
23d7f4c
to
4bc41d0
Compare
/intelci: run |
ticket 8352 covers both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expecting all follow up tickets addressed. Thank you for the work done!
Description
finalize_fit
requiresspmd_policy
, butpartial_fit
requiresdata_parallel_policy
on oneDAL sidefinalize_fit
now uses provided queue for computations on onedal4py side.