Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: SPMD interface for IncrementalPCA #1979

Merged
merged 14 commits into from
Sep 5, 2024

Conversation

olegkkruglov
Copy link
Contributor

@olegkkruglov olegkkruglov commented Jul 31, 2024

Description

  • Added SPMD interface for IncrementalEmpiricalCovariance
  • Changed policy saving workflow, now queue is saved to attributes instead of policy. It is necessary because finalize_fit requires spmd_policy, but partial_fit requires data_parallel_policy on oneDAL side
  • finalize_fit now uses provided queue for computations on onedal4py side.
  • Contains some content from TEST: test coverage for sklearnex SPMD ifaces #1777 for test implementation

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • The unit tests pass successfully.
  • I have run it locally and tested the changes extensively.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.

@olegkkruglov olegkkruglov added enhancement New feature or request testing Tests for sklearnex/daal4py/onedal4py & patching sklearn labels Jul 31, 2024
@samir-nasibli
Copy link
Contributor

@olegkkruglov please rebase your branch

@ethanglaser
Copy link
Contributor

/intelci: run

@ethanglaser
Copy link
Contributor

@olegkkruglov
Copy link
Contributor Author

/intelci: run

@samir-nasibli
Copy link
Contributor

Description

  • Added SPMD interface for IncrementalEmpiricalCovariance
  • Changed policy saving workflow, now queue is saved to attributes instead of policy. It is necessary because finalize_fit requires spmd_policy, but partial_fit requires data_parallel_policy on oneDAL side
  • finalize_fit now uses provided queue for computations on onedal4py side.
  • Contains some content from TEST: test coverage for sklearnex SPMD ifaces #1777 for test implementation

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • The unit tests pass successfully.
  • I have run it locally and tested the changes extensively.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.

@olegkkruglov Is it possible to update and use global queue in scikit-learn-intelex for the finalize_fit? Could be done as a followup for all Incr algos, just to remove storing queue in models itself.

Copy link
Contributor

@samir-nasibli samir-nasibli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olegkkruglov All CI checks should be included to the PR before making it ready for the review. For SPMD algos this is mandatory to have internal CI run.

onedal/decomposition/incremental_pca.py Outdated Show resolved Hide resolved
onedal/spmd/decomposition/incremental_pca.py Outdated Show resolved Hide resolved
onedal/spmd/decomposition/incremental_pca.py Outdated Show resolved Hide resolved
onedal/spmd/decomposition/incremental_pca.py Show resolved Hide resolved
@samir-nasibli
Copy link
Contributor

/intelci: run

@olegkkruglov
Copy link
Contributor Author

/intelci: run

@olegkkruglov
Copy link
Contributor Author

@olegkkruglov
Copy link
Contributor Author

@olegkkruglov Is it possible to update and use global queue in scikit-learn-intelex for the finalize_fit? Could be done as a followup for all Incr algos, just to remove storing queue in models itself

Probably it is possible but what's the point? Serialization problem is already resolved and I don't see any other problems with it

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment much on the implementation of the spmd object side of things, since I really didn't review the initial IncrementalPCA PR. My focus was on similar issue as the other Incremental SPMD PRs for 2025/ testing. I assume the _create_model method comes from that implementation, and that it was justified for sklearn conformance reasons.

Copy link
Contributor

@samir-nasibli samir-nasibli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase your branch and run CI

onedal/spmd/decomposition/incremental_pca.py Outdated Show resolved Hide resolved
onedal/spmd/decomposition/incremental_pca.py Show resolved Hide resolved
@icfaust
Copy link
Contributor

icfaust commented Sep 4, 2024

@olegkkruglov ping me when you want another review.

Copy link
Contributor

@samir-nasibli samir-nasibli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Expecting other reviewers approvals.
Assuming green CI

@olegkkruglov
Copy link
Contributor Author

/intelci: run

Copy link
Contributor

@icfaust icfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge whenever you are ready.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that spmd incremental pca tests are passing but not normal pca spmd

@olegkkruglov olegkkruglov merged commit 9f63db2 into uxlfoundation:main Sep 5, 2024
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request testing Tests for sklearnex/daal4py/onedal4py & patching sklearn
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants