Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Sort out clustering base class #2251

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

[ENH] Sort out clustering base class #2251

wants to merge 16 commits into from

Conversation

TonyBagnall
Copy link
Contributor

@TonyBagnall TonyBagnall commented Oct 25, 2024

Fixes #1530

  • The predict function has a y input, this is inconsistent with other collection classes and with predict_proba from the same class
  • The score function is functionally different to other bases and is a required method. Can all algorithms produce scores like this? What about pipelines etc.
  • fit_predict does not have a private equivalent like other classes. Also, by default it seems like this should return labels_ rather than calling _predict?
  • n_clusters is a required attribute, but not all algorithms have a set number of clusters.

@TonyBagnall TonyBagnall added the clustering Clustering package label Oct 25, 2024
@aeon-actions-bot aeon-actions-bot bot added the enhancement New feature, improvement request or other non-bug code enhancement label Oct 25, 2024
@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#FEF1BE}{\textsf{enhancement}}$ ].
I would have added the following labels to this PR based on the changes made: [ $\color{#4011F3}{\textsf{clustering}}$ ], however some package labels are already present.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

  • Run pre-commit checks for all files
  • Run mypy typecheck tests
  • Run all pytest tests and configurations
  • Run all notebook example tests
  • Run numba-disabled codecov tests
  • Stop automatic pre-commit fixes (always disabled for drafts)
  • Disable numba cache loading
  • Push an empty commit to re-run CI checks

@TonyBagnall TonyBagnall closed this Nov 2, 2024
@TonyBagnall TonyBagnall deleted the ajb/base_clst branch November 2, 2024 12:00
@TonyBagnall TonyBagnall restored the ajb/base_clst branch November 4, 2024 10:15
@TonyBagnall TonyBagnall reopened this Nov 4, 2024
@TonyBagnall TonyBagnall marked this pull request as ready for review November 4, 2024 10:49
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hadifawaz1999 is this the one you wanted delayed?

Approving for if thats fine. Left some comments.

Comment on lines +198 to +201
if hasattr(self, "n_clusters"):
n_clusters = self.n_clusters
else:
n_clusters = len(np.unique(preds))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this could be risky down the line if there are non-int methods of generating n_clusters, but no solution for now other than always using unique.

@@ -173,10 +173,10 @@ def __init__(
self.save_last_model = save_last_model
self.best_file_name = best_file_name
self.random_state = random_state
self.n_clusters = n_clusters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can keep this for now, but I think @hadifawaz1999 said these would be better removed (n_clusters should be input with estimator)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it should not be added as we're gonna remove it in a few days

Comment on lines +163 to +164
if hasattr(self, "labels_"):
return self.labels_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, everything should have this.

@hadifawaz1999
Copy link
Member

@hadifawaz1999 is this the one you wanted delayed?

Approving for if thats fine. Left some comments.

no the one am delaying is Chris's pr of deprecation, but the additions here to deep clustering i think should not be made @TonyBagnall as they're gonna be removed with the purging in a few days once all Aadya's prs are in

@TonyBagnall
Copy link
Contributor Author

could also do this at the same time @chrisholder ?
#2252

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clustering Clustering package enhancement New feature, improvement request or other non-bug code enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] BaseClusterer inconsistencies with other base classes
4 participants