Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discretize: rounding problem #6876

Open
ZanMervic opened this issue Aug 21, 2024 · 1 comment
Open

Discretize: rounding problem #6876

ZanMervic opened this issue Aug 21, 2024 · 1 comment
Labels
bug A bug confirmed by the core team snack This will take an hour or two

Comments

@ZanMervic
Copy link
Contributor

What's wrong?

Using PCA on the Titanic dataset and discretizing the output results in strange rounding of the values by the disretization. This results with multiple values with the same "name".

This is the workflow I have (I have also included the .ows):
image

Here on the left is the Data Table that shows the results of the PCA (pay attention to the PC7 attribute). On the right we can see the results of the Discretize widget, the discretized PC7 attribute has been rounded strangely, there are also multiple PC7 values with the same "name" (highlighted).

image

How can we reproduce the problem?
Zip of the workflow:
discretize_bug.zip

To reproduce the problem, set the PCA components to 8 in the provided workflow.

image

What's your environment?

  • Operating system: Windows 10
  • Orange version: 3.38
  • How you installed Orange: Using pip in a conda environment
@ZanMervic ZanMervic added the bug report Bug is reported by user, not yet confirmed by the core team label Aug 21, 2024
@janezd
Copy link
Contributor

janezd commented Aug 21, 2024

Rounding was introduced in df34d90.

I think we could (and probably should) simply add bins_ = np.unique(bins_) after rounding.

Decimal binning doesn't guarantee to give the exact number of intervals specified by the user, but returns the closest match across different possible "nice" thresholds. If rounding+unique decrease the number of intervals for a certain bin width, the method may choose another (smaller) width, or return smaller number of bins; both are OK.

If you wish, change this (don't forget to add a simple test).

@janezd janezd added bug A bug confirmed by the core team snack This will take an hour or two and removed bug report Bug is reported by user, not yet confirmed by the core team labels Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug confirmed by the core team snack This will take an hour or two
Projects
None yet
Development

No branches or pull requests

2 participants