-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC improve documentation of NCR #1017
base: master
Are you sure you want to change the base?
Changes from 4 commits
fdda014
ff4fe04
d9d7613
01af8df
c26157a
8d4508f
9bbc4a4
72b94d0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -347,10 +347,30 @@ place. The class can be used as:: | |||||
Our implementation offer to set the number of seeds to put in the set :math:`C` | ||||||
originally by setting the parameter ``n_seeds_S``. | ||||||
|
||||||
:class:`NeighbourhoodCleaningRule` will focus on cleaning the data than | ||||||
condensing them :cite:`laurikkala2001improving`. Therefore, it will used the | ||||||
union of samples to be rejected between the :class:`EditedNearestNeighbours` | ||||||
and the output a 3 nearest neighbors classifier. The class can be used as:: | ||||||
Neighbourhood Cleaning Rule | ||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||||
|
||||||
The :class:`NeighbourhoodCleaningRule` is another "cleaning" algorithm. It removes | ||||||
samples from the majority class that are closest to the boundary with the minority | ||||||
:cite:`laurikkala2001improving`. | ||||||
|
||||||
The :class:`NeighbourhoodCleaningRule` expands on the cleaning performed by | ||||||
:class:`EditedNearestNeighbours` by eliminating additional majority class samples if | ||||||
they are among the 3 closest neighbours of a sample from the minority class. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have a parameter controlling the
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Throughout the docs we are using K as the number of neighbours, not N. I guess the n in n_neighbours comes from n=number. I'd rather stick to K if that's alright with you, for consitency. I'll fix this in a separate commit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I removed this sentence altogether as per below suggestion. |
||||||
|
||||||
The procedure for the :class:`NeighbourhoodCleaningRule` is as follows: | ||||||
|
||||||
1. Remove observations from the majority class with edited nearest neighbors (ENN). | ||||||
2. Remove additional samples from the majority class if they are one of the k closest | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we repeating the same sentence as above, I would remove the paragraph above and only go with the bullet point sequence. |
||||||
neighbors of a minority sample, where all or most of those neighbors are not minority. | ||||||
|
||||||
To carry out step 2 there is one condition: a sample will only be removed if its class | ||||||
has a minimum number of observations. The minimum number of observations is regulated | ||||||
by the `threshold_cleaning` parameter. In the original article | ||||||
:cite:`laurikkala2001improving`, samples would be removed if the class had at | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would not go in details regarding the original paper but instead just phrase that we check that the number of samples in the class to under-sample is above the threshold times the number of samples in the minority class. |
||||||
least half as many observations as those in the minority class. | ||||||
|
||||||
The class can be used as:: | ||||||
|
||||||
>>> from imblearn.under_sampling import NeighbourhoodCleaningRule | ||||||
>>> ncr = NeighbourhoodCleaningRule(n_neighbors=11) | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't totally understand this sentence. Let me try a modification in a new commit.