Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a deduplicate preprocessor #103

Closed
dotsdl opened this issue May 19, 2020 · 1 comment
Closed

Add a deduplicate preprocessor #103

dotsdl opened this issue May 19, 2020 · 1 comment

Comments

@dotsdl
Copy link
Member

dotsdl commented May 19, 2020

See this comment for the context.

We are taking the approach of having composable preprocessing functions to make it easy for users to manipulate our DataFrame-based data structures without having to reinvent the wheel each time or quickly dive into complex and error-prone pandas-fu. To that end, we would like a preprocessor that simply and safely deduplicates records from our standard form DataFrames.

@xiki-tempula
Copy link
Collaborator

I think this is already done through the drop_duplicates keyword.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants