File | Description | Source link (with details) | Preprocessing applied | Label column |
---|---|---|---|---|
generated.csv |
Automatically-generated dataset containing data samples separated into very well-delineated categories. This can be considered a "best-case scenario" test case. | label |
||
defaults.csv |
Defaults on credit card payments | UCI | Minor (column name reformatting) | defaulted |
winequality.csv |
Quality ratings of Portuguese white wines | UCI | Added binarized label column recommend indicating quality >= 7 |
recommend |
vehicles.csv |
Recognizing vehicle type from its silhouette | OpenML | None | Class |
eeg.csv |
EEG eye state measurements | OpenML | Dropped a few outlier rows | Class |
kick_starter.csv |
Kick stater project state | Kaggle | Dropped unnamed columns; Minor column name reformatting; Calculated duration of the project and dropped start and end dates; Dropped some rows with wrong input type; Dropped main category column and kept category column; randomply sampled 30% of the data; Filled NA with 0 for numeric values | state |
mushrooms.csv |
Classification mushrooms edibility based on physical features | UCI | Renamed the column class to edibility for descriptiveness |
edibility |
Surgical-deepnet.csv |
Surgical cases related to complication | Kaggle | None | complication |
gender_classification.csv |
use hobbies to guess gender | Kaggle | None | Gender |
These can all be loaded using Pandas:
import pandas as pd
dataset = pd.read_csv("file.csv")