Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Complexity Library (DCoL) Metafeatures #91

Open
bjschoenfeld opened this issue Jun 13, 2018 · 5 comments
Open

Data Complexity Library (DCoL) Metafeatures #91

bjschoenfeld opened this issue Jun 13, 2018 · 5 comments

Comments

@bjschoenfeld
Copy link
Member

Taken from the README:

The data complexity library (DCoL) is a library implemented in C++ that provides the implementation of a set of measures designed to characterize the apparent complexity of data sets for supervised learning, which were originally proposed by Ho and Basu (2002).

https://github.com/nmacia/dcol

Do we wrap this C++ library or re-implement them in Python (or not use them at all)?

These metafeatures/library were suggested by @MichaelMMeskhi

@joaquinvanschoren
Copy link
Collaborator

Cool. If possible, it would be best to reimplement them in Python to allow future extensions.
I know a PhD student working on data complexity, I can ask whether he is interested.

@MichaelMMeskhi
Copy link

@joaquinvanschoren That would be better yes. There won't be any problem just the reimplementation would require good speed optimization to compensate what C++ offers thats all.

@bjschoenfeld
Copy link
Member Author

There is also an R implementation called ECol by Luis Paolo which has some bug fixes from the original implementation.

@bjschoenfeld
Copy link
Member Author

Here is the link to ECol. Luis Paulo, one of the authors is willing to discuss any bug fixes or updates.

It will be up to us to wrap or re-implement these metafeatures. It is not yet clear to me whether we should wrap or re-implement. Re-implementing is preferred in many ways, but is simply a matter of dedicating hours to it.

@MichaelMMeskhi
Copy link

Re-implementing is the way to go. Think long term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants