Dealing with bipartite/monopartite datasets can be improved #7

pedroilidio · 2023-04-15T17:27:14Z

In different scenarios, we sometimes deal with bipartite formatted data, but sometimes the bipartite datasets are converted to the monopartite form, monopartite meaning that X is formed by pairwise concatenations of all possible X[0] and X[1] rows, and bipartite meaning X = [X[0], X[1]].

As mentioned in #5 (comment), the way of distinguishing between these two formats deserves more careful solutions than what we currently do:

bipartite_learn/bipartite_learn/utils/__init__.py

Lines 40 to 42 in 8499867

    
           def _X_is_multipartite(X): 
        
               # TODO: find a better way of deciding. 
        
               return isinstance(X, (tuple, list))

Even more so since some estimators do accept both types of input for predict() (tree-based models in general) while others only accept the bipartite format (the matrix factorization ones, for instance), but all of them should yield flattened predictions for better integration with scikit-learn scoring utilities, which I reckon can be quite confusing.

I suppose an estimator tag would be an appropriate way of signaling that.
Maybe a whole Dataset class would facilitate maintenance in the long term.

The text was updated successfully, but these errors were encountered:

pedroilidio added maintenance Facilitates long term maintenance of the project refactor Enhances code design without breaking performance discussion There are points open to discussion labels Apr 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with bipartite/monopartite datasets can be improved #7

Dealing with bipartite/monopartite datasets can be improved #7

pedroilidio commented Apr 15, 2023 •

edited

Loading

Dealing with bipartite/monopartite datasets can be improved #7

Dealing with bipartite/monopartite datasets can be improved #7

Comments

pedroilidio commented Apr 15, 2023 • edited Loading

pedroilidio commented Apr 15, 2023 •

edited

Loading