Dealing with bipartite/monopartite datasets can be improved #7
Labels
discussion
There are points open to discussion
maintenance
Facilitates long term maintenance of the project
refactor
Enhances code design without breaking performance
In different scenarios, we sometimes deal with bipartite formatted data, but sometimes the bipartite datasets are converted to the monopartite form, monopartite meaning that
X
is formed by pairwise concatenations of all possibleX[0]
andX[1]
rows, and bipartite meaningX = [X[0], X[1]]
.As mentioned in #5 (comment), the way of distinguishing between these two formats deserves more careful solutions than what we currently do:
bipartite_learn/bipartite_learn/utils/__init__.py
Lines 40 to 42 in 8499867
Even more so since some estimators do accept both types of input for
predict()
(tree-based models in general) while others only accept the bipartite format (the matrix factorization ones, for instance), but all of them should yield flattened predictions for better integration withscikit-learn
scoring utilities, which I reckon can be quite confusing.Dataset
class would facilitate maintenance in the long term.The text was updated successfully, but these errors were encountered: