Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Team Formation as Bundle Recommendation #227

Open
hosseinfani opened this issue Jan 2, 2024 · 9 comments
Open

Team Formation as Bundle Recommendation #227

hosseinfani opened this issue Jan 2, 2024 · 9 comments
Assignees
Labels
experiment Running a study or baseline for results

Comments

@hosseinfani
Copy link
Member

No description provided.

@hosseinfani hosseinfani added the experiment Running a study or baseline for results label Jan 2, 2024
@rkenny
Copy link

rkenny commented Jan 2, 2024

(Copied from an email sent from [email protected] to [email protected] on 12/28/2023 at 11:56 am)

I've had a chance to go through some papers on DBLP to find implementations of bundle recommendation algorithms that had been published in top tier venues, that also had source code and datasets available. I was able to find 5, three of which I have been able to clone and run. The other two I was able to clone, but I do not yet have a GPU and their code was written to use CUDA on Nvidia cards so I have not yet run those.

I was able to get in touch with Zhang Zhenning about SUGER. Zhang mentioned that there had been no additional work done on the repo since the publication, so, the code in Github is still the latest available. It ran with minimal modifications. I needed to update the hard-coded paths to the right locations on my machine, and it seemed OK.

I have attached a spreadsheet with the details of the papers that I have reviewed and the repos. I think you had mentioned that the first steps would be to find papers, review them, clone and run the repos, then check in with you before going further?

papers review 1.xlsx

@rkenny
Copy link

rkenny commented Feb 13, 2024

I've been having tons of trouble getting some of these to run. I've tried more and more RAM, CPUs, nodes and GPUs. Eventually I was able to get some of them to run, but I have only had two data sets complete on one program - BGCN with imdb and github data.

BundleGT is saying there's an issue with the Top20 results, and CrossCBR and MIDGN are giving me index issues.

I'm not 100% sure which area to focus on... getting one or two methods to work on all of the data sets, whether or not I should focus on getting an entire dataset to run or continue to focus on getting a slice of the dataset to run on all the algorithms, and I'm not sure how deep I should dig into the code/data issues.
Details are in the attached file
work status.xlsx

@hosseinfani
Copy link
Member Author

Hi Richard, @rkenny
Thanks for the update. Let's focus on those that were run on imdb and github.

1- Foremost, we need create a repo to include these methods in a clean/readable codebase/pipeline. You can create it in your github and I can transfer it to fani-lab or I can create an empty repo here and you push your code there.

2- We need to save the prediction of such methods and then evaluate them based on our own metrics. We can meet during this week or on Friday and I can explain more.

3- For dblp, the problem is with it's large number of experts and skills. We can filter them more. I will also explain this more in our meeting.

@rkenny
Copy link

rkenny commented Feb 14, 2024

I have created a repo at https://github.com/rkenny/4960A
I guessed at what should go in there. It is a local copy of the repos that were cloned, and the mapping utility, along with a slice of the OpeNTF data used for development

@rkenny
Copy link

rkenny commented Mar 1, 2024

I have been having all kinds of trouble with dblp. I've been trying to resolve an issue with the bundle_item dataset (which is papers - authors, if my notes are accurate). When I try to load the entire dataset, or a subset of the dataset, into the models, I'm getting an error from scipy saying that the row index exceeds the matrix dimensions. I'm working on narrowing down the cause... I suspect my mapper has a bug or two. The dataset is huge, so, I'm mostly working on getting samples from it more quickly to speed up debugging time in the actual program. I'll be working on this over the weekend, and early Monday morning.

Does it make sense for me to spend more of my time at this point working on improving the performance of the mapper/ETL tools for DBLP? Just want to make sure I'm going in a direction that makes sense.

@hosseinfani
Copy link
Member Author

@rkenny
Thanks for the update.
Let's put dblp away for now and do all the steps for imdb, that is saving the peredictions and evaluate the results based on opentf's metrics.

@rkenny
Copy link

rkenny commented Mar 13, 2024

I have been able to get IMDB to run on BundleGT, BGCN and CrossCBR. I'm working on MIDGN still.
CrossCBR and MIDGN were failing because the tensor indices were split - some were on the CPU and some were on the GPU. I changed it to just use the CPU, but that is causing a huge slowdown as expected. MIDGN can't complete quickly enough, and is being killed on Narval. I'll adjust the settings a bit, and will see if it runs fast enough. Since it takes so long to complete, while that is running, I will see if I can get all the indexes on the GPU to run it properly.

@hosseinfani
Copy link
Member Author

@rkenny
Thanks for the update. May I see the results for BundleGT and BGCN? I just wanted to see if the results make sense.

@rkenny
Copy link

rkenny commented Mar 15, 2024

I thought I was closer than it turned out to be... the data really doesn't make a lot of sense. Do you have time early in the week of the 17th-24th to discuss it? I'm concerned that I might be completely lost and not realizing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experiment Running a study or baseline for results
Projects
None yet
Development

No branches or pull requests

2 participants