Team Formation as Bundle Recommendation #227

hosseinfani · 2024-01-02T04:16:30Z

No description provided.

rkenny · 2024-01-02T16:29:11Z

(Copied from an email sent from [email protected] to [email protected] on 12/28/2023 at 11:56 am)

I've had a chance to go through some papers on DBLP to find implementations of bundle recommendation algorithms that had been published in top tier venues, that also had source code and datasets available. I was able to find 5, three of which I have been able to clone and run. The other two I was able to clone, but I do not yet have a GPU and their code was written to use CUDA on Nvidia cards so I have not yet run those.

I was able to get in touch with Zhang Zhenning about SUGER. Zhang mentioned that there had been no additional work done on the repo since the publication, so, the code in Github is still the latest available. It ran with minimal modifications. I needed to update the hard-coded paths to the right locations on my machine, and it seemed OK.

I have attached a spreadsheet with the details of the papers that I have reviewed and the repos. I think you had mentioned that the first steps would be to find papers, review them, clone and run the repos, then check in with you before going further?

papers review 1.xlsx

rkenny · 2024-02-13T16:41:20Z

I've been having tons of trouble getting some of these to run. I've tried more and more RAM, CPUs, nodes and GPUs. Eventually I was able to get some of them to run, but I have only had two data sets complete on one program - BGCN with imdb and github data.

BundleGT is saying there's an issue with the Top20 results, and CrossCBR and MIDGN are giving me index issues.

I'm not 100% sure which area to focus on... getting one or two methods to work on all of the data sets, whether or not I should focus on getting an entire dataset to run or continue to focus on getting a slice of the dataset to run on all the algorithms, and I'm not sure how deep I should dig into the code/data issues.
Details are in the attached file
work status.xlsx

hosseinfani · 2024-02-13T16:55:27Z

Hi Richard, @rkenny
Thanks for the update. Let's focus on those that were run on imdb and github.

1- Foremost, we need create a repo to include these methods in a clean/readable codebase/pipeline. You can create it in your github and I can transfer it to fani-lab or I can create an empty repo here and you push your code there.

2- We need to save the prediction of such methods and then evaluate them based on our own metrics. We can meet during this week or on Friday and I can explain more.

3- For dblp, the problem is with it's large number of experts and skills. We can filter them more. I will also explain this more in our meeting.

rkenny · 2024-02-14T19:05:58Z

I have created a repo at https://github.com/rkenny/4960A
I guessed at what should go in there. It is a local copy of the repos that were cloned, and the mapping utility, along with a slice of the OpeNTF data used for development

rkenny · 2024-03-01T13:51:07Z

I have been having all kinds of trouble with dblp. I've been trying to resolve an issue with the bundle_item dataset (which is papers - authors, if my notes are accurate). When I try to load the entire dataset, or a subset of the dataset, into the models, I'm getting an error from scipy saying that the row index exceeds the matrix dimensions. I'm working on narrowing down the cause... I suspect my mapper has a bug or two. The dataset is huge, so, I'm mostly working on getting samples from it more quickly to speed up debugging time in the actual program. I'll be working on this over the weekend, and early Monday morning.

Does it make sense for me to spend more of my time at this point working on improving the performance of the mapper/ETL tools for DBLP? Just want to make sure I'm going in a direction that makes sense.

hosseinfani · 2024-03-01T19:21:05Z

@rkenny
Thanks for the update.
Let's put dblp away for now and do all the steps for imdb, that is saving the peredictions and evaluate the results based on opentf's metrics.

rkenny · 2024-03-13T11:01:10Z

I have been able to get IMDB to run on BundleGT, BGCN and CrossCBR. I'm working on MIDGN still.
CrossCBR and MIDGN were failing because the tensor indices were split - some were on the CPU and some were on the GPU. I changed it to just use the CPU, but that is causing a huge slowdown as expected. MIDGN can't complete quickly enough, and is being killed on Narval. I'll adjust the settings a bit, and will see if it runs fast enough. Since it takes so long to complete, while that is running, I will see if I can get all the indexes on the GPU to run it properly.

hosseinfani · 2024-03-14T07:59:07Z

@rkenny
Thanks for the update. May I see the results for BundleGT and BGCN? I just wanted to see if the results make sense.

rkenny · 2024-03-15T16:15:45Z

I thought I was closer than it turned out to be... the data really doesn't make a lot of sense. Do you have time early in the week of the 17th-24th to discuss it? I'm concerned that I might be completely lost and not realizing it.

hosseinfani added the experiment Running a study or baseline for results label Jan 2, 2024

hosseinfani assigned rkenny Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Team Formation as Bundle Recommendation #227

Team Formation as Bundle Recommendation #227

hosseinfani commented Jan 2, 2024

rkenny commented Jan 2, 2024

rkenny commented Feb 13, 2024

hosseinfani commented Feb 13, 2024

rkenny commented Feb 14, 2024

rkenny commented Mar 1, 2024

hosseinfani commented Mar 1, 2024

rkenny commented Mar 13, 2024

hosseinfani commented Mar 14, 2024

rkenny commented Mar 15, 2024

Team Formation as Bundle Recommendation #227

Team Formation as Bundle Recommendation #227

Comments

hosseinfani commented Jan 2, 2024

rkenny commented Jan 2, 2024

rkenny commented Feb 13, 2024

hosseinfani commented Feb 13, 2024

rkenny commented Feb 14, 2024

rkenny commented Mar 1, 2024

hosseinfani commented Mar 1, 2024

rkenny commented Mar 13, 2024

hosseinfani commented Mar 14, 2024

rkenny commented Mar 15, 2024