Feed pre-trained embeddings to NVTabular #124

MelissaKR · 2022-03-30T14:41:47Z

What is your question?
I have a dataset that includes a column feature of pre-trained embeddings. I couldn't find any documentations or examples on how this column should be passed to NVTabular. Is it treated as a continuous feature?

rnyak · 2022-04-12T18:35:35Z

@MelissaKR thanks for the question. Can you tell us more about your use case?

Basically, if you want to do some feature pre-processing on the column of pre-trained embeddings, yes, you can feed them as continuous features to NVTabular.

Let us know if you have further questions.

MelissaKR · 2022-05-11T19:51:59Z

@rnyak Thank you for your response. I basically have another model that outputs embeddings for a given set of features, and I want to replace those features in the original model with the embeddings I have obtained.
Should I simply pass these new feature columns as conts in TorchAsyncItr? It'll be great if I could see an example code of how pre-trained embeddings are passed to NVTabular's TorchAsyncItr.

viswa-nvidia · 2023-04-11T17:15:46Z

@rnyak , to follow up on this.

rnyak · 2023-04-13T11:06:51Z

@MelissaKR this issue was open for a while. do you mind giving a bit more detail about what you want to do with the embeddings you are getting from another model and what's your original model? We are currently supporting feeding embeddings to embedding layer you can see that tensorflow example. let us know if that's something you were looking for, or something else? thanks.

MelissaKR · 2023-04-19T14:23:39Z

@rnyak Thank you for getting back to me on this! I have a main model and in that model, let's say I have a feature for different movies. I can pass it as a regular categorical feature to then be fed to the embedding layer. My model uses PyTorch, by the way. But I have trained a different model that uses collaborative filtering which learns embeddings for these movies much better. So now, for each movie in the training and validation set for the main model, I have vectors of size n that are the learned embeddings. And I don't need to use this movie feature anymore and pass it to an embedding layer. Instead, I want to remove it from my dataset and use the learned embeddings from the second model, but I want to see if there is a straightforward way of doing this, instead of manually defining n new numeric features for each element in the new movie embeddings and pass them to NVTabular. In other words, how can I pass pre-trained embeddings as is to my model?
I hope I could clarify my question and use case.

rnyak · 2023-04-20T07:02:50Z

@MelissaKR thanks for the clarification. we are currently working on that and we will be creating an example shortly. Example might not be on PyT but you can adapt it to your framework I believe :) Can you please tell me what's the architecture of your main model? it is an MLP model? or a more complicated architecture? Besides can you share a simple screenshot what would your data look like ? contains nested 3D arrays? or is it something like below?

movie_id movie_embedding
1 [float1, float2, ..., float64]
2 [float1, float2, ..., float64]
..
n [float1, float2, ..., float64]

or more like this
movie_id. movie_genres_id. movie_genres_embeddings
1 [1, 2, 3] [[float1, float2, ..., float64] , [float1, float2, ..., float64] , ...]
2 [3,5] [float1, float2, ..., float64] , [float1, float2, ..., float64] ]

MelissaKR added the question Further information is requested label Mar 30, 2022

karlhigley assigned rnyak Apr 12, 2022

karlhigley transferred this issue from NVIDIA-Merlin/NVTabular Apr 4, 2023

karlhigley added this to the Merlin 23.04 milestone Apr 4, 2023

viswa-nvidia modified the milestones: Merlin 23.04, Merlin 23.05 Apr 11, 2023

viswa-nvidia mentioned this issue May 17, 2023

[RMP] Support pre-trained vector embeddings as input features into a model via the dataloader NVIDIA-Merlin/Merlin#211

Closed

33 tasks

viswa-nvidia modified the milestones: Merlin 23.05, Merlin 23.06 May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feed pre-trained embeddings to NVTabular #124

Feed pre-trained embeddings to NVTabular #124

MelissaKR commented Mar 30, 2022

rnyak commented Apr 12, 2022

MelissaKR commented May 11, 2022

viswa-nvidia commented Apr 11, 2023

rnyak commented Apr 13, 2023

MelissaKR commented Apr 19, 2023

rnyak commented Apr 20, 2023

Feed pre-trained embeddings to NVTabular #124

Feed pre-trained embeddings to NVTabular #124

Comments

MelissaKR commented Mar 30, 2022

rnyak commented Apr 12, 2022

MelissaKR commented May 11, 2022

viswa-nvidia commented Apr 11, 2023

rnyak commented Apr 13, 2023

MelissaKR commented Apr 19, 2023

rnyak commented Apr 20, 2023