-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add usecase with pretrained embeddings #508
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Documentation preview |
@radekosmulski thanks for the PR. Arent we already capable of doing this?
dont we have such functionality right now? if not, what's missing can you please identify? thanks. |
@rnyak I am not fully sure if I am understanding the functionality right but if I right now do the following:
whatever embeddings would be automatically created for my If I do the following:
there is no
or can I? 🤔 Will try that out right now 😄 |
Just tried that
and I believe only a single embedding matrix can be associated with a This would be a super useful feature to have (and would allow us to train multimodal models). I am thinking that this example might be good to go as is, it demonstrates the use of pretrained embeddings. There is also a workaround to this I believe, by creating duplicated Maybe we could merge this use case and continue the conversation in NVIDIA-Merlin/Merlin#211? |
I think it is expected to have one embedding table per input column. What I meant, if you have say image embeddings per movie_id, you can create an extra column in the input dataframe by copying the
|
It works but I was not sure if this is the solution we want? That was my entire point. If we are happy with this approach, of duplicating the ID column, then that's perfect. We have everything we need here 🙂 And I am thinking you might be right -- this might be the preferred way to do this, as opposed to probably significantly increasing code complexity just to address this in a more elegant fashion (via being able to tie multiple embeddings to the same ID column) |
@gabrielspmoreira can you provide your input pls? thanks. |
Let's loop Gabriel in and get his opinion as well. |
Yes @radekosmulski and @rnyak . Currently there is a many-to-one mapping between categorical features and embedding table. Meaning that one embedding table can be shared by multiple categorical features in the schema.pbtxt but not the opposite.
Allowing many-to-many relationship between categ features and embedding tables would add complexity to the schema, requiring changes in both NVTabular and Merlin Models. |
Another possibility would be "cloning" features in the model side. For example, we could have a clone_features_block = CloneFeatures(cloning_dict={"item_id": ["item_product_image", "item_product_text"]})
new_schema = clone_features_block.get_schema_with_cloned_features(schema) Then you could define DLRM model using the low-level API like in this example, so that you can create your own embedding_block like this, setting the embedding_block = mm.EmbeddingFeatures.from_schema(
new_schema, pre=clone_features_block,
embedding_options=mm.EmbeddingOptions(
embedding_dims={'item_id': 128, 'item_product_image': 256, 'item_product_text': 512}
embeddings_initializers={
"item_product_image": mm.TensorInitializer(pretrained_product_image_embs),
"item_product_text": mm.TensorInitializer(pretrained_product_text_embs),
}
)
) Here is a prototype of how such @tf.keras.utils.register_keras_serializable(package="merlin.models")
class CloneFeatures(TabularBlock):
def __init__(
self, cloning_dict, name=None, **kwargs
):
self.cloning_dict = cloning_dict
super().__init__(name=name, **kwargs)
def call(self, inputs: TabularData, **kwargs) -> TabularData:
outputs = inputs
for feat_name in inputs:
if feat_name in self.cloning_dict:
new_feature_names = self.cloning_dict[feat_name]
for new_feat_name in new_feature_names:
outputs[new_feat_name] = inputs[feat_name]
return outputs
def compute_call_output_shape(self, input_shape):
output_shape = {k: v for k, v in input_shape.items()}
for original_feat_name, new_feat_names in self.cloning_dict.items():
for new_feat_name in new_feat_names:
output_shape[new_feat_name] = output_shape[original_feat_name]
return output_shape
def get_config(self):
config = super().get_config()
config["cloning_dict"] = self.feature_names
return config
def get_schema_with_cloned_features(self, schema):
new_column_schemas = []
for column_schema in schema:
if column_schema.name in self.cloning_dict:
new_feature_names = self.cloning_dict[column_schema.name]
for new_feat_name in new_feature_names:
new_col_schema = copy.deepcopy(column_schema)
new_col_schema.name = new_feat_name
new_column_schemas.append(new_col_schema)
new_schema = Schema([schema] + new_column_schemas)
return new_schema Would that design make sense @marcromeyn @sararb ? |
In such cases, you might prefer setting the pre-trained embedding to |
that's a good point. why we need to tag |
Click to view CI ResultsGitHub pull request #508 of commit 0153eb7ec5c89bf6ee66be86f4ecc8bf26d6c813, no merge conflicts. Running as SYSTEM Setting status of 0153eb7ec5c89bf6ee66be86f4ecc8bf26d6c813 to PENDING with url https://10.20.13.93:8080/job/merlin_models/477/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/508/*:refs/remotes/origin/pr/508/* # timeout=10 > git rev-parse 0153eb7ec5c89bf6ee66be86f4ecc8bf26d6c813^{commit} # timeout=10 Checking out Revision 0153eb7ec5c89bf6ee66be86f4ecc8bf26d6c813 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 0153eb7ec5c89bf6ee66be86f4ecc8bf26d6c813 # timeout=10 Commit message: "Merge branch 'main' into pretrained_embeddings" > git rev-list --no-walk 9e0874de3b1cd855a48f4e731b224c9cc5c937b5 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins12249990184612973225.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.1.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 443 items / 3 skipped |
I think we can iterate on the example and merged a simpler version, just using I have multiple comments on the current notebook:
|
You are right @rnyak . Only the item id feature should be tagged with |
Click to view CI ResultsGitHub pull request #508 of commit f1760e6ff2f7644a2a6be89ba5e187b5a415da76, no merge conflicts. Running as SYSTEM Setting status of f1760e6ff2f7644a2a6be89ba5e187b5a415da76 to PENDING with url https://10.20.13.93:8080/job/merlin_models/490/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/508/*:refs/remotes/origin/pr/508/* # timeout=10 > git rev-parse f1760e6ff2f7644a2a6be89ba5e187b5a415da76^{commit} # timeout=10 Checking out Revision f1760e6ff2f7644a2a6be89ba5e187b5a415da76 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f f1760e6ff2f7644a2a6be89ba5e187b5a415da76 # timeout=10 Commit message: "edit example" > git rev-list --no-walk a59b2ba9a3b0a3aae925789bb9cac02d532ac466 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins11304442851683690518.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.1.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 443 items / 3 skipped |
Click to view CI ResultsGitHub pull request #508 of commit 9d9896cc1e553d74e7eb4edbf13ccb2039fada6e, no merge conflicts. Running as SYSTEM Setting status of 9d9896cc1e553d74e7eb4edbf13ccb2039fada6e to PENDING with url https://10.20.13.93:8080/job/merlin_models/491/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/508/*:refs/remotes/origin/pr/508/* # timeout=10 > git rev-parse 9d9896cc1e553d74e7eb4edbf13ccb2039fada6e^{commit} # timeout=10 Checking out Revision 9d9896cc1e553d74e7eb4edbf13ccb2039fada6e (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 9d9896cc1e553d74e7eb4edbf13ccb2039fada6e # timeout=10 Commit message: "further edits" > git rev-list --no-walk f1760e6ff2f7644a2a6be89ba5e187b5a415da76 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins10134747609301601276.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.1.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 443 items / 3 skipped |
Seems the CI is not too happy about the label, I wonder what should we do here? I can create the |
Click to view CI ResultsGitHub pull request #508 of commit 47d79470fddf683a429c9d4327848031d83d7497, no merge conflicts. Running as SYSTEM Setting status of 47d79470fddf683a429c9d4327848031d83d7497 to PENDING with url https://10.20.13.93:8080/job/merlin_models/492/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/508/*:refs/remotes/origin/pr/508/* # timeout=10 > git rev-parse 47d79470fddf683a429c9d4327848031d83d7497^{commit} # timeout=10 Checking out Revision 47d79470fddf683a429c9d4327848031d83d7497 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 47d79470fddf683a429c9d4327848031d83d7497 # timeout=10 Commit message: "switch to synthetic data" > git rev-list --no-walk 9d9896cc1e553d74e7eb4edbf13ccb2039fada6e # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins6159812050287910105.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.1.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 443 items / 3 skipped |
Click to view CI ResultsGitHub pull request #508 of commit 68e83a977835c4a5c08a5081cef5078d0446dba7, no merge conflicts. Running as SYSTEM Setting status of 68e83a977835c4a5c08a5081cef5078d0446dba7 to PENDING with url https://10.20.13.93:8080/job/merlin_models/493/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/508/*:refs/remotes/origin/pr/508/* # timeout=10 > git rev-parse 68e83a977835c4a5c08a5081cef5078d0446dba7^{commit} # timeout=10 Checking out Revision 68e83a977835c4a5c08a5081cef5078d0446dba7 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 68e83a977835c4a5c08a5081cef5078d0446dba7 # timeout=10 Commit message: "Merge branch 'main' into pretrained_embeddings" > git rev-list --no-walk 47d79470fddf683a429c9d4327848031d83d7497 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins6858541067953038040.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.1.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 451 items / 3 skipped |
Click to view CI ResultsGitHub pull request #508 of commit b78b7c13df03e3672406c9756740e92cf2719f83, no merge conflicts. Running as SYSTEM Setting status of b78b7c13df03e3672406c9756740e92cf2719f83 to PENDING with url https://10.20.13.93:8080/job/merlin_models/494/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/508/*:refs/remotes/origin/pr/508/* # timeout=10 > git rev-parse b78b7c13df03e3672406c9756740e92cf2719f83^{commit} # timeout=10 Checking out Revision b78b7c13df03e3672406c9756740e92cf2719f83 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f b78b7c13df03e3672406c9756740e92cf2719f83 # timeout=10 Commit message: "add unit test" > git rev-list --no-walk 68e83a977835c4a5c08a5081cef5078d0446dba7 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins14664627099080560220.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.1.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 452 items / 3 skipped |
What do you mean with CI is not happening?
|
@bschifferer it is a test regarding a label on the PR that is failing: I think it might not like the label I also see that we have another test failing regarding code formatting, will fix this now |
Click to view CI ResultsGitHub pull request #508 of commit b7cc3a8e4c0a9b12d597b8a555884a390aa4f5f5, no merge conflicts. Running as SYSTEM Setting status of b7cc3a8e4c0a9b12d597b8a555884a390aa4f5f5 to PENDING with url https://10.20.13.93:8080/job/merlin_models/496/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/508/*:refs/remotes/origin/pr/508/* # timeout=10 > git rev-parse b7cc3a8e4c0a9b12d597b8a555884a390aa4f5f5^{commit} # timeout=10 Checking out Revision b7cc3a8e4c0a9b12d597b8a555884a390aa4f5f5 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f b7cc3a8e4c0a9b12d597b8a555884a390aa4f5f5 # timeout=10 Commit message: "fix code formatting" > git rev-list --no-walk 819273a3be019a935097d909eaa7363e8fd8a726 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins6139602222839142494.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 452 items / 3 skipped |
I don't believe that this is an absolute necessity to have this before this work can be merged (we can add this at a later time possibly), but one thing I attempted here (which could be very helpful to our users) was freezing the embeddings. The way I went about it was that I found the embedding weights (after the model was constructed) and attempted to set the trainable variable to False. I tried Maybe this is something worth considering as part of Merlin#211? |
Click to view CI ResultsGitHub pull request #508 of commit 6645ed406cb8f191064838dc877e4a28d4e3e9c4, no merge conflicts. Running as SYSTEM Setting status of 6645ed406cb8f191064838dc877e4a28d4e3e9c4 to PENDING with url https://10.20.13.93:8080/job/merlin_models/497/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/508/*:refs/remotes/origin/pr/508/* # timeout=10 > git rev-parse 6645ed406cb8f191064838dc877e4a28d4e3e9c4^{commit} # timeout=10 Checking out Revision 6645ed406cb8f191064838dc877e4a28d4e3e9c4 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 6645ed406cb8f191064838dc877e4a28d4e3e9c4 # timeout=10 Commit message: "Merge branch 'main' into pretrained_embeddings" > git rev-list --no-walk b7cc3a8e4c0a9b12d597b8a555884a390aa4f5f5 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins18344621406300043254.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 455 items / 3 skipped |
Need support of non-trainable embeddings table |
A really good day of meetings yesterday 🙂 I would like to thank @bschifferer and @oliverholworthy for their invaluable feedback! 🙂 Apart from working on this PR, I spent some time modifying the example for I will push a new version of this example in a few minutes, just wanted to leave one comment before I do so. Tried to add additional explanations in the areas identified by @bschifferer. One thing I was not able to do though that we discussed was showing how the embedding table changes after training. The blocker here was that the embedding table doesn't seem to exist before calling The embedding matrix should be accessible here: BTW the fact that all this happens in I felt training for a single epoch and then showing the embedding table, and only then training for more epochs, is a little bit inelegant, hence opted for not including this comparison. But happy to add it if that would be the call on this one! |
Click to view CI ResultsGitHub pull request #508 of commit fd5fe6e0ed722d533574f3bba81ea5c26dc107c2, no merge conflicts. Running as SYSTEM Setting status of fd5fe6e0ed722d533574f3bba81ea5c26dc107c2 to PENDING with url https://10.20.13.93:8080/job/merlin_models/503/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/508/*:refs/remotes/origin/pr/508/* # timeout=10 > git rev-parse fd5fe6e0ed722d533574f3bba81ea5c26dc107c2^{commit} # timeout=10 Checking out Revision fd5fe6e0ed722d533574f3bba81ea5c26dc107c2 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f fd5fe6e0ed722d533574f3bba81ea5c26dc107c2 # timeout=10 Commit message: "update example" > git rev-list --no-walk 29be3a04a44cdf988aff02caa8fa372bf4e5622f # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins10863660206465941232.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 455 items / 3 skipped |
Click to view CI ResultsGitHub pull request #508 of commit d0bade894818051a1faf5eb63a2a27c55a564236, no merge conflicts. Running as SYSTEM Setting status of d0bade894818051a1faf5eb63a2a27c55a564236 to PENDING with url https://10.20.13.93:8080/job/merlin_models/504/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/508/*:refs/remotes/origin/pr/508/* # timeout=10 > git rev-parse d0bade894818051a1faf5eb63a2a27c55a564236^{commit} # timeout=10 Checking out Revision d0bade894818051a1faf5eb63a2a27c55a564236 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f d0bade894818051a1faf5eb63a2a27c55a564236 # timeout=10 Commit message: "Merge branch 'main' into pretrained_embeddings" > git rev-list --no-walk fd5fe6e0ed722d533574f3bba81ea5c26dc107c2 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins13193212495625040182.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /var/jenkins_home/.local/lib/python3.8/site-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from testbook) (0.5.13) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.0) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 455 items / 3 skipped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. I like the version as a first step.
We can update it with freezing the pretrained embeddings, when the feature is available.
@radekosmulski did you create a bug ticket for |
@rnyak apologies, didn't realize this would be a bug, thank you for catching this! Raised one now |
Add usecase with pretrained embeddings
Add usecase with pretrained embeddings
Following the great advice from @rnyak (thank you!!!) I was able to create a use case around using pretrained embeddings.
I use
TensorInitializer
to provide the model with pretrained embeddings for the movies.But while this works, I am not sure that this is what this functionality should look like (I am using preexisting pieces that probably were developed for some other purpose).
The full-blown scenario IMHO should look a little bit more like this:
For instance, in an extreme scenario where you have 3 different shots for each product, you should be able to train a regular embedding for that product, and then feed in each of the 3 pretrained image embeddings for each item.
Something like this
where the model would still be able to tie each of the embeddings to the
product_id
(to know which row to fetch for a given example).This is by all means not finished, just probably another step in the conversation.
resolves #421