Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task Parsing Part for Pytorch Implementation #552

Draft
wants to merge 134 commits into
base: master
Choose a base branch
from
Draft

Conversation

ZhengTang1120
Copy link

@MihaiSurdeanu @bethard @kwalcock
Here is my current code of the MTL in torch. Thanks to Steve, I already fixed few bugs in the code.

This is just the task manager and file reader part, I will do another pull after I get the NER task implemented.

@MihaiSurdeanu
Copy link
Contributor

Looks nice so far!

Copy link
Member

@kwalcock kwalcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How in the world can I delete this comment?

Comment on lines +15 to +16
w2i = {}
i = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this might help. In the previous version with the head start of i = 1, it seems like the wrong vectors might have been used. If one looked up "," in w2i, it might have been mapped to 2 instead of 1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because we treated empty string "" and unknown "" differently in the previous version, 0 was token by , and i was starting from 1.
In the current version, the "" and "" share the same embedding, so we do not need an extra id for ""/"".

@@ -21,15 +21,13 @@ def load(config):
else:
delimiter = " "
word, *rest = line.rstrip().split(delimiter)
word = "<UNK>" if word == "" else word
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IF Python is OK using an empty string as a key, this should not be necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is easier to change the key here instead of changing all tokens through the codes...

emb_dict["<UNK>"] = vector
else:
emb_dict[word] = vector
emb_dict[word] = vector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are two copies of the arrays being kept temporarily: one in emb_dict and another in weights? If memory is an issue, it seems like one could record this vector right away in weights.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I will refine this later. Thanks!

@kwalcock kwalcock marked this pull request as draft February 15, 2023 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants