Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uspt gender labels and experiments #87

Open
Hamedloghmani opened this issue Nov 28, 2023 · 4 comments · Fixed by #88
Open

uspt gender labels and experiments #87

Hamedloghmani opened this issue Nov 28, 2023 · 4 comments · Fixed by #88
Assignees

Comments

@Hamedloghmani
Copy link
Member

Hi @edwinpaul121 and @gabrielrueda
Please log the process for extracting gender labels for uspt dataset in this issue page and let me know if you have any questions.
Thank you.

@gabrielrueda
Copy link
Member

Hi @Hamedloghmani,

@edwinpaul121 and I started working on the gender mappings for uspt, and we were able to generate the gender.csv file.
Here is the code

mappings = {}


with open("data/preprocessed/uspt/patent.tsv.filtered.mt75.ts3/teams.pkl", "rb") as f:
    with open("data/preprocessed/uspt/patent.tsv.filtered.mt75.ts3/indexes.pkl", "rb") as f_2:
        teams_pkl = pkl.load(f)
        indexes_pkl = pkl.load(f_2)

        # print(teams_pkl[0])
        c2i = indexes_pkl['c2i']

        for patent in teams_pkl:
            for member in patent.members:
                ind = c2i[member.id + "_" + member.name]
                if(ind not in mappings):
                    mappings[ind] = member.gender

        df = pd.DataFrame.from_dict(mappings, orient="index", columns=["gender"])
        df.to_csv("data/preprocessed/uspt/patent.tsv.filtered.mt75.ts3/gender.csv")

However, we have a few concerns about our code:

  1. We have to run our code in the OpeNTF repo since it needs access to the patent.py and inventor.py files in the cmn folder.
  2. Some of the gender results were null. Should I assume these to be True or just leave them as False.
  3. Also, a large number of the results are True (male). I will check the results to confirm if this is intentional.

@Hamedloghmani
Copy link
Member Author

Thank you so much @gabrielrueda and @edwinpaul121
We will discuss issue 1 on Friday.
2) Please leave them empty, I'll handle it in my own code.
3) Thanks a lot, please let me know.

@gabrielrueda
Copy link
Member

Hi @Hamedloghmani, I just wanted to let you know that I checked some of the gender values with those in the inventor.tsv file in the USPT dataset and can confirm that the gender values were valid. Also, I'll upload the resulting gender.csv file in the Adila teams channel -> USPT Labelling Files

@Hamedloghmani
Copy link
Member Author

Hi @gabrielrueda . Thanks a lot for the update and confirmation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants