Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the meanings of dataset? #2

Open
zzbzzb1413 opened this issue Nov 11, 2018 · 5 comments
Open

What's the meanings of dataset? #2

zzbzzb1413 opened this issue Nov 11, 2018 · 5 comments

Comments

@zzbzzb1413
Copy link

Hello!
Thanks for sharing.Could I konw the meaning of two input files, name_to_pubs_train and name_to_pubs_test?

@zfjsail
Copy link
Collaborator

zfjsail commented Nov 13, 2018

name_to_pubs_train contains matchings of persons and papers, which is to train global metric learning model and cluster size estimation model. name_to_pubs_test is for evaluation. Please see our paper for details.

@zzbzzb1413
Copy link
Author

谢谢您的回复!
感觉这个问题我用英语有点说不清楚,用中文了,哈哈。
就是我看name_to_pubs 这两个文件,最外层是个字典,然后外层字典的key是人名(作者),外层的value也是一个字典(内层字典)。内层字典的key和value分别是一个编码串和一个由编码串组成的list。

内层字典是我疑惑的地方,请问内层字典的key和value分别代表什么呢,是不是内层字典的key是某个会议,value list中的单个元素(如XXX-1)是这个会议下的论文(是不是XXX-1代表XXX论文的一作)呢?另,这个编码是怎么得到的呢,直接用论文和会议的名字可以吗?

十分感谢!

@zfjsail
Copy link
Collaborator

zfjsail commented Nov 13, 2018

内层字典的key是person id,value是这个人发表的论文id列表。论文id, 如XXX-1表示这个作者是第几作者,从零开始计数。

name_to_pubs_train_500.json: This file can be used for training data, which includes name-person-paper mapping relations.

Data schema: This file is a dictionary (denoted as dic1) saved as a json object. The keys of dic1 are author name. The values of dic1 are person dictionary (denoted as dic2). The keys of dic2 are person id. The values of dic2 are list of paper ID authored by this person.

name_to_pubs_test_100.json: This file can be used for testing data, which includes name-person-paper mapping relations. Its data schema is the same as name_to_pubs_train_500.json.

@zzbzzb1413
Copy link
Author

另问,最终消歧的聚类结果是需要自己保存吧(我在train.py中看到了一行调用了clustering,它的结果就是聚类结果吧)?.
谢谢!

@zfjsail
Copy link
Collaborator

zfjsail commented Nov 13, 2018

yes. The disambiguation results are obtained by clustering (in train.py).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants