-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What's the meanings of dataset? #2
Comments
name_to_pubs_train contains matchings of persons and papers, which is to train global metric learning model and cluster size estimation model. name_to_pubs_test is for evaluation. Please see our paper for details. |
谢谢您的回复! 内层字典是我疑惑的地方,请问内层字典的key和value分别代表什么呢,是不是内层字典的key是某个会议,value list中的单个元素(如XXX-1)是这个会议下的论文(是不是XXX-1代表XXX论文的一作)呢?另,这个编码是怎么得到的呢,直接用论文和会议的名字可以吗? 十分感谢! |
内层字典的key是person id,value是这个人发表的论文id列表。论文id, 如XXX-1表示这个作者是第几作者,从零开始计数。 name_to_pubs_train_500.json: This file can be used for training data, which includes name-person-paper mapping relations. Data schema: This file is a dictionary (denoted as dic1) saved as a json object. The keys of dic1 are author name. The values of dic1 are person dictionary (denoted as dic2). The keys of dic2 are person id. The values of dic2 are list of paper ID authored by this person. name_to_pubs_test_100.json: This file can be used for testing data, which includes name-person-paper mapping relations. Its data schema is the same as name_to_pubs_train_500.json. |
另问,最终消歧的聚类结果是需要自己保存吧(我在train.py中看到了一行调用了clustering,它的结果就是聚类结果吧)?. |
yes. The disambiguation results are obtained by clustering (in train.py). |
Hello!
Thanks for sharing.Could I konw the meaning of two input files, name_to_pubs_train and name_to_pubs_test?
The text was updated successfully, but these errors were encountered: