Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare statistics between PWC and LPWC #4

Open
VladimirAlexiev opened this issue Dec 10, 2023 · 3 comments
Open

Compare statistics between PWC and LPWC #4

VladimirAlexiev opened this issue Dec 10, 2023 · 3 comments

Comments

@VladimirAlexiev
Copy link

VladimirAlexiev commented Dec 10, 2023

Here are some stats coming from

Site papers tasks methods models datasets evaluations repos repoRefs
PWC 114210 4571 2166 8922 or 15539 11906
LPWC v1 99367 (w code) or 376557 (w/o code) 4267 2101 24598 8322 52519 153476 192235
Ratio 1.15 or 0.30 1.07 1.03 0.00 1.07 or 1.87 0.23 0.00 0.00

(This is an emacs org-mode formula to compute the ratios.
0.00 doesn't mean there are no such in PWC, just that they are not on the homepage)

#+TBLFM: @4=@2/@3;%.2f

The results are puzzling:

  • LPWC lists 3.3x more papers and 4x more evaluations than PWC. But LWPC is 5.5m behind PWC?
  • PWC lists more tasks, and a lot more datasets than LPWC?

I think that the PWC stats are very much outdated? But how to explain the "crossed" ratios on different kinds of objects?

@VladimirAlexiev
Copy link
Author

@davidlamprecht
Copy link
Collaborator

Papers with Code:
"Papers with Code" refers to papers that have provided code (there are also papers on PWC/LPWC that do not have code). I.e. for LPWC these are papers that have a repository.
The SPARQL query for this key figure is:

PREFIX lpwc: <https://linkedpaperswithcode.com/property/>

SELECT (COUNT(DISTINCT ?paper) AS ?count)
WHERE {
    ?paper a <https://linkedpaperswithcode.com/class/paper> .
    {
        {?paper lpwc:hasOfficialRepository ?code .}
        UNION
        {?paper lpwc:hasRepository ?code .}
    }
}

Result: 99367

Datasets:
The official paperswithcode website also shows some key figures. E.g. 8,919 datasets. See: https://paperswithcode.com/datasets

Evaluations:
I think the difference in the number of the evaluations comes from the fact that evaluations are defined differently. In LPWC, an evaluation depends on the task, model and dataset used (these must be identical). Note: there are 13,289 papers in LPWC that have an evaluation.

I will have a closer look at the data in the coming weeks when I regenerate and update the LPWC Knowledge Graph.

@VladimirAlexiev
Copy link
Author

hi @davidlamprecht !

  • I edited the first comment and added your alternative numbers and alternative PWC numbers to the table
  • I notice that the stats at https://portal.paperswithcode.com/ do update
  • I think the count at https://paperswithcode.com/datasets is right, the one at "portal" is wrong
  • Now the ratios start to make sense: LWPC is 3..15% behind PWC depending on the kind of entity
  • "coming weeks when I regenerate and update": thanks a lot!! ONTO and SINTEF may be able to help with this task in January in relation to the "InnoGraph" project (part of "enrichMyData")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants