Compare statistics between PWC and LPWC #4

VladimirAlexiev · 2023-12-10T13:23:18Z

Here are some stats coming from

https://portal.paperswithcode.com/: only the site https://paperswithcode.com/ devoted to Machine Learning:
https://linkedpaperswithcode.com/ image (confirmed by this query https://api.triplydb.com/s/ZGUbEvE87):

Site	papers	tasks	methods	models	datasets	evaluations	repos	repoRefs
PWC	114210	4571	2166		8922 or 15539	11906
LPWC v1	99367 (w code) or 376557 (w/o code)	4267	2101	24598	8322	52519	153476	192235
Ratio	1.15 or 0.30	1.07	1.03	0.00	1.07 or 1.87	0.23	0.00	0.00

(This is an emacs org-mode formula to compute the ratios.
0.00 doesn't mean there are no such in PWC, just that they are not on the homepage)

#+TBLFM: @4=@2/@3;%.2f

The results are puzzling:

LPWC lists 3.3x more papers and 4x more evaluations than PWC. But LWPC is 5.5m behind PWC?
PWC lists more tasks, and a lot more datasets than LPWC?

I think that the PWC stats are very much outdated? But how to explain the "crossed" ratios on different kinds of objects?

The text was updated successfully, but these errors were encountered:

VladimirAlexiev · 2023-12-10T14:07:14Z

Posted paperswithcode/paperswithcode-data#29
and asked on X: https://twitter.com/valexiev1/status/1733850721194250249

davidlamprecht · 2023-12-10T14:41:47Z

Papers with Code:
"Papers with Code" refers to papers that have provided code (there are also papers on PWC/LPWC that do not have code). I.e. for LPWC these are papers that have a repository.
The SPARQL query for this key figure is:

PREFIX lpwc: <https://linkedpaperswithcode.com/property/>

SELECT (COUNT(DISTINCT ?paper) AS ?count)
WHERE {
    ?paper a <https://linkedpaperswithcode.com/class/paper> .
    {
        {?paper lpwc:hasOfficialRepository ?code .}
        UNION
        {?paper lpwc:hasRepository ?code .}
    }
}

Result: 99367

Datasets:
The official paperswithcode website also shows some key figures. E.g. 8,919 datasets. See: https://paperswithcode.com/datasets

Evaluations:
I think the difference in the number of the evaluations comes from the fact that evaluations are defined differently. In LPWC, an evaluation depends on the task, model and dataset used (these must be identical). Note: there are 13,289 papers in LPWC that have an evaluation.

I will have a closer look at the data in the coming weeks when I regenerate and update the LPWC Knowledge Graph.

VladimirAlexiev · 2023-12-11T11:19:26Z

hi @davidlamprecht !

I edited the first comment and added your alternative numbers and alternative PWC numbers to the table
I notice that the stats at https://portal.paperswithcode.com/ do update
I think the count at https://paperswithcode.com/datasets is right, the one at "portal" is wrong
Now the ratios start to make sense: LWPC is 3..15% behind PWC depending on the kind of entity
"coming weeks when I regenerate and update": thanks a lot!! ONTO and SINTEF may be able to help with this task in January in relation to the "InnoGraph" project (part of "enrichMyData")

VladimirAlexiev mentioned this issue Dec 10, 2023

Compare statistics between PWC and LPWC paperswithcode/paperswithcode-data#29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare statistics between PWC and LPWC #4

Compare statistics between PWC and LPWC #4

VladimirAlexiev commented Dec 10, 2023 •

edited by davidlamprecht

Loading

VladimirAlexiev commented Dec 10, 2023

davidlamprecht commented Dec 10, 2023

VladimirAlexiev commented Dec 11, 2023

Compare statistics between PWC and LPWC #4

Compare statistics between PWC and LPWC #4

Comments

VladimirAlexiev commented Dec 10, 2023 • edited by davidlamprecht Loading

VladimirAlexiev commented Dec 10, 2023

davidlamprecht commented Dec 10, 2023

VladimirAlexiev commented Dec 11, 2023

VladimirAlexiev commented Dec 10, 2023 •

edited by davidlamprecht

Loading