-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: consolidate author remote_ids and wikidata identifiers #10092
base: master
Are you sure you want to change the base?
Changes from all commits
dacef92
1186696
dd82901
47b57c8
3c82121
5e58373
2c268b2
be0d1a8
d52c109
df683a1
8e7cb38
66744ef
5c51bcc
b5e4319
34da18d
76a1724
2ce0e17
3645e9e
0e9650b
9663789
8f0810d
2a3e617
0268018
99f3c93
5390b54
b06b60d
d7561f8
3e10c5d
4bd8e54
e445276
d633c62
4053014
e1bd982
62d1798
25d9243
bbe242f
15aa2b2
cae28f9
cb2a959
5cb9dc9
5370e41
b96a07d
4d4d091
da00d4e
d116d6e
8d83830
f7e61c0
3018a5f
878051c
9135ff5
113e6a7
42c05d0
e9fef40
5193267
2075f4e
dbac756
916c3ae
d1683a5
287bfe0
4ad37de
e8fb019
d8452d8
0d15c54
db5c4a9
3c46c9a
31d84a7
3dbe7b9
d6d303e
aa73b6b
a8fdcfb
916dabc
bb3c30c
e28c8cd
f6f6c99
beaf68c
e6fe169
408da50
e7f714b
8f01b5b
77b23d8
2aca912
7bfc39b
461b02a
147fab3
b73b8d1
92065ed
0750cb2
5842e13
83a00f8
bc31426
2e99560
cb14b14
8d4dac0
7d85a0f
f1b0edd
5b57217
0bd52e8
05f3644
12b96f4
6a8234c
b9c36e4
2fd4569
5a8ea7a
268f055
60e5d00
b6b68f3
e02ae78
d7dd818
ee00b9a
0ecba61
f952c4f
0e2c510
1265681
216cb50
eb8d3d4
d298d39
8c17a14
8917ae8
a174609
ad263aa
ab624f6
320be8b
f07e5fb
3bfe2d9
a0aaf40
d97921d
565853b
d84c5fd
f2fdde1
d9e921d
24fbffd
cac80aa
df6273b
ea1284d
cfd0377
2ac22d5
80cad49
b6988e2
ca98443
59ba078
4d20241
eb3e872
fbd4b49
1f0cd7e
6ab7e5e
9142c67
c62caef
ed15ca9
fd98ed8
1ba81d8
f8dcfb3
61aa478
d3439c1
7c141d5
de56c2a
7b9b590
df4d471
e00eb4c
b8fe653
81e8282
21bd563
5123a82
6d8864b
15cd9fa
c01a17e
a103564
cdcb18f
1f2c984
0ff56b1
27b991d
bd8f0c2
9592016
5c61c6b
4fd1491
357886d
2a4e7b9
e30d440
c161403
af0ebf9
9255be9
bb50c86
4669fc3
8c50a19
60fcc1b
38e6311
fc5f8a3
cbd0e96
79ef6e1
a5889df
67736a8
0d99753
77aeed1
ec9fe38
a4a906d
bb62b15
df24f8d
089dd71
3e18b6b
9995782
479ad66
21b83cb
8b3c798
f018c7c
b710116
27eb3cc
c1edc35
e0a7700
85da93d
eb9ddcd
fc9b86e
4cecf7b
9f4992a
2598c30
52e8b8c
a7126ea
f779dfb
eea0c09
1ca2012
82d99a8
933c625
ea9931f
21e59f1
f32536b
65fa8d2
6c5006b
9f2f6a9
009749e
db20d32
c2b7908
9aa8835
41cc424
ad3bf1c
6916035
3fc91f8
82c19f1
46e22ce
eed54d0
033b822
3f5ea0a
31a0b9a
51d8618
25a3383
ca286d3
cfdb166
e4389c4
d3bade0
011e5df
598f1c7
03f12b5
0e0fb1d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
""" | ||
Copies all author identifiers from the author's stored Wikidata info into their remote_ids. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. question for later: should this also scrape wikidata for authors that have an OL ID on their side but we don't have their wikidata json on our side? not sure if any of these actually exist There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pretty sure they exist, but I’d suggest leaving that out of this one, and consider it for a later PR. Better to keep commits/PRs as atomic as possible. :) |
||
|
||
To Run: | ||
|
||
PYTHONPATH=. python ./scripts/populate_author_identifiers.py /olsystem/etc/openlibrary.yml | ||
|
||
(If testing locally, run inside `docker compose exec web bash` and use ./conf/openlibrary.yml) | ||
""" | ||
|
||
#!/usr/bin/env python | ||
import web | ||
|
||
import infogami | ||
from openlibrary.config import load_config | ||
from openlibrary.core import db | ||
from openlibrary.core.wikidata import get_wikidata_entity | ||
from scripts.solr_builder.solr_builder.fn_to_cli import FnToCLI | ||
|
||
|
||
def main(ol_config: str): | ||
""" | ||
:param str ol_config: Path to openlibrary.yml file | ||
""" | ||
load_config(ol_config) | ||
infogami._setup() | ||
|
||
# how i fix this lol there's no IP when running from within docker | ||
web.ctx.ip = '127.0.0.1' | ||
|
||
for row in db.query("select id from wikidata"): | ||
e = get_wikidata_entity(row.id) | ||
if e is not None: | ||
e.consolidate_remote_author_ids() | ||
|
||
|
||
# Get wikidata for authors who dont have it yet? | ||
|
||
if __name__ == "__main__": | ||
FnToCLI(main).run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we plan to do anything with the ordering, it’d probably be better to just do something like alphabetic ordering so it’s easy to add new identifiers to this.
Also, an alternative approach could maybe be to add a
wikidata
item toidentifiers.yml
which could be read here? Otherwise this approach means that there are more places to edit when adding/editing identifiers (e.g., #9982 (pending) and #10052 (merged and live on prod, but the identifier is not included here)). This would also mean that we wouldn’t need to maintain and handle separateREMOTE_IDS
lists for authors, editions, and works (e.g.,musicbrainz
andbookbrainz
have different Wikidata properties depending on whether it’s an Author, Edition, or Work, which can’t be handled with this current structure).