Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on a consistent scheme for the xml:id #53

Open
alexwlchan opened this issue Oct 5, 2022 · 3 comments
Open

Decide on a consistent scheme for the xml:id #53

alexwlchan opened this issue Oct 5, 2022 · 3 comments

Comments

@alexwlchan
Copy link
Contributor

At the top of every TEI file is a <TEI> element with an xml:id, e.g.

<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="Hebrew_B_21">

Currently they're a bit of an inconsistent mess:

List of all xml:id values
manuscript_15808
manuscript_15653
manuscript_16448
manuscript_16456
manuscript_16464
manuscript_16465
manuscript_16479
manuscript_16499
manuscript_16502
manuscript_16504
manuscript_16505
manuscript_16506
manuscript_16508
manuscript_16509
manuscript_16511
manuscript_16512
manuscript_16513
manuscript_16514
manuscript_16515
manuscript_16519
manuscript_16520
manuscript_16523
manuscript_16525
manuscript_16526
manuscript_16527
manuscript_16528
manuscript_16530
manuscript_16531
manuscript_16534
manuscript_16538
manuscript_15651
manuscript_15660
manuscript_15745
manuscript_15747
manuscript_15752
manuscript_15661
manuscript_15761
manuscript_15762
manuscript_15662
manuscript_15766
manuscript_15663
manuscript_15664
manuscript_15792
manuscript_15793
manuscript_15665
manuscript_15666
manuscript_15806
manuscript_15809
manuscript_15810
manuscript_15811
manuscript_15667
manuscript_15816
manuscript_15819
manuscript_15820
manuscript_15823
manuscript_15824
manuscript_15668
manuscript_15827
manuscript_15828
manuscript_15652
manuscript_15670
manuscript_15846
manuscript_15847
manuscript_15672
manuscript_15876
manuscript_15879
manuscript_15674
manuscript_15890
manuscript_15893
manuscript_15895
manuscript_15897
manuscript_15898
manuscript_15899
manuscript_15900
manuscript_15676
manuscript_15901
manuscript_15908
manuscript_15914
manuscript_15915
manuscript_15918
manuscript_15919
manuscript_15922
manuscript_15924
manuscript_15925
manuscript_15927
manuscript_15930
manuscript_15679
manuscript_15931
manuscript_15932
manuscript_15933
manuscript_15938
manuscript_15939
manuscript_15946
manuscript_15681
manuscript_15954
manuscript_15684
manuscript_15983
manuscript_15990
manuscript_15991
manuscript_15685
manuscript_15994
manuscript_15998
manuscript_16002
manuscript_16004
manuscript_16014
manuscript_16015
manuscript_16016
manuscript_16018
manuscript_16019
manuscript_16020
manuscript_16022
manuscript_16025
manuscript_16026
manuscript_16027
manuscript_16028
manuscript_16040
manuscript_16044
manuscript_16045
manuscript_16046
manuscript_16046
manuscript_16048
manuscript_16049
manuscript_16050
manuscript_16051
manuscript_16052
manuscript_16053
manuscript_16052
manuscript_16055
manuscript_16056
manuscript_16057
manuscript_16058
manuscript_16059
manuscript_16060
manuscript_16061
manuscript_16062
manuscript_16063
manuscript_16064
manuscript_16065
manuscript_16066
manuscript_16067
manuscript_16060
manuscript_16069
manuscript_16070
manuscript_16071
manuscript_16072
manuscript_16073
manuscript_16074
manuscript_16075
manuscript_16076
manuscript_16077
manuscript_16078
manuscript_16079
manuscript_16080
manuscript_16081
manuscript_16082
manuscript_16083
manuscript_16084
manuscript_16085
manuscript_16086
manuscript_16087
manuscript_16088
manuscript_16089
manuscript_16090
manuscript_16091
manuscript_16092
manuscript_16093
manuscript_16094
manuscript_16095
manuscript_16096
manuscript_16097
manuscript_16098
manuscript_16099
manuscript_16100
manuscript_16101
manuscript_16102
manuscript_15695
manuscript_16103
manuscript_16104
manuscript_16105
manuscript_16106
manuscript_16107
manuscript_16108
manuscript_16109
manuscript_16110
manuscript_16111
manuscript_16112
manuscript_16113
manuscript_16114
manuscript_16115
manuscript_16116
manuscript_16117
manuscript_16118
manuscript_16119
manuscript_16120
manuscript_16121
manuscript_16122
manuscript_16123
manuscript_16124
manuscript_16125
manuscript_16126
manuscript_16127
manuscript_16128
manuscript_16129
manuscript_16130
manuscript_16132
manuscript_15698
manuscript_16133
manuscript_16138
manuscript_16140
manuscript_16141
manuscript_16146
manuscript_16149
manuscript_16150
manuscript_16151
manuscript_16154
manuscript_16155
manuscript_16156
manuscript_16158
manuscript_16164
manuscript_16170
manuscript_16171
manuscript_16172
manuscript_16173
manuscript_16174
manuscript_16176
manuscript_16177
manuscript_16200
manuscript_16201
manuscript_16203
manuscript_16204
manuscript_16209
manuscript_16212
manuscript_16216
manuscript_16217
manuscript_16221
manuscript_16222
manuscript_16224
manuscript_16227
manuscript_16229
manuscript_16231
manuscript_16232
manuscript_16236
manuscript_16237
manuscript_16238
manuscript_16241
manuscript_16242
manuscript_15656
manuscript_16250
manuscript_16251
manuscript_16253
manuscript_16261
manuscript_16262
manuscript_15710
manuscript_16266
manuscript_16268
manuscript_16272
manuscript_16274
manuscript_16276
manuscript_16277
manuscript_16281
manuscript_16290
manuscript_15713
manuscript_16297
manuscript_16298
manuscript_16303
manuscript_15714
manuscript_16304
manuscript_16306
manuscript_16308
manuscript_16312
manuscript_16317
manuscript_16319
manuscript_16322
manuscript_16323
manuscript_15716
manuscript_16324
manuscript_16326
manuscript_16333
manuscript_16334
manuscript_16335
manuscript_16337
manuscript_16338
manuscript_16341
manuscript_16342
manuscript_16343
manuscript_16353
manuscript_15719
manuscript_16355
manuscript_16358
manuscript_16360
manuscript_15720
manuscript_16367
manuscript_16368
manuscript_16369
manuscript_16370
manuscript_16372
manuscript_16373
manuscript_15721
manuscript_16374
manuscript_16377
manuscript_16378
manuscript_16379
manuscript_16382
manuscript_16383
manuscript_16386
manuscript_16390
manuscript_16392
manuscript_16415
manuscript_16417
manuscript_16420
manuscript_16421
manuscript_16425
manuscript_16427
manuscript_16432
manuscript_16435
manuscript_16437
manuscript_16441
manuscript_16442
manuscript_15658
manuscript_15727
manuscript_16450
manuscript_16452
manuscript_16453
manuscript_15728
manuscript_16457
manuscript_16459
manuscript_16460
manuscript_16461
manuscript_15729
manuscript_16466
manuscript_16467
manuscript_16470
manuscript_15730
manuscript_16474
manuscript_16475
manuscript_16480
manuscript_16481
manuscript_16483
manuscript_15731
manuscript_16484
manuscript_16485
manuscript_16487
manuscript_16488
manuscript_16491
manuscript_16492
manuscript_16497
manuscript_15744
Batak_330889
Wellcome_Batak_330890
Wellcome_Batak_36801
Batak_36863
Wellcome_Batak_36960
Wellcome_Batak_56303
Wellcome_Batak_56330
Wellcome_Batak_63570
Wellcome_Batak_66485
Wellcome_Batak_66486
Wellcome_Batak_91548
Wellcome_Batak_91624
Batak_330894
Egyptian_MS_1
Egyptian_MS_2
Egyptian_3
Egyptian_4
Egyptian_MS_5
Egyptian_MS_6
Egyptian_MS_7
Egyptian_MS_8
Ethiopian_1
Ethiopian_10
Ethiopian_12
Ethiopian_13
Ethiopian_14
Ethiopian_15
Ethiopian_16
Ethiopian_17
Ethiopian_18
Ethiopian_19
Ethiopian_20
Ethiopian_21
Ethiopian_20
Ethiopian_23
Ethiopian_24
Ethiopian_25
Ethiopian_26
Ethiopian_27
Ethiopian_3
Ethiopian_4
Ethiopian_5
Ethiopian_6
Ethiopian_7
Ethiopian_8
Ethiopian_9
Hebrew_A_1
Hebrew_A_10
Hebrew_A_11
Hebrew_A_12
Hebrew_A_13
Hebrew_A_14
Hebrew_A_15
Hebrew_A_16
Hebrew_A_17
Hebrew_A_18
Hebrew_A_19
Hebrew_A_2
Hebrew_A_20
Hebrew_A_21
Hebrew_A_22
Hebrew_A_23
Hebrew_A_24
Hebrew_A_25
Hebrew_A_26
Hebrew_A_27
Hebrew_A_28
Hebrew_A_29
Hebrew_A_3
Hebrew_A_30
Hebrew_A_31
Hebrew_A_32
Hebrew_A_33
Hebrew_A_34
Hebrew_A_35
Hebrew_A_36
Hebrew_A_4
Hebrew_A_5
Hebrew_A_6
Hebrew_A_7
Hebrew_A_8
Hebrew_A_9
Hebrew_B_1
Hebrew_B_10
Hebrew_B_11
Hebrew_B_12
Hebrew_B_13
Hebrew_B_14
Hebrew_B_15
Hebrew_B_16
Hebrew_B_17
Hebrew_B_18
Hebrew_B_19
Hebrew_B_2
Hebrew_B_20
Hebrew_B_21
Hebrew_B_22
Hebrew_B_23
Hebrew_B_24
Hebrew_B_25
Hebrew_B_26
Hebrew_B_27
Hebrew_B_28
Hebrew_B_29
Hebrew_B_3
Hebrew_B_30
Hebrew_B_31
Hebrew_B_32
Hebrew_B_33
Hebrew_B_34
Hebrew_B_35
Hebrew_B_36
Hebrew_B_37
Hebrew_B_38
Hebrew_B_39
Hebrew_B_4
Hebrew_B_40
Hebrew_B_41
Hebrew_B_42
Hebrew_B_43
Hebrew_B_44
Hebrew_B_45
Hebrew_B_46
Hebrew_B_47
Hebrew_B_48
Hebrew_B_49
Hebrew_B_5
Hebrew_B_50
Hebrew_B_51
Hebrew_B_52
Hebrew_B_53
Hebrew_B_54
Hebrew_B_55
Hebrew_B_56
Hebrew_B_57
Hebrew_B_58
Hebrew_B_5
Hebrew_B_7
Hebrew_B_8
Hebrew_B_9
MS_Japanese_17
MS_Japanese_100
MS_Japanese_116
MS_Japanese_119
MS_Japanese_125
MS_Japanese_16
MS_Japanese_27
MS_Japanese_28
MS_Japanese_29
MS_Japanese_52
MS_Japanese_58
MS_Japanese_59
MS_Japanese_61
MS_Japanese_63
MS_Japanese_79
MS_Japanese_82
MS_Japanese_85
MS_Japanese_86
MS_Japanese_88
MS_Japanese_94
MS_Japanese_98
MS_Japanese_99
Well.Jav.1
Javanese_10
Well.Jav.11
Javanese_2
Wellcome_Jav_3
Wellcome_Javanese_4
Well.Jav.5
Javanese_6
Wellcome_Jav_7
Well.Jav.8
Wellcome_Javanese_9
Karshuni_1
Karshuni_2
Karshuni_3
Wellcome_Malay_1
Wellcome_Malay_10
Wellcome_Malay_2
Wellcome_Malay_3
Wellcome_Malay_4
Wellcome_Malay_5
Wellcome_Malay_6
Wellcome_Malay_7
Wellcome_Malay_8
Wellcome_Malay_9
Syriac_1
Syriac_2
Tamil_1
Tamil_10
Tamil_11
Tamil_12
Tamil_13
Tamil_14
Tamil_15
Tamil 17
Tamil_18
Tamil_19
Tamil_2
Tamil_20
Tamil_21
Tamil 22
Tamil_23
Tamil_24
Tamil_25
Tamil 26
Tamil 27
Tamil_28
Tamil_29
Tamil_3
Tamil_30
Tamil_32
Tamil_33
Tamil_34
Tamil_35
Tamil_36
Tamil_37
Tamil_38
Tamil_4
Tamil_42
Tamil_5
Tamil 6
Tamil 7
Tamil_8
Tamil_9
MS_Tibetan_133
MS_Tibetan_134

Now we've made the display labels consistent, would it be useful to make these identifiers consistent also? e.g. MS_$language_$number. That's something we could add to the XML checker.

It would have caught a recent error – both MS_Hebrew_B_5.xml and MS_Hebrew_B_6.xml had the same xml:id value, which was causing conflicts.

Note: please don't change these IDs without talking to the platform team first. We'll need to do some work on our side to make sure existing URLs to manuscripts don't change when these IDs change. It's possible, we just need to schedule it.

@amme2
Copy link
Contributor

amme2 commented Oct 5, 2022

Yes, I think so, with the usual caveat to check and allow for how this might impacts files which are derived from / shared with external aggregators (i.e. I'm thinking primarily of FIHRIST in this instance).

@alexwlchan
Copy link
Contributor Author

Branwen has also flagged Fihrist as an issue; so what if we enforced consistency for everything except Fihrist?

(And even there, we can do some validation that, e.g., the same ID hasn't been used twice.)

@adrianplau
Copy link
Contributor

Thank you everybody! This sounds like a great plan, Alex. The only instance of external aggregators is indeed Fihrist (which essentially means the Arabic files plus whatever goes in the Fihrist folder of stuff that is not to go on the front end). Others might come in the future, but, as you say, this is only something we would look into with the platform team's input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

3 participants