Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify Base Dewey Call Numbers #318

Closed
2 tasks done
Tracked by #285
awead opened this issue Jul 26, 2021 · 10 comments · Fixed by #353 or #362
Closed
2 tasks done
Tracked by #285

Identify Base Dewey Call Numbers #318

awead opened this issue Jul 26, 2021 · 10 comments · Fixed by #353 or #362
Assignees
Milestone

Comments

@awead
Copy link
Contributor

awead commented Jul 26, 2021

To-Do

  • Lop Dewey call numbers to identify the base
  • Map the base call number to call_number_dewey_ssm field

Examples

Record Dewey Call Number Base All Dewey Call Numbers
988650 808.8K74a T 808.8K74a T
122553 170M366l 1844 170M366l 1844
124958 590.52An7 590.52An7 ser.5 v.2 July-Dec.1878
820540 929.4F719s 1958 929.4F719s 1958
24789 341F914c 1964 341F914c 1964
54791 338.1M567 338.1M567 v.17-21 1957-61; 338.1M567 v.11-13 1951-53; 338.1M567 v.8-10 1947-50
55304 336.774M575f 336.774M575f 1951/52; 336.774M575f 1951/52 pt.1
56152 634.906M582b no.12 634.906M582b no.12
56156 634.906M582c no.1-6 1937-1942 634.906M582c no.1-6 1937-1942
60164 338.1N206p 2nd 1920-14th 1932 [incomp.] 338.1N206p 2nd 1920-14th 1932 [incomp.]
68492 813St6t 813St6t
74938 338.1P19c 338.1P19c no.9 1943; 338.1P19c no.23 1959; 338.1P19c no.24 1960; 338.1P19c no.26 1962; 338.1P19c no.30 1966; 338.1P19c no.35 1971; 338.1P19c no.25 1961
75582 677.7As35a 677.7As35a
77330 332.7P384 332.7P384 v.1-2 1951/52; 332.7P384 v.3-4 1953/54; 332.7P384 v.5 no.3-v.7 1955-57; 332.7P384 v.8-10 1958-60; 332.7P384 v.11-13 1961-64; 332.7P384 v.14 1964/65
78157 336.06P384r 336.06P384r no.8 1959; 336.06P384r no.9 1960; 336.06P384r no.7 1958; 336.06P384r no.6 1958; 336.06P384r no.5 1958; 336.06P384r no.4 1958
81692 913P38 913P38 v.1; 913P38 v.2
100581 914.7D93i1 914.7D93i1
101432 610.6Su75t 610.6Su75t no.127-130; 610.6Su75t no.131-135; 610.6Su75t no.135-140; 610.6Su75t no.141-145; 610.6Su75t no.146-149; 610.6Su75t no.150-159; 610.6Su75t no.1-6; 610.6Su75t no.7-16; 610.6Su75t no.17-19; 610.6Su75t no.20-28; 610.6Su75t no.29-38; 610.6Su75t no.39-56; 610.6Su75t no.57-63; 610.6Su75t no.64-68; 610.6Su75t no.76-85; 610.6Su75t no.86-100; 610.6Su75t no.101-106; 610.6Su75t no.107-115; 610.6Su75t no.116-120
@awead awead mentioned this issue Jul 26, 2021
2 tasks
@banukutlu banukutlu added this to the 1.1.x milestone Aug 4, 2021
@banukutlu banukutlu self-assigned this Aug 4, 2021
@banukutlu
Copy link
Contributor

@ruthtillman only 4 catkeys out of our sample had dewey call numbers:

988650       access_facet              In the Library
988650       call_number_dewey_ssm     808.8K74a T
988650       format                    Book
988650       id                        988650
988650       library_facet             Special Collections Library
988650       location_facet            Rare Books & Mss, 1st Floor Paterno, American Gift Books
988650       title_tsim                The Atlantic souvenir for 1859 :

122553       access_facet              In the Library
122553       call_number_dewey_ssm     170M366l 1844
122553       format                    Book
122553       id                        122553
122553       library_facet             Annex | Penn State Harrisburg
122553       location_facet            Harrisburg - Alice Marshall - Special Collections - 3rd Floor
122553       title_tsim                Life in the sick-room :

124958       access_facet              In the Library
124958       call_number_dewey_ssm     590.52An7 ser.5 v.2 July-Dec.1878 | 590.52An7 ser.5 v.3 Jan.-June 1879 | 590.52An7 ser.7 v.15 Jan.-June 1905 | 590.52An7 ser.7 v.16 July-Dec.1905 | 590.52An7 ser.7 v.17 Jan.-June 1906 | 590.52An7 ser.7 v.18 July-Dec.1906 | 590.52An7 ser.7 v.19 Jan.-June 1907 | 590.52An7 ser.7 v.20 July-Dec.1907 | 590.52An7 ser.3 v.20 no.115-120 | 590.52An7 ser.3 v.20 July 1867-Jan.1868 | 590.52An7 ser.4 v.11 Jan.-June 1873 | 590.52An7 ser.4 v.13 Jan.-June 1874 | 590.52An7 ser.4 v.14 July-Dec.1874 | 590.52An7 ser.4 v.15 Jan.-June 1875 | 590.52An7 ser.4 v.16 July-Dec.1875 | 590.52An7 ser.4 v.17 Jan.-June 1876 | 590.52An7 ser.4 v.18 July-Dec.1876 | 590.52An7 ser.4 v.19 Jan.-June 1877 | 590.52An7 ser.4 v.20 July-Dec.1877 | 590.52An7 ser.3 v.19 no.109-114 | 590.52An7 ser.4 v.1 Jan.-June 1868 | 590.52An7 ser.2 v.9 no.49-no.54 1852 | 590.52An7 ser.4 v.12 July.-Dec. 1873 | 590.52An7 ser.5 v.1 Jan-June.1878
124958       format                    Microfilm/Microfiche | Journal/Periodical
124958       id                        124958
124958       library_facet             Annex
124958       title_tsim                The Annals and magazine of natural history

820540       access_facet              In the Library
820540       call_number_dewey_ssm     929.4F719s 1958
820540       format                    Book
820540       id                        820540
820540       library_facet             Annex
820540       title_tsim                Slovarʹ i︠a︡ponskikh imën i familiĭ. / | Словарь японских имён и фамилий /

@ruthtillman
Copy link
Collaborator

ruthtillman commented Aug 4, 2021

Got it! I can work from these, but it would be helpful to have a slightly bigger one if possible. @mkutch do you know if you can do an export of 500 records which include a 949w == DEWEY?

(and update -- I put a table of expectations in the top, but it'll help to have more)

@banukutlu
Copy link
Contributor

banukutlu commented Aug 4, 2021

@ruthtillman there is more in this dewey_more.txt, just to note the sample I used is not from an up-to-date extract but thought it would be good enough to give an idea. there is no cutting done on the dewey numbers

@ruthtillman
Copy link
Collaborator

@banukutlu perfect! They don't need to be 100% current, call numbers rarely change

@banukutlu banukutlu changed the title Handle Dewey Call Numbers Indentify Base Dewey Call Numbers Aug 9, 2021
@banukutlu banukutlu changed the title Indentify Base Dewey Call Numbers Identify Base Dewey Call Numbers Aug 9, 2021
@banukutlu
Copy link
Contributor

@ruthtillman Could you please review these results? dewey_lopped.txt

@mkutch
Copy link
Contributor

mkutch commented Aug 9, 2021

@ruthtillman I emailed you the sample requested.

@ruthtillman
Copy link
Collaborator

ruthtillman commented Aug 11, 2021

Only a few issues identified:

  • 160947: expected 860.9C891 but was getting 860.9C891 t.6-8 | 860.9C891 t.4-5 | 860.9C891 t.2-3 | 860.9C891 t.1 (because we aren't accounting for "t.\d" which I guess we should do in LC as well (with the same fix we used for "h" and "k" to be sure we're not cutting off part of a call number)
    • 285749 : same issue, expecting 901H629 1955
    • 289504: same issue, expecting 914.6B64b S1921
    • 335268: same issue, expecting 843V65x
    • 338631: same issue, expecting 849.9L969b S1929
    • 717859: expected 915.1An24n
    • 788834: expected 843C993x
    • 815304: expected 848M294x
    • 818522: expected 849.109B181h
    • 898981: expected 891.78L839x
    • 1004907: expected 910.4Sa74v
    • 1192482: expected 843F844x
    • 1587251: expected 861G5885p
    • 2169149: expected 843D26x
  • 928964: getting "1st" because "1st May" 336.2P3864t 1st Mar.1953, scheme: DEWEY | 336.2P3864t 1st May 1953, which I assume are dates. ... if this can't be handled, it might have to just be a really weird edge case we tackle later. This is not good practice but it's very old and in the annex, so updating would be a PITA. (Edited: will be followed up by Base call numbers should handle dates like 1st May yyyy  #364)
  • I don't have an answer for 423287 but want to note it could be an issue later

@ruthtillman
Copy link
Collaborator

Also of note: per testing on Adam's browse, the "t.\d" thing also affects LC, so we should apply whatever fix works here to that regex as well.

@banukutlu
Copy link
Contributor

banukutlu commented Sep 1, 2021

@ruthtillman sample results with t. cut off dewey_base_cn.txt

@ruthtillman
Copy link
Collaborator

t. lopping looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment