Skip to content
This repository has been archived by the owner on Feb 4, 2020. It is now read-only.

YAZ - collecting data and printing them with PYMARC #115

Open
zurek11 opened this issue Apr 6, 2018 · 1 comment
Open

YAZ - collecting data and printing them with PYMARC #115

zurek11 opened this issue Apr 6, 2018 · 1 comment

Comments

@zurek11
Copy link

zurek11 commented Apr 6, 2018

Hello. I have simple data collected from YAZ commands.

yaz-client -m catalogue.dat

I am connecting to library which has MARC21 format and UTF-8 encoding.
I am saving records to catalogue.dat file. It's CZECH library so titles are with special characters for example Ř or Ě etc. when i will run this code:

def get_books(request):
    with open('catalogue.dat', 'rb') as fh:
        reader = MARCReader(fh)
        for record in reader:
            print(str(record.title()))
    return HttpResponseRedirect('/')

Console will print this:

couldn't find 0xbe in g0=66 g1=69
Zelen©Ł kniha /
couldn't find 0xbe in g0=66 g1=69
Kniha p¿©Łtel /
Kniha ¿©Ưkadel /
Kniha poezie /
Kniha dn©Ư /
Kniha ¿©Ưkadel /
Kniha definic /
Kniha cest /
Kniha Frenesis /
Smoln©Ł kniha /
couldn't find 0xbe in g0=66 g1=69
couldn't find 0xbe in g0=66 g1=69
couldn't find 0xbe in g0=66 g1=69
couldn't find 0xaf in g0=66 g1=69

So basicly there are two issues. First why it prints couldn't find errors and why it prints data without that special characters? Thank you so much.

@josephalway
Copy link

josephalway commented Dec 13, 2018

I believe it defaults to marc8 encoding, try changing your with open line to:
with open('catalogue.dat', to_unicode=True, force_utf8=True, 'rb') as fh:

From the MARCReader class docstring in the marc8.py file:

If you find yourself in the unfortunate position of having data that
is utf-8 encoded without the leader set appropriately you can use
the force_utf8 parameter:

reader = MARCReader(file('file.dat'), to_unicode=True,
    force_utf8=True)

Not sure, if that's the particular problem you're having, but that might help. Though, you might need to remove the to_unicode=True portion that I recommended.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants