-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kosha texts mangled upon export #503
Comments
AFAIK line breaks (without an empty line in between) are not considered significant, and the proofreader instructions (probably/hopefully) also say that. So the behaviour (line breaks ignored, and everything wrapped into a paragraph) is kind of expected. (Consider: we don't want the other line breaks to be reflected in the text.) So IMO the bug here is not quite “kosha texts mangled upon export” but rather something like “a particular proofreader came up with an ad-hoc convention in which the first line break in each paragraph is significant, because it indicates the headword in this text, and the rest of the line breaks are not, and the backend was not aware of this convention”. But going even further, rather than saying this is user error / blaming it on the user, IMO the fix is for the proofreader UI to have a “rendered” (”preview”) mode (maybe even shown by default), so that any proofreader, as they work through a text, will always see the effect of their conventions: how the text they prepared will be seen by readers eventually. (Of course I think of this issue as another point in favour of the rich-editor / ProseMirror idea :P, though I admit that this could be hacked around even without it….) |
I bet that the proofreader just followed @suhasm 's wise instruction (given his experience with similar dict files and need for headwords without vibhakti-pratyaya)! So, the real fix would be to come up with better support (conventions and markup) for dictionary books given the importance @suhasm (rightly) gives to domain specific dicts. |
Right, that too :) The issues I see:
|
Observe how the proofreader has marked headwords in this dict:
Neither the txt or the tei-xml dump have that. Rather we get an amorphous blog like:
cc @suhasm
The text was updated successfully, but these errors were encountered: