Releases: cu-mkp/manuscript-object
v1.0-ronikaufman 2021-03-16
This branch was created from the "context" branch of Roni Kaufman's fork of manuscript-object (https://github.com/ronikaufman/manuscript-object/tree/context). Last commit was on Aug 3, 2020: ronikaufman@3a0aeab https://github.com/ronikaufman/manuscript-object.
In spring 2020, Roni joined the Making and Knowing Project as an intern. This work was prepared to explore visualizations of data from https://github.com/cu-mkp/m-k-manuscript-data.
v1.0-danachaillard 2021-03-16
This branch was created from the "master" branch of Dana Chaillard's fork of manuscript-object (https://github.com/danachaillard/manuscript-object). Last commit was on Aug 21, 2020: danachaillard@5fa06a4.
In spring 2020, Dana joined the Making and Knowing Project as an intern. This work was prepared to explore visualizations of data from https://github.com/cu-mkp/m-k-manuscript-data.
v2.0.1 2021-02-16
This release contains some bug fixes and usability improvements to the core derivative-generating code.
Most importantly:
- Improved documentation and comments
- Manuscript class constructors are more flexible in what inputs it can take (arbitrary versions)
- Manuscript object can be mutated to include extra folios and entries after being initially created
- Update outdated property tag names
- Add tag to list of properties
- Fix bug where derivative files were named incorrectly
- Fix bug where divs with no id attribute would cause error
- Fix major bug in update.py logic
- Fix major bug where existing derivatives were not removed before writing new ones
Next steps/to do:
- improved automated testing (see #51)
- revisiting high-level design (see #76)
- generally, moving on from derivative generation to data analysis and visualization
See also any open issues: https://github.com/cu-mkp/manuscript-object/issues
v2.0.0 2020-10-29
Second major release of manuscript-object with significant changes to the core object code.
- All of the central code is now in two files, with one auxiliary file (utils.py).
- update.py usage remains unchanged
- The titular "manuscript object" is now a class called Manuscript inside a file called manuscript.py. Each entry is turned into an object of the Entry class inside a file called entry.py.
Some highlights:
- much faster (update_entries() is, as before, the longest step)
- increased verbosity during generation
- manuscript and entry modules are importable and interactable
- Manuscript and Entry classes control their own behavior
- e.g. generating and updating derivatives happens inside the Manuscript class
- update.py works as before, but simply calls the update methods inside Manuscript
- this means if you want to generate the derivative output in a Python shell and interact with it as a string or table, you can do so by importing manuscript and running one of the derivative generation methods
- derivative generation takes place in 2 steps: generation and then writing. This enables checks for correctness before writing to disk
- All xml is converted to lxml.etree objects for easier and more consistent parsing
- text renditions of editorial tags are created using an XSLT stylesheet
- this stylesheet takes parameters, so if you don't want to render del tags as <-TEXT->, for example, you can just set that to "false()"
- As possible, functions are reused rather than duplicated in order to facilitate bug checks, e.g., there's only one function which tells you how to convert a string to an lxml.etree Element.
- the Entry class is very flexible:
- there are different methods to take a valid lxml.etree Element, a string of well-formed XML, or a filepath to a valid XML file
- folio and identity arguments are optional
- only one version of each entry is given at a time (handling tc, tcn, and tl versions is done by the Manuscript object, not the Entry)
- if it is desired to test or inspect the contents of a txt or xml file -instead of manually opening a file - it can simply be loaded as an Entry object in a Python shell and look at the text and the properties that way
To do:
- implementing more automated spot- and unit-tests
- sophisticated search function for Manuscript
- type annotations are useful and correct (e.g., specificity of "xml") - see use in
https://github.com/cu-mkp/manuscript-object/blob/94d158d814bf9a62071a11845a9b2938d561ab3e/entry.py#L10 - optional arguments to Manuscript specifying which entries you want to generate
- function to inspect the context around a particular term
- visualization engine
- thesaurus
see also any open issues: https://github.com/cu-mkp/manuscript-object/issues
2020-09-28 v1.0.1
Patch release fixing bug described in issue #42 "tail text in entry-metadata.csv" (see also: cu-mkp/m-k-manuscript-data#1909, "Exclude lxml tails from find_tagged_terms outputs")
return [et.tostring(tag, method="text", encoding="utf-8", with_tail=False).decode().replace("\n", " ") for tag in tags]
2020-09-14 v1.0.0
Major release of manuscript-object repository.
The BnF
class represents a Python object version of BnF Ms 640. It contains a list of Recipe
objects, which hold the raw XML data from each entry along with some other data such as length and properties.
When BnF
is instantiated, it loads every folio in ms-xml
and processes it into its component entries, each of which becomes a Recipe object. ms-xml
is folder in the repository containing the data, m-k-manuscript-data.
update.py
is a script that generates the BnF
object and then writes derivative forms and the entry-metadata table to the m-k-manuscript-data
repository.
Known issues:
- del tags are unmarked in derivative files [issue]
- see m-k-manuscript-data issue tracker for other issues related to derivative files
test/
folder is just a dumping ground for entry-metadata.csv files; it can be removed
See issues tracker for other ongoing issues and feature requests.