Skip to content

Releases: cu-mkp/manuscript-object

v1.0-ronikaufman 2021-03-16

16 Mar 19:24
807b933
Compare
Choose a tag to compare

This branch was created from the "context" branch of Roni Kaufman's fork of manuscript-object (https://github.com/ronikaufman/manuscript-object/tree/context). Last commit was on Aug 3, 2020: ronikaufman@3a0aeab https://github.com/ronikaufman/manuscript-object.

In spring 2020, Roni joined the Making and Knowing Project as an intern. This work was prepared to explore visualizations of data from https://github.com/cu-mkp/m-k-manuscript-data.

v1.0-danachaillard 2021-03-16

16 Mar 19:53
5abf324
Compare
Choose a tag to compare

This branch was created from the "master" branch of Dana Chaillard's fork of manuscript-object (https://github.com/danachaillard/manuscript-object). Last commit was on Aug 21, 2020: danachaillard@5fa06a4.

In spring 2020, Dana joined the Making and Knowing Project as an intern. This work was prepared to explore visualizations of data from https://github.com/cu-mkp/m-k-manuscript-data.

v2.0.1 2021-02-16

16 Feb 20:04
7093ddd
Compare
Choose a tag to compare

This release contains some bug fixes and usability improvements to the core derivative-generating code.

Most importantly:

  • Improved documentation and comments
  • Manuscript class constructors are more flexible in what inputs it can take (arbitrary versions)
  • Manuscript object can be mutated to include extra folios and entries after being initially created
  • Update outdated property tag names
  • Add tag to list of properties
  • Fix bug where derivative files were named incorrectly
  • Fix bug where divs with no id attribute would cause error
  • Fix major bug in update.py logic
  • Fix major bug where existing derivatives were not removed before writing new ones

Next steps/to do:

  • improved automated testing (see #51)
  • revisiting high-level design (see #76)
  • generally, moving on from derivative generation to data analysis and visualization

See also any open issues: https://github.com/cu-mkp/manuscript-object/issues

v2.0.0 2020-10-29

29 Oct 18:41
94d158d
Compare
Choose a tag to compare

Second major release of manuscript-object with significant changes to the core object code.

  • All of the central code is now in two files, with one auxiliary file (utils.py).
  • update.py usage remains unchanged
  • The titular "manuscript object" is now a class called Manuscript inside a file called manuscript.py. Each entry is turned into an object of the Entry class inside a file called entry.py.

Some highlights:

  • much faster (update_entries() is, as before, the longest step)
  • increased verbosity during generation
  • manuscript and entry modules are importable and interactable
  • Manuscript and Entry classes control their own behavior
    • e.g. generating and updating derivatives happens inside the Manuscript class
    • update.py works as before, but simply calls the update methods inside Manuscript
    • this means if you want to generate the derivative output in a Python shell and interact with it as a string or table, you can do so by importing manuscript and running one of the derivative generation methods
    • derivative generation takes place in 2 steps: generation and then writing. This enables checks for correctness before writing to disk
  • All xml is converted to lxml.etree objects for easier and more consistent parsing
  • text renditions of editorial tags are created using an XSLT stylesheet
    • this stylesheet takes parameters, so if you don't want to render del tags as <-TEXT->, for example, you can just set that to "false()"
  • As possible, functions are reused rather than duplicated in order to facilitate bug checks, e.g., there's only one function which tells you how to convert a string to an lxml.etree Element.
  • the Entry class is very flexible:
    • there are different methods to take a valid lxml.etree Element, a string of well-formed XML, or a filepath to a valid XML file
    • folio and identity arguments are optional
    • only one version of each entry is given at a time (handling tc, tcn, and tl versions is done by the Manuscript object, not the Entry)
    • if it is desired to test or inspect the contents of a txt or xml file -instead of manually opening a file - it can simply be loaded as an Entry object in a Python shell and look at the text and the properties that way

To do:

see also any open issues: https://github.com/cu-mkp/manuscript-object/issues

2020-09-28 v1.0.1

28 Sep 20:45
efd4b22
Compare
Choose a tag to compare

Patch release fixing bug described in issue #42 "tail text in entry-metadata.csv" (see also: cu-mkp/m-k-manuscript-data#1909, "Exclude lxml tails from find_tagged_terms outputs")

https://github.com/cu-mkp/manuscript-object/blob/efd4b221edccc5e7b01e0c494dd02b71bb8f1025/recipe.py#L106

        return [et.tostring(tag, method="text", encoding="utf-8", with_tail=False).decode().replace("\n", " ") for tag in tags]

2020-09-14 v1.0.0

14 Sep 21:50
Compare
Choose a tag to compare

Major release of manuscript-object repository.

The BnF class represents a Python object version of BnF Ms 640. It contains a list of Recipe objects, which hold the raw XML data from each entry along with some other data such as length and properties.

When BnF is instantiated, it loads every folio in ms-xml and processes it into its component entries, each of which becomes a Recipe object. ms-xml is folder in the repository containing the data, m-k-manuscript-data.

update.py is a script that generates the BnF object and then writes derivative forms and the entry-metadata table to the m-k-manuscript-data repository.

Known issues:

  • del tags are unmarked in derivative files [issue]
  • test/ folder is just a dumping ground for entry-metadata.csv files; it can be removed

See issues tracker for other ongoing issues and feature requests.