OT logs as XML #4340

kirkilj · 2023-12-08T16:13:44Z

kirkilj
Dec 8, 2023

This might be as good a place as any to mention that I tried a while ago to create a god-awful Python script parse_ot_log.py.txt with a god-awful regex to parse OT error messages into distinct fields. My latest attempt tried to parse the messages.xml file and use it to compute a regex pattern to parse a log's actual messages. It's a work in progress, to say the least. It works on some message patterns and not others, such as messages where a variable %n is repeated. It's fixable, but I don't have time to invest any more in it.

In some messages, no file paths are mentioned, while in others there are 1 or possibly two. Has there been any discussion of also providing an XML version of an OT log so we don't have to infer structure from raw text that already has a structure to begin with? Is an OT-log schema out of the question to pursue separately? My intention is to save a structured representation of these logs in our CI/CD environment so our Information Architecture group can ingest them into a datastore can do analytics on them, if not some ML experiments at some point.

Is it worth adding a log option to OT to generate a structured OT log file?

chrispy-snps · 2023-12-22T13:46:32Z

chrispy-snps
Dec 22, 2023

@kirkilj - interestingly enough, my company (a software company) is having this same internal discussion for the software products we produce.

Our software tools can produce hundreds of thousands of messages, and there are thousands of unique message types. Parsing is difficult because some messages are single-line, some are multi-line, the termination of each message block is not consistent, etc. Fortunately we already have a robust message subsystem implemented in our tools, so it's not too different to dump the messages in a different format (such as XML or JSON) to allow for machine processing.

For UI, I see we have the following options:

$ ~/dita-ot/bin/dita --help | rg -i log
  -d, --debug                                      Enable debug logging
  -l <file>, --logfile=<file>                      Write log messages to file
  -v, --verbose                                    Enable verbose logging

Originally I thought we could have an additional option to specify the format of the --logfile log file. But --logfile causes the terminal output to become quiet, and the behavior I would want (I think?) would be to have regular logging to the terminal, plus structured logging to some additional file.

0 replies

raducoravu · 2024-01-03T07:52:59Z

raducoravu
Jan 3, 2024
Collaborator

We use similar ways (regexp expressions applied on console lines) to extract relevant error (message + line/column/resource info) from the DITA OT console output and display them in a separate Problems list. And I would also want an easier way to extract individual problems from the DITA OT console but I do not have a proposal about how that could be achieved.

0 replies

kirkilj · 2024-01-03T16:14:40Z

kirkilj
Jan 3, 2024
Author

@jelovirt, thoughts?

1 reply

jelovirt Jan 5, 2024
Maintainer

Ant has the option of adding additional build listeners/loggers and those can generate e.g. XML. You can use

dita -logger org.apache.tools.ant.XmlLogger -v …

To get log output in XML format:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="log.xsl"?>

<build time="1 second">
  …
  <target name="gen-list" time="0 seconds">
    <task location="/Users/jelovirt/dita-ot/plugins/org.dita.base/build_preprocess.xml:91: " name="pipeline" time="0 seconds">
      <message priority="info"><![CDATA[Processing file:/Volumes/tmp/src/root.ditamap]]></message>
      <message priority="info"><![CDATA[Processing file:/Volumes/tmp/src/topic.md]]></message>
    </task>
  </target>
  …
</build>

Note that this XML logger that comes with Ant has poor performance in that it uses DOM to build the log and only transforms the DOM into output stream at the end of the process. I would not use this in production.

stefan-jung · 2024-01-19T15:16:07Z

stefan-jung
Jan 19, 2024

@kirkilj we are now using another approach. We're building with GitHub Actions and this has a nice feature, called job summaries. You can style the summary with Markdown, it's a quite nice way to make the log something which is "nice to read and analyze" for a technical writer.

NOTE: We only do preprocessing on the server and skip the HTML/PDF part completely, as the interesting errors occur in preprocessing and this way we can stick to the "normal" GitHub runners. PDF builds fail and require GitHub Large Runners in our case.

During the build, we tee the verbose log.

      - name: 🚀 Preprocess all deliverables
        run: dita-ot/bin/dita --project=my-docs.xml --processing-mode=strict --verbose | tee my-docs.log

Then we flush everything away, which is not interesting with sed.

      - name: 🚽 Flush away unnecessary logs
        run: |
          # Keep only lines which contain certain keywords.
          sed -i '/deliverable\|FATAL\|ERROR\|WARNING\|DOTA\|DOTJ\|DOTX\|PDFX/!d' my-docs.log

We do some more stuff, but the principle is the same. Afterwards, we load this file into the summary.

      - name: 🏁 Summarize build
        run: |
          echo "### Summary :rocket:" >> $GITHUB_STEP_SUMMARY
          echo "$(cat my-docs.log)" >> $GITHUB_STEP_SUMMARY

The result looks like this:

1 reply

raducoravu Jan 24, 2024
Collaborator

@stefan-jung thanks for posting about the use of "GITHUB_STEP_SUMMARY", I created three GitHub actions for the Oxygen XML Blog, one which publishes it to Netlify, one for validate and check for completion and one is an attempt to use OpenAI for Grammar Checking:
https://blog.oxygenxml.com/topics/building_validating_and_publishing_using_github_actions.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DITA-OT

OT logs as XML #4340

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

DITA-OT

OT logs as XML #4340

kirkilj Dec 8, 2023

Replies: 4 comments · 2 replies

chrispy-snps Dec 22, 2023

raducoravu Jan 3, 2024 Collaborator

kirkilj Jan 3, 2024 Author

jelovirt Jan 5, 2024 Maintainer

stefan-jung Jan 19, 2024

raducoravu Jan 24, 2024 Collaborator

kirkilj
Dec 8, 2023

Replies: 4 comments 2 replies

chrispy-snps
Dec 22, 2023

raducoravu
Jan 3, 2024
Collaborator

kirkilj
Jan 3, 2024
Author

jelovirt Jan 5, 2024
Maintainer

stefan-jung
Jan 19, 2024

raducoravu Jan 24, 2024
Collaborator