Replies: 4 comments 2 replies
-
@kirkilj - interestingly enough, my company (a software company) is having this same internal discussion for the software products we produce. Our software tools can produce hundreds of thousands of messages, and there are thousands of unique message types. Parsing is difficult because some messages are single-line, some are multi-line, the termination of each message block is not consistent, etc. Fortunately we already have a robust message subsystem implemented in our tools, so it's not too different to dump the messages in a different format (such as XML or JSON) to allow for machine processing. For UI, I see we have the following options:
Originally I thought we could have an additional option to specify the format of the |
Beta Was this translation helpful? Give feedback.
-
We use similar ways (regexp expressions applied on console lines) to extract relevant error (message + line/column/resource info) from the DITA OT console output and display them in a separate Problems list. And I would also want an easier way to extract individual problems from the DITA OT console but I do not have a proposal about how that could be achieved. |
Beta Was this translation helpful? Give feedback.
-
@jelovirt, thoughts? |
Beta Was this translation helpful? Give feedback.
-
@kirkilj we are now using another approach. We're building with GitHub Actions and this has a nice feature, called job summaries. You can style the summary with Markdown, it's a quite nice way to make the log something which is "nice to read and analyze" for a technical writer.
During the build, we
Then we flush everything away, which is not interesting with
We do some more stuff, but the principle is the same. Afterwards, we load this file into the summary.
The result looks like this: |
Beta Was this translation helpful? Give feedback.
-
This might be as good a place as any to mention that I tried a while ago to create a god-awful Python script parse_ot_log.py.txt with a god-awful regex to parse OT error messages into distinct fields. My latest attempt tried to parse the messages.xml file and use it to compute a regex pattern to parse a log's actual messages. It's a work in progress, to say the least. It works on some message patterns and not others, such as messages where a variable %n is repeated. It's fixable, but I don't have time to invest any more in it.
In some messages, no file paths are mentioned, while in others there are 1 or possibly two. Has there been any discussion of also providing an XML version of an OT log so we don't have to infer structure from raw text that already has a structure to begin with? Is an OT-log schema out of the question to pursue separately? My intention is to save a structured representation of these logs in our CI/CD environment so our Information Architecture group can ingest them into a datastore can do analytics on them, if not some ML experiments at some point.
Is it worth adding a log option to OT to generate a structured OT log file?
Beta Was this translation helpful? Give feedback.
All reactions