Skip to content

Latest commit

 

History

History
127 lines (89 loc) · 15.4 KB

pipeline-monitoring.md

File metadata and controls

127 lines (89 loc) · 15.4 KB

Pipeline Monitoring Example

This example discusses ways in which the continuous integration and delivery pipeline can be monitored using Eiffel events.

Introduction

A crucial concern in any continuous integration and delivery pipeline is transparency: the ability to see what is going on, find areas of improvement and detect any issues sooner rather than later. An important capability to allow that type of transparency is real time traceability - the kind of traceability the Eiffel protocol has been designed for. But having the information isn't enough, one must be able to analyze it, make sense of it and present in a coherent and conducive manner.

The image below shows an event graph from such a pipeline. In it, source changes are created, submitted, included in comspositions and built into artifacts. These artifacts are then tested, and depending on the outcome of those tests a confidence level is issued. The image shows several iterations in this pipeline: the first source change is built into a new artifact, while the second and third source changes are batched into a single artifact.

In this pipeline, numerous relevant questions can be posed: what is the lead time from a source change being committed until it is submitted, how long do builds and tests take, what is the lead time from a source change being submitted until it is included in a successfully tested artifact, how many source changes are submitted over time, where in the pipeline is a particular bug fix right now, et cetera. All of these and more are discussed below.

One might object that such reports can be generated from e.g. the CI server: what's the need for Eiffel? This is true in some cases, but many of the questions take a holistic end-to-end view, where multiple tools are involved. In these cases it's not enough to query a single tool. Rather, a method to collect information from multiple sources, tying them all together, is needed, and that is what Eiffel provides.

In addition, what if one doesn't have just one SCM system, but several? Or not one issue tracking system, but several? Or not just one type of CI server? Particularly in large scale contexts, this is often the case. And when one inevitably decides to replace one of those tools with another one, or move them from one location or cloud provider to another? How does one do that without wrecking the metrics reporting system? Again, Eiffel as a tool agnostic communication protocol helps mitigate these problems.

Event Graph

alt text

Event-by-Event Explanation

SCC1, SCC2, SCC3, SCS1, SCS2, SCS3

The EiffelSourceChangeCreatedEvents declare that changes have been made and describe what they entail, by referencing work items, requirements et cetera. This does not mean mean that the change has been merged onto the project mainline (or other relevant branch) - this is instead declared by the EiffelSourceChangeSubmittedEvent. The distinction between the two is important when working with review processes, private repositories and/or pull requests. If none of that is applicable, the two events are simply sent at once.

The structure of events shown in this example represents a common development branch, where changes are represented by SCS1, SCS2 and SCS3. Each of these submitted changes references a EiffelSourceChangeCreatedEvent via CHANGE links, and also points to the latest previously submitted version(s). This establishes an unbroken chain of source revisions along with a record of the process leading up to that submission.

CDef1, CDef2, CDef3

EiffelCompositionDefinedEvents declaring that new compositions are available to be built. Note that in this example not every composition leads to the creation of a new artifact. In industrial practice this is a common phenomenon, for which there may be a number of reasons - often there simply isn't sufficient time or resources to build each individual change.

Note that EiffelCompositionDefinedEvents may reference any number of elements: often a composition doesn't just consist of the one source revision, but a large collection of sources, binaries and third party libraries.

ActT1, ActT2, ActS1, ActS2, ActF1, ActF2

EiffelActivityTriggeredEvents, EiffelActivityStartedEvents and EiffelActivityFinishedEvents, in this example representing build activities. Using its CONTEXT link, the EiffelArtifactCreatedEvents ArtC1 and ArtC2 declare that they are part of the activity.

ArtC1, ArtC2

The EiffelArtifactCreatedEvents representing new versions of the built software.

TCT1, TCT2, TCS1, TCS2, TCF1, TCF2

EiffelTestCaseTriggeredEvents, EiffelTestCaseStartedEvents and EiffelTestCaseFinishedEvents representing one test execution per artifact (ArtC1 and ArtC2, respectively). Note that management of test cases per se is not within the scope of Eiffel, but like many events EiffelTestCaseTriggered is able to reference external entities. Furthermore, it is assumed in this example that these externally managed test case descriptions in turn are able to reference any requirements they verify (which is arguably good practice in any context). With those references in place, these events can be used to answer the question "Which requirements have been verified in which version of the product, and what was the outcome?". This can in turn be explicitly represented via EiffelIssueVerifiedEvents.

CLM1, CLM2

EiffelConfidenceLevelModifiedEvents signaling that a new version of this component or part of the system is deemed ready for delivery. In this example, this is the event that the next tier of the system hierarchy reacts to, and proceeds to pick up the referenced artifact (ArtC1 and ArtC2, respectively) to integrate it.

Metrics Examples

There's a multitude of metrics that are relevant to measure in a continuous integration and delivery pipeline, for various purpose and for various stakeholders. An exhaustive list is impossible, but a few examples and how they may be collected using Eiffel events are presented below.

Lead Time from Source Change Creation to Submission

How long does it take for a source change to be submitted? In many cases this is instantaneous, but in other scenarios of extensive pre-testing and/or reviewing of any change pushed to the shared development branch or mainline, it's important to keep monitor how long this takes to ensure it doesn't get out of hand.

Using Eiffel, this can be done as follows:

  1. For every EiffelSourceChangeSubmittedEvent, follow its CHANGE link to the corresponding EiffelSourceChangeCreatedEvents.
  2. Compare meta.time of the two events.

This gives provides the lead time from the final version of the source change to its submission. If one would rather analyze the time from the first version, this can be done by following any PREVIOUS_VERSION link in EiffelSourceChangeCreatedEvents.

Build Duration

In the example above, the artifacts (represented by ArtC1 and ArtC2) are built in activities, represented by sets of EiffelActivityTriggeredEvents, EiffelActivityStartedEvents and EiffelActivityFinishedEvents. It is often important to study how long such build activities take, and study the trends of such execution times.

Using Eiffel, this can be done as follows:

  1. For every EiffelActivityFinishedEvent, search for its corresponding EiffelActivityStartedEvent having the same ACTIVITY_EXECUTION link.
  2. Compare meta.time of the two events.

Test Duration

Measuring test duration is similar to measuring build duration, and driven by similar needs. Indeed, if one is interested in the duration of a set of tests wrapped by a set of of EiffelActivityTriggeredEvent, EiffelActivityStartedEvent and EiffelActivityFinishedEvent one can employ the exact same method. Assuming one is interested in studying the execution time of a particular test case, however, one can use the following method:

  1. For every EiffelTestCaseFinishedEvent, find any EiffelTestCaseStartedEvent sharing the same TEST_CASE_EXECUTION link target.
  2. Compare meta.time of the two events.

Lead Time from Source Change Submission to Successfully Tested Artifact

Rather than investigating how long it takes to get a source change submitted, a pertinent question to ask is how long it takes for that source change to end up in an product revision ready to be delivered. In the simple example depicted above, that corresponds to an artifact with an EiffelConfidenceLevelModifiedEvent having the data.value property set to SUCCESS.

Using Eiffel, this can be done as follows:

  1. For every EiffelSourceChangeSubmittedEvent, find a any relevant EiffelCompositionDefinedEvents linking it using the ELEMENT link type.
  2. Find any relevant EiffelArtifactCreatedEvents linking to them using the COMPOSITION link type.
  3. Find any relevant EiffelConfidenceLevelModifiedEvents with data.value set to SUCCESS and linking to them using the SUBJECT link type.
  4. In case of no matches, find any relevant EiffelArtifactCreatedEvents linking to them using the PREVIOUS_VERSION link type. Repeat previous step.

Related questions, such as the frequency of such artifacts, number of source changes included per such artifact, or the proportion of successfully tested artifacts can be answered in a similar fashion.

Source Change Frequency

A relevant question in any continuous integration and delivery pipeline is the frequency at which source changes are being integrated: in general, in a particular part of the product or submitted by a particular individual or group of individuals.

Using Eiffel, this can be done as follows:

  1. Search for relevant EiffelSourceChangeSubmittedEvent, e.g. filtering on data.submitter.
  2. Count the number of hits over time.

Real Time Bug Fix Status

It can be important not just to track source changes, but what those source changes entail, such as bug fixes. Since EiffelSourceChangeCreatedEvents can identify issues via its data.issues property, the status of the bug fix can be monitored in real time. To exemplify, let us assume that one wants to know whether the bug fix has been included in an artifact.

Using Eiffel, this can be done as follows:

  1. For any EiffelSourceChangeCreatedEvents containing the bug fix in data.issues, traverse any EiffelSourceChangeCreatedEvent(s) linking to them using the PREVIOUS_VERSION link type.
  2. Find any EiffelSourceChangeSubmittedEvents linking them using the CHANGE link type.
  3. Traverse any subsequent EiffelSourceChangeSubmittedEvents linking them using the PREVIOUS_VERSION link type.
  4. Find any EiffelCompositionDefinedEvents linking them using the ELEMENT link type.
  5. Find any EiffelArtifactCreatedEvents linking them using the COMPOSITION link type.

Activity Queuing Times

Related to the question above of build and test durations, it is sometimes important to monitor queuing times. Where resources are scarce, activities may end up in queue for long periods of time before they can be executed.

Using Eiffel, this can be done as follows:

  1. For every relevant EiffelActivityStartedEvent follow its ACTIVITY_EXECUTION link to its corresponding EiffelActivityTriggeredEvent.
  2. Compare meta.time of the two events.

Test Queuing Times

Measuring the time it takes from a test execution is triggered until it commences is an important metric to monitor test efficiency and resource availability.

Using Eiffel, this can be done as follows:

  1. For every EiffelTestCaseStartedEvent, follow its TEST_CASE_EXECUTION link to its corresponding EiffelTestCaseTriggeredEvent.
  2. Compare meta.time of the two events.

A Note on Levels of Abstraction

The more holistic examples above, covering a larger portion of the continuous integration and delivery pipeline, include multiple steps where events linking events linking events must be searched for. This is a consequence of the fact that Eiffel events operate on a low level of abstraction: they represent atomic events, and to paint a larger picture, sometimes a large number of events must first be collected.

For this reason, dedicated services that raise the level of abstraction to concepts of greater interest are highly useful. To exemplify, a service providing a real time state of source changes, hiding the nitty gritty details of the individual events, can turn several of the queries described here into single queries by simply listening to events and aggregating them into stateful, higher abstraction level objects.