Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for tests nested within tests #126

Closed
Krinkle opened this issue Nov 12, 2020 · 4 comments · Fixed by #129
Closed

Allow for tests nested within tests #126

Krinkle opened this issue Nov 12, 2020 · 4 comments · Fixed by #129
Assignees
Milestone

Comments

@Krinkle
Copy link
Member

Krinkle commented Nov 12, 2020

  • Allow tests to directly contain both assertions and other tests.

Per #117 (comment) this is required for node-tap and tape. This does limit a bit how reporters can visualise and lay out the data, but seems worth doing. Especially as it aligns us closer with TAP.

For HTML reporters like QUnit that provide a collapsible list of assertions for each, they may need to buffer each test and then render the list of assertions first, followed by the sub tests. This is a small price to pay for widening up the surface of test frameworks and reporters that can participate. It also wouldn't negatively affect any reporters that exist today since those are currently specific to frameworks frameworks that would never excercise that need for buffering, so it's only a win-win to allow the reporter to be user more widely.

  • Consider phasing out the "Suite" concept. Most likely obsolete after this.
@Krinkle
Copy link
Member Author

Krinkle commented Nov 12, 2020

Requiring consumers to support assertions and nested tests under the same concept does limit the choices a reporter can make for their visual or textual layout. For assertions to appear between tests, and tests between assertions.

I realize this may insignificant from a TAP perspective. From what I've seen in the TAP community, frameworks are generally exception-based, runners stop after the first error, and reporters only communicate tests and failed assertions. By not rendering multiple errors at different levels, and by not making passed assertions accessible, this means one conveniently sidesteps this issue. QUnit is not exception-based, may provide multiple errors from a single run, and does make its passed assertions accessible via a click or flag. Below is an example of its HTML report, with two of the tests expanded to show its assertions:

Screenshot

I suppose individual consumers wishing to render things this way could workaround this limitation by buffering any nested tests, and rendering the direct assertions first. This would slightly misrepresent the true data, but seems and is no worse than what they can do today. It just means they can be used by a wider range of test frameworks.

@Krinkle Krinkle pinned this issue Jan 2, 2021
@Krinkle Krinkle self-assigned this Jan 2, 2021
@Krinkle
Copy link
Member Author

Krinkle commented Jan 13, 2021

I'm running into a few issues when trying to merge the "suite" and "test" concepts.

The simplest approach, I thought, would be to simply convert suites to tests and call it a day. However, this is proving to be more difficult than I thought. Our TAP reporter, ignores "suite" events. It only acknowledges suite as a concept by means of prefixing the test names (which are provided as part of the test name, so technically it's entirely unaware of suites).

Before events:

  • runStart: (total: 2)
  • suiteStart: "some suite"
  • testStart: "test A" (parent: "some suite")
  • testEnd: "test A" (parent: "some suite", status: passing)
  • testStart: "test B" (parent: "some suite")
  • testEnd: "test B" (parent: "some suite", status: passing)
  • suiteEnd: "some suite" (status: passing)
  • runEnd: (total: 2, passing: 2, failing: 0, skipped: 0, todo: 0)

Before TAP:

  • ok 1 some suite > test A
  • ok 2 some suite > test B
  • 1..2

When we alllow nesting in tests, and emit existing suites as tests, we see two potentially unexpected side-effects. Firstly, each suite now counts as a test, so the number of tests is higher, even if all these former-suites don't and can't have assertions. That's fine I suppose. Secondly, it means we are outputting them after their children because TAP is based on when a test result has come in, and naturally children end before their parent.

Possible future, events:

  • runStart: (total: 3)
  • testStart: "some suite"
  • testStart: "test A" (parent: "some suite")
  • testEnd: "test A" (parent: "some suite", status: passing)
  • testStart: "test B" (parent: "some suite")
  • testEnd: "test B" (parent: "some suite", status: passing)
  • testEnd: "some suite" (status: passing)
  • runEnd: (total: 3, passing: 3, failing: 0, skipped: 0, todo: 0)

Possible future, TAP:

  • ok 1 some suite > test A
  • ok 2 some suite > test B
  • ok 3 some suite
  • 1..3

We already knew that the TAP spec doesn't (yet) have a standard for sub tests (ref TestAnything/Specification#2), but there's a couple of (mostly back-compat) ways this is done today by node-tap, and its tap-parser. It provides a child event, which consumers can use to mark sub tests in some way, e.g. by indenting, prefixing or otherwise wrapping the inner tests.

This seems analogous to the "suite" events js-reporters have today, so maybe we shouldn't be merging these concepts after all. Rather we just need to add support for nesting tests.

We can continue to provide "suite" as a way of transparently grouping tests. Individual test frameworks and adapters don't have to use these of course. If their grouping unit likely to have assertions directlyh in it (not bail outs, but regular failures) then it might want to use "test" for both the group and the unit, but for transparent grouping of tests we can continue to provide "suite".

Would that make sense?

@Krinkle
Copy link
Member Author

Krinkle commented Jan 13, 2021

node-tap, for comparison:

const tap = require('tap');

tap.test('foo', (t) => {
	t.test('bar', (t) => {
		t.end();
	});
	t.end();
});
$ node_modules/.bin/tap -R tap tmp.js
TAP version 13
ok 1 - tmp.js # time=28.006ms {
    # Subtest: foo
        # Subtest: bar
            1..0
        ok 1 - bar # time=2.908ms
        
        1..1
    ok 1 - foo # time=19.055ms
    
    1..1
    # time=28.006ms
}

1..1
# time=5334.686ms

@Krinkle
Copy link
Member Author

Krinkle commented Jan 13, 2021

Would that make sense?

I'm still interested in hearing other thoughts, but, now that I've done most of the code changes required for this, I'm coming around (once again) to the idea that we don't need suites. We really should just represent suites as tests, I think.

The next thing I'm running into is the status and errors fields for TestEnd. Once a child test has failed, I'm not sure how the enclosing test should behave. It seems intuitive to propagate the error status, surely a parent should not succeed if one of its children is failing, that seems clear enough. But, what about errors? On the one hand it seems odd for producers to have to copy around and propagate these and reporting the same error multiple times. On the other hand, it also seems odd for consumers to deal with a test that has status: failed and not have an error object to explain the error.

Should we consider it normal for a test to be failed and yet have no errors? We woudl trust the producer (test framework/adapter) to have emitted it before if it was from a child, and thus consumers need to get used to that and present it in a way that makes sense. E.g. they can no longer say "it failed and here is why". Even some kind of rich IDE wouldn't be able to correlate the two with high confidence, if e.g. one clicks on the parent test, beyond e.g. falling back to showing its child test errors and hoping they are indeed the reason. I guess that makes sense. We might want to formalise it in the spec (possibly non-normative) that producers should only ever omit errors on a failing test if it failed due to a child test.

Krinkle added a commit that referenced this issue Jan 17, 2021
This is to accomodate node-tap and tape, which allow for child
tests to be associated directly with other assertion-holding tests
(as opposed to having tests only contain assertions, and suites
contain only tests and other suites).
Ref #126.

It also allows for future compatibility with TAP 14, which currently
has no concept of test groups or test suites, but is considering
the addition of "sub tests".
Ref TestAnything/testanything.github.io#36.

Also:

- Define "Adapter" and "Producer" terms.

- Refer mostly to producers and reporters, instead of frameworks,
  runners, or adapters.

- Remove mention that the spec is for reporting information about
  JavaScript test frameworks, it can report information about any
  kind of test that can be represented in its structure of JSON
  messages.
  Instead, do clarify that the spec defines a JavaScript-based
  API of producers and reporters.

Thought dump:

In aggregation, simplify status to failed/passed only,
if something has only todo or skipped children, don't
propagate this like we did with suites, but cast it down
to only failed/passed, as we did with "run" before.

This is because, with the "suite" concept gone, we can't
assume that test parents only contained other tests, they
may have their own assertions. As such, a parent with only
two skipped children doesn't mean the parent can therefore
be marked as skipped, rather it will be marked as passed,
assuming no errors/failures reported.

This affects the adapters for QUnit/Mocha/Jasmine, but when
frameworks implement this themselves, they can of course have
know if an entire suite was known to have been explicitly skipped
in which case it can mark that accordingly.
Krinkle added a commit that referenced this issue Jan 22, 2021
== Status quo ==

The TAP 13 specification does not standardise a way of describing parent-child
relationships between tests, nor does it standardise how to group tests.

Yet, all major test frameworks have a way to group tests (e.g. QUnit module,
and Mocha suite) and/or allow nesting tests inside of other tests (like tape,
and node-tap). While the CRI draft provided a way to group tests, it did not
accomodate Tap. They would either need to flatten the tests with a separator
symbol in the test name, or to create an implied "Suite" for every test that
has non-zero children and then come up with an ad-hoc naming scheme for it.

Note that the TAP 13 reporter we ship, even after this change, still ends up
flattening the tests by defaut using the greater than `>` symbol, but at least
the event model itself recognises the relationships so that other output formats
can make use of it, and in the future TAP 14 hopefully will recognise it as
well, which we can then make use of.

Ref TestAnything/testanything.github.io#36.

== Summary of changes ==

See the diff of `test/integration/reference-data.js` for the concrete changes
this makes to the consumable events.

- Remove `suiteStart` and `suiteEnd` events.

  Instead, the spec now says that tests are permitted to have children.

  The link from child to parent remains the same as before, using the `fullName`
  field which is now a stack of test names. Previously, it was a stack of suite
  names with a test name at the end.

- Remove all "downward" links from parent to child. Tests don't describe
  their children upfront in detail, and neither does `runStart`. This was
  information was very repetitive and tedious to satisy for implementors, and
  encouraged or required inefficient use of memory.

  I do recognise that a common use case might be to generate a single output
  file or stream where real-time updates are not needed, in which case you
  may want a convenient tree that is ready to traverse without needing to
  listen for async events and put it together. For this purpose, I have added a
  built-in reporter that simply listens to the new events and outputs a "summary"
  event with an object that is similar to the old "runEnd" event object where
  the entire run is described in a single large object.

- New "SummaryReporter" for simple use cases of non-realtime traversing of
  single structure after the test has completed.

== Caveats ==

- A test with the "failed" status is no longer expected to always have
  an error directly associated with it.

  Now that tests aggregate into other tests rather than into suites,
  this means tests that merely have other tests as children do still
  have to send a full testEnd event, and thus an `errors` and `assertions`
  array.

  I considered specifying that errors have to propagate but this seemed
  messy and could lead to duplicate diagnostic output in  reporters, as well
  ambiguity or uncertainty over where errors originated.

- A suite containing only "skipped" tests now aggregates as "passed"
  instead of "skipped". Given we can't know whether a suite is its own
  test with its own assertions, we also can't assume that if a test parent
  has only "skipped" children that the parent was also skipped.

  This applies to our built-in adapters, but individual frameworks, if they
  know that a suite was skipped in its entirety, can of course still set the
  status of parents however they see fit.

- Graphical reporters (such as QUnit and Mocha's HTML reporters) may no
  longer assume that a test parent has either assertions/errors or other
  tests. A test parente can now have both its own assertions/errors, as well
  as other tests beneath it.

  This restricts the freedom and possibilities for visualisation.
  My recommendation is that, if a visual reporter wants to keep using different
  visual shapes for "group of assertions" and "group of tests", that they
  buffer the information internally such that they can first render all the
  tests's own assertions, and then render the children, even if they originally
  ran interleaved and/or the other way around.
  Ref #126.

== Misc ==

- Add definitions for the "Adapter" and "Producer" terms.

- Use terms "producer" and "reporter" consistently, instead of
  "framework", "runner", or "adapter".

- Remove mention that the spec is for reporting information from
  "JavaScript test frameworks". CRI can be used to report information
  about any kind of test that can be represented in CRI's event model,
  including linting and end-to-end tests for JS programs, as well as
  non-JS programs. It describes a JS interface for reporters, but the
  information can come from anywhere.

  This further solifies that CRI is not meant to be used for "hooking"
  into a framework, and sets no expectation about timing or run-time
  environment being shared with whatever is executing tests in some
  form or another. This was already the intent originally, since it could
  be used to report information from other processes or from a cloud-based
  test runner like BrowserStack, but this removes any remaining confusion
  or doubt there may have been.

Fixes #126.
Krinkle added a commit that referenced this issue Jan 22, 2021
== Status quo ==

The TAP 13 specification does not standardise a way of describing parent-child
relationships between tests, nor does it standardise how to group tests.

Yet, all major test frameworks have a way to group tests (e.g. QUnit module,
and Mocha suite) and/or allow nesting tests inside of other tests (like tape,
and node-tap). While the CRI draft provided a way to group tests, it did not
accomodate Tap. They would either need to flatten the tests with a separator
symbol in the test name, or to create an implied "Suite" for every test that
has non-zero children and then come up with an ad-hoc naming scheme for it.

Note that the TAP 13 reporter we ship, even after this change, still ends up
flattening the tests by defaut using the greater than `>` symbol, but at least
the event model itself recognises the relationships so that other output formats
can make use of it, and in the future TAP 14 hopefully will recognise it as
well, which we can then make use of.

Ref TestAnything/testanything.github.io#36.

== Summary of changes ==

See the diff of `test/integration/reference-data.js` for the concrete changes
this makes to the consumable events.

- Remove `suiteStart` and `suiteEnd` events.

  Instead, the spec now says that tests are permitted to have children.

  The link from child to parent remains the same as before, using the `fullName`
  field which is now a stack of test names. Previously, it was a stack of suite
  names with a test name at the end.

- Remove all "downward" links from parent to child. Tests don't describe
  their children upfront in detail, and neither does `runStart`. This was
  information was very repetitive and tedious to satisy for implementors, and
  encouraged or required inefficient use of memory.

  I do recognise that a common use case might be to generate a single output
  file or stream where real-time updates are not needed, in which case you
  may want a convenient tree that is ready to traverse without needing to
  listen for async events and put it together. For this purpose, I have added a
  built-in reporter that simply listens to the new events and outputs a "summary"
  event with an object that is similar to the old "runEnd" event object where
  the entire run is described in a single large object.

- New "SummaryReporter" for simple use cases of non-realtime traversing of
  single structure after the test has completed.

== Caveats ==

- A test with the "failed" status is no longer expected to always have
  an error directly associated with it.

  Now that tests aggregate into other tests rather than into suites,
  this means tests that merely have other tests as children do still
  have to send a full testEnd event, and thus an `errors` and `assertions`
  array.

  I considered specifying that errors have to propagate but this seemed
  messy and could lead to duplicate diagnostic output in  reporters, as well
  ambiguity or uncertainty over where errors originated.

- A suite containing only "skipped" tests now aggregates as "passed"
  instead of "skipped". Given we can't know whether a suite is its own
  test with its own assertions, we also can't assume that if a test parent
  has only "skipped" children that the parent was also skipped.

  This applies to our built-in adapters, but individual frameworks, if they
  know that a suite was skipped in its entirety, can of course still set the
  status of parents however they see fit.

- Graphical reporters (such as QUnit and Mocha's HTML reporters) may no
  longer assume that a test parent has either assertions/errors or other
  tests. A test parente can now have both its own assertions/errors, as well
  as other tests beneath it.

  This restricts the freedom and possibilities for visualisation.
  My recommendation is that, if a visual reporter wants to keep using different
  visual shapes for "group of assertions" and "group of tests", that they
  buffer the information internally such that they can first render all the
  tests's own assertions, and then render the children, even if they originally
  ran interleaved and/or the other way around.
  Ref #126.

== Misc ==

- Add definitions for the "Adapter" and "Producer" terms.

- Use terms "producer" and "reporter" consistently, instead of
  "framework", "runner", or "adapter".

- Remove mention that the spec is for reporting information from
  "JavaScript test frameworks". CRI can be used to report information
  about any kind of test that can be represented in CRI's event model,
  including linting and end-to-end tests for JS programs, as well as
  non-JS programs. It describes a JS interface for reporters, but the
  information can come from anywhere.

  This further solifies that CRI is not meant to be used for "hooking"
  into a framework, and sets no expectation about timing or run-time
  environment being shared with whatever is executing tests in some
  form or another. This was already the intent originally, since it could
  be used to report information from other processes or from a cloud-based
  test runner like BrowserStack, but this removes any remaining confusion
  or doubt there may have been.

Fixes #126.
Krinkle added a commit that referenced this issue Jan 22, 2021
== Status quo ==

The TAP 13 specification does not standardise a way of describing parent-child
relationships between tests, nor does it standardise how to group tests.

Yet, all major test frameworks have a way to group tests (e.g. QUnit module,
and Mocha suite) and/or allow nesting tests inside of other tests (like tape,
and node-tap). While the CRI draft provided a way to group tests, it did not
accomodate Tap. They would either need to flatten the tests with a separator
symbol in the test name, or to create an implied "Suite" for every test that
has non-zero children and then come up with an ad-hoc naming scheme for it.

Note that the TAP 13 reporter we ship, even after this change, still ends up
flattening the tests by defaut using the greater than `>` symbol, but at least
the event model itself recognises the relationships so that other output formats
can make use of it, and in the future TAP 14 hopefully will recognise it as
well, which we can then make use of.

Ref TestAnything/testanything.github.io#36.

== Summary of changes ==

See the diff of `test/integration/reference-data.js` for the concrete changes
this makes to the consumable events.

- Remove `suiteStart` and `suiteEnd` events.

  Instead, the spec now says that tests are permitted to have children.

  The link from child to parent remains the same as before, using the `fullName`
  field which is now a stack of test names. Previously, it was a stack of suite
  names with a test name at the end.

- Remove all "downward" links from parent to child. Tests don't describe
  their children upfront in detail, and neither does `runStart`. This was
  information was very repetitive and tedious to satisy for implementors, and
  encouraged or required inefficient use of memory.

  I do recognise that a common use case might be to generate a single output
  file or stream where real-time updates are not needed, in which case you
  may want a convenient tree that is ready to traverse without needing to
  listen for async events and put it together. For this purpose, I have added a
  built-in reporter that simply listens to the new events and outputs a "summary"
  event with an object that is similar to the old "runEnd" event object where
  the entire run is described in a single large object.

- New "SummaryReporter" for simple use cases of non-realtime traversing of
  single structure after the test has completed.

== Caveats ==

- A test with the "failed" status is no longer expected to always have
  an error directly associated with it.

  Now that tests aggregate into other tests rather than into suites,
  this means tests that merely have other tests as children do still
  have to send a full testEnd event, and thus an `errors` and `assertions`
  array.

  I considered specifying that errors have to propagate but this seemed
  messy and could lead to duplicate diagnostic output in  reporters, as well
  ambiguity or uncertainty over where errors originated.

- A suite containing only "skipped" tests now aggregates as "passed"
  instead of "skipped". Given we can't know whether a suite is its own
  test with its own assertions, we also can't assume that if a test parent
  has only "skipped" children that the parent was also skipped.

  This applies to our built-in adapters, but individual frameworks, if they
  know that a suite was skipped in its entirety, can of course still set the
  status of parents however they see fit.

- Graphical reporters (such as QUnit and Mocha's HTML reporters) may no
  longer assume that a test parent has either assertions/errors or other
  tests. A test parente can now have both its own assertions/errors, as well
  as other tests beneath it.

  This restricts the freedom and possibilities for visualisation.
  My recommendation is that, if a visual reporter wants to keep using different
  visual shapes for "group of assertions" and "group of tests", that they
  buffer the information internally such that they can first render all the
  tests's own assertions, and then render the children, even if they originally
  ran interleaved and/or the other way around.
  Ref #126.

- The "Console" reporter that comes with js-reporter now no longer
  uses `console.group()` for collapsing nested tests.

== Misc ==

- Add definitions for the "Adapter" and "Producer" terms.

- Use terms "producer" and "reporter" consistently, instead of
  "framework", "runner", or "adapter".

- Remove mention that the spec is for reporting information from
  "JavaScript test frameworks". CRI can be used to report information
  about any kind of test that can be represented in CRI's event model,
  including linting and end-to-end tests for JS programs, as well as
  non-JS programs. It describes a JS interface for reporters, but the
  information can come from anywhere.

  This further solifies that CRI is not meant to be used for "hooking"
  into a framework, and sets no expectation about timing or run-time
  environment being shared with whatever is executing tests in some
  form or another. This was already the intent originally, since it could
  be used to report information from other processes or from a cloud-based
  test runner like BrowserStack, but this removes any remaining confusion
  or doubt there may have been.

Fixes #126.
Krinkle added a commit that referenced this issue Jan 22, 2021
== Status quo ==

The TAP 13 specification does not standardise a way of describing parent-child
relationships between tests, nor does it standardise how to group tests.

Yet, all major test frameworks have a way to group tests (e.g. QUnit module,
and Mocha suite) and/or allow nesting tests inside of other tests (like tape,
and node-tap). While the CRI draft provided a way to group tests, it did not
accomodate Tap. They would either need to flatten the tests with a separator
symbol in the test name, or to create an implied "Suite" for every test that
has non-zero children and then come up with an ad-hoc naming scheme for it.

Note that the TAP 13 reporter we ship, even after this change, still ends up
flattening the tests by defaut using the greater than `>` symbol, but at least
the event model itself recognises the relationships so that other output formats
can make use of it, and in the future TAP 14 hopefully will recognise it as
well, which we can then make use of.

Ref TestAnything/testanything.github.io#36.

== Summary of changes ==

See the diff of `test/integration/reference-data.js` for the concrete changes
this makes to the consumable events.

- Remove `suiteStart` and `suiteEnd` events.

  Instead, the spec now says that tests are permitted to have children.

  The link from child to parent remains the same as before, using the `fullName`
  field which is now a stack of test names. Previously, it was a stack of suite
  names with a test name at the end.

- Remove all "downward" links from parent to child. Tests don't describe
  their children upfront in detail, and neither does `runStart`. This was
  information was very repetitive and tedious to satisy for implementors, and
  encouraged or required inefficient use of memory.

  I do recognise that a common use case might be to generate a single output
  file or stream where real-time updates are not needed, in which case you
  may want a convenient tree that is ready to traverse without needing to
  listen for async events and put it together. For this purpose, I have added a
  built-in reporter that simply listens to the new events and outputs a "summary"
  event with an object that is similar to the old "runEnd" event object where
  the entire run is described in a single large object.

- New "SummaryReporter" for simple use cases of non-realtime traversing of
  single structure after the test has completed.

== Caveats ==

- A test with the "failed" status is no longer expected to always have
  an error directly associated with it.

  Now that tests aggregate into other tests rather than into suites,
  this means tests that merely have other tests as children do still
  have to send a full testEnd event, and thus an `errors` and `assertions`
  array.

  I considered specifying that errors have to propagate but this seemed
  messy and could lead to duplicate diagnostic output in  reporters, as well
  ambiguity or uncertainty over where errors originated.

- A suite containing only "skipped" tests now aggregates as "passed"
  instead of "skipped". Given we can't know whether a suite is its own
  test with its own assertions, we also can't assume that if a test parent
  has only "skipped" children that the parent was also skipped.

  This applies to our built-in adapters, but individual frameworks, if they
  know that a suite was skipped in its entirety, can of course still set the
  status of parents however they see fit.

- Graphical reporters (such as QUnit and Mocha's HTML reporters) may no
  longer assume that a test parent has either assertions/errors or other
  tests. A test parente can now have both its own assertions/errors, as well
  as other tests beneath it.

  This restricts the freedom and possibilities for visualisation.
  My recommendation is that, if a visual reporter wants to keep using different
  visual shapes for "group of assertions" and "group of tests", that they
  buffer the information internally such that they can first render all the
  tests's own assertions, and then render the children, even if they originally
  ran interleaved and/or the other way around.
  Ref #126.

- The "Console" reporter that comes with js-reporter now no longer
  uses `console.group()` for collapsing nested tests.

== Misc ==

- Add definitions for the "Adapter" and "Producer" terms.

- Use terms "producer" and "reporter" consistently, instead of
  "framework", "runner", or "adapter".

- Remove mention that the spec is for reporting information from
  "JavaScript test frameworks". CRI can be used to report information
  about any kind of test that can be represented in CRI's event model,
  including linting and end-to-end tests for JS programs, as well as
  non-JS programs. It describes a JS interface for reporters, but the
  information can come from anywhere.

  This further solifies that CRI is not meant to be used for "hooking"
  into a framework, and sets no expectation about timing or run-time
  environment being shared with whatever is executing tests in some
  form or another. This was already the intent originally, since it could
  be used to report information from other processes or from a cloud-based
  test runner like BrowserStack, but this removes any remaining confusion
  or doubt there may have been.

Fixes #126.
Krinkle added a commit that referenced this issue Jan 22, 2021
== Status quo ==

The TAP 13 specification does not standardise a way of describing parent-child
relationships between tests, nor does it standardise how to group tests.

Yet, all major test frameworks have a way to group tests (e.g. QUnit module,
and Mocha suite) and/or allow nesting tests inside of other tests (like tape,
and node-tap). While the CRI draft provided a way to group tests, it did not
accomodate Tap. They would either need to flatten the tests with a separator
symbol in the test name, or to create an implied "Suite" for every test that
has non-zero children and then come up with an ad-hoc naming scheme for it.

Note that the TAP 13 reporter we ship, even after this change, still ends up
flattening the tests by defaut using the greater than `>` symbol, but at least
the event model itself recognises the relationships so that other output formats
can make use of it, and in the future TAP 14 hopefully will recognise it as
well, which we can then make use of.

Ref TestAnything/testanything.github.io#36.

== Summary of changes ==

See the diff of `test/integration/reference-data.js` for the concrete changes
this makes to the consumable events.

- Remove `suiteStart` and `suiteEnd` events.

  Instead, the spec now says that tests are permitted to have children.

  The link from child to parent remains the same as before, using the `fullName`
  field which is now a stack of test names. Previously, it was a stack of suite
  names with a test name at the end.

- Remove all "downward" links from parent to child. Tests don't describe
  their children upfront in detail, and neither does `runStart`. This was
  information was very repetitive and tedious to satisy for implementors, and
  encouraged or required inefficient use of memory.

  I do recognise that a common use case might be to generate a single output
  file or stream where real-time updates are not needed, in which case you
  may want a convenient tree that is ready to traverse without needing to
  listen for async events and put it together. For this purpose, I have added a
  built-in reporter that simply listens to the new events and outputs a "summary"
  event with an object that is similar to the old "runEnd" event object where
  the entire run is described in a single large object.

- New "SummaryReporter" for simple use cases of non-realtime traversing of
  single structure after the test has completed.

== Caveats ==

- A test with the "failed" status is no longer expected to always have
  an error directly associated with it.

  Now that tests aggregate into other tests rather than into suites,
  this means tests that merely have other tests as children do still
  have to send a full testEnd event, and thus an `errors` and `assertions`
  array.

  I considered specifying that errors have to propagate but this seemed
  messy and could lead to duplicate diagnostic output in  reporters, as well
  ambiguity or uncertainty over where errors originated.

- A suite containing only "skipped" tests now aggregates as "passed"
  instead of "skipped". Given we can't know whether a suite is its own
  test with its own assertions, we also can't assume that if a test parent
  has only "skipped" children that the parent was also skipped.

  This applies to our built-in adapters, but individual frameworks, if they
  know that a suite was skipped in its entirety, can of course still set the
  status of parents however they see fit.

- Graphical reporters (such as QUnit and Mocha's HTML reporters) may no
  longer assume that a test parent has either assertions/errors or other
  tests. A test parente can now have both its own assertions/errors, as well
  as other tests beneath it.

  This restricts the freedom and possibilities for visualisation.
  My recommendation is that, if a visual reporter wants to keep using different
  visual shapes for "group of assertions" and "group of tests", that they
  buffer the information internally such that they can first render all the
  tests's own assertions, and then render the children, even if they originally
  ran interleaved and/or the other way around.
  Ref #126.

- The "Console" reporter that comes with js-reporter now no longer
  uses `console.group()` for collapsing nested tests.

== Misc ==

- Add definitions for the "Adapter" and "Producer" terms.

- Use terms "producer" and "reporter" consistently, instead of
  "framework", "runner", or "adapter".

- Remove mention that the spec is for reporting information from
  "JavaScript test frameworks". CRI can be used to report information
  about any kind of test that can be represented in CRI's event model,
  including linting and end-to-end tests for JS programs, as well as
  non-JS programs. It describes a JS interface for reporters, but the
  information can come from anywhere.

  This further solifies that CRI is not meant to be used for "hooking"
  into a framework, and sets no expectation about timing or run-time
  environment being shared with whatever is executing tests in some
  form or another. This was already the intent originally, since it could
  be used to report information from other processes or from a cloud-based
  test runner like BrowserStack, but this removes any remaining confusion
  or doubt there may have been.

Fixes #126.
Krinkle added a commit that referenced this issue Feb 14, 2021
== Status quo ==

The TAP 13 specification does not standardise a way of describing parent-child
relationships between tests, nor does it standardise how to group tests.

Yet, all major test frameworks have a way to group tests (e.g. QUnit module,
and Mocha suite) and/or allow nesting tests inside of other tests (like tape,
and node-tap). While the CRI draft provided a way to group tests, it did not
accomodate Tap. They would either need to flatten the tests with a separator
symbol in the test name, or to create an implied "Suite" for every test that
has non-zero children and then come up with an ad-hoc naming scheme for it.

Note that the TAP 13 reporter we ship, even after this change, still ends up
flattening the tests by defaut using the greater than `>` symbol, but at least
the event model itself recognises the relationships so that other output formats
can make use of it, and in the future TAP 14 hopefully will recognise it as
well, which we can then make use of.

Ref TestAnything/testanything.github.io#36.

== Summary of changes ==

See the diff of `test/integration/reference-data.js` for the concrete changes
this makes to the consumable events.

- Remove `suiteStart` and `suiteEnd` events.

  Instead, the spec now says that tests are permitted to have children.

  The link from child to parent remains the same as before, using the `fullName`
  field which is now a stack of test names. Previously, it was a stack of suite
  names with a test name at the end.

- Remove all "downward" links from parent to child. Tests don't describe
  their children upfront in detail, and neither does `runStart`. This was
  information was very repetitive and tedious to satisy for implementors, and
  encouraged or required inefficient use of memory.

  I do recognise that a common use case might be to generate a single output
  file or stream where real-time updates are not needed, in which case you
  may want a convenient tree that is ready to traverse without needing to
  listen for async events and put it together. For this purpose, I have added a
  built-in reporter that simply listens to the new events and outputs a "summary"
  event with an object that is similar to the old "runEnd" event object where
  the entire run is described in a single large object.

- New "SummaryReporter" for simple use cases of non-realtime traversing of
  single structure after the test has completed.

== Caveats ==

- A test with the "failed" status is no longer expected to always have
  an error directly associated with it.

  Now that tests aggregate into other tests rather than into suites,
  this means tests that merely have other tests as children do still
  have to send a full testEnd event, and thus an `errors` and `assertions`
  array.

  I considered specifying that errors have to propagate but this seemed
  messy and could lead to duplicate diagnostic output in  reporters, as well
  ambiguity or uncertainty over where errors originated.

- A suite containing only "skipped" tests now aggregates as "passed"
  instead of "skipped". Given we can't know whether a suite is its own
  test with its own assertions, we also can't assume that if a test parent
  has only "skipped" children that the parent was also skipped.

  This applies to our built-in adapters, but individual frameworks, if they
  know that a suite was skipped in its entirety, can of course still set the
  status of parents however they see fit.

- Graphical reporters (such as QUnit and Mocha's HTML reporters) may no
  longer assume that a test parent has either assertions/errors or other
  tests. A test parente can now have both its own assertions/errors, as well
  as other tests beneath it.

  This restricts the freedom and possibilities for visualisation.
  My recommendation is that, if a visual reporter wants to keep using different
  visual shapes for "group of assertions" and "group of tests", that they
  buffer the information internally such that they can first render all the
  tests's own assertions, and then render the children, even if they originally
  ran interleaved and/or the other way around.
  Ref #126.

- The "Console" reporter that comes with js-reporter now no longer
  uses `console.group()` for collapsing nested tests.

== Misc ==

- Add definitions for the "Adapter" and "Producer" terms.

- Use terms "producer" and "reporter" consistently, instead of
  "framework", "runner", or "adapter".

- Remove mention that the spec is for reporting information from
  "JavaScript test frameworks". CRI can be used to report information
  about any kind of test that can be represented in CRI's event model,
  including linting and end-to-end tests for JS programs, as well as
  non-JS programs. It describes a JS interface for reporters, but the
  information can come from anywhere.

  This further solifies that CRI is not meant to be used for "hooking"
  into a framework, and sets no expectation about timing or run-time
  environment being shared with whatever is executing tests in some
  form or another. This was already the intent originally, since it could
  be used to report information from other processes or from a cloud-based
  test runner like BrowserStack, but this removes any remaining confusion
  or doubt there may have been.

Fixes #126.
@Krinkle Krinkle unpinned this issue Feb 14, 2021
Krinkle added a commit that referenced this issue Feb 21, 2021
In light of the shift in direction per #133,
I'm reverting (most of) cce0e4d so as
to allow the next release to more similar to the previous, and to make
upgrading easy, allowing most reporters to keep working with very minimal
changes (if any).

Instead, I'll focus on migrating consumers of js-reporters to use
TAP tools directly where available, and to otherwise reduce use of
js-reporters to purely the adapting and piping to TapReporter.

* Revert `RunStart.testCounts` > `RunStart.counts` (idem RunEnd).
* Revert `TestStart.suitName` > `TestStart.parentName` (idem TestEnd).
* Revert Test allowing Test as child, restore Suite.

This un-fixes #126,
which will be declined. Frameworks adapted to TAP by js-reporters will
not supported nested tests.

Frameworks directly providing TAP 13 can one of several strategies
to express relationships in a backwards-compatible manner, e.g. like
we do in js-reporters by flattening with '>' symbol, or through
indentation or through other manners proposed in
TestAnything/testanything.github.io#36.
Refer to #133 for
questions about how to support TAP.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

1 participant