Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standarize the notation of subtests in TAP #2

Open
kinow opened this issue Jan 6, 2015 · 88 comments
Open

Standarize the notation of subtests in TAP #2

kinow opened this issue Jan 6, 2015 · 88 comments

Comments

@kinow
Copy link
Member

kinow commented Jan 6, 2015

This issue was originally reported in TestAnything/testanything.github.io#20, and the conversation has been moved here since it is related to the TAP specification.

It concerns the subtests in TAP and a formal standard to be followed by TAP consumer and producers. In the latest TAP specification (TAP 13) there is no formal definition of subtests. There were proposals and draft documents on this, some tools like Test::More and tap4j support with limitations.

Please feel free to chime in and suggest enhancements for this standard. A message is being written to the mailing list pointing to this issue too. The mailing list has several users that participated in the IETF draft and have lots of experience with TAP and its implementations.

/cc @Leont @xmojmr

@jonathanKingston
Copy link
Member

One thing that did not seem to be mentioned on the TAP mailing list was that the potential for a sub test to look like this:

ok 1 - something
ok 1 -     something
ok 2 - something

or

ok 1 - something
ok 1 -- something
ok 2 - something

I believe most libraries would support a failure just fine for both of those, its simpler that requiring an update to the libraries and forces stricter reasoning of tests which is a bonus.

@Leont
Copy link

Leont commented Jan 6, 2015

We really want to have this. I'm not fond of the current mechanism, but given it's out there and there's a lot of stuff outputting it already it may be a passed station.

@Leont
Copy link

Leont commented Jan 6, 2015

I believe most libraries would support a failure just fine for both of those, its simpler that requiring an update to the libraries and forces stricter reasoning of tests which is a bonus.

Both parsers I've worked on would consider those outputs an error. Subtests really should be something legacy parsers consider comments, which fortunately still gives a lot of space.

@kinow
Copy link
Member Author

kinow commented Jan 6, 2015

@jonathanKingston I'm happy with the old proposal for subtests

TAP 13
1..1
ok 1 - Subtest 1
    ok 1 - subtest 2a
    ok 2 - subtest 2b

I'd be fine with other formats, but since there are tools in Perl, Java, Node, which output something similar to that, I'd be more keen to simply formalize it in a TAP 14 specification.

@Leont

We really want to have this. I'm not fond of the current mechanism, but given it's out there and there's a lot of stuff outputting it already it may be a passed station.

+1 I believe we have the formal TAP 13 specification, and a meta TAP 13.5, or TAP 13 + proposals/custom code. Considering the number of users in different programming languages outputting subtests, I think a proposal similar to what most tools produce (and that got consensus here and in the mailing list) would be a good approach.

As a side note, there is an open issue in tap4j where the TAP producer emits the subtests first. The current state machine in tap4j is not able to consume a subtest before. Quoting the example in that issue below:

    ok 1 - subtest 1a
    ok 2 - subtest 1b
    1..2
ok 1 - Subtest 1
    ok 1 - subtest 2a
    ok 2 - subtest 2b
    1..2
ok 2 - Subtest 2
1..2
ok

A perl test that will produce this output is as follows:

use Test::More;

subtest 'Subtest 1' => sub {
    ok(1,'subtest 1a');
    ok(1,'subtest 1b');
};
subtest 'Subtest 2' => sub {
    ok(1,'subtest 2a');
    ok(1,'subtest 2b');
};
done_testing();

It'd be nice to have the specification in cases like this to guide the developer :-)

@jonathanKingston
Copy link
Member

We should not be afraid to break old implementations and if anything that should be the correct behaviour. Old implementations should not ever ignore sub tests. This is a testing setup, ignoring tests would be a mistake. Due to the non standard nature of all the current use of subtests it is possible subtests could fail and the parent succeed - this should be caught or be a parse error; either for me is fine.
This brings me to another ticket I need to make about how TAP software should respond if it can't be parsed.

I'm a little against looking to satisfy older implementations which we might end up calling 13.5 erroneously. This is a similar situation to what Markdown has been through and where possible I would prefer to go for breaking behaviour and errors will be spotted sooner rather than skipped over.

This was my theory around the formats above which should be treated as safe in most parsers where extra white space would be ignored around the comment.

I would therefore aim for:

  • False negative changes only
  • Try not to risk being interpreted by any parser as YAML
  • Discover the most TAP consumers which can cope with the two points above

I'm actually a fan of the tap4j as it would aid streaming output of the tests before they have completed running.

The other issue with the old proposal was that there is no acknowledgement of sub test count which I think is an issue.

  • Do sub tests count as the main count?
  • Sub tests counts help consumers fail earlier from a streaming output of tests

One last point, the number of main consumers and producers is still actually small, patching all of them to support a newer version shouldn't be a large piece of work.

@Leont
Copy link

Leont commented Jan 7, 2015

We should not be afraid to break old implementations and if anything that should be the correct behaviour. Old implementations should not ever ignore sub tests. This is a testing setup, ignoring tests would be a mistake.

The way it was designed means that even if the subtests are ignored the correct conclusion can still be made. It leads to a data-loss, not not to incorrect operation.

Due to the non standard nature of all the current use of subtests it is possible subtests could fail and the parent succeed - this should be caught or be a parse error; either for me is fine.

In the parser I wrote a discrepancy generates a non-fatal parse error.

I'm a little against looking to satisfy older implementations which we might end up calling 13.5 erroneously. This is a similar situation to what Markdown has been through and where possible I would prefer to go for breaking behaviour and errors will be spotted sooner rather than skipped over.

Do we really have that problem?

This was my theory around the formats above which should be treated as safe in most parsers where extra white space would be ignored around the comment.

I don't follow you. They don't have extra whitespace at the start of the line, so they won't be ignored even on legacy parsers.

@jonathanKingston
Copy link
Member

The point I am making is we are expecting the producer to be error free by always returning a not ok parent test. I would prefer if both producer and consumer were more fragile rather than the 'Just cope' attitude the philosophy suggests.

This was my theory around the formats above which should be treated as safe in most parsers where extra white space would be ignored around the comment.

I don't follow you. They don't have extra whitespace at the start of the line, so they won't be ignored even on legacy parsers.

That is my point, why would you want them ignored. I want a testing framework to do one of two things when it doesn't know what to do:

  • Fatal error
  • Add a parse error as if it were a failed test it encountered

Older parsers may or may not be able to cope with invalid subtests. The following will not fail with some of the parsers I have seen:

ok 1 - test
  not ok 1 - test
ok 2 - test

However both these would:

ok 1 - test
not ok 1 -    test
ok 2 - test

and

ok 1 - test
not ok 1 -- test
ok 2 - test

The way it was designed means that even if the subtests are ignored the correct conclusion can still be made. It leads to a data-loss, not not to incorrect operation.

This is unlikely to be true for all producers but I take your point that the parent test should fail however I would prefer the correct behaviour for parsers is that they fail.

In the parser I wrote a discrepancy generates a non-fatal parse error.

This is unlikely to be true for all consumers, in fact the TAP philosophy (which likely needs addressing) suggests any such issue should be skipped.


The only way I would support this with the subtests being before like @kinow mentioned. Even then the format isn't as easy to read.

@Leont
Copy link

Leont commented Jan 9, 2015

Older parsers may or may not be able to cope with invalid subtests. The following will not fail with some of the parsers I have seen:

Even as OKs they would fail because the numbering is incorrect.

This is unlikely to be true for all producers but I take your point that the parent test should fail however I would prefer the correct behaviour for parsers is that they fail.

Agreed. Parsing only the parent test should clearly be considered legacy behavior, but legacy has its uses.

This is unlikely to be true for all consumers, in fact the TAP philosophy (which likely needs addressing) suggests any such issue should be skipped.

My take on the philosophy would be: "report errors, but when possibly make them non-fatal", as that way the user can be presented with as much information as possible.

@jonathanKingston
Copy link
Member

Very true however subtests could be part of the same numbering potentially.

Is there other useful meta data that subtests have?

Yeah I will raise an issue also with that as I certainly don't think it is clear enough.

@jonathanKingston
Copy link
Member

@Ovid I saw some interesting comments on the mailing list of yours. (http://www.ietf.org/mail-archive/web/tap/current/msg00530.html) Would you be able to mention your concerns here?

I believe a way of demarcating subtests is needed and it should satisfy these conditions:

  • Subtests be tested by older parsers as if it was a parent or fail in most cases
  • YAML doesn't become a nightmare to parse
  • Continue to be as human readable as possible
  • Allow for test numbering
    • Allow for sub test counts 1..20
    • Allow for sub test numbering 1.1.1 and 1

@Leont
Copy link

Leont commented Jan 15, 2015

Allow for sub test numbering 1.1.1 and 1

I'm not sure what you mean with that exactly.

@jonathanKingston
Copy link
Member

ok 1 - something
ok 1.1 -- something
ok 1.1.1 --- something
ok 2 - something

and

ok 1 - something
ok 1 -- something
ok 1 --- something
ok 2 - something

Obviously numbering itself should be optional also.

@Ovid
Copy link

Ovid commented Jan 18, 2015

@jonathanKingston: The approach has been suggested before and always shot down. The problem with this approach is that it is not backwards-compatible. Maintaining this backwards compatibility is a key design goal of TAP.

@jonathanKingston
Copy link
Member

@Ovid it depends how you define backwards compatibility, therein lies my issues with the TAP philosophy.
Where the new format includes further information about the tests, TAP consumers should of course always succeed.

However in this case we are transferring further tests, we should be doing our best to make all previous consumers to assume the tests are top level tests.

Subtests should in older consumers either:

  • Cause a parse error (Getting the consumer to upgrade their consumer)
  • Add in top level tests (Allowing test failures to be treated as failures if they are)

It would not be acceptable for the following to be treated as an overall pass in older consumers:

ok - test
    not ok - test
ok - test

What I think this means is that the choice between any subtest format is:

  • Complexity in future parsers surrounding YAML blocks and lack of ability of extra meta data
  • Risk of parse errors in older formats

Ultimately if backwards compatibility were a priority then the TAP version would not have been put in place in the first instance. In my opinion truly backwards compatible structures would not put in version strings.

The problem around backwards compatibility here restricts subtest numbering completely (However looking at the specification numbering is very loosely defined itself, it doesn't even state test numbers are incremental), by adding in numbering when we have the tests any numbering policy would be problematic also.

I think the benefit here is that we can define subtests without the risk of breaking older consumers - false negatives are better than false positives when it comes to tests themselves.

@Leont
Copy link

Leont commented Jan 20, 2015

Cause a parse error (Getting the consumer to upgrade their consumer)

This is not acceptable at all. Specially not in situations where the producer and the consumer are distributed separately. If the consequence of upgrading your producer is breakage, the most sensible decision is to not upgrade that producer. This sort of thing means people will refrain from upgrading to TAP 14.

Add in top level tests (Allowing test failures to be treated as failures if they are)

Both of your suggestions will cause a test harness to fail despite the tests all being succeeding. Again, if upgrading a producer causes the toolchain to fail when the tests pass, the sensible thing is to not upgrade the producer.

It would not be acceptable for the following to be treated as an overall pass in older consumers:

That is already the situation today. And it's not a major issues because it only happens on a faulty producer. In a green-fields implementation this would make sense, but given the reality we can't enforce this for legacy consumers.

@jonathanKingston
Copy link
Member

I'm sorry but I really am not seeing the adverse reaction to a testing framework that requires you to upgrade your consumer when you decide to want subtests.

All ideas have their pitfalls so far. Its a testing framework not a GUI tool or phone system we should not be trying to remedy old frameworks. Testing frameworks need to report errors not be ignored accuracy should be the key.

It is not really greenfield at all as some have gone with the suggested wiki post / mailing list. Some have gone with the tap4j backwards format. Some have invented YAML subdocuments and etc. The best we can do is cause errors and explain why by incrementing the version number.

The assertions made are that the TAP output is read by one consumer for the one producer and that won't always be the case.

I'm not precious to any of the suggested structures I have given at all however I am precious to allow the safest failures for older frameworks. You mention that test harness will fail and yes ideally we should look for a structure that has the least issues in the most popular consumers however as I keep saying we should not be worried about causing false positives, false negatives would be far worse.

Ideally I would prefer an implementation that breaks all old implementations to force the consumer upgrade if subtests are used.

If you were upgrading a framework of any kind in a corporate setting, the best is to assume that everything interacting with the framework will break. If we make TAP 14 compelling enough then slowly people will upgrade. Upgrades in all software needs to be compelling enough as likely enough there will be issues.

What I suggest is we keep looking, I really don't think it is ok that the best worst of the indentation is taken up. This causes YAML parsing issues for simplistic parsers.

I'm back to lets throw as many examples at consumers until we find the most that either:

  • can cope with the suggested format
  • blow up and cause upgrades

Can we at least keep looking/trying for a solution here, I think it is pretty obvious why no solution was ever chosen. Slumping for the first suggestion we know seems a little lacking.

@Ovid
Copy link

Ovid commented Jan 20, 2015

@jonathanKingston: I know this must be frustrating for you and I'm sorry about that. We're very, very cautious about this because TAP is the default test protocol used in the core Perl language. Perl is distributed by default with just about every major operating system except windows. This means that in just about every city on the planet with modern computers, Perl is there and thus TAP is there.

There is no way I can estimate the entire scope of breakage (many of those default installations will never be touched), but there have been serious issues in the past with accidental toolchain breakage causing an uproar. And then there's the ripple effect of this change spreading out to other language test harnesses which produce and consume TAP and there would be yet another uproar (and I'm pretty sure it would kill TAP or cause a fork).

I see no compelling reason for this change, but the downside is that it would be a disaster and devs in countries we've never thought of would be staring at their monitors, trying to figure out what happened. The first rule of the toolchain is that you do not break the toolchain (we Perl toolchain devs have been bitten enough times that we're learned our lessons).

As for how to handle YAML diagnostic information, I thought it was to be indented four spaces from the current line, but there's a potential problem there (the following is completely made up and just used as an example):

ok 1 - some test
ok 2 - some test
    1..3
    ok 1 - some subtest
    not ok 2 - some subtest
        ---
        have: 7
        want: 8
        line: 7
        file: t/foo.t
        extra:
               - |
                   This is a YAML here document and maybe it's
                   not ok to embed this in test diagnostics
    ok 3 - bummer
not ok - 3 - subtest summary line
ok 4
1..4

In the above example, I could easily see a poorly-written parser choking on the YAML heredoc. That will need to be specified carefully and I don't see how the suggested changes which break the parser would mitigate the above case.

To help us move forward on this, please show us concrete examples of:

  • what you want
  • why you want it
  • why the current parser(s) don't handle that case
  • how a proposed solution would do that.

Without seeing specific examples, we'll get this wrong.

As an aside, is anyone reading this who was also at the Oslo QA hackathon in 2008? We hashed out a lot of this there, but more importantly, there was discussion of creating "spec tests" in YAML or XML format that parsers written in any language should be able to parse and validate. That would allow TAP consumer authors to have a standard against which they could test. Does anyone recall what happened with that? I wasn't working on the spec tests at that event.

@kinow
Copy link
Member Author

kinow commented Jan 20, 2015

@Ovid

As for how to handle YAML diagnostic information, I thought it was to be indented four spaces from the current line

I also recall reading it somewhere, probably tap4j expects YAMLish in a indented block of text, starting with --- and ending with ...

@kinow
Copy link
Member Author

kinow commented Jan 20, 2015

Since the specification has been frozen for a while, a TAP 14 which breaks backward compatibility could be dangerous and reduce the number of adopters, IMO.

Though I agree with @jonathanKingston that we need to move further and not be afraid of changes, at this time maybe it would be more sensible to release a specification that could be easily adopted by implementations, and would let us measure its level of adoption by implementations and gather new contributors and users.

@jonathanKingston in TAP 13 old website, I think there was a section Proposals. Maybe we could create similar page in the website with sections for TAP 14, TAP 15 and Backlog (or others) and write what we would be aiming to release with each new specification. Kind like a roadmap. What do you think?

@xmojmr
Copy link

xmojmr commented Jan 20, 2015

@kinow 👍 public product backlog spreadsheet prioritized for the TAP 14 draft sprint looks like useful deliverable

This need was already expressed using different words in Leont's comment, my comment, Ovid's comment

Where it should be hosted (GitHub Wiki, Google Docs, GitHub Issues, Trello, ...) and how would the review and voting processes look like is an unknown for me. I don't have corresponding positive open source experience and The IETF Process: An Informal Guide is not very digestible guideline

@jonathanKingston ?

@kinow
Copy link
Member Author

kinow commented Jan 20, 2015

Thanks @xmojmr! Just created issue #10 for it, so that we can move this discussion there.

@jonathanKingston
Copy link
Member

@Ovid to be clear it is not me who actually wants this feature, however it would be a useful separation of the TAP output to see categorisation of the issues.

What I am trying to prevent is causing more issues for the future of TAP by adding in the simplest option we can pick from which is whitespace indenting. Yes YAMLish formats have the indicators but I would prefer parsers to be able to ignore all indented text unless it spots either bail out or not ok.

The example code you gave explains my distaste in standardising the old proposal of indenting the subtests, I want to keep parsing rules as simple as possible.

I would be interested in seeing your idea around "spec tests" even if it is from memory and not exactly how it was suggested. As I said I'm not really suggesting anything besides keeping the TAP format simple whilst allowing flexibility.

@jonathanKingston: I know this must be frustrating for you and I'm sorry about that. We're very, very cautious about this because TAP is the default test protocol used in the core Perl language. Perl is distributed by default with just about every major operating system except windows. This means that in just about every city on the planet with modern computers, Perl is there and thus TAP is there.

I really don't buy this statement sorry, software can be updated carefully. Updating specs is possible even in the most widely used languages. I get that breaking changes will happen but that is the risk of updating the producer. This is the same reason why old browsers are prevalent and PHP 4 in enterprise setups.

A fork has ultimately happened already happened with TAP-Y etc, without progressing the format to include the latest user requirements adoption will slowly die out. If TAP is so ingrained into the Perl toolchain that it can't be extended then ultimately it shouldn't have been opened up outside of the libraries that consumed it. If the original tools that created TAP want to remain using the older version, that is completely fine also. I think the only way to progress is as @xmojmr suggested creating guidelines for extending TAP and a framework where people can vote on these issues.

@kinow I agree TAP 14 should just be a clean up of the spec where possible breaking backwards compatibility should be 15 (unless we opt for point increases which isn't a bad idea either).

@xmojmr yep moved all talks of a roadmap to #10 thanks for all the previous comments on the previous thread also 👍

@Leont
Copy link

Leont commented Jan 21, 2015

The example code you gave explains my distaste in standardising the old proposal of indenting the subtests, I want to keep parsing rules as simple as possible.

I think that is a very reasonable POV, I just don't think your suggested resolutions are workable. I'm sure there are syntaxes that would be ignored by a TAP12 consumer but are easier to parse for a TAP14 consumer than the current mechanism. This is a solvable problem. For example.

case A subtest {
    ok 1 - A
    ok 2 - B
}
ok 1 - A subtest

or

ok 1 - A subtest {
    ok 1 - A
    ok 2 - B
}

@jonathanKingston
Copy link
Member

@Leont again I am back to thinking it doesn't need to be ignored so long as there is a failure but lets not get into that.

Both those examples then need the parser to be a node based parser rather than a simple one test per line that starts with /^(not|ok)/. If I am building a parser that complex then I can filter out the YAMLish start and end.

Prefixing with a character might be easier (despite not meeting my desire to have subtests parsed by older consumers):

ok 1 - parent
> ok 1 - child
> ok 2 - child
> > ok 1 - child

I'm not a massive fan of that either but it's certainly simpler than context based parsing.

@exodist
Copy link

exodist commented Feb 3, 2015

I know I am late to the party, but @Leont just looped me in (kind of). Before I knew this discussion was going on, in fact it was early into my adoption of Test-Simple, I decided to build something like this in as an option into Test::Builder. As it stands the dev versions have the following as a subtest syntax you can turn on (default is the classic)

ok 1 - not a subtest
ok 2 - a passing subtest {
    ok 1 - inside a subtest
    ok 2 - inside a subtest
}
not ok 3 - a failing subtest {
    not ok 1 - failure inside
    # diag stuff
}

This doesn't break any of the harnesses/parsers I have used it on.

Please do not take this available syntax as me going cowboy or trying to preempt anyone here. I did this before I knew there was an ongoing process or other people to discuss it with. The main takeaway should be that the dev releases now make it possible to support a new subtest syntax if one is decided. Also it is now possible to put the subtest line before the results from inside the subtest if that is necessary.

@exodist
Copy link

exodist commented Feb 28, 2022

It has been a long time sine this thread was last pinged. I just want to add that Test2 for perl, which is now the heart of most perl testing tools, supports the 2 forms of subtests I mentioned all those years ago:

ok 1 - foo {
    ok 1 - bar
    ok 2 - baz
    0..2
}

and

    ok 1 - bar
    ok 2 - baz
    0..2
ok 1 - foo

But when concurrency is in play only the parentheses form is used to avoid subtests overlapping, which is impossible to handle if the parser tries.

I will also add that Test2::Harness, aka yath, supports parsing both forms of subtests.

@isaacs
Copy link
Contributor

isaacs commented Feb 28, 2022

Yeah, node-tap has supported both forms for years, as well.

If JS and Perl both do it, can we just call it a spec? 😂 I think the "spec" has been just "whatever the implementations do" for quite a long time now.

EDIT: The node-tap docs show it as:

# Subtest: child test name
    ok 1 - foo
    ok 2 - bar
    1..2
ok 1 - child test name

but in practice, it will parse as a child test if the leading comment is missing. That's just a helpful hint so we can bail out with the failed test name earlier if tap --bail is being run.

@Krinkle
Copy link

Krinkle commented Feb 28, 2022

Note that QUnit has always reported nested tests by printing the full name of a test, with parent names joined using the > (greater-than) delimiter. This afaik avoids the compromise in developer usability of not having the parent name printed in context, regardless of whether children are buffered or unbuffered.

I personally also find this:

  • .. convenient to read in long CI output,
  • .. convenient to copy failing test names into bug trackers or share in chat,
  • .. easily pipe into TAP tooling to display the currently executing test (and doesn't require subtest awareness to re-construct the "full name").
TAP version 13
ok 1 add > two numbers
1..1
# pass 1
# skip 0
# todo 0
# fail 0

We first shipped this in 2017, but didn't consider it a compromise afaik. We render our HTML UI in much the same way.

I'm not proposing this as an alternative by any means. Subtests are by now established in Perl and node-tap and it's worth spec'ing for future TAP parsers just for that alone.

I'm curious though what we gain in added value (or avoided complexity) through TAP subtest syntax. I've only used a relatively small corner of the TAP universe, so apologies if this is obvious, but I just thought I'd ask. It might inspire me to adopt the syntax in QUnit if it is applicable. Thanks ❤️ !

@mblayman
Copy link
Member

If there is a syntax for subtests that both Perl and Node.js have supported for years, then that makes sense to me to put it in the spec as those are probably the biggest TAP communities that I'm aware of.

In spite of best intentions, IMO, the TAP 14 draft failed because it tried to do too much at once. Subsequently, it has gone unmerged for 7 years. I think a small change to the standard and a bump to revision 14 bringing the spec closer to real world usage would be a good thing.

As the author of tappy for TAP in Python, I've mostly punted on the whole subtest problem because I was only interested in implementing what's in the spec.

@Leont
Copy link

Leont commented Mar 1, 2022

Raku currently only supports the unbuffered syntax, but I'm fine with adding support for adding the buffered syntax if that helps move this forward.

I do wonder if it's also reusable for YAML blocks (and comments?). Can we make the block mean "this thing should be parsed as all belonging together", instead of using it exclusively for subtests.

@isaacs
Copy link
Contributor

isaacs commented Mar 1, 2022

I do wonder if it's also reusable for YAML blocks (and comments?). Can we make the block mean "this thing should be parsed as all belonging together", instead of using it exclusively for subtests.

Node-tap uses a 2-space indented yaml blocks delimited by --- and ..., which are always attached to the previous test point. For buffered tests, you can do this to attach diagnostics:

ok 1 - child test
  ---
  this: is diagnostics
  ...
{
    ok 1
    1..1
}

Comments/directives can exist between the test point and the {

ok 1 - child test # TODO do the child test later {
  1..1
  ok
}

I think it's best to just say that diagnostics must be actual yaml (not ish) indented 2 characters and delimited by --- and ..., since that's what extant popular implementations are doing.

@Krinkle #2 (comment)

The thing is, I really don't want to go back to having to count test points and update plan expectations like the bad old days before protocol-level child tests. Ie, this kind of code:

// using node-tap
const t = require('tap')
t.plan(2)
t.test('child 1', async t => { t.pass('this is fine'); t.pass('another one') })
t.test('child 2', async t => { t.pass('also fine') })

today outputs:

TAP version 13
1..2
# Subtest: child 1
    ok 1 - this is fine
    ok 2 - another one
    1..2
ok 1 - child 1

# Subtest: child 2
    ok 1 - also fine
    1..1
ok 2 - child 2

I don't want to have to say "ok, you did t.plan(2), but actually the first one of those was two things, so I have to change the plan to 3". This way, I can say "you did t.plan(2), that means I output 1..2 right now, no math involved".

Reporters can of course do whatever they like with it, including counting it up and flattening it:

$ node t.js | npx tap-parser -t -f
TAP version 13
ok 1 - child 1 > this is fine
ok 2 - child 1 > another one
ok 3 - child 2 > also fine
# time=15.2ms
1..3

@mblayman #2 (comment)

In spite of best intentions, IMO, the TestAnything/testanything.github.io#36 failed because it tried to do too much at once. Subsequently, it has gone unmerged for 7 years. I think a small change to the standard and a bump to revision 14 bringing the spec closer to real world usage would be a good thing.

I agree.

So if there's some appetite for doing this, should we try again? If so, I'm down. Here's my minimal list of must-haves for a spec:

  • s/yamlish/actual yaml/, formalize that it must be 2 space indentation from the test point it's attached to.
  • specify comments (with the provision that tap comments aren't parsed in yaml -- everything in the diag block is yaml, and yaml has comments).
  • child tests (at a bare minimum, indented style, but ideally buffered style as well), formalized as 4 space indentation.

We can tackle these one by one so it's not too big of a drop at once.

We should have a hard line against including anything in the spec that doesn't have at least 3 "relevant" implementations (ie, the most used implementation in a major programming language community), and only accept objections if a relevant implementation doesn't support it and doesn't intend to.

Between me, @exodist, and @mblayman I think we've got quorum, but it'd be nice to bring in someone from Java and/or PHP if TAP is being used widely there.

@isaacs
Copy link
Contributor

isaacs commented Mar 1, 2022

I think it's best to just say that diagnostics must be actual yaml (not ish) indented 2 characters and delimited by --- and ..., since that's what extant popular implementations are doing.

That is, at least in the case of node-tap, we've been forced to do it by virtue of the fact that there are parsers in the wild which will crash unhelpfully on invalid yaml in diagnostic sections. (I think the TAP parser built into Jenkins does this? I could be misremembering.)

@Leont
Copy link

Leont commented Mar 1, 2022

Node-tap uses a 2-space indented yaml blocks delimited by --- and ..., which are always attached to the previous test point.

The problem is, that means you either have two separate events (which is undesirable if you want to process it in a sensible way; together), or you have to wait for the next line until you can emit the most recent test result (which is undesirable if you want to show updates as they come in). We need some kind of marker of there's more coming up if we don't want to lose TAP's streaming nature.

@isaacs
Copy link
Contributor

isaacs commented Mar 1, 2022

The problem is, that means you either have two separate events (which is undesirable if you want to process it in a sensible way; together)

Great suggestion for TAP15 ;)

I mean, having written parsers that have to handle this and maintaining node-tap for 11 years now, I can say yes, you're 100% right, that sucks. And also, it's the kind of thing you do once, and isn't really that hard from then on.

But we all do this now, undesirable or not, and there's value in having a spec that is minimal, descriptive, and accurate, at least to start. Then there'll be a firmer foundation upon which to build new ideas, which has been the biggest obstacle to progress in this space thus far.

@Leont
Copy link

Leont commented Mar 1, 2022

Great suggestion for TAP15 ;)

Fair enough

@isaacs
Copy link
Contributor

isaacs commented Mar 1, 2022

@exodist @mblayman @Leont Please go take a look: #25

@exodist
Copy link

exodist commented Mar 1, 2022

For the yaml blocks, under a subtest would a yaml block have 6 space indentation for the --- and ...? To indicate the yaml block belongs to an event inside the subtest?

@isaacs
Copy link
Contributor

isaacs commented Mar 1, 2022

@exodist Yes, when we add child tests to this specification, we'll have to specify that the YAML indentation is 2 spaces more than the indentation of the test point to which it belongs. Similarly, a child test inside of a child test would be indented 8 spaces (4 + the parent), one inside of that would be 12, a yaml block for the 12-space indented test point would be 14 spaces, etc.

@ExE-Boss
Copy link

ExE-Boss commented Mar 1, 2022

If subtests will be indented, then they should use the more accessible tabs1234, because unlike with spaces, it’s exactly one tab per level.


That said, I personally prefer the 1.2.3 format combined with including the supertest name(s) delimited with >.

Footnotes

  1. https://github.com/prettier/prettier/issues/7475

  2. https://github.com/yaml/yaml-spec/issues/40

  3. https://www.reddit.com/r/javascript/comments/c8drjo/nobody_talks_about_the_real_reason_to_use_tabs/

  4. https://alexandersandberg.com/tabs-for-accessibility/

@isaacs
Copy link
Contributor

isaacs commented Mar 1, 2022

@ExE-Boss Current implementations use 4 spaces for indenting child tests. The goal of this exercise is to describe what is, so that we can make sensible progress towards what ought to be. Or we spend another 7 years debating everything and going nowhere.

@exodist
Copy link

exodist commented Mar 1, 2022

I should point out that the Test2::Harness TAP parser actually does not require 4 space indentation. Any indentation at all with proper TAP indented counts, and it considers a change in indentation one way or the other to indicate an additional subtest or and end of a subtest. it uses perls regex '\s' wildcard to capture space, that includes tabs or spaces.

I am not likely to change this any time soon, it does work for the 4-space standard this wants to set, but it is more relaxed on what it accepts.

@isaacs
Copy link
Contributor

isaacs commented Mar 1, 2022

@exodist Parsers rarely get in trouble by following Postel's rule, I think that's fine. ;) That being said, though, when we get to that part, we should specify "Harnesses may accept any level of indentation / should only parse subtests indented exactly 4 spaces", but also "producers must indent subtests 4 spaces", or something like that.

@isaacs
Copy link
Contributor

isaacs commented Mar 1, 2022

@exodist I'm curious what would it do with something like this?

TAP version 13
1..2
    1..1
    ok
  1..1
  ok
ok
ok

I seem to recall edge cases like that are why I flipped the table and just said it had to be 4 always in node-tap.

@exodist
Copy link

exodist commented Mar 2, 2022

The way the parser works last I modified it (a while ago to be sure) was that it had an array indexed by indentation level, every time indentation gets deeper it adds one to the right element (4 for 4 spaces, 3 for 3 spaces, etc), every time it gets more shallow it will close out all deeper subtests until the shallow level is closed. So it would handle this case just fine, and I believe I may even have a unit test for this specific edge case.

Something I am less sure of from memory is if my parser considers a tab to be 4 spaces, or a single space (IE does it convert tabs to spaces, or does it just count it as a single space character?) I have to look into that.

@exodist
Copy link

exodist commented Mar 2, 2022

I should also say my parser is a couple-layered thing.
https://metacpan.org/dist/Test2-Harness/source/lib/Test2/Harness/Collector/TapParser.pm parses line at a time, and returns what it parsed as well as the indentation. This part of the parser does not care about subtests, it is simply "What does this TAP line say, and how much indentation came first)
This basically turns the TAP stream into the harnesses internal event stream format which then goes to https://metacpan.org/dist/Test2-Harness/source/lib/Test2/Harness/Auditor/Watcher.pm which does the work of validating the event stream, including verifying subtests and indentations are all sane.

Looking at this again I see the "current" implementation does not actually index into an array, I have a feature branch somewhere that does, but this is still set up such that it should handle your edge case.

Anyway, the point is that I did not find it difficult to handle edge cases like yours, and I found that the parsing is much easier if the lines are first parsed into objects, then a separate watcher handles the object stream, the watcher then spawns a child-watcher for each subtest, etc. This simplifies the process greatly..

@exodist
Copy link

exodist commented Mar 2, 2022

oh, also my parser ignores the YAML stuff, but if we get it standardized I can add it in.

@isaacs
Copy link
Contributor

isaacs commented Mar 2, 2022

Anyway, the point is that I did not find it difficult to handle edge cases like yours, and I found that the parsing is much easier if the lines are first parsed into objects, then a separate watcher handles the object stream, the watcher then spawns a child-watcher for each subtest, etc. This simplifies the process greatly..

Yeah, I can see how that'd simplify things. http://npm.im/tap-parser creates a child Parser object for each indented block, and then when it sees a test point at the parent level, it closes out that child and continues on with the parent, iirc. It was also a while ago since I touched it last though, I could be completely wrong 😅 It does have to not emit the test point until the next line, in case it's yaml, which is sometimes a bit weird (that case @Leont is talking about).

I don't think we'd need diagnostics to come before the test point, but it'd be nice if there was a marker on the test point line to indicate whether yaml is coming or not. Something like:

TAP version 15
1..2
not ok 1 - this has more info # diagnostics
  ---
  found: 1
  wanted: 2
  ...
ok 2

Then it could be a little less stateful and still fully interruptible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests