Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cbor/unstable): introduce @std/cbor #5909

Merged
merged 60 commits into from
Oct 9, 2024
Merged

Conversation

BlackAsLight
Copy link
Contributor

@BlackAsLight BlackAsLight commented Sep 4, 2024

Closes #5479

This pull request introduces a CBOR implementation based off the RFC 8949 spec from scratch. It introduces functions like encodeCbor, encodeCborSequence, decodeCbor, and decodeCborSequence, and a CborTag class to provide additional semantic information.

It also introduces streaming versions called:

  • CborSequenceEncoderStream
  • CborByteEncoderStream
  • CborTextEncoderStream
  • CborArrayEncoderStream
  • CborMapEncoderStream
  • CborSequenceDecoderStream
  • CborByteDecodedStream
  • CborTextDecodedStream
  • CborArrayDecodedStream
  • CborMapDecodedStream

It should be noted the different naming convention used between the "encoder", "decoder" and "decoded". The "encoder" and "decoder" classes are TransformStreams, while the "decoded" classes are ReadableStreams and act merely as a way for the user to figure out what type they have.

Due to the way streams work, if one of the "decoded" streams are yielded then they'll need to either be entirely consumed or cancelled before the next value will be yielded. This should be noted when using things like Array.fromAsync. Such a function would work only if you can guarantee that no value yielded will be one of these "decoded" streams. If such a value is yield in such a function then it will hang indefinitely.

Example 1

import { assert, assertEquals } from "@std/assert";
import { decodeCbor, encodeCbor } from "@std/cbor";

const rawMessage = [
  "Hello World",
  35,
  0.5,
  false,
  -1,
  null,
  Uint8Array.from([0, 1, 2, 3]),
];

const encodedMessage = encodeCbor(rawMessage);
const decodedMessage = decodeCbor(encodedMessage);

assert(decodedMessage instanceof Array);
assertEquals(decodedMessage, rawMessage);

Example 2

import { assert, assertEquals } from "@std/assert";
import { CborTextEncoderStream, CborTextDecodedStream, CborSequenceDecoderStream } from "@std/cbor";

const reader = CborTextEncoderStream.from(["Hello World!"])
  .readable
  .pipeThrough(new CborSequenceDecoderStream()).getReader();

const { done, value } = await reader.read();
assert(done === false);
assert(value instanceof CborTextDecodedStream);
assertEquals((await Array.fromAsync(value)).join(''), "Hello World!")

assert((await reader.read()).done === true);
reader.releaseLock();

Example 3

import { encodeBase64Url } from "@std/encoding";
import {
  CborArrayDecodedStream,
  CborArrayEncoderStream,
  CborByteDecodedStream,
  CborByteEncoderStream,
  CborMapDecodedStream,
  CborMapEncoderStream,
  type CborOutputStream,
  CborSequenceDecoderStream,
  CborSequenceEncoderStream,
  CborTag,
  CborTextDecodedStream,
  CborTextEncoderStream,
} from "@std/cbor";

const rawMessage = [
  undefined,
  null,
  true,
  false,
  3.14,
  5,
  2n ** 32n,
  "Hello World",
  new Uint8Array(25),
  new Date(),
  new CborTag(33, encodeBase64Url(new Uint8Array(7))),
  ["cake", "carrot"],
  { a: 3, b: "d" },
  CborByteEncoderStream.from([new Uint8Array(7)]),
  CborTextEncoderStream.from(["Bye!"]),
  CborArrayEncoderStream.from([
    "Hey!",
    CborByteEncoderStream.from([new Uint8Array(18)]),
  ]),
  CborMapEncoderStream.from([
    ["a", 0],
    ["b", "potato"],
  ]),
];

async function logValue(value: CborOutputStream) {
  if (
    value instanceof CborByteDecodedStream ||
    value instanceof CborTextDecodedStream
  ) {
    for await (const x of value) console.log(x);
  } else if (value instanceof CborArrayDecodedStream) {
    for await (const x of value) logValue(x);
  } else if (value instanceof CborMapDecodedStream) {
    for await (const [k, v] of value) {
      console.log(k);
      logValue(v);
    }
  } else if (value instanceof CborTag) {
    console.log(value);
    logValue(value.tagContent);
  } else console.log(value);
}

for await (
  const value of ReadableStream.from(rawMessage)
    .pipeThrough(new CborSequenceEncoderStream())
    .pipeThrough(new CborSequenceDecoderStream())
) {
  logValue(value);
}

Copy link

codecov bot commented Sep 4, 2024

Codecov Report

Attention: Patch coverage is 90.26275% with 126 lines in your changes missing coverage. Please review.

Project coverage is 96.55%. Comparing base (065296c) to head (ec68c5c).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
cbor/sequence_decoder_stream.ts 75.78% 83 Missing and 2 partials ⚠️
cbor/sequence_encoder_stream.ts 85.82% 18 Missing ⚠️
cbor/_common_decode.ts 95.81% 10 Missing ⚠️
cbor/_common.ts 90.66% 7 Missing ⚠️
cbor/_common_encode.ts 96.63% 4 Missing ⚠️
cbor/byte_encoder_stream.ts 97.95% 1 Missing ⚠️
cbor/text_encoder_stream.ts 98.03% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5909      +/-   ##
==========================================
- Coverage   96.75%   96.55%   -0.21%     
==========================================
  Files         509      530      +21     
  Lines       39175    40483    +1308     
  Branches     5795     6067     +272     
==========================================
+ Hits        37905    39087    +1182     
- Misses       1228     1352     +124     
- Partials       42       44       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot added the cbor label Sep 5, 2024
@kt3k
Copy link
Member

kt3k commented Sep 5, 2024

I wonder if these need to be classes when they both don't have any instance states.

How about making them just deocde and encode functions? That mirrors the structure of std/msgpack

@BlackAsLight
Copy link
Contributor Author

I wonder if these need to be classes when they both don't have any instance states.

How about making them just deocde and encode functions? That mirrors the structure of std/msgpack

Sure. I was trying to mirror the structure of TextEncoder/TextDecoder, but this other structure seems more common in the std.

@kt3k
Copy link
Member

kt3k commented Sep 5, 2024

Side note: TextDecoder stores some chunk from the previous decode when stream option is specified. Probably that is why it's implemented as a class

@iuioiua
Copy link
Contributor

iuioiua commented Sep 9, 2024

@BlackAsLight, could you please draft this PR and un-draft once ready for us to review?

@BlackAsLight BlackAsLight reopened this Sep 9, 2024
@BlackAsLight BlackAsLight marked this pull request as ready for review September 22, 2024 01:57
@BlackAsLight
Copy link
Contributor Author

Is there a way I can get the Sentry bot to update or create a new report on what lines are missing?

cbor/deno.json Outdated Show resolved Hide resolved
cbor/deno.json Outdated Show resolved Hide resolved
@kt3k
Copy link
Member

kt3k commented Sep 24, 2024

@BlackAsLight

Coverage diff of this PR should be visible in this page https://app.codecov.io/gh/denoland/std/pull/5909

CborSequenceDecoderStream
CborByteDecodedStream
CborTextDecodedStream
CborArrayDecodedStream
CborMapDecodedStream

There are mixed naming styles, FooDecoderStream and FooDecodedStream. Are these differences intentional?


We'd like to export a single API from a single file. Can you split files based on exported APIs

  • encode_cbor.ts exports encodeCbor
  • encode_cbor_sequence.ts exports encodeCborSequence
  • tag.ts exports CborTag
  • sequence_encoder.ts exports CborSequenceEncoderStream
  • byte_encoder_stream.ts exports CborByteEncoderStream
  • text_encoder_stream.ts exports CborTextEncoderStream
  • array_encoder_stream.ts exports CborArrayEncoderStream
  • map_encoder_stream.ts exports CborMapEncoderStream
  • decode_cbor.ts exports decodeCbor
  • decode_cbor_sequence.ts exports decodeCborSequence
  • sequence_decoder_stream.ts exports CborSequenceDecoderStream
  • byte_decoded_stream.ts exports CborByteDecodedStream
  • text_decoded_stream exports CborTextDecodedStream
  • array_decoded_stream exports CborArrayDecodedStream
  • map_decoded_stream exports CborMapDecodedStream

@BlackAsLight
Copy link
Contributor Author

@BlackAsLight

Coverage diff of this PR should be visible in this page https://app.codecov.io/gh/denoland/std/pull/5909

That page hasn't gotten updated in like 5 days now, which is why I asked.

CborSequenceDecoderStream
CborByteDecodedStream
CborTextDecodedStream
CborArrayDecodedStream
CborMapDecodedStream

There are mixed naming styles, FooDecoderStream and FooDecodedStream. Are these differences intentional?

The difference in naming style is intentional. The "Encoder" and "Decoder" ones are TransformStreams that actually do the work of converting, while the "Decoded" ones are ReadableStreams that act as a simple wrapper so the user of the lib is able to know what type of chunks to expect.

I did explore the idea of having the "Decoded" ones also be TransformStreams and handling the logic of conversion, but the logic there seemed too complicated as you don't know how far to go until you've actually decoded it.

I also explored the idea of having the other "Encoder" ones (i.e. CborByteEncodedStream) act as a mere wrapper for CborSequenceEncoderStream, but that then seemed redundant if all you wanted to send for example a byte string, as more checks would essentially need to be done for no reason.

We'd like to export a single API from a single file. Can you split files based on exported APIs

* `encode_cbor.ts` exports `encodeCbor`

* `encode_cbor_sequence.ts` exports `encodeCborSequence`

* `tag.ts` exports `CborTag`

* `sequence_encoder.ts` exports `CborSequenceEncoderStream`

* `byte_encoder_stream.ts` exports `CborByteEncoderStream`

* `text_encoder_stream.ts` exports `CborTextEncoderStream`

* `array_encoder_stream.ts` exports `CborArrayEncoderStream`

* `map_encoder_stream.ts` exports `CborMapEncoderStream`

* `decode_cbor.ts` exports `decodeCbor`

* `decode_cbor_sequence.ts` exports `decodeCborSequence`

* `sequence_decoder_stream.ts` exports `CborSequenceDecoderStream`

* `byte_decoded_stream.ts` exports `CborByteDecodedStream`

* `text_decoded_stream` exports `CborTextDecodedStream`

* `array_decoded_stream` exports `CborArrayDecodedStream`

* `map_decoded_stream` exports `CborMapDecodedStream`

That is a lot of files, but I can do that.

cbor/types.ts Outdated
* Specifies the encodable value types for the {@link CborSequenceEncoderStream}
* and {@link CborArrayEncoderStream}.
*/
export type CborInputStream =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming this to CborStreamInput as this itself is not always a stream, but an input for the stream?

The same suggestion could also apply to:

  • CborMapInputStream -> CborMapStreamInput
  • CborOutputStream -> CborStreamOutput
  • CborMapOutputStream -> CborMapStreamOutput

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also work on this change

@kt3k
Copy link
Member

kt3k commented Oct 8, 2024

I'll move the files *_decoded_stream.ts to the private paths (_*_decoded_stream.ts) and remove them from exports entries in cbor/deno.json as they are not intended to be used directly from the users. Let me know if you disagree with this change.

@BlackAsLight
Copy link
Contributor Author

BlackAsLight commented Oct 8, 2024

I'll move the files *_decoded_stream.ts to the private paths (_*_decoded_stream.ts) and remove them from exports entries in cbor/deno.json as they are not intended to be used directly from the users. Let me know if you disagree with this change.

The user isn't meant to create instances of the decoded streams themselves, but they do need access to them to be able to figure out what type they have. For example with an entry instanceof ByteDecodedStream.

Edit: maybe a Boolean function instead could be used instead by the user? isByteDecodedStream(entry). That's way there is no risk of incorrect use of the classes?

@kt3k
Copy link
Member

kt3k commented Oct 8, 2024

The user isn't meant to create instances of the decoded streams themselves, but they do need access to them to be able to figure out what type they have. For example with an entry instanceof ByteDecodedStream.

instanceof check makes sense. Currently they are still exported from mod.ts. Maybe it's also good to export them from relevant decoder streams' endpoints.

@kt3k
Copy link
Member

kt3k commented Oct 8, 2024

@BlackAsLight
I added the exports of CborArrayDecodedStream, CborByteDecodedStream, CborMapDecodedStream, CborTextDecodedStream from cbor/sequence-decoder-stream. What do you think?

@BlackAsLight
Copy link
Contributor Author

@BlackAsLight

I added the exports of CborArrayDecodedStream, CborByteDecodedStream, CborMapDecodedStream, CborTextDecodedStream from cbor/sequence-decoder-stream. What do you think?

Looks good to me

Copy link
Member

@kt3k kt3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kt3k kt3k changed the title feat(cbor/unstable): Introduce @std/cbor feat(cbor/unstable): introduce @std/cbor Oct 9, 2024
@kt3k kt3k merged commit 0f9a23c into denoland:main Oct 9, 2024
17 checks passed
@BlackAsLight BlackAsLight deleted the cbor branch October 11, 2024 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

suggestion: add CBOR encoding/decoding
3 participants