Replace tinygltf #106

kring · 2021-01-20T08:54:46Z

This PR removes tinygltf and replaces it with our own glTF representation (CesiumGltf) as well as glTF reader library (CesiumGltfReader). To be clear, this isn't because anything is wrong with tinygltf, and in fact the interface of CesiumGltf is certainly inspired by tinygltf. The main motivation was to avoid baking a third-party library into such a critical aspect of cesium-native's public interface. I started out doing that by wrapping tinygltf, but that turned out to be a lot of trouble. More importantly, it made the interface harder to use.

Instead of wrapping, I ended up going down the road of generating glTF classes and a glTF reader from the glTF JSON Schema files. See tools/generate-gltf-classes. As a result, it conforms quite strictly to the glTF spec. The generated reader classes use RadidJSON's SAX mode, so parsing should be faster and use less memory. The glTF representation is completely separated from the reader classes, so we can swap out RapidJSON in the future with minimal fuss, especially since most of the RapidJSON parsing code is generated. Support for more glTF extensions can be easily added just by editing tools/generate-gltf-classes/glTF.json to reference the schema(s) of the extension and regenerating.

So, was it worth it? Probably not, this ended up being a lot more work than expected, and in the end we have something that isn't drastically different from tinygltf. But if we didn't do it now, we probably wouldn't realistically ever be able to do it.

Fixes #67
Fixes #31

javagl · 2021-01-21T18:10:44Z

Apart from the two minor inlined comments above, here is a summary of the notes that I took locally during the reivew:

Short comments

Large parts of the changes have been tinygltf to CesiumGltf ones.
Beyond that, it's hard (for me) to say exactly what is "new", in that it warrants a deeper review. E.g. the decodeDraco part probably (?) already had been somewhere before, so it's hard to compare the changes in all detail.
The CesiumGltfReader classes (i.e. the handlers) seem like a nice approach to separate the reading logic from the actual
glTF structure. But I haven't reviewed (as in "tried to understand all details of") the auto-generated classes throroughly, or how
the "hierarchical dispatching" from the RapidJSON "SAX" stream actually works...
As for the testing (e.g. the current TestReader class): Of course, one could go many extra miles here. A "batch run" over a clone of the glTF-Sample-Models repo, just to see whether it throws up at some point, might be worthwhile. (Not as part of the automated tests, of course, but as some sort of "one-time" validation...)
Going full pedantic mode: There are several headers that do not have a newline at the end of the file, but it "shall" be
there according to the C standard... ( https://gcc.gnu.org/legacy-ml/gcc/2003-11/msg01568.html ) (Saw it in CesiumUtility/include/CesiumUtility/joinToString.h, but didn't check systematically...)
I glimpsed ad the generate-gltf-classes contents, but of course, did not "review" this part.

About the "project structure"....

I'm skeptical about the git submodule approach in several ways, but omitting some details: The fact that the (>100MB) glTF repo is now part of the cesium-native repo is a bit odd, IFF this is only for the schema, to generate classes that are ... part of the repo in the first place (and hardly ever re-generated anyhow). One could download the schema on the fly, with (SchemaCache.js) :

var request = require('sync-request');
...
var res = request('GET', new URL(name, this.schemaPath).href);
const result = JSON.parse(res.getBody(), "utf-8");

and (package.json)

"generate": "node index.js --schema https://raw.githubusercontent.com/KhronosGroup/glTF/master/specification/2.0/schema/ --output ../../CesiumGltf --readerOutput ../../CesiumGltfReader --extensions https://raw.githubusercontent.com/KhronosGroup/glTF/master/extensions/2.0/ --config glTF.json"

(and it's still possible to refer to certain commits there, FWIW). This is not really "important" in terms of the end result, but maybe for people who want to use cesium-native, and wonder why they have to pull a 100MB repo with the glTF specification for that.

(Taken one step further, one could think about options like making the generator a standalone project, or maing CesiumGltf a standalone project (which then might or might not contain the generator...), but I assume this will be considered more thoroughly at some point).

About structure of the generated classes

The way how "custom data" is attached to the resulting classes may cause some confusion at the first glance. (Essentially, what is done with the toBeInherited part of the code generator). I see the point/goal:

ImageSpec: The "plain" glTF image structure
ImageCesium : The "derived"/resolved data from the glTF image
Image = ImageSpec + ImageCesium

(similarly for other classes, particularly Accessor with its utility methods)

Some thoughts about details of the naming:

Image could always be the original glTF structure (i.e. the Spec part could be omitted, just for consistency)
The ...Cesium suffix could be renamed to something like Data or Decoded or Resolved or so...

But maybe (slightly) more importantly: What should be the "top-level" elements in the Model be? I've spent some thoughts about the options here. These are essentially varying by their "degree of coupling":

Right now, the top-level element is this one

struct Image : public ImageSpec {
    ImageCesium cesium;
}

Alternatively, the model could contain elements like

struct Image_AND_DecodedImage {
    ImageSpec plainImage;
    ImageCesium cesium;
}

(Somehow along "aggregation over inheritance", but with a grain of salt).

Alternatively, the model could contain the plain ImageSpec and the ImageCesium as completely separate entities. So instead of having a

std::vector<Image> images; // Image=ImageSpec+ImageCesium

it could contain

std::vector<ImageSpec> images;
std::vector<ImageCesium> imageCesiums;

Alternatively, one could completely remove the "custom, decoded stuff" from the model:

class Model {
    std::vector<ImageSpec> images;
}
class ModelCesium {
    std::vector<ImageCeslum> imageCesiums;
}

All these are just options. None should be considered as a "suggestion" (and even less a "proposal"). Just things where one
could think deeply about the pros and cons...

An aside: The Reader currently does not support "standard" glTFs where the buffer.uri is given and points to an external file,
right? At least, I didn't find the place where the buffer.uri is resolved and filled into the buffer.cesium.data - maybe I just
overlooked this? Maybe it would be somewhere near Reader::postprocess), i.e. https://github.com/CesiumGS/cesium-native/pull/106/files#diff-947305810bed1195d20d39604d06e6252843b61240b0db0122f1915003f22d09R274 ?

kring · 2021-01-25T01:09:46Z

Thanks for the review @javagl!

the decodeDraco part probably (?) already had been somewhere before, so it's hard to compare the changes in all detail.

No, that's new. tinygltf has similar functionality, of course.

The CesiumGltfReader classes (i.e. the handlers) seem like a nice approach to separate the reading logic from the actual
glTF structure.

Thanks. I'm just glad I didn't ask @jtorresfabra to review or he would have given me a hard time about the virtual dispatch! 😆

As for the testing (e.g. the current TestReader class): Of course, one could go many extra miles here. A "batch run" over a clone of the glTF-Sample-Models repo, just to see whether it throws up at some point, might be worthwhile. (Not as part of the automated tests, of course, but as some sort of "one-time" validation...)

It's par for the course with me right now, but yeah, the testing is pretty lackluster. I love the idea of automated testing against the sample models. Perhaps it could be automated so it's easy to run frequently, but not run as part of CI or a normal test run. Do you think that's something you would be interested in working on? Feel free to say no.

The fact that the (>100MB) glTF repo is now part of the cesium-native repo is a bit odd, IFF this is only for the schema, to generate classes that are ... part of the repo in the first place (and hardly ever re-generated anyhow). One could download the schema on the fly, with (SchemaCache.js) :

I didn't realize it's 100MB. That seems a little excessive for a repo with a spec and a few images. I guess there's just a lot of history. But in any case, I like your idea of adding it to SchemaCache.js. I think I can't justify taking the time to do that right now, but let's make an issue for it.

Taken one step further, one could think about options like making the generator a standalone project, or maing CesiumGltf a standalone project (which then might or might not contain the generator...), but I assume this will be considered more thoroughly at some point

Yeah conceptually all of the cesium-native sub-libraries are candidates to be separate repos. At the moment that probably adds more overhead to the development process than is justified, but I do think we should do that in the future.

I like the generator being close to the code it is generating, though, so it's easy and obvious how to regenerate. We don't want people to be tempted to start making hand-edited changes in the futures because there's too much friction to finding and running the generator.

But maybe (slightly) more importantly: What should be the "top-level" elements in the Model be? I've spent some thoughts about the options here. These are essentially varying by their "degree of coupling":

Yeah I've thought a lot about this, and I don't love any of the options. But the current version is what I eventually settled on.

There's really two semi-independent parts to this. There are cases where we need to add data to the generated classes. These are the *Cesium classes. Then there are cases where we want to add new helper functions to the generated classes.

In my original design, I had a simple flag to the generator that told it whether the generated class has a (hand-written) *Cesium field with additional fields. Buffer and Image did, all others did not. And meanwhile the helpers weren't embedded in the classes at all, they were just global free functions meant to be used with the generated classes.

I was reasonably happy with how the *Cesium bit worked, but the helper functions were awkward: unclear where to put them and how to organize them, and hard for users to discover.

What I really want is something like C#'s partial classes to make it easy to have a single class be a combination of a generated and a hand-written part with extra helper methods. But lacking that, all of the options are annoying in different ways. I considered teaching the generator how to embed custom code directly in the generated classes, either by listing all the code in configuration or by providing the name of a file to #include in the body of the class. Both were ugly.

I considered have separate "parallel arrays" as in your last two examples. This makes good sense when adding new fields (i.e. *Cesium, but less so when we just want to add helper methods. And while I like it in some ways, it also seemed to make the classes substantially more difficult to use. Not only does a user have to discover those parallel arrays, but they need to worry about them being out of sync.

So I eventually settled on using inheritance as a poor substitute for partial classes. And once that was in place for the helpers, it seemed reasonable to use it to add the *Cesium fields, too, rather than customer logic in the generator. We could at this point eliminate those classes entirely, and just add a data field directly to Buffer, for example. That is how tinygltf does it. I think it's nice to keep the fields separate, though, and doesn't add much burden to use. I also considered putting these fields in an extension, so that we could claim the classes are perfectly spec-compliant. But again, that made it a lot harder to use. I like the name cesium for this field with custom stuff because it makes it clear that these are fields added by CesiumGltf, rather than being part of the spec.

Long story short, I think I'm still happy with where this ended up, even though there are lots of other possibly approaches and all of them are better in some ways. I think this strikes the best balance overall. My biggest complaint is that ImageSpec shows up in the doc. Ideally I'd like that to be transparent from a doc perspective, and look like all the ImageSpec members just exist directly on Image.

An aside: The Reader currently does not support "standard" glTFs where the buffer.uri is given and points to an external file,
right? At least, I didn't find the place where the buffer.uri is resolved and filled into the buffer.cesium.data - maybe I just
overlooked this? Maybe it would be somewhere near Reader::postprocess), i.e. https://github.com/CesiumGS/cesium-native/pull/106/files#diff-947305810bed1195d20d39604d06e6252843b61240b0db0122f1915003f22d09R274 ?

Yes you are very right. buffer.uri support is a missing feature at the moment, and a regression from tinygltf (it just happens to be not widely used in 3D Tiles). We also need to add support for async loading buffers and images by URI. Neither of those latter two were supported in our tinygltf-based implementation either, though, so I'm happy for that to be a future feature.

javagl · 2021-01-25T23:43:03Z

I had a short look at the Draco part as well, and didn't spot anything "obviously questionable" there. It's a large PR, and even though large parts are the auto-generated ones, which only have to be reviewed for the "patterns" that they are using, I hope that I didn't overlook anything. (I saw that you started additional changes, maybe I can have a closer look at some parts in the meantime).

he would have given me a hard time about the virtual dispatch!
...
... the idea of automated testing against the sample models.

I was also a bit surprised to see that you used virtual functions "excessively" there, despite pervious comments regarding "low-level performance". But I think that 1. a SAX-based parsing is hard to accomplish without dynamic dispatching, 2. if this turns out to be a bottleneck, the reader part could (according to my understanding) be replaced with a "more efficient" reader, and 3., most importantly: It's very unlikely to be a bottleneck (even though the functions are called in a tight "loop" (or rather recursion)).

Out of a mix of curiosity about the performance and the idea of testing against the sample models, I just ran a quick test. I hacked the code for this quickly into a test project that I have locally, but some of it might be reused. It reads the sample model index file (from a local SSD, of course), just calls the reader on each file, and prints whether the model could be read.

The first (important) good news is: All glTF-Binary and glTF-Draco models can be read without errors. (That does not mean that they are in a "valid, useable state", but is a first litmus test).

Now the big disclaimer: I'm not a C++ profiling expert. There are likely pitfalls that I'm not aware of. But I also ran a test of the glTF variants (i.e. the ones without any binary data - only the JSON part, as the external files/uris are not resolved yet). And just naively hitting the "Profile this for me"-button in Visual Studio (on the RelWithDebugInfo binary) generated this:

The whole parsing takes only a fraction of the time that is required for reading the data from an SSD in the first place, and this already shows that this should not be an issue. And of course, for the case of reading binary data, specifically, the glTF-Binary and glTF-Draco variants, everything that is related to JSON becomes negligible anyhow:

So if we want to tune something, we should probably have a look at (supposedly) faster PNG decoders (https://libspng.org/ or so...)

Regarding the class structure:

I know that it's hard to make decision here that does not have any (at least potential) drawbacks. This touches many aspects, like the usability and "discoverability" of helper functions, as well as the "uniformity/consistency" of the generated classes. And I think that attaching the decoded data to the "owner" classes, but still keeping it as a dedicated entity (!) is a reasonable middle ground.

BTW: I went through similar issues in my glTF implementation. The "basis" always had been the purely spec-based classes. But the approaches for resolving the additional data underwent some refactorings: Handling binary, external buffers, and data URIs "transparently" is a bit more tricky than it looks at the first glance - all this with the caveats that this also had to support glTF 1.0 (ouch!), and for some obscure reason, I wanted to offer some fancy progress UI for the asynchronous downloads...

I thought that things could have been a bit simpler here, because the loading could have been synchronous. But now you mentioned that async loading might be something that may be added at some point, so this may also not be so straightforward. But for now, resolving the data synchronously in Reader::postprocess would probably make sense.

kring · 2021-01-27T04:07:38Z

a SAX-based parsing is hard to accomplish without dynamic dispatching

Yeah I couldn't see any way to avoid it with RapidJSON.

All glTF-Binary and glTF-Draco models can be read without errors.

Awesome! Thanks for trying this!

The whole parsing takes only a fraction of the time that is required for reading the data from an SSD in the first place, and this already shows that this should not be an issue.

Yes I definitely agree, given that glTFs usually don't have a lot of JSON (the bulk of their bytes are in geometry stored separately from the JSON), the JSON parsing performance is not particularly important for glTF. Even not counting the I/O time, I found that JSON parsing time is dwarfed also by Draco and image decoding. See #31 (comment)

So if we want to tune something, we should probably have a look at (supposedly) faster PNG decoders

I don't think we can make that a priority right now, but it may be very worthwhile in the future. Mind writing an issue for it?

I'm going to finish implementing synchronous data URI decoding to bring us to par with tinygltf, and then - if I haven't missed something - this is ready to merge.

Mostly untested at the moment.

kring · 2021-01-28T05:30:37Z

@javagl support for data urls (apparently they're not called data uris anymore?!) is in. Since we already have a customer using this (via the cmpt branch), unless you see any major problems, let's get this merged and write issues for any additional work we ought to do.

javagl · 2021-01-28T16:13:07Z

Just a note: With the generator update, the output matches the repo state (i.e. the _API part is there again)

And during a quick test with running over all glTF-Embedded sample models, they all could be read.

(Again, this is not a deeply reliable test, but ... the reader does not throw up, at least. A more meaningful test would/will be possible if/when there also is a "Writer", so that tests could do the roundtrip of assert(write(read("input")).equals("input")), but that's not on the radar right now)

kring · 2021-01-29T02:13:34Z

And during a quick test with running over all glTF-Embedded sample models, they all could be read.

That's great! (and I understand the caveats)
If you don't have any remaining concerns that need to be addressed prior to merging, can you please merge this?

javagl · 2021-01-29T13:38:32Z

The generated classes look clean, the reader interface is clear and small (particularly, the SAX-virtual-dispatch-complexity is completely hidden), and it can read the sample models, apparently.

One could think deeply about the ImageSpec/ImageCesium relationship, and I assume that some API changes might become necessary there when asynchronous loading is supposed to be added, but that's not something that can be sorted out now.

So I don't see a reason to not merge it.

Samulus · 2021-01-29T18:30:41Z

CesiumGltf/include/CesiumGltf/BufferCesium.h

+        /**
+         * @brief The buffer's data.
+         */
+        std::vector<uint8_t> data;


Perhaps this can be std::vector<std::byte> if we're embracing C++17?

The reasoning for using uint8_t instead of std::byte was (according to a mail from Nov 3, 2020, 8:01 AM) that std::byte may be harder to interface with other libraries (probably referring to those which aren't on C++17 yet - among them being tinygltf...).

Considering that the data here is going to be "the source", changing it here might allow to replace some of the gsl::span<uint8_t> that are used subsequently in cesium-native ... iff that doesn't make it noticably harder to pass this data forward to Unreal.

(But from a quick glance, the data is copied via raw void* pointers on Unreal side anyhow).

It could be reasonable to use std::byte at this point. Let's get this merged as-is and we can try switching to std::byte in a separate PR.

I wrote #117 so we don't forget.

Samulus · 2021-02-01T22:03:56Z

CesiumGltf/include/CesiumGltf/BufferView.h

+         *
+         * The stride, in bytes, between vertex attributes.  When this is not defined, data is tightly packed. When two or more accessors use the same bufferView, this field must be defined.
+         */
+        std::optional<int64_t> byteStride;


Why is this std::optional? Shouldn't it match the pattern of using a negative signed integer as a sentinel value to indicate "unused". The standard says a negative stride isn't possible here: https://github.com/KhronosGroup/glTF/blob/master/specification/2.0/README.md#bufferviewbytestride

(btw I don't think this comment or any of the other future ones I'll make on this PR should hold it up. I've been working on a sibling CesiumGltfWriter using the rapidjson sax api + these classes and I'm just commenting on things as they come up).

The "-1 as sentinel" thing is only used for references to other glTF objects, not for arbitrary numbers. The rule is currently:

If it's a reference to another glTF object, it does not need to be std::optional; use -1 to indicate "no reference".

If it's a vector or map, it does not need to be std::optional; An undefined value is expressed as an empty collection.

If it has a default value, make the default value the initial value and the property is not std::optional.

Otherwise, it's std::optional.

We could add a 3.5: "If the schema says the value can't be negative, use a negative value to indicate undefined rather than making it std::optional". I think that would just muddy the already-kinda-muddy waters further. But I could be swayed if you're finding that it negatively impacts your client code.

kring · 2021-02-02T00:16:19Z

Based on @javagl's comment above (and my own feeling that it's ready), I'm going to go ahead and merge this. BTW @javagl in the future please don't hesitate to merge a PR like this yourself when you're the nominated reviewer and you've deemed it ready.

kring added 30 commits December 8, 2020 14:02

WIP transition to cgltf.

7a0f190

Merge remote-tracking branch 'origin/master' into gltf

817b7ed

Add simdjson.

6cb6cdd

Merge remote-tracking branch 'origin/master' into gltf

306886b

WIP glTF wrapper

72b07bc

WIP glTF

767b5d3

Merge remote-tracking branch 'origin/master' into gltf

e8eb102

Some very broken experiments.

c29069f

Another experiment.

a72c111

Add Mesh and Primitve to POC.

0d7d7d0

Separate files for glTF reader PoC.

ae16a48

Support attributes, names.

e4a69c6

Add support for primitive targets.

a67ed6a

Add materials.

1504ed3

Deserialize materials.

e49e655

Generate glTF classes from JSON Schema.

232b87c

Include doc in generated classes.

b27f43b

Generate reader header files.

d206b09

Generate reader implementations.

bd1a51c

Generate enum readers.

33db3a2

Fix some code generation problems.

a26418d

Generated classes now compiling.

87a030a

Improve error handling.

068df1c

Improved error reporting.

32dca3f

Remove unused submodules.

1c8bdac

Fix test failure.

c153c4c

Generated doc improvements.

bd0290b

GLB support.

3f6e301

Use CesiumGltf instead of tinygltf.

08d173d

Fix test compiler error.

2ec456e

Incorporate changes from review and from the cmpt branch.

28b6434

kring mentioned this pull request Jan 25, 2021

generate-gltf-classes: Request glTF schema files from github instead of embedding as a submodule #108

Closed

kring added 5 commits January 25, 2021 14:43

Fix warnings.

f11089f

More picky clang signedness warnings.

dccfcf0

WIP decoding of data URIs.

7ef8d42

Remove extraneous namespace qualification.

eee83c9

Merge remote-tracking branch 'origin/master' into gltf

5226530

kring mentioned this pull request Jan 25, 2021

Full support for multiple-b3dm-in-cmpt tiles, and some other fixes #109

Merged

kring added 4 commits January 27, 2021 15:14

Merge remote-tracking branch 'origin/master' into gltf

7b47c7e

Add base64 decoding.

243f486

Mostly untested at the moment.

Bring over generator changes from the cmpt branch.

474a2b8

Add test for decoding data urls.

2932b56

kring added 2 commits January 28, 2021 16:37

Fix gcc/clang warning.

92af1b4

Fix gcc warning.

e692908

kring mentioned this pull request Jan 29, 2021

Use CesiumGltf instead of tinygltf CesiumGS/cesium-unreal#128

Merged

Samulus reviewed Jan 29, 2021

View reviewed changes

kring mentioned this pull request Jan 29, 2021

Switch CesiumGltf to use std::vector<std::byte> rather than std::vector<uint8_t> #117

Closed

Samulus reviewed Feb 1, 2021

View reviewed changes

kring merged commit c3d35cf into master Feb 2, 2021

kring deleted the gltf branch February 2, 2021 00:17

javagl mentioned this pull request Mar 26, 2021

Performance testing and optimization #114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace tinygltf #106

Replace tinygltf #106

kring commented Jan 20, 2021

javagl commented Jan 21, 2021

kring commented Jan 25, 2021

javagl commented Jan 25, 2021

kring commented Jan 27, 2021

kring commented Jan 28, 2021

javagl commented Jan 28, 2021 •

edited

Loading

kring commented Jan 29, 2021

javagl commented Jan 29, 2021

Samulus Jan 29, 2021

javagl Jan 29, 2021

kring Jan 29, 2021

kring Jan 29, 2021

Samulus Feb 1, 2021

kring Feb 2, 2021

kring commented Feb 2, 2021

Replace tinygltf #106

Replace tinygltf #106

Conversation

kring commented Jan 20, 2021

javagl commented Jan 21, 2021

Short comments

About the "project structure"....

About structure of the generated classes

kring commented Jan 25, 2021

javagl commented Jan 25, 2021

kring commented Jan 27, 2021

kring commented Jan 28, 2021

javagl commented Jan 28, 2021 • edited Loading

kring commented Jan 29, 2021

javagl commented Jan 29, 2021

Samulus Jan 29, 2021

Choose a reason for hiding this comment

javagl Jan 29, 2021

Choose a reason for hiding this comment

kring Jan 29, 2021

Choose a reason for hiding this comment

kring Jan 29, 2021

Choose a reason for hiding this comment

Samulus Feb 1, 2021

Choose a reason for hiding this comment

kring Feb 2, 2021

Choose a reason for hiding this comment

kring commented Feb 2, 2021

javagl commented Jan 28, 2021 •

edited

Loading