Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace tinygltf #106

Merged
merged 69 commits into from
Feb 2, 2021
Merged

Replace tinygltf #106

merged 69 commits into from
Feb 2, 2021

Conversation

kring
Copy link
Member

@kring kring commented Jan 20, 2021

This PR removes tinygltf and replaces it with our own glTF representation (CesiumGltf) as well as glTF reader library (CesiumGltfReader). To be clear, this isn't because anything is wrong with tinygltf, and in fact the interface of CesiumGltf is certainly inspired by tinygltf. The main motivation was to avoid baking a third-party library into such a critical aspect of cesium-native's public interface. I started out doing that by wrapping tinygltf, but that turned out to be a lot of trouble. More importantly, it made the interface harder to use.

Instead of wrapping, I ended up going down the road of generating glTF classes and a glTF reader from the glTF JSON Schema files. See tools/generate-gltf-classes. As a result, it conforms quite strictly to the glTF spec. The generated reader classes use RadidJSON's SAX mode, so parsing should be faster and use less memory. The glTF representation is completely separated from the reader classes, so we can swap out RapidJSON in the future with minimal fuss, especially since most of the RapidJSON parsing code is generated. Support for more glTF extensions can be easily added just by editing tools/generate-gltf-classes/glTF.json to reference the schema(s) of the extension and regenerating.

So, was it worth it? Probably not, this ended up being a lot more work than expected, and in the end we have something that isn't drastically different from tinygltf. But if we didn't do it now, we probably wouldn't realistically ever be able to do it.

Fixes #67
Fixes #31

@javagl
Copy link
Contributor

javagl commented Jan 21, 2021

Apart from the two minor inlined comments above, here is a summary of the notes that I took locally during the reivew:

Short comments

  • Large parts of the changes have been tinygltf to CesiumGltf ones.

  • Beyond that, it's hard (for me) to say exactly what is "new", in that it warrants a deeper review. E.g. the decodeDraco part probably (?) already had been somewhere before, so it's hard to compare the changes in all detail.

  • The CesiumGltfReader classes (i.e. the handlers) seem like a nice approach to separate the reading logic from the actual
    glTF structure. But I haven't reviewed (as in "tried to understand all details of") the auto-generated classes throroughly, or how
    the "hierarchical dispatching" from the RapidJSON "SAX" stream actually works...

  • As for the testing (e.g. the current TestReader class): Of course, one could go many extra miles here. A "batch run" over a clone of the glTF-Sample-Models repo, just to see whether it throws up at some point, might be worthwhile. (Not as part of the automated tests, of course, but as some sort of "one-time" validation...)

  • Going full pedantic mode: There are several headers that do not have a newline at the end of the file, but it "shall" be
    there according to the C standard... ( https://gcc.gnu.org/legacy-ml/gcc/2003-11/msg01568.html ) (Saw it in CesiumUtility/include/CesiumUtility/joinToString.h, but didn't check systematically...)

  • I glimpsed ad the generate-gltf-classes contents, but of course, did not "review" this part.


About the "project structure"....

I'm skeptical about the git submodule approach in several ways, but omitting some details: The fact that the (>100MB) glTF repo is now part of the cesium-native repo is a bit odd, IFF this is only for the schema, to generate classes that are ... part of the repo in the first place (and hardly ever re-generated anyhow). One could download the schema on the fly, with (SchemaCache.js) :

var request = require('sync-request');
...
var res = request('GET', new URL(name, this.schemaPath).href);
const result = JSON.parse(res.getBody(), "utf-8");

and (package.json)

"generate": "node index.js --schema https://raw.githubusercontent.com/KhronosGroup/glTF/master/specification/2.0/schema/ --output ../../CesiumGltf --readerOutput ../../CesiumGltfReader --extensions https://raw.githubusercontent.com/KhronosGroup/glTF/master/extensions/2.0/ --config glTF.json"

(and it's still possible to refer to certain commits there, FWIW). This is not really "important" in terms of the end result, but maybe for people who want to use cesium-native, and wonder why they have to pull a 100MB repo with the glTF specification for that.

(Taken one step further, one could think about options like making the generator a standalone project, or maing CesiumGltf a standalone project (which then might or might not contain the generator...), but I assume this will be considered more thoroughly at some point).

About structure of the generated classes

The way how "custom data" is attached to the resulting classes may cause some confusion at the first glance. (Essentially, what is done with the toBeInherited part of the code generator). I see the point/goal:

  • ImageSpec: The "plain" glTF image structure
  • ImageCesium : The "derived"/resolved data from the glTF image
  • Image = ImageSpec + ImageCesium

(similarly for other classes, particularly Accessor with its utility methods)

Some thoughts about details of the naming:

  • Image could always be the original glTF structure (i.e. the Spec part could be omitted, just for consistency)
  • The ...Cesium suffix could be renamed to something like Data or Decoded or Resolved or so...

But maybe (slightly) more importantly: What should be the "top-level" elements in the Model be? I've spent some thoughts about the options here. These are essentially varying by their "degree of coupling":


Right now, the top-level element is this one

struct Image : public ImageSpec {
    ImageCesium cesium;
}

Alternatively, the model could contain elements like

struct Image_AND_DecodedImage {
    ImageSpec plainImage;
    ImageCesium cesium;
}

(Somehow along "aggregation over inheritance", but with a grain of salt).


Alternatively, the model could contain the plain ImageSpec and the ImageCesium as completely separate entities. So instead of having a

std::vector<Image> images; // Image=ImageSpec+ImageCesium

it could contain

std::vector<ImageSpec> images;
std::vector<ImageCesium> imageCesiums;

Alternatively, one could completely remove the "custom, decoded stuff" from the model:

class Model {
    std::vector<ImageSpec> images;
}
class ModelCesium {
    std::vector<ImageCeslum> imageCesiums;
}

All these are just options. None should be considered as a "suggestion" (and even less a "proposal"). Just things where one
could think deeply about the pros and cons...


An aside: The Reader currently does not support "standard" glTFs where the buffer.uri is given and points to an external file,
right? At least, I didn't find the place where the buffer.uri is resolved and filled into the buffer.cesium.data - maybe I just
overlooked this? Maybe it would be somewhere near Reader::postprocess), i.e. https://github.com/CesiumGS/cesium-native/pull/106/files#diff-947305810bed1195d20d39604d06e6252843b61240b0db0122f1915003f22d09R274 ?

@kring
Copy link
Member Author

kring commented Jan 25, 2021

Thanks for the review @javagl!

the decodeDraco part probably (?) already had been somewhere before, so it's hard to compare the changes in all detail.

No, that's new. tinygltf has similar functionality, of course.

The CesiumGltfReader classes (i.e. the handlers) seem like a nice approach to separate the reading logic from the actual
glTF structure.

Thanks. I'm just glad I didn't ask @jtorresfabra to review or he would have given me a hard time about the virtual dispatch! 😆

As for the testing (e.g. the current TestReader class): Of course, one could go many extra miles here. A "batch run" over a clone of the glTF-Sample-Models repo, just to see whether it throws up at some point, might be worthwhile. (Not as part of the automated tests, of course, but as some sort of "one-time" validation...)

It's par for the course with me right now, but yeah, the testing is pretty lackluster. I love the idea of automated testing against the sample models. Perhaps it could be automated so it's easy to run frequently, but not run as part of CI or a normal test run. Do you think that's something you would be interested in working on? Feel free to say no.

The fact that the (>100MB) glTF repo is now part of the cesium-native repo is a bit odd, IFF this is only for the schema, to generate classes that are ... part of the repo in the first place (and hardly ever re-generated anyhow). One could download the schema on the fly, with (SchemaCache.js) :

I didn't realize it's 100MB. That seems a little excessive for a repo with a spec and a few images. I guess there's just a lot of history. But in any case, I like your idea of adding it to SchemaCache.js. I think I can't justify taking the time to do that right now, but let's make an issue for it.

Taken one step further, one could think about options like making the generator a standalone project, or maing CesiumGltf a standalone project (which then might or might not contain the generator...), but I assume this will be considered more thoroughly at some point

Yeah conceptually all of the cesium-native sub-libraries are candidates to be separate repos. At the moment that probably adds more overhead to the development process than is justified, but I do think we should do that in the future.

I like the generator being close to the code it is generating, though, so it's easy and obvious how to regenerate. We don't want people to be tempted to start making hand-edited changes in the futures because there's too much friction to finding and running the generator.

But maybe (slightly) more importantly: What should be the "top-level" elements in the Model be? I've spent some thoughts about the options here. These are essentially varying by their "degree of coupling":

Yeah I've thought a lot about this, and I don't love any of the options. But the current version is what I eventually settled on.

There's really two semi-independent parts to this. There are cases where we need to add data to the generated classes. These are the *Cesium classes. Then there are cases where we want to add new helper functions to the generated classes.

In my original design, I had a simple flag to the generator that told it whether the generated class has a (hand-written) *Cesium field with additional fields. Buffer and Image did, all others did not. And meanwhile the helpers weren't embedded in the classes at all, they were just global free functions meant to be used with the generated classes.

I was reasonably happy with how the *Cesium bit worked, but the helper functions were awkward: unclear where to put them and how to organize them, and hard for users to discover.

What I really want is something like C#'s partial classes to make it easy to have a single class be a combination of a generated and a hand-written part with extra helper methods. But lacking that, all of the options are annoying in different ways. I considered teaching the generator how to embed custom code directly in the generated classes, either by listing all the code in configuration or by providing the name of a file to #include in the body of the class. Both were ugly.

I considered have separate "parallel arrays" as in your last two examples. This makes good sense when adding new fields (i.e. *Cesium, but less so when we just want to add helper methods. And while I like it in some ways, it also seemed to make the classes substantially more difficult to use. Not only does a user have to discover those parallel arrays, but they need to worry about them being out of sync.

So I eventually settled on using inheritance as a poor substitute for partial classes. And once that was in place for the helpers, it seemed reasonable to use it to add the *Cesium fields, too, rather than customer logic in the generator. We could at this point eliminate those classes entirely, and just add a data field directly to Buffer, for example. That is how tinygltf does it. I think it's nice to keep the fields separate, though, and doesn't add much burden to use. I also considered putting these fields in an extension, so that we could claim the classes are perfectly spec-compliant. But again, that made it a lot harder to use. I like the name cesium for this field with custom stuff because it makes it clear that these are fields added by CesiumGltf, rather than being part of the spec.

Long story short, I think I'm still happy with where this ended up, even though there are lots of other possibly approaches and all of them are better in some ways. I think this strikes the best balance overall. My biggest complaint is that ImageSpec shows up in the doc. Ideally I'd like that to be transparent from a doc perspective, and look like all the ImageSpec members just exist directly on Image.

An aside: The Reader currently does not support "standard" glTFs where the buffer.uri is given and points to an external file,
right? At least, I didn't find the place where the buffer.uri is resolved and filled into the buffer.cesium.data - maybe I just
overlooked this? Maybe it would be somewhere near Reader::postprocess), i.e. https://github.com/CesiumGS/cesium-native/pull/106/files#diff-947305810bed1195d20d39604d06e6252843b61240b0db0122f1915003f22d09R274 ?

Yes you are very right. buffer.uri support is a missing feature at the moment, and a regression from tinygltf (it just happens to be not widely used in 3D Tiles). We also need to add support for async loading buffers and images by URI. Neither of those latter two were supported in our tinygltf-based implementation either, though, so I'm happy for that to be a future feature.

@javagl
Copy link
Contributor

javagl commented Jan 25, 2021

I had a short look at the Draco part as well, and didn't spot anything "obviously questionable" there. It's a large PR, and even though large parts are the auto-generated ones, which only have to be reviewed for the "patterns" that they are using, I hope that I didn't overlook anything. (I saw that you started additional changes, maybe I can have a closer look at some parts in the meantime).

he would have given me a hard time about the virtual dispatch!
...
... the idea of automated testing against the sample models.

I was also a bit surprised to see that you used virtual functions "excessively" there, despite pervious comments regarding "low-level performance". But I think that 1. a SAX-based parsing is hard to accomplish without dynamic dispatching, 2. if this turns out to be a bottleneck, the reader part could (according to my understanding) be replaced with a "more efficient" reader, and 3., most importantly: It's very unlikely to be a bottleneck (even though the functions are called in a tight "loop" (or rather recursion)).

Out of a mix of curiosity about the performance and the idea of testing against the sample models, I just ran a quick test. I hacked the code for this quickly into a test project that I have locally, but some of it might be reused. It reads the sample model index file (from a local SSD, of course), just calls the reader on each file, and prints whether the model could be read.

The first (important) good news is: All glTF-Binary and glTF-Draco models can be read without errors. (That does not mean that they are in a "valid, useable state", but is a first litmus test).

Now the big disclaimer: I'm not a C++ profiling expert. There are likely pitfalls that I'm not aware of. But I also ran a test of the glTF variants (i.e. the ones without any binary data - only the JSON part, as the external files/uris are not resolved yet). And just naively hitting the "Profile this for me"-button in Visual Studio (on the RelWithDebugInfo binary) generated this:

GltfPerformanceOnly

The whole parsing takes only a fraction of the time that is required for reading the data from an SSD in the first place, and this already shows that this should not be an issue. And of course, for the case of reading binary data, specifically, the glTF-Binary and glTF-Draco variants, everything that is related to JSON becomes negligible anyhow:

GltfPerformance

So if we want to tune something, we should probably have a look at (supposedly) faster PNG decoders (https://libspng.org/ or so...)


Regarding the class structure:

I know that it's hard to make decision here that does not have any (at least potential) drawbacks. This touches many aspects, like the usability and "discoverability" of helper functions, as well as the "uniformity/consistency" of the generated classes. And I think that attaching the decoded data to the "owner" classes, but still keeping it as a dedicated entity (!) is a reasonable middle ground.

BTW: I went through similar issues in my glTF implementation. The "basis" always had been the purely spec-based classes. But the approaches for resolving the additional data underwent some refactorings: Handling binary, external buffers, and data URIs "transparently" is a bit more tricky than it looks at the first glance - all this with the caveats that this also had to support glTF 1.0 (ouch!), and for some obscure reason, I wanted to offer some fancy progress UI for the asynchronous downloads...

I thought that things could have been a bit simpler here, because the loading could have been synchronous. But now you mentioned that async loading might be something that may be added at some point, so this may also not be so straightforward. But for now, resolving the data synchronously in Reader::postprocess would probably make sense.

@kring
Copy link
Member Author

kring commented Jan 27, 2021

a SAX-based parsing is hard to accomplish without dynamic dispatching

Yeah I couldn't see any way to avoid it with RapidJSON.

All glTF-Binary and glTF-Draco models can be read without errors.

Awesome! Thanks for trying this!

The whole parsing takes only a fraction of the time that is required for reading the data from an SSD in the first place, and this already shows that this should not be an issue.

Yes I definitely agree, given that glTFs usually don't have a lot of JSON (the bulk of their bytes are in geometry stored separately from the JSON), the JSON parsing performance is not particularly important for glTF. Even not counting the I/O time, I found that JSON parsing time is dwarfed also by Draco and image decoding. See #31 (comment)

So if we want to tune something, we should probably have a look at (supposedly) faster PNG decoders

I don't think we can make that a priority right now, but it may be very worthwhile in the future. Mind writing an issue for it?

I'm going to finish implementing synchronous data URI decoding to bring us to par with tinygltf, and then - if I haven't missed something - this is ready to merge.

@kring
Copy link
Member Author

kring commented Jan 28, 2021

@javagl support for data urls (apparently they're not called data uris anymore?!) is in. Since we already have a customer using this (via the cmpt branch), unless you see any major problems, let's get this merged and write issues for any additional work we ought to do.

@javagl
Copy link
Contributor

javagl commented Jan 28, 2021

Just a note: With the generator update, the output matches the repo state (i.e. the _API part is there again)

And during a quick test with running over all glTF-Embedded sample models, they all could be read.

(Again, this is not a deeply reliable test, but ... the reader does not throw up, at least. A more meaningful test would/will be possible if/when there also is a "Writer", so that tests could do the roundtrip of assert(write(read("input")).equals("input")), but that's not on the radar right now)

@kring
Copy link
Member Author

kring commented Jan 29, 2021

And during a quick test with running over all glTF-Embedded sample models, they all could be read.

That's great! (and I understand the caveats)
If you don't have any remaining concerns that need to be addressed prior to merging, can you please merge this?

@javagl
Copy link
Contributor

javagl commented Jan 29, 2021

The generated classes look clean, the reader interface is clear and small (particularly, the SAX-virtual-dispatch-complexity is completely hidden), and it can read the sample models, apparently.

One could think deeply about the ImageSpec/ImageCesium relationship, and I assume that some API changes might become necessary there when asynchronous loading is supposed to be added, but that's not something that can be sorted out now.

So I don't see a reason to not merge it.

/**
* @brief The buffer's data.
*/
std::vector<uint8_t> data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this can be std::vector<std::byte> if we're embracing C++17?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning for using uint8_t instead of std::byte was (according to a mail from Nov 3, 2020, 8:01 AM) that std::byte may be harder to interface with other libraries (probably referring to those which aren't on C++17 yet - among them being tinygltf...).

Considering that the data here is going to be "the source", changing it here might allow to replace some of the gsl::span<uint8_t> that are used subsequently in cesium-native ... iff that doesn't make it noticably harder to pass this data forward to Unreal.

(But from a quick glance, the data is copied via raw void* pointers on Unreal side anyhow).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be reasonable to use std::byte at this point. Let's get this merged as-is and we can try switching to std::byte in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote #117 so we don't forget.

*
* The stride, in bytes, between vertex attributes. When this is not defined, data is tightly packed. When two or more accessors use the same bufferView, this field must be defined.
*/
std::optional<int64_t> byteStride;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this std::optional? Shouldn't it match the pattern of using a negative signed integer as a sentinel value to indicate "unused". The standard says a negative stride isn't possible here: https://github.com/KhronosGroup/glTF/blob/master/specification/2.0/README.md#bufferviewbytestride

(btw I don't think this comment or any of the other future ones I'll make on this PR should hold it up. I've been working on a sibling CesiumGltfWriter using the rapidjson sax api + these classes and I'm just commenting on things as they come up).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "-1 as sentinel" thing is only used for references to other glTF objects, not for arbitrary numbers. The rule is currently:

  1. If it's a reference to another glTF object, it does not need to be std::optional; use -1 to indicate "no reference".
  2. If it's a vector or map, it does not need to be std::optional; An undefined value is expressed as an empty collection.
  3. If it has a default value, make the default value the initial value and the property is not std::optional.
  4. Otherwise, it's std::optional.

We could add a 3.5: "If the schema says the value can't be negative, use a negative value to indicate undefined rather than making it std::optional". I think that would just muddy the already-kinda-muddy waters further. But I could be swayed if you're finding that it negatively impacts your client code.

@kring
Copy link
Member Author

kring commented Feb 2, 2021

Based on @javagl's comment above (and my own feeling that it's ready), I'm going to go ahead and merge this. BTW @javagl in the future please don't hesitate to merge a PR like this yourself when you're the nominated reviewer and you've deemed it ready.

@kring kring merged commit c3d35cf into master Feb 2, 2021
@kring kring deleted the gltf branch February 2, 2021 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants