-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace tinygltf #106
Replace tinygltf #106
Conversation
Apart from the two minor inlined comments above, here is a summary of the notes that I took locally during the reivew: Short comments
About the "project structure"....I'm skeptical about the git submodule approach in several ways, but omitting some details: The fact that the (>100MB) glTF repo is now part of the cesium-native repo is a bit odd, IFF this is only for the schema, to generate classes that are ... part of the repo in the first place (and hardly ever re-generated anyhow). One could download the schema on the fly, with (
and (
(and it's still possible to refer to certain commits there, FWIW). This is not really "important" in terms of the end result, but maybe for people who want to use (Taken one step further, one could think about options like making the generator a standalone project, or maing About structure of the generated classesThe way how "custom data" is attached to the resulting classes may cause some confusion at the first glance. (Essentially, what is done with the
(similarly for other classes, particularly Some thoughts about details of the naming:
But maybe (slightly) more importantly: What should be the "top-level" elements in the Right now, the top-level element is this one
Alternatively, the model could contain elements like
(Somehow along "aggregation over inheritance", but with a grain of salt). Alternatively, the model could contain the plain
it could contain
Alternatively, one could completely remove the "custom, decoded stuff" from the model:
All these are just options. None should be considered as a "suggestion" (and even less a "proposal"). Just things where one An aside: The |
Thanks for the review @javagl!
No, that's new. tinygltf has similar functionality, of course.
Thanks. I'm just glad I didn't ask @jtorresfabra to review or he would have given me a hard time about the virtual dispatch! 😆
It's par for the course with me right now, but yeah, the testing is pretty lackluster. I love the idea of automated testing against the sample models. Perhaps it could be automated so it's easy to run frequently, but not run as part of CI or a normal test run. Do you think that's something you would be interested in working on? Feel free to say no.
I didn't realize it's 100MB. That seems a little excessive for a repo with a spec and a few images. I guess there's just a lot of history. But in any case, I like your idea of adding it to SchemaCache.js. I think I can't justify taking the time to do that right now, but let's make an issue for it.
Yeah conceptually all of the cesium-native sub-libraries are candidates to be separate repos. At the moment that probably adds more overhead to the development process than is justified, but I do think we should do that in the future. I like the generator being close to the code it is generating, though, so it's easy and obvious how to regenerate. We don't want people to be tempted to start making hand-edited changes in the futures because there's too much friction to finding and running the generator.
Yeah I've thought a lot about this, and I don't love any of the options. But the current version is what I eventually settled on. There's really two semi-independent parts to this. There are cases where we need to add data to the generated classes. These are the In my original design, I had a simple flag to the generator that told it whether the generated class has a (hand-written) I was reasonably happy with how the What I really want is something like C#'s partial classes to make it easy to have a single class be a combination of a generated and a hand-written part with extra helper methods. But lacking that, all of the options are annoying in different ways. I considered teaching the generator how to embed custom code directly in the generated classes, either by listing all the code in configuration or by providing the name of a file to I considered have separate "parallel arrays" as in your last two examples. This makes good sense when adding new fields (i.e. So I eventually settled on using inheritance as a poor substitute for partial classes. And once that was in place for the helpers, it seemed reasonable to use it to add the Long story short, I think I'm still happy with where this ended up, even though there are lots of other possibly approaches and all of them are better in some ways. I think this strikes the best balance overall. My biggest complaint is that
Yes you are very right. buffer.uri support is a missing feature at the moment, and a regression from tinygltf (it just happens to be not widely used in 3D Tiles). We also need to add support for async loading buffers and images by URI. Neither of those latter two were supported in our tinygltf-based implementation either, though, so I'm happy for that to be a future feature. |
I had a short look at the Draco part as well, and didn't spot anything "obviously questionable" there. It's a large PR, and even though large parts are the auto-generated ones, which only have to be reviewed for the "patterns" that they are using, I hope that I didn't overlook anything. (I saw that you started additional changes, maybe I can have a closer look at some parts in the meantime).
I was also a bit surprised to see that you used virtual functions "excessively" there, despite pervious comments regarding "low-level performance". But I think that 1. a SAX-based parsing is hard to accomplish without dynamic dispatching, 2. if this turns out to be a bottleneck, the reader part could (according to my understanding) be replaced with a "more efficient" reader, and 3., most importantly: It's very unlikely to be a bottleneck (even though the functions are called in a tight "loop" (or rather recursion)). Out of a mix of curiosity about the performance and the idea of testing against the sample models, I just ran a quick test. I hacked the code for this quickly into a test project that I have locally, but some of it might be reused. It reads the sample model index file (from a local SSD, of course), just calls the reader on each file, and prints whether the model could be read. The first (important) good news is: All Now the big disclaimer: I'm not a C++ profiling expert. There are likely pitfalls that I'm not aware of. But I also ran a test of the The whole parsing takes only a fraction of the time that is required for reading the data from an SSD in the first place, and this already shows that this should not be an issue. And of course, for the case of reading binary data, specifically, the So if we want to tune something, we should probably have a look at (supposedly) faster PNG decoders (https://libspng.org/ or so...) Regarding the class structure: I know that it's hard to make decision here that does not have any (at least potential) drawbacks. This touches many aspects, like the usability and "discoverability" of helper functions, as well as the "uniformity/consistency" of the generated classes. And I think that attaching the decoded data to the "owner" classes, but still keeping it as a dedicated entity (!) is a reasonable middle ground. BTW: I went through similar issues in my glTF implementation. The "basis" always had been the purely spec-based classes. But the approaches for resolving the additional data underwent some refactorings: Handling binary, external buffers, and data URIs "transparently" is a bit more tricky than it looks at the first glance - all this with the caveats that this also had to support glTF 1.0 (ouch!), and for some obscure reason, I wanted to offer some fancy progress UI for the asynchronous downloads... I thought that things could have been a bit simpler here, because the loading could have been synchronous. But now you mentioned that async loading might be something that may be added at some point, so this may also not be so straightforward. But for now, resolving the data synchronously in |
Yeah I couldn't see any way to avoid it with RapidJSON.
Awesome! Thanks for trying this!
Yes I definitely agree, given that glTFs usually don't have a lot of JSON (the bulk of their bytes are in geometry stored separately from the JSON), the JSON parsing performance is not particularly important for glTF. Even not counting the I/O time, I found that JSON parsing time is dwarfed also by Draco and image decoding. See #31 (comment)
I don't think we can make that a priority right now, but it may be very worthwhile in the future. Mind writing an issue for it? I'm going to finish implementing synchronous data URI decoding to bring us to par with tinygltf, and then - if I haven't missed something - this is ready to merge. |
@javagl support for data urls (apparently they're not called data uris anymore?!) is in. Since we already have a customer using this (via the |
Just a note: With the generator update, the output matches the repo state (i.e. the And during a quick test with running over all (Again, this is not a deeply reliable test, but ... the reader does not throw up, at least. A more meaningful test would/will be possible if/when there also is a "Writer", so that tests could do the roundtrip of |
That's great! (and I understand the caveats) |
The generated classes look clean, the reader interface is clear and small (particularly, the SAX-virtual-dispatch-complexity is completely hidden), and it can read the sample models, apparently. One could think deeply about the So I don't see a reason to not merge it. |
/** | ||
* @brief The buffer's data. | ||
*/ | ||
std::vector<uint8_t> data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this can be std::vector<std::byte>
if we're embracing C++17?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reasoning for using uint8_t
instead of std::byte
was (according to a mail from Nov 3, 2020, 8:01 AM) that std::byte
may be harder to interface with other libraries (probably referring to those which aren't on C++17 yet - among them being tinygltf...).
Considering that the data here is going to be "the source", changing it here might allow to replace some of the gsl::span<uint8_t>
that are used subsequently in cesium-native
... iff that doesn't make it noticably harder to pass this data forward to Unreal.
(But from a quick glance, the data is copied via raw void*
pointers on Unreal side anyhow).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be reasonable to use std::byte
at this point. Let's get this merged as-is and we can try switching to std::byte in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote #117 so we don't forget.
* | ||
* The stride, in bytes, between vertex attributes. When this is not defined, data is tightly packed. When two or more accessors use the same bufferView, this field must be defined. | ||
*/ | ||
std::optional<int64_t> byteStride; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this std::optional
? Shouldn't it match the pattern of using a negative signed integer as a sentinel value to indicate "unused". The standard says a negative stride isn't possible here: https://github.com/KhronosGroup/glTF/blob/master/specification/2.0/README.md#bufferviewbytestride
(btw I don't think this comment or any of the other future ones I'll make on this PR should hold it up. I've been working on a sibling CesiumGltfWriter
using the rapidjson sax api + these classes and I'm just commenting on things as they come up).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "-1 as sentinel" thing is only used for references to other glTF objects, not for arbitrary numbers. The rule is currently:
- If it's a reference to another glTF object, it does not need to be std::optional; use -1 to indicate "no reference".
- If it's a vector or map, it does not need to be std::optional; An undefined value is expressed as an empty collection.
- If it has a default value, make the default value the initial value and the property is not std::optional.
- Otherwise, it's std::optional.
We could add a 3.5: "If the schema says the value can't be negative, use a negative value to indicate undefined rather than making it std::optional". I think that would just muddy the already-kinda-muddy waters further. But I could be swayed if you're finding that it negatively impacts your client code.
This PR removes tinygltf and replaces it with our own glTF representation (CesiumGltf) as well as glTF reader library (CesiumGltfReader). To be clear, this isn't because anything is wrong with tinygltf, and in fact the interface of CesiumGltf is certainly inspired by tinygltf. The main motivation was to avoid baking a third-party library into such a critical aspect of cesium-native's public interface. I started out doing that by wrapping tinygltf, but that turned out to be a lot of trouble. More importantly, it made the interface harder to use.
Instead of wrapping, I ended up going down the road of generating glTF classes and a glTF reader from the glTF JSON Schema files. See
tools/generate-gltf-classes
. As a result, it conforms quite strictly to the glTF spec. The generated reader classes use RadidJSON's SAX mode, so parsing should be faster and use less memory. The glTF representation is completely separated from the reader classes, so we can swap out RapidJSON in the future with minimal fuss, especially since most of the RapidJSON parsing code is generated. Support for more glTF extensions can be easily added just by editingtools/generate-gltf-classes/glTF.json
to reference the schema(s) of the extension and regenerating.So, was it worth it? Probably not, this ended up being a lot more work than expected, and in the end we have something that isn't drastically different from tinygltf. But if we didn't do it now, we probably wouldn't realistically ever be able to do it.
Fixes #67
Fixes #31