Ground truth for proto specification docs #111
Replies: 3 comments 8 replies
-
I am in favor of keeping as much documentation as close to the code as possible, otherwise the documentation goes stale very fast. I would like to be able to understand the proto files in a self-contained way (referencing other proto files is fine). I don't want to hunt around a bunch of markdown looking for the relevant prose, it's just too hard to search without enforcing some specific structure to the prose, and if we're going to do that we should just put it in the protobufs. |
Beta Was this translation helpful? Give feedback.
-
The intention of the project was to have two representations of data: human readable and proto. As part of that, the goal was to have a spec that the proto implements. Implementation requires additional definition that the spec can only define abstractly. For example, a specification for how to represent date literals isn't defined in the spec. The spec only defines the range of allowed values and the meaning, not a specific representation. In proto, this may be a integer of days since epoch. In human-readable representation, this may be an iso date string. I think that generic things should be defined in the spec. Serialization specific details should be defined in the serialization specific area of the spec or the serialization idl (e.g. .proto files). In some cases, it is a good idea to embed high-level concepts in the spec along with embedded proto such as here. I also think that a way to resolve some of this is to have more proto embedded in the spec in other places like here. The vision was to have all of the spec contain the corresponding representations embedded. By doing so, we can probably be less verbose in the spec and leverage content in the embedded idl boxes. I appreciate the desire to make the proto "self-contained" but I think we should also avoid inverting the relationship between the spec and a specific representation. To me, the spec (markdown) comes first, the representation details come second. That being said, we clearly need to do a better job of keeping both up to date or it will be very hard for new users to adopt Substrait. We should review each patch with this in mind. I also think that most people right now are thinking of the proto as the spec, as opposed to an implementation of the spec. That wasn't the initial intention. We can, of course, also reevaluate the initial intention. |
Beta Was this translation helpful? Give feedback.
-
I agree with @jacques-n. I think that there isn't really an option other than to have the markdown spec be the source of truth. There are a few good reasons for that:
Going a bit further, I also don't think that it is a good idea to couple protobuf and substrait too closely. It's really important to have a reference implementation, prove out ideas with real code, and make sure the spec can be implemented cleanly. But the idea for this project goes beyond a single serialization format. In other words, this is not a set of protobuf objects, it is a way to exchange plans. There are probably things that require extra specification that isn't enforced by protobuf and it's good to have times when you're not thinking in terms of the protobuf, but in terms of the spec. (Hopefully that made sense.) |
Beta Was this translation helpful? Give feedback.
-
This discussion was inspired by this comment: #109 (comment)
The question, if I understand it correctly, is whether documentation should go in the markdown files or in the proto files themselves. We currently seem to be rather split on this. There are many places where the .proto files are not documented at all or the documentation is much sparser than whats in the markdown. On the other hand, plan.proto is more thoroughly documented and there is no corresponding markdown document.
What should be the guideline going forwards? Is there a way we can reuse what we put in the proto files so we don't have to type everything twice?
Beta Was this translation helpful? Give feedback.
All reactions