-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement vectorized base image augmentation layer w/ Grayscale()
example
#1331
Conversation
What about the product decision at tensorflow/tensorflow#55639 (comment)? |
Curious @fchollet 's thoughts on this. Its impossible to vectorize this one in TensorFlow and keep transformations independent from one another. Do you think a slowdown is acceptable if this is the case? |
The bottleneck is with any API that doesn't have a batch dimension on param that you want to randomize or change x element. So it is like if you need to loop over the batch calling the same op with a different parameter. I don't know if this could happen or not in the near future with the new XLA/MLIR birdge design and its genenerated kernels but right now not with vectorized map and I think not with the current bridge. Edit (our old thread about |
tensorflow/tensorflow#55335 (comment) @jpienaar Can ops like this one be fused in a loop without breaking the raw ops API? |
keras_cv/layers/preprocessing/vectorized_base_image_augmentation_layer.py
Outdated
Show resolved
Hide resolved
keras_cv/layers/preprocessing/vectorized_base_image_augmentation_layer.py
Outdated
Show resolved
Hide resolved
Grayscale()
example
@jbischof @ianstenbit this is ready for a real review now. We can incrementally migrate layers to vectorized layers using this approach. |
Before we merge I hope that we could clarify the point about the product decision at: As It was since the first keras-cv commits that I was talking about the performance overhead of this design with |
What are we hoping to have clarified? This change solves the performance issues of grayscale with no regression of functionality. Item-wise augmentation is still deemed important. |
/gcbrun |
/gcbrun |
It is not only about grayscale as this is related to many ops. An August not update list is at #291 (comment) |
My goal in this PR is to provide an abstraction to allow us to optimize the performance of layers that are not being converted by pfor: #291 (comment) Does that answer the question? |
/gcbrun |
This was the same topic one year ago but we didn't care about the topic also if we was in a design phase: So what have changed in 1 year that instead we could not care in the design phase with all the explicit feedback and benchmarks that I've sent?
As you can read in many place and also in that comment it is not the only point for the failure/fallback of pfor conversion as we have also missing converters. As an extra note: |
we do care about performance, and as you say there is no tf.image API owner or pfor API owner, so instead of for these issues to be fixed we'd like to offer vectorized implementations of these layers in the meantime. Does that answer your question? And to be clear, it was never that we didn't care about performance, it was simply that getting RandAugment working, OD augmentations working, and other product goals took priority. Now that these are working I'm attempting to revisit these layers and optimize without major regressions in functionality. |
Honestly the impression for one year was that we did not care about performance. I have not collected any comment that we recognized the problem also if it was submitted and benchmarked in multiple tickets and that we just need to postpone the design for resource constrain. In any case better later then never, we will see the residual fallback after this migration. About the PR: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Luke!
Lots of comments because this is a foundational piece and I want to make sure we get it really right.
keras_cv/layers/preprocessing/batched_base_image_augmentation_layer_test.py
Show resolved
Hide resolved
keras_cv/layers/preprocessing/batched_base_image_augmentation_layer_test.py
Show resolved
Hide resolved
keras_cv/layers/preprocessing/batched_base_image_augmentation_layer_test.py
Show resolved
Hide resolved
inputs = {IMAGES: inputs} | ||
|
||
metadata[BATCHED] = inputs["images"].shape.rank == 4 | ||
if inputs["images"].shape.rank == 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ianstenbit see here, we expand dims if its operation in unbatched mode.
@ianstenbit all comments addressed. Closed in favor of #1373 |
Resource constraints are real and we were hoping pfor would either be:
Neither turned out so we are now prioritizing this. Please keep in mind resource constraints are a real issue, and they don't mean issues aren't important; they simply mean we have other things that are more pressing at the time. |
Generally I agree but it is a little bit different when we was still in the design phase and we had realtime feedback about how it will go to fail. Pfor could be not fixed if the underline design API of the ops we want to use in this library is not batch ready. I think our approach was really out of design for Pfor from the early days. Then for missing converters it could be still fixed. We cared about something (within the batch policy) with some theoretical validity but without an empirical verification about the faster convergence vs x epoch computational overhead. Also as we have seen in another recent PR pytorch vision is not working within the batch so many reference models that we also train/replicare and the related reference papers were not trained with the within the batch policy at all. |
This already matters in contrastive learning and any case where samples are compared within a batch. This is a case where a silent error could really waste a users time someday - and we should avoid this if possible. In the current structure we will likely be able to have both performance and numerical correctness. I think this is the best of all worlds |
Yes but:
|
We have a ton of performance issues with KPLs due to auto vectorization not working with tf.image ops.
Basically my proposed approach is as follows:
Fixes: #581
Caveats of this approach: