Add a padding operator. #95

mrbeann · 2021-07-12T09:30:17Z

The implemented operator supports padding in all dimensions and asymmetric padding. And it is implemented in software only, no accelerated version is available. The code is tested in both CPP and python API. In the CPP test, I test it with different input sizes (2D, 4D) and different padding patterns (asymmetric, symmetric). The python API testing is mainly focused on it can work in a graph.

Example Python code.

      # consider input_tensor is a 4D tensor, we pad the last 2 dimension with 1
      out = array_ops.padding(input_tensor, [0, 0, 0, 0, 1, 1, 1, 1], "padding")
      # consider input_tensor is a 4D tensor, we pad the 2nd dimension with symmetric 
      # padding with size 1 and the 3rd dimension with asymmetric padiing of size 1 and 2.
      out = array_ops.padding(input_tensor, [0, 0, 0, 0, 1, 1, 1, 2], "padding")

Fixes issue #94.

yaoyuannnn · 2021-07-13T03:33:52Z

@mrbeann Is this PR ready for code review? You can add either Sam or me (or both) as reviewers.

mrbeann · 2021-07-13T06:01:07Z

I think this is ready for review. But it seems I do not have permission. Can you or Sam review it? @xyzsam @yaoyuannnn

yaoyuannnn · 2021-07-13T06:28:55Z

I think this is ready for review. But it seems I do not have permission. Can you or Sam review it? @xyzsam @yaoyuannnn

You should be able to add reviewers since you already created the PR.

yaoyuannnn · 2021-07-13T06:29:34Z

Also can you resolve the issue in the CI build?

mrbeann · 2021-07-13T08:56:15Z

@yaoyuannnn Hi Yao, I resolve the CI problem. But add a reviewer can only be done for the owner of this repo according to the following answer,
https://stackoverflow.com/questions/32262295/adding-a-reviewer-to-a-github-pull-request

yaoyuannnn · 2021-07-13T19:11:56Z

@yaoyuannnn Hi Yao, I resolve the CI problem. But add a reviewer can only be done for the owner of this repo according to the following answer,
https://stackoverflow.com/questions/32262295/adding-a-reviewer-to-a-github-pull-request

Thanks! That's interesting. In GitLab, the author can select reviewers so I thought GitHub would be the same. Maybe assignee should work.

I will review this PR tonight. Thanks for contributing to the SMAUG project :)

yaoyuannnn · 2021-07-13T02:59:06Z

.gitignore

@@ -0,0 +1,3 @@
+build/
+experiments/
+smaug/operators/padding_op_test.h


Why putting this file here?

Yes, please remove this file from this PR. You are welcome to send a separate PR with a .gitignore if you like, but if so, it should only contain build products, not other code (like padding_op_test.h).

Mainly to facilitate my development. But I think adding the initial two lines would be beneficial to this repo (the third line is meaningless now). But I can delete the file if you think it doesn't make sense.

yaoyuannnn · 2021-07-13T03:01:08Z

smaug/core/backend.cpp

@@ -1,38 +1,39 @@
 #include "smaug/core/backend.h"
 #include "smaug/operators/batch_norm_op.h"
+#include "smaug/operators/concat_op.h"


Code rearrangement generally is confusing :) Moving forward, please move this to a separate PR.

I definitely appreciate you sorting all the includes in alphabetical order! It's certainly cleaner than before. But yes, refactoring and code cleanups should be done separately if it's not related to the main PR purpose.

I'll fix this

yaoyuannnn · 2021-07-13T03:02:32Z

smaug/core/backend.cpp

-const unsigned kEltwiseOpHw = 0x0003;
-const unsigned kBatchNormHw = 0x0004;
-const unsigned kPoolingHw = 0x0005;
+const unsigned kConvolutionHw = 0x0001;   // 0x0001;


I can't remember exactly, but we do want these accelerator blocks to be traced with different accelerator IDs. @xyzsam for his perspective.

Yes, leave this unchanged here; there is an argument for both sides but regardless, it should not be done in this PR.

This is based on the advice @xyzsam gives for another problem. But it is not relevant to this PR. I'll delete it.

yaoyuannnn · 2021-07-13T03:03:11Z

smaug/core/network_builder.cpp

@@ -1,56 +1,57 @@
-#include <iostream>


Again, it'd better to put the code rearrangement into a separate PR.

I'll fix this.

yaoyuannnn · 2021-07-13T03:26:22Z

smaug/operators/padding_op.h

+    }
+
+    /** Set the number of padders of the Tensor along each dimension. */
+    void setPadder(const int& val) {


Why not make this support different padding size on different dimensions like TensorFlow (https://www.tensorflow.org/api_docs/python/tf/keras/layers/ZeroPadding2D)? Judging from the implementation, I think this is for 2D input tensor? If so, it's better to rename it to Padding2DOp.

I think it would be relatively straightforward to make this support tensors with any number of dimensions and arbitrary padding size along all of them, so we should do that instead. The copyTensorRegion API should work regardless.

Good idea. I'll rename the operation.

yaoyuannnn · 2021-07-14T02:12:26Z

smaug/operators/padding_op.h

+namespace smaug {
+
+/** \ingroup Operators
+ * \brief Pad a given tensor in different dimension.


I think this is meant to pad on H and W dimensions. Let's make this more explicit.

Also, mention that the padding on either side of all dimensions is assumed to be equal (e.g. 1 element on both left and right sides).

yaoyuannnn · 2021-07-14T02:15:26Z

smaug/operators/padding_op.h

+    int getPadder() { return padder; }
+
+    void run() override {
+      Tensor* input = getInput(0);


clang-format?

yaoyuannnn · 2021-07-14T02:17:33Z

smaug/operators/padding_op.h

+      std::vector<int> destOrigin;
+      if (input->getShape().getLayout() == DataLayout::NCHW){
+        destOrigin = std::vector<int>({0, 0, padder, padder});
+      }


clang-format

yaoyuannnn · 2021-07-14T02:17:48Z

smaug/operators/padding_op.h

+      }
+      std::vector<int> srcOrigin = std::vector<int>({0, 0, 0, 0});
+      std::vector<int> regionSize = inputDims;
+      copyTensorRegion(output, input, destOrigin, srcOrigin, regionSize);


Glad this API gets used here :)

Yeah, thanks for your efforts. Hope I use it correctly. :)

yaoyuannnn · 2021-07-14T02:18:30Z

smaug/operators/padding_op.h

+    // Optional but recommended function to verify operator parameters.
+    bool validate() override {
+      if (padder < 0){
+        return false;


clang-format.

xyzsam

Thank you for your PR! A few high level comments:

Run clang-format on your code. For all the files you changed, run clang-format -i -style=file filename.cpp.
Move unrelated code out of this PR (refactoring, re-ordering includes, depthwise conv). You can put them in a separate PR.

I will add some more information about contributing code to the repo for future users. As you can see, you are the first external user to contribute to smaug. We really appreciate it :)

xyzsam · 2021-07-14T04:17:40Z

.gitignore

@@ -0,0 +1,3 @@
+build/
+experiments/
+smaug/operators/padding_op_test.h


Yes, please remove this file from this PR. You are welcome to send a separate PR with a .gitignore if you like, but if so, it should only contain build products, not other code (like padding_op_test.h).

xyzsam · 2021-07-14T04:18:37Z

smaug/core/backend.cpp

@@ -1,38 +1,39 @@
 #include "smaug/core/backend.h"
 #include "smaug/operators/batch_norm_op.h"
+#include "smaug/operators/concat_op.h"


I definitely appreciate you sorting all the includes in alphabetical order! It's certainly cleaner than before. But yes, refactoring and code cleanups should be done separately if it's not related to the main PR purpose.

xyzsam · 2021-07-14T04:19:21Z

smaug/core/backend.cpp

-#include "smaug/operators/smv/smv_less_op.h"
-#include "smaug/operators/smv/smv_greater_op.h"
+#include "smaug/operators/smv/smv_tanh_op.h"
+#include "smaug/operators/softmax_op.h"


(Also, if this was meant to sort the header includes, these three should not be at the bottom :] )

xyzsam · 2021-07-14T04:20:08Z

smaug/core/backend.cpp

-const unsigned kEltwiseOpHw = 0x0003;
-const unsigned kBatchNormHw = 0x0004;
-const unsigned kPoolingHw = 0x0005;
+const unsigned kConvolutionHw = 0x0001;   // 0x0001;


Yes, leave this unchanged here; there is an argument for both sides but regardless, it should not be done in this PR.

xyzsam · 2021-07-14T04:20:42Z

smaug/core/types.proto

@@ -3,79 +3,80 @@ syntax = "proto3";
 package smaug;

 enum DataType {
-  UnknownDataType = 0;


xyzsam · 2021-07-14T04:24:14Z

smaug/operators/padding_op.h

+        // set output size?
+    }
+
+    auto getPadder() { return padder; }


Two issues here:

Avoid auto return types; it makes the API unnecessarily hard to read and understand. The reader should not need to look for where padder is defined to know what this returns.

The name "padder" suggests that it is an object doing padding, when in fact it is just your padding size. So this API (and the class member) should be named "padding_size".

Thank you for your advice.

xyzsam · 2021-07-14T04:25:46Z

smaug/operators/padding_op.h

+    }
+
+    /** Set the number of padders of the Tensor along each dimension. */
+    void setPadder(const int& val) {


I think it would be relatively straightforward to make this support tensors with any number of dimensions and arbitrary padding size along all of them, so we should do that instead. The copyTensorRegion API should work regardless.

xyzsam · 2021-07-14T04:26:44Z

smaug/operators/padding_op.h

+      Tensor* input = getInput(0);
+      Tensor* output = getOutput(0);
+      int ndims = input->ndims();
+      std::vector<int> inputDims = input->getShape().dims();


All these vectors should be const ref since you don't need to modify them.

xyzsam · 2021-07-14T04:30:22Z

smaug/operators/padding_op.h

+namespace smaug {
+
+/** \ingroup Operators
+ * \brief Pad a given tensor in different dimension.


Also, mention that the padding on either side of all dimensions is assumed to be equal (e.g. 1 element on both left and right sides).

xyzsam · 2021-07-14T04:33:05Z

smaug/python/ops/array_ops.py

+    name: Name of the operator.
+
+  Returns:
+    A paded version of the input tensor.


…dding

mrbeann · 2021-07-14T15:44:00Z

Hi, I Just fix all the problems. BTW some includes are sorted after using the clang-format. Please check if everything is okay now.

yaoyuannnn · 2021-07-15T04:13:37Z

smaug/operators/padding_op.h

+     * ,dimk_backward>
+     */
+    void setPaddingSize(RepeatedField<google::protobuf::int32> const& val) {
+        std::vector<double> paddingSize(val.begin(), val.end());


Why converting to double here?

I change it to int.

yaoyuannnn · 2021-07-15T04:15:25Z

smaug/operators/padding_op.h

+
+    void setPaddingSize(std::vector<int> const& val) { paddingSize = val; }
+
+    std::vector<int> getPaddingSize() { return paddingSize; }


This should be a const member function.

yaoyuannnn · 2021-07-15T04:18:41Z

smaug/operators/padding_op_test.cpp

+        auto paddingOp =
+                new PaddingOp<ReferenceBackend>("padding", workspace());
+        paddingOp->setInput(input, 0);
+        paddingOp->setPaddingSize({ 0, 0, 0, 0, 1, 1, 1, 1 });


Can we have a test for asymmetric padding like padding size of 1 on the left and 2 on the right since it's supported?

I add it now.

yaoyuannnn · 2021-07-15T04:21:01Z

smaug/python/ops/array_ops.py

+
+  Args:
+    input_tensor: Input tensor.
+    padding_size: A list that contains number of values padded to each dimension.


I think this docstring needs to be more explicit: padding_size is in the format of {dim0_begin, dim0_end, dim1_begin, dim1_end, ...}.

yaoyuannnn · 2021-07-15T05:30:26Z

smaug/operators/padding_op.h

+     * ,dimk_backward>
+     */
+    void setPaddingSize(RepeatedField<google::protobuf::int32> const& val) {
+        std::vector<int> paddingSize(val.begin(), val.end());


Sorry I didn't make it clear in my comment. This defines a new variable. Can you copy from the repeated field like:
paddingSize.assign(val.begin(), val.end());

yaoyuannnn · 2021-07-15T05:31:32Z

smaug/operators/padding_op.h

+
+    void setPaddingSize(std::vector<int> const& val) { paddingSize = val; }
+
+    const std::vector<int> getPaddingSize() { return paddingSize; }


I meant to make this a const member function, not a const return type:
std::vector<int> getPaddingSize() const { return paddingSize; }

yaoyuannnn · 2021-07-15T05:31:54Z

smaug/operators/padding_op_test.cpp

+        auto paddingOp =
+                new PaddingOp<ReferenceBackend>("padding", workspace());
+        paddingOp->setInput(input, 0);
+        paddingOp->setPaddingSize({ 0, 0, 0, 0, 1, 1, 1, 1 });


yaoyuannnn

Thanks!

xyzsam

Please update the PR description to include more detail about this operator. For example:

Mention that the operator supports padding in all dimensions and asymmetric padding.
Implemented in software only, no accelerated version available.
Example Python code.
How this was tested.

xyzsam · 2021-07-15T22:59:31Z

smaug/operators/padding_op.h

+     * The paddingSize is orgainized as <dim1_forward, dim1_backward, ...
+     * ,dimk_backward>
+     */
+    void setPaddingSize(RepeatedField<google::protobuf::int32> const& val) {


This should be const RepeatedField&, not RepeatedField const&.

Also, what does "forward" and "backward" mean?

I'll update this.

xyzsam · 2021-07-15T23:00:04Z

smaug/operators/padding_op.h


-    int getPadder() { return padder; }
+    std::vector<int> getPaddingSize() const { return paddingSize; }


return const std::vector& to avoid making a copy of paddingSize.

xyzsam · 2021-07-15T23:00:30Z

smaug/operators/padding_op.h

+        Tensor* input = getInput(0);
+        Tensor* output = getOutput(0);
+        int ndims = input->ndims();
+        const std::vector<int> inputDims = input->getShape().dims();


both inputDims and outputDims should be const-ref.

xyzsam · 2021-07-15T23:01:38Z

smaug/operators/padding_op.h

+        output->fillData(vf.data(), vf.size());
+        std::vector<int> destOrigin, paddingBegin, srcOrigin;
+        for (int i = 0; i < ndims; i++) {
+            paddingBegin.push_back(paddingSize[2 * i]);


use paddingSize.at(2*i) instead; this will throw an exception if you go out of bounds, whereas paddingSize[] will attempt to add an element at that position.

xyzsam · 2021-07-15T23:02:35Z

smaug/operators/padding_op.h

+            paddingBegin.push_back(paddingSize[2 * i]);
+            srcOrigin.push_back(0);
+        }
+        destOrigin = std::vector<int>(paddingBegin);


There's no need to declare destOrigin earlier and then re-initialize it here - that causes the vector to be constructed twice. Just declare it here directly: std::vector<int> destOrigin = ....

xyzsam · 2021-07-15T23:04:10Z

smaug/operators/padding_op.h

-          dims[1] += 2*padder;
-          dims[2] += 2*padder;
+        for (int i = 0; i < ndims; i++) {
+            dims[i] += (paddingSize[2 * i] + paddingSize[2 * i + 1]);
        }
        TensorShape shape(
                dims, input->getShape().getLayout(), Backend::Alignment);
        Tensor* output = new Tensor(name, shape);
        workspace->addTensor(output);
        outputs.at(0) = output;


Same here - use enums instead of hardcoded constants.

xyzsam · 2021-07-15T23:05:31Z

smaug/operators/padding_op.h

        return Operator::validate();
    }
-    
+
    enum { kInputs, kNumInputs };


nit: kInput instead of kInputs (you only have one), and likewise for outputs.

xyzsam · 2021-07-15T23:06:32Z

smaug/operators/padding_op_test.cpp

+        std::vector<float> expected_output{
+            0, 0, 0,  // input 0, chan 0, row -1
+            0, 0, 0,  // input 0, chan 0, row 0
+            0, 0, 0,  // input 0, chan 1, row 3


Do you mean row 1?

Yes, I modified it.

xyzsam · 2021-07-15T23:07:32Z

smaug/python/ops/array_ops.py

  """Construct a tensor by padding a given tensor.

  Args:
    input_tensor: Input tensor.
-    padder: A int value that represents the padding dimension
+    padding_size: A list in the format of {dim0_begin, dim0_end, dim1_begin, dim1_end, ...} that 


Add a note that the order of dimensions must align with the data layout of input_tensor.

Also, ensure that this line doesn't exceed 80 characters in length.

xyzsam · 2021-07-15T23:08:58Z

smaug/python/ops/array_ops.py

-  else:
-    raise ValueError("Only support layout as NHWC or NCHW")
+  if len(padding_size) != 2 * len(src_dims):
+    raise ValueError("The padding_size's dimension must be two times as the input_tensor")


nit: "len(padding_size) should be 2x input_tensor.shape.dims" is clearer.

mrbeann · 2021-07-16T02:19:55Z

I just update the PR description. Please check if it is okay now. Thanks!

xyzsam

Almost done, just two more small fixes, then we can merge. Thank you for your patience and your contributions!!!

xyzsam · 2021-07-16T06:05:38Z

smaug/operators/padding_op.h

            srcOrigin.push_back(0);
        }
-        destOrigin = std::vector<int>(paddingBegin);
+        std::vector<int> destOrigin = std::vector<int>(paddingBegin);


Sorry, I should have caught this earlier. Why build paddingBegin and then copy it into destOrigin? Why not just use paddingBegin directly?

xyzsam · 2021-07-16T06:06:28Z

smaug/operators/padding_op.h

            srcOrigin.push_back(0);
        }
-        destOrigin = std::vector<int>(paddingBegin);
+        std::vector<int> destOrigin = std::vector<int>(paddingBegin);
        std::vector<int> regionSize = inputDims;


Same here - no need to make a copy of inputDims, just use it directly. I think you're trying to make it more clear what each vector is representing in the copyTensorRegion call but there's no need since the API documents it very clearly already.

mrbeann · 2021-07-16T06:34:17Z

Thanks for your efforts! I just update the code.

mrbeann added 4 commits July 7, 2021 07:15

add padding

03fdff1

add padding

4064629

padding works

9e54e3a

add py op

1090710

fix tracer

1b17e2a

yaoyuannnn reviewed Jul 14, 2021

View reviewed changes

xyzsam reviewed Jul 14, 2021

View reviewed changes

mrbeann added 2 commits July 14, 2021 14:12

update padding

02ef71e

Merge branch 'master' of https://github.com/harvard-acc/smaug into pa…

b2a51ba

…dding

yaoyuannnn reviewed Jul 15, 2021

View reviewed changes

update padding with more test and documentation

c89fa42

yaoyuannnn reviewed Jul 15, 2021

View reviewed changes

update padding

fe44780

yaoyuannnn approved these changes Jul 15, 2021

View reviewed changes

xyzsam requested changes Jul 15, 2021

View reviewed changes

update padding

d818a5f

xyzsam approved these changes Jul 16, 2021

View reviewed changes

update padding

7689d04

xyzsam changed the title ~~Padding~~ Add a padding operator. Jul 16, 2021

xyzsam merged commit e54f53d into harvard-acc:master Jul 17, 2021

mrbeann deleted the padding branch July 17, 2021 05:16


		void setPaddingSize(std::vector<int> const& val) { paddingSize = val; }

		std::vector<int> getPaddingSize() { return paddingSize; }


		void setPaddingSize(std::vector<int> const& val) { paddingSize = val; }

		const std::vector<int> getPaddingSize() { return paddingSize; }


		int getPadder() { return padder; }
		std::vector<int> getPaddingSize() const { return paddingSize; }

Add a padding operator. #95

Add a padding operator. #95

Conversation

mrbeann commented Jul 12, 2021 • edited by xyzsam Loading

yaoyuannnn commented Jul 13, 2021

mrbeann commented Jul 13, 2021

yaoyuannnn commented Jul 13, 2021

yaoyuannnn commented Jul 13, 2021

mrbeann commented Jul 13, 2021

yaoyuannnn commented Jul 13, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xyzsam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrbeann commented Jul 14, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yaoyuannnn left a comment

Choose a reason for hiding this comment

xyzsam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrbeann commented Jul 16, 2021

xyzsam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrbeann commented Jul 16, 2021

mrbeann commented Jul 12, 2021 •

edited by xyzsam

Loading