[web] fix rest-test issues #593

tttttangTH · 2020-10-19T21:05:35Z

This PR mainly fixes some issues on current tests of OTBR-REST, and makes some preparations for #532 .

set timeout for rest-test in case it takes too much time
some modifications on rest-test principles & HTTP response, so this test will work properly when more error code of underlying APIs are defined
bind ReceiveDiagnosticGetCallback to resource each time when diagnostics API is called(consistent with rest-test modifications and more logistic)

tttttangTH · 2020-10-19T21:06:51Z

codecov · 2020-10-19T21:28:07Z

Codecov Report

Merging #593 (0e7a66a) into main (5a142e7) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #593      +/-   ##
==========================================
- Coverage   71.12%   71.11%   -0.01%     
==========================================
  Files          78       78              
  Lines        5240     5235       -5     
==========================================
- Hits         3727     3723       -4     
+ Misses       1513     1512       -1

Impacted Files	Coverage Δ
src/rest/resource.hpp	`0.00% <ø> (ø)`
src/rest/rest_web_server.cpp	`86.74% <ø> (-0.16%)`	⬇️
src/rest/resource.cpp	`90.98% <100.00%> (+0.25%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a142e7...0e7a66a. Read the comment docs.

simonlingoogle · 2020-10-20T01:14:00Z

tests/rest/test_rest.py

+    try:    
+        response = urllib.request.urlopen(urllib.request.Request(url))
+        body = response.read()
+        data = json.loads(body)
+        result[index] = data
+
+
+    except urllib.error.HTTPError as e:
+        print(e.code)   


why is raise exception not good?

+1, why catching the exception?

I expect to just record the error here then raise one exception in main thread(on top of all subprocess), so for each subprocess, its job is either to get data(success) or to print error log(fail).

then raise one exception in main thread

@tttttangTH I didn't follow how this is implemented.

Let me ask a question: Will the CI fail if a urllib.error.HTTPError exception is raised in this method?

Yes, CI will fail.

Yes, CI will fail.

@tttttangTH Could you explain how it works?
I don't see how it can make the CI fail, you are just printing a message...

There is an array result which is used to record the response of each thread. if one thread gets HTTP error, the value in result will still be None, we will check this in main thread.

simonlingoogle · 2020-10-20T01:17:39Z

src/rest/resource.cpp

@@ -518,31 +513,22 @@ void Resource::UpdateDiag(std::string aKey, std::vector<otNetworkDiagTlv> &aDiag

 void Resource::Diagnostic(const Request &aRequest, Response &aResponse) const
 {
-    otbrError error = OTBR_ERROR_NONE;
+    otThreadSetReceiveDiagnosticGetCallback(mInstance, &Resource::DiagnosticResponseHandler,


why setting callback for each Diagnostic is necessary?

For example, If factoryreset is called, the callback may be bind to another function. We should ensure that each time diagnostic API in resource is called, the response message can be collected by our resource handler.

After factoryreset, Resource::Init will run again and sets up the callback ? Is the original code causing real issues?

Resource::Init will only be run once when we start otbr-agent, then factoryreset will set the callback to the default(seems a handler for CLI), I am not sure whether there is another approach that set otThreadSetReceiveDiagnosticGetCallback to another function. do you think we do this each time before diagnostics API is called is acceptable? or is there an approach that we could detect whether factoryreset is called?

Resource::Init will only be run once when we start otbr-agent, then factoryreset will set the callback to the default(seems a handler for CLI), I am not sure whether there is another approach that set otThreadSetReceiveDiagnosticGetCallback to another function. do you think we do this each time before diagnostics API is called is acceptable? or is there an approach that we could detect whether factoryreset is called?

I think factoryreset re-exec the whole process, so the process restarts as if it was just launched normally. We should not blame factoryreset for any issue. If multiple components of ot-br-posix is using otThreadSetReceiveDiagnosticGetCallback, maybe they should coordinate with each other.
Should we use EventEmitter so that ot-br-posix calls otThreadSetReceiveDiagnosticGetCallback just once and let other components subscribe to the corresponding event using EventEmitter::On? @gjc13 @tttttangTH

Ok, will try to do like this.

wgtdkp

LGTM 👍

tests/rest/test_rest.py

wgtdkp · 2020-10-20T02:18:41Z

tests/rest/test_rest.py

+    try:    
+        response = urllib.request.urlopen(urllib.request.Request(url))
+        body = response.read()
+        data = json.loads(body)
+        result[index] = data
+
+
+    except urllib.error.HTTPError as e:
+        print(e.code)   


+1, why catching the exception?

wgtdkp · 2020-10-20T02:21:31Z

src/rest/resource.cpp

-    if (error == OTBR_ERROR_NONE)
-    {
-        aResponse.SetStartTime(steady_clock::now());
-        aResponse.SetCallback();
-    }
-    else
-    {
-        ErrorHandler(aResponse, HttpStatusCode::kStatusInternalServerError);
-    }
+    aResponse.SetStartTime(steady_clock::now());
+    aResponse.SetCallback();


For now, what happens if we fail in this function? Simply ignore the error?

I found that It may fail when node is detached or no buffer for message.

So I think one solution is to define more HTTP status code according to these newly added error code, I think I could do it after #532 is merged.

Another solution is like this, we ignore all errors(more simple, but still reasonable).

For one thing, it could address the no buffer problem - although we fail in this API call, if we have diagnostic information(received in 4s but due to another call) left in resource, this information is still valid and could be sent as response.
For another thing, we could just view empty response as "some issues happen, we can't get anything now".

Simply ignore the error?

@tttttangTH Is this true for your changes? I didn't see why we cannot handle the possible failures in this PR. It looks to me that keeping

else { ErrorHandler(aResponse, HttpStatusCode::kStatusInternalServerError); }

should just work. Why did you remove this error handler?

because if we call otThreadSendDiagnosticGet too many times within a specific period, we will get a no buffer error for this call( for example, when we send 10 requests for diagnostics concurrently , the no buffer error usually occurs).

But actually I prefer not viewing it as an error for HTTP response because the reason for calling otThreadSendDiagnosticGet each time when the server received a request for diagnostics is to update the information we maintained(we have a hash table to record all the diagnostic information we received within 4 seconds). If we have a 'no buffer' error, it means we have call the API many times recently so information in the hash table is almost the latest.

I know the right way to deal with this problem is to make an exception for no buffer rather than remove the error handler. I expect to do this after #532 because I think I may need to rewrite error handler of all RESTful API according to the modifications in #532 . So here I just ignore it currently.

Why is no buffer error not good for such situation? I think no buffer error was exactly what happened and should be reflected by the HTTP response.

Maybe we can return 507 Insufficient Storage for no buffer error.

But if the test is sending too much requests and is expecting no buffer error to happen sometimes, the test client can conclude with success even if there are some no buffer errors.

Thoughts? @tttttangTH @wgtdkp

@tttttangTH You should not just remove the error handler. You can at least add a log for this error.

I know the right way to deal with this problem is to make an exception for no buffer rather than remove the error handler. I expect to do this after #532 because I think I may need to rewrite error handler of all RESTful API according to the modifications in #532 .

I don't see why we need to wait for #532. Are there any REST feature that depends on #532?

Have't found any other feature depend on #532 , but it seems we won't catch no buffer at otThreadSetReceiveDiagnosticGetCallback before #532 is applied.

I noticed no buffer before when when I wrote this API, at that time, no buffer was just an Info, and I have chosen to ignore it at that time. But It‘s thoughtless for me to just remove the error handler here.

simonlingoogle · 2020-10-21T07:53:48Z

This PR mainly fixes some issues on current tests of OTBR-REST, and makes some preparations for #532 .

set timeout for rest-test in case it takes too much time

some modifications on rest-test principles & HTTP response, so this test will work properly when more error code of underlying APIs are defined

bind ReceiveDiagnosticGetCallback to resource each time when diagnostics API is called(consistent with rest-test modifications and more logistic)

Thanks for the fix. I merged this PR to #532 and it did pass all the checks.
But I am not clear what exactly does this PR fix. Any detailed explanation? @tttttangTH

wgtdkp · 2020-10-21T08:02:56Z

src/rest/resource.cpp

-        ErrorHandler(aResponse, HttpStatusCode::kStatusInternalServerError);
-    }
+    aResponse.SetStartTime(steady_clock::now());
+    aResponse.SetCallback();


Don't need to address it in this PR: the function name SetCallback is bad... I would expect this function accepts a callback function as an argument but it is actually a flag indicates whether we have/enable callback. Can we rename to SetHasCallback?

Ok, will do it in #537

simonlingoogle · 2020-10-21T09:23:13Z

Labelled as P2 since #532 depends on this.

tttttangTH · 2020-10-21T15:31:54Z

This PR mainly fixes some issues on current tests of OTBR-REST, and makes some preparations for #532 .

set timeout for rest-test in case it takes too much time

some modifications on rest-test principles & HTTP response, so this test will work properly when more error code of underlying APIs are defined

bind ReceiveDiagnosticGetCallback to resource each time when diagnostics API is called(consistent with rest-test modifications and more logistic)

Thanks for the fix. I merged this PR to #532 and it did pass all the checks.
But I am not clear what exactly does this PR fix. Any detailed explanation? @tttttangTH

After #532 is applied, several issues occur.

some Openthread API call fail for detached. So I add basic configuration for the node before rest-test starts.
one Openthread API still fails for no buffer. So I ignore this error currently (It does fail for an Openthread API call but I don't think it is a failure for our diagnostics API ), and will rewrite the error handler altogether after [backbone-router] add Backbone multicast routing #532 according to the new return value principle.
rest-test takes too much time, gives too much log. It's strange, seems it is result from your PR, when I start the node, then stop otbr-agent and restart it again, many logs keep appearing and never stop. This is why we have too much logs in rest-test. After factoryreset, it is back to normal. so I also add a factoryreset operation to the node before rest-test which leads to our discussion on callback.

wgtdkp · 2020-10-22T03:20:12Z

one Openthread API still fails for no buffer. So I ignore this error currently (It does fail for an Openthread API call but I don't think it is a failure for our diagnostics API ), and will rewrite the error handler altogether after [backbone-router] add Backbone multicast routing #532 according to the new return value principle.

@tttttangTH I think we are wrong about the principle of testing. To my understanding, you are just removing the error check in REST tests to get #532 passed. But the purpose of tests is to find out bugs. We are not sure if #532 has bugs or not, so it could be #532 that causes existing testcases to fail. So I think you should not change the tests to make PRs happy. Otherwise, why bother have those tests?

Before figuring out why #532 fails the tests, we should hold this PR.

tttttangTH · 2020-10-22T05:01:38Z

one Openthread API still fails for no buffer. So I ignore this error currently (It does fail for an Openthread API call but I don't think it is a failure for our diagnostics API ), and will rewrite the error handler altogether after [backbone-router] add Backbone multicast routing #532 according to the new return value principle.

@tttttangTH I think we are wrong about the principle of testing. To my understanding, you are just removing the error check in REST tests to get #532 passed. But the purpose of tests is to find out bugs. We are not sure if #532 has bugs or not, so it could be #532 that causes existing testcases to fail. So I think you should not change the tests to make PRs happy. Otherwise, why bother have those tests?

Before figuring out why #532 fails the tests, we should hold this PR.

Ok, from my side, if we put aside the no buffer issue, I think another issue is if there is no factoryreset after restarting otbr-agent, rest-check always fails and output too many logs. I am not sure whether no factoryreset is acceptable, Seems it's result from #532, I will investigate on this and see why.

wgtdkp · 2020-10-22T06:14:21Z

one Openthread API still fails for no buffer. So I ignore this error currently (It does fail for an Openthread API call but I don't think it is a failure for our diagnostics API ), and will rewrite the error handler altogether after [backbone-router] add Backbone multicast routing #532 according to the new return value principle.

@tttttangTH I think we are wrong about the principle of testing. To my understanding, you are just removing the error check in REST tests to get #532 passed. But the purpose of tests is to find out bugs. We are not sure if #532 has bugs or not, so it could be #532 that causes existing testcases to fail. So I think you should not change the tests to make PRs happy. Otherwise, why bother have those tests?
Before figuring out why #532 fails the tests, we should hold this PR.

Ok, from my side, if we put aside the no buffer issue, I think another issue is if there is no factoryreset after restarting otbr-agent, rest-check always fails and output too many logs. I am not sure whether no factoryreset is acceptable, Seems it's result from #532, I will investigate on this and see why.

@tttttangTH OpenThread persists data across restarts. It is expected that OpenThread will continue its function after resatrting. factoryreset, by its name, is to reset the device to the factory mode and its an usual action in products. The too many logs issue is due to a bug of otbr with latest openthread, see the fail https://github.com/openthread/ot-br-posix/pull/594/checks?check_run_id=1290402004 in @simonlingoogle 's new PR.

simonlingoogle · 2020-10-22T08:04:33Z

The too many logs issue is due to a bug of otbr with latest openthread, see the fail https://github.com/openthread/ot-br-posix/pull/594/checks?check_run_id=1290402004 in @simonlingoogle 's new PR.

Yes. It's not an issue of otbr-rest. But we can still use this PR to enhance otbr-rest, e.g. configure test timeout.

tttttangTH · 2020-10-22T17:03:20Z

The too many logs issue is due to a bug of otbr with latest openthread, see the fail https://github.com/openthread/ot-br-posix/pull/594/checks?check_run_id=1290402004 in @simonlingoogle 's new PR.

Yes. It's not an issue of otbr-rest. But we can still use this PR to enhance otbr-rest, e.g. configure test timeout.

ok, will update this PR according to our discussion.

wgtdkp · 2020-10-23T01:25:43Z

The too many logs issue is due to a bug of otbr with latest openthread, see the fail https://github.com/openthread/ot-br-posix/pull/594/checks?check_run_id=1290402004 in @simonlingoogle 's new PR.

Yes. It's not an issue of otbr-rest. But we can still use this PR to enhance otbr-rest, e.g. configure test timeout.

ok, will update this PR according to our discussion.

@tttttangTH I think it would be better to close this PR and create a new PR with clearer purpose.

tttttangTH · 2020-10-23T10:05:11Z

The too many logs issue is due to a bug of otbr with latest openthread, see the fail https://github.com/openthread/ot-br-posix/pull/594/checks?check_run_id=1290402004 in @simonlingoogle 's new PR.

Yes. It's not an issue of otbr-rest. But we can still use this PR to enhance otbr-rest, e.g. configure test timeout.

ok, will update this PR according to our discussion.

@tttttangTH I think it would be better to close this PR and create a new PR with clearer purpose.

Will do!

tttttangTH · 2020-10-26T04:13:48Z

This PR will be closed and will be divided into several PRs with clear purpose in which they aim to:

Enhance tests for OTBR_REST
Modify error handler for according to openthread API response
(If possible)Deal with collaboration issues within different modules of ot-br-posix on otThreadSetReceiveDiagnosticGetCallback
Others( For example, does the change SetCallback need to be done in an individual PR or in [web] refactor OTBR-WEB #537 ? @wgtdkp )

wgtdkp · 2020-10-26T05:40:52Z

This PR will be closed and will be divided into several PRs with clear purpose in which they aim to:

Enhance tests for OTBR_REST

Modify error handler for according to openthread API response

(If possible)Deal with collaboration issues within different modules of ot-br-posix on otThreadSetReceiveDiagnosticGetCallback

Others( For example, does the change SetCallback need to be done in an individual PR or in [web] refactor OTBR-WEB #537 ? @wgtdkp )

@tttttangTH We need to make sure #537 always work and mergeable.

tttttangTH · 2020-10-28T14:18:03Z

This PR will be closed and will be divided into several PRs with clear purpose in which they aim to:

Enhance tests for OTBR_REST

Modify error handler for according to openthread API response

(If possible)Deal with collaboration issues within different modules of ot-br-posix on otThreadSetReceiveDiagnosticGetCallback

Others( For example, does the change SetCallback need to be done in an individual PR or in [web] refactor OTBR-WEB #537 ? @wgtdkp )

@tttttangTH We need to make sure #537 always work and mergeable.

👌！

… pr/593

google-cla bot added the cla: yes label Oct 19, 2020

tttttangTH force-pushed the rest_build branch from 1e2ef96 to a43127a Compare October 19, 2020 21:11

[web] fix rest-test issues

a43127a

simonlingoogle reviewed Oct 20, 2020

View reviewed changes

simonlingoogle mentioned this pull request Oct 20, 2020

[backbone-router] add Backbone multicast routing #532

Merged

5 tasks

wgtdkp approved these changes Oct 20, 2020

View reviewed changes

resolve comment

7a4769c

wgtdkp reviewed Oct 21, 2020

View reviewed changes

wgtdkp self-requested a review October 21, 2020 08:05

simonlingoogle added the P2 label Oct 21, 2020

Base automatically changed from master to main March 8, 2021 21:50

bukepo force-pushed the rest_build branch 4 times, most recently from c4f0ba5 to 598be52 Compare May 13, 2021 04:40

Merge branch 'main' of https://github.com/openthread/ot-br-posix into…

50baa34

… pr/593

bukepo force-pushed the rest_build branch from 598be52 to 50baa34 Compare May 13, 2021 04:50

Merge branch 'main' of https://github.com/openthread/ot-br-posix into…

0e7a66a

… pr/593

[web] fix rest-test issues #593

Are you sure you want to change the base?

[web] fix rest-test issues #593

Conversation

tttttangTH commented Oct 19, 2020

tttttangTH commented Oct 19, 2020

codecov bot commented Oct 19, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tttttangTH Oct 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wgtdkp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wgtdkp Oct 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonlingoogle Oct 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonlingoogle commented Oct 21, 2020

wgtdkp Oct 21, 2020 • edited Loading

Choose a reason for hiding this comment

tttttangTH Oct 21, 2020 • edited Loading

Choose a reason for hiding this comment

simonlingoogle commented Oct 21, 2020

tttttangTH commented Oct 21, 2020

wgtdkp commented Oct 22, 2020 • edited Loading

tttttangTH commented Oct 22, 2020 • edited Loading

wgtdkp commented Oct 22, 2020 • edited Loading

simonlingoogle commented Oct 22, 2020

tttttangTH commented Oct 22, 2020

wgtdkp commented Oct 23, 2020

tttttangTH commented Oct 23, 2020

tttttangTH commented Oct 26, 2020 • edited Loading

wgtdkp commented Oct 26, 2020

tttttangTH commented Oct 28, 2020

codecov bot commented Oct 19, 2020 •

edited

Loading

tttttangTH Oct 21, 2020 •

edited

Loading

wgtdkp Oct 21, 2020 •

edited

Loading

simonlingoogle Oct 22, 2020 •

edited

Loading

wgtdkp Oct 21, 2020 •

edited

Loading

tttttangTH Oct 21, 2020 •

edited

Loading

wgtdkp commented Oct 22, 2020 •

edited

Loading

tttttangTH commented Oct 22, 2020 •

edited

Loading

wgtdkp commented Oct 22, 2020 •

edited

Loading

tttttangTH commented Oct 26, 2020 •

edited

Loading