Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[web] fix rest-test issues #593

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

tttttangTH
Copy link
Contributor

This PR mainly fixes some issues on current tests of OTBR-REST, and makes some preparations for #532 .

  • set timeout for rest-test in case it takes too much time
  • some modifications on rest-test principles & HTTP response, so this test will work properly when more error code of underlying APIs are defined
  • bind ReceiveDiagnosticGetCallback to resource each time when diagnostics API is called(consistent with rest-test modifications and more logistic)

@google-cla google-cla bot added the cla: yes label Oct 19, 2020
@tttttangTH
Copy link
Contributor Author

@simonlingoogle @wgtdkp any comments?

@codecov
Copy link

codecov bot commented Oct 19, 2020

Codecov Report

Merging #593 (0e7a66a) into main (5a142e7) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #593      +/-   ##
==========================================
- Coverage   71.12%   71.11%   -0.01%     
==========================================
  Files          78       78              
  Lines        5240     5235       -5     
==========================================
- Hits         3727     3723       -4     
+ Misses       1513     1512       -1     
Impacted Files Coverage Δ
src/rest/resource.hpp 0.00% <ø> (ø)
src/rest/rest_web_server.cpp 86.74% <ø> (-0.16%) ⬇️
src/rest/resource.cpp 90.98% <100.00%> (+0.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a142e7...0e7a66a. Read the comment docs.

Comment on lines 41 to 49
try:
response = urllib.request.urlopen(urllib.request.Request(url))
body = response.read()
data = json.loads(body)
result[index] = data


except urllib.error.HTTPError as e:
print(e.code)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is raise exception not good?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, why catching the exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect to just record the error here then raise one exception in main thread(on top of all subprocess), so for each subprocess, its job is either to get data(success) or to print error log(fail).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then raise one exception in main thread

@tttttangTH I didn't follow how this is implemented.

Let me ask a question: Will the CI fail if a urllib.error.HTTPError exception is raised in this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, CI will fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, CI will fail.

@tttttangTH Could you explain how it works?
I don't see how it can make the CI fail, you are just printing a message...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an array result which is used to record the response of each thread. if one thread gets HTTP error, the value in result will still be None, we will check this in main thread.

@@ -518,31 +513,22 @@ void Resource::UpdateDiag(std::string aKey, std::vector<otNetworkDiagTlv> &aDiag

void Resource::Diagnostic(const Request &aRequest, Response &aResponse) const
{
otbrError error = OTBR_ERROR_NONE;
otThreadSetReceiveDiagnosticGetCallback(mInstance, &Resource::DiagnosticResponseHandler,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why setting callback for each Diagnostic is necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, If factoryreset is called, the callback may be bind to another function. We should ensure that each time diagnostic API in resource is called, the response message can be collected by our resource handler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After factoryreset, Resource::Init will run again and sets up the callback ? Is the original code causing real issues?

Copy link
Contributor Author

@tttttangTH tttttangTH Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource::Init will only be run once when we start otbr-agent, then factoryreset will set the callback to the default(seems a handler for CLI), I am not sure whether there is another approach that set otThreadSetReceiveDiagnosticGetCallback to another function. do you think we do this each time before diagnostics API is called is acceptable? or is there an approach that we could detect whether factoryreset is called?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource::Init will only be run once when we start otbr-agent, then factoryreset will set the callback to the default(seems a handler for CLI), I am not sure whether there is another approach that set otThreadSetReceiveDiagnosticGetCallback to another function. do you think we do this each time before diagnostics API is called is acceptable? or is there an approach that we could detect whether factoryreset is called?

I think factoryreset re-exec the whole process, so the process restarts as if it was just launched normally. We should not blame factoryreset for any issue. If multiple components of ot-br-posix is using otThreadSetReceiveDiagnosticGetCallback, maybe they should coordinate with each other.
Should we use EventEmitter so that ot-br-posix calls otThreadSetReceiveDiagnosticGetCallback just once and let other components subscribe to the corresponding event using EventEmitter::On? @gjc13 @tttttangTH

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will try to do like this.

Copy link
Member

@wgtdkp wgtdkp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

tests/rest/test_rest.py Outdated Show resolved Hide resolved
Comment on lines 41 to 49
try:
response = urllib.request.urlopen(urllib.request.Request(url))
body = response.read()
data = json.loads(body)
result[index] = data


except urllib.error.HTTPError as e:
print(e.code)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, why catching the exception?

Comment on lines -537 to +531
if (error == OTBR_ERROR_NONE)
{
aResponse.SetStartTime(steady_clock::now());
aResponse.SetCallback();
}
else
{
ErrorHandler(aResponse, HttpStatusCode::kStatusInternalServerError);
}
aResponse.SetStartTime(steady_clock::now());
aResponse.SetCallback();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, what happens if we fail in this function? Simply ignore the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that It may fail when node is detached or no buffer for message.

So I think one solution is to define more HTTP status code according to these newly added error code, I think I could do it after #532 is merged.

Another solution is like this, we ignore all errors(more simple, but still reasonable).

For one thing, it could address the no buffer problem - although we fail in this API call, if we have diagnostic information(received in 4s but due to another call) left in resource, this information is still valid and could be sent as response.
For another thing, we could just view empty response as "some issues happen, we can't get anything now".

Copy link
Member

@wgtdkp wgtdkp Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply ignore the error?

@tttttangTH Is this true for your changes? I didn't see why we cannot handle the possible failures in this PR. It looks to me that keeping

else
    {
        ErrorHandler(aResponse, HttpStatusCode::kStatusInternalServerError);
    }

should just work. Why did you remove this error handler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because if we call otThreadSendDiagnosticGet too many times within a specific period, we will get a no buffer error for this call( for example, when we send 10 requests for diagnostics concurrently , the no buffer error usually occurs).

But actually I prefer not viewing it as an error for HTTP response because the reason for calling otThreadSendDiagnosticGet each time when the server received a request for diagnostics is to update the information we maintained(we have a hash table to record all the diagnostic information we received within 4 seconds). If we have a 'no buffer' error, it means we have call the API many times recently so information in the hash table is almost the latest.

I know the right way to deal with this problem is to make an exception for no buffer rather than remove the error handler. I expect to do this after #532 because I think I may need to rewrite error handler of all RESTful API according to the modifications in #532 . So here I just ignore it currently.

Copy link
Member

@simonlingoogle simonlingoogle Oct 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is no buffer error not good for such situation? I think no buffer error was exactly what happened and should be reflected by the HTTP response.

Maybe we can return 507 Insufficient Storage for no buffer error.

But if the test is sending too much requests and is expecting no buffer error to happen sometimes, the test client can conclude with success even if there are some no buffer errors.

Thoughts? @tttttangTH @wgtdkp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tttttangTH You should not just remove the error handler. You can at least add a log for this error.

I know the right way to deal with this problem is to make an exception for no buffer rather than remove the error handler. I expect to do this after #532 because I think I may need to rewrite error handler of all RESTful API according to the modifications in #532 .

I don't see why we need to wait for #532. Are there any REST feature that depends on #532?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have't found any other feature depend on #532 , but it seems we won't catch no buffer at otThreadSetReceiveDiagnosticGetCallback before #532 is applied.

I noticed no buffer before when when I wrote this API, at that time, no buffer was just an Info, and I have chosen to ignore it at that time. But It‘s thoughtless for me to just remove the error handler here.

@simonlingoogle
Copy link
Member

This PR mainly fixes some issues on current tests of OTBR-REST, and makes some preparations for #532 .

  • set timeout for rest-test in case it takes too much time
  • some modifications on rest-test principles & HTTP response, so this test will work properly when more error code of underlying APIs are defined
  • bind ReceiveDiagnosticGetCallback to resource each time when diagnostics API is called(consistent with rest-test modifications and more logistic)

Thanks for the fix. I merged this PR to #532 and it did pass all the checks.
But I am not clear what exactly does this PR fix. Any detailed explanation? @tttttangTH

ErrorHandler(aResponse, HttpStatusCode::kStatusInternalServerError);
}
aResponse.SetStartTime(steady_clock::now());
aResponse.SetCallback();
Copy link
Member

@wgtdkp wgtdkp Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to address it in this PR: the function name SetCallback is bad... I would expect this function accepts a callback function as an argument but it is actually a flag indicates whether we have/enable callback. Can we rename to SetHasCallback?

Copy link
Contributor Author

@tttttangTH tttttangTH Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do it in #537

@wgtdkp wgtdkp self-requested a review October 21, 2020 08:05
@simonlingoogle
Copy link
Member

Labelled as P2 since #532 depends on this.

@tttttangTH
Copy link
Contributor Author

This PR mainly fixes some issues on current tests of OTBR-REST, and makes some preparations for #532 .

  • set timeout for rest-test in case it takes too much time
  • some modifications on rest-test principles & HTTP response, so this test will work properly when more error code of underlying APIs are defined
  • bind ReceiveDiagnosticGetCallback to resource each time when diagnostics API is called(consistent with rest-test modifications and more logistic)

Thanks for the fix. I merged this PR to #532 and it did pass all the checks.
But I am not clear what exactly does this PR fix. Any detailed explanation? @tttttangTH

After #532 is applied, several issues occur.

  • some Openthread API call fail for detached. So I add basic configuration for the node before rest-test starts.

  • one Openthread API still fails for no buffer. So I ignore this error currently (It does fail for an Openthread API call but I don't think it is a failure for our diagnostics API ), and will rewrite the error handler altogether after [backbone-router] add Backbone multicast routing #532 according to the new return value principle.

  • rest-test takes too much time, gives too much log. It's strange, seems it is result from your PR, when I start the node, then stop otbr-agent and restart it again, many logs keep appearing and never stop. This is why we have too much logs in rest-test. After factoryreset, it is back to normal. so I also add a factoryreset operation to the node before rest-test which leads to our discussion on callback.

@wgtdkp
Copy link
Member

wgtdkp commented Oct 22, 2020

  • one Openthread API still fails for no buffer. So I ignore this error currently (It does fail for an Openthread API call but I don't think it is a failure for our diagnostics API ), and will rewrite the error handler altogether after [backbone-router] add Backbone multicast routing #532 according to the new return value principle.

@tttttangTH I think we are wrong about the principle of testing. To my understanding, you are just removing the error check in REST tests to get #532 passed. But the purpose of tests is to find out bugs. We are not sure if #532 has bugs or not, so it could be #532 that causes existing testcases to fail. So I think you should not change the tests to make PRs happy. Otherwise, why bother have those tests?

Before figuring out why #532 fails the tests, we should hold this PR.

@tttttangTH
Copy link
Contributor Author

tttttangTH commented Oct 22, 2020

  • one Openthread API still fails for no buffer. So I ignore this error currently (It does fail for an Openthread API call but I don't think it is a failure for our diagnostics API ), and will rewrite the error handler altogether after [backbone-router] add Backbone multicast routing #532 according to the new return value principle.

@tttttangTH I think we are wrong about the principle of testing. To my understanding, you are just removing the error check in REST tests to get #532 passed. But the purpose of tests is to find out bugs. We are not sure if #532 has bugs or not, so it could be #532 that causes existing testcases to fail. So I think you should not change the tests to make PRs happy. Otherwise, why bother have those tests?

Before figuring out why #532 fails the tests, we should hold this PR.

Ok, from my side, if we put aside the no buffer issue, I think another issue is if there is no factoryreset after restarting otbr-agent, rest-check always fails and output too many logs. I am not sure whether no factoryreset is acceptable, Seems it's result from #532, I will investigate on this and see why.

@wgtdkp
Copy link
Member

wgtdkp commented Oct 22, 2020

  • one Openthread API still fails for no buffer. So I ignore this error currently (It does fail for an Openthread API call but I don't think it is a failure for our diagnostics API ), and will rewrite the error handler altogether after [backbone-router] add Backbone multicast routing #532 according to the new return value principle.

@tttttangTH I think we are wrong about the principle of testing. To my understanding, you are just removing the error check in REST tests to get #532 passed. But the purpose of tests is to find out bugs. We are not sure if #532 has bugs or not, so it could be #532 that causes existing testcases to fail. So I think you should not change the tests to make PRs happy. Otherwise, why bother have those tests?
Before figuring out why #532 fails the tests, we should hold this PR.

Ok, from my side, if we put aside the no buffer issue, I think another issue is if there is no factoryreset after restarting otbr-agent, rest-check always fails and output too many logs. I am not sure whether no factoryreset is acceptable, Seems it's result from #532, I will investigate on this and see why.

@tttttangTH OpenThread persists data across restarts. It is expected that OpenThread will continue its function after resatrting. factoryreset, by its name, is to reset the device to the factory mode and its an usual action in products. The too many logs issue is due to a bug of otbr with latest openthread, see the fail https://github.com/openthread/ot-br-posix/pull/594/checks?check_run_id=1290402004 in @simonlingoogle 's new PR.

@simonlingoogle
Copy link
Member

The too many logs issue is due to a bug of otbr with latest openthread, see the fail https://github.com/openthread/ot-br-posix/pull/594/checks?check_run_id=1290402004 in @simonlingoogle 's new PR.

Yes. It's not an issue of otbr-rest. But we can still use this PR to enhance otbr-rest, e.g. configure test timeout.

@tttttangTH
Copy link
Contributor Author

The too many logs issue is due to a bug of otbr with latest openthread, see the fail https://github.com/openthread/ot-br-posix/pull/594/checks?check_run_id=1290402004 in @simonlingoogle 's new PR.

Yes. It's not an issue of otbr-rest. But we can still use this PR to enhance otbr-rest, e.g. configure test timeout.

ok, will update this PR according to our discussion.

@wgtdkp
Copy link
Member

wgtdkp commented Oct 23, 2020

The too many logs issue is due to a bug of otbr with latest openthread, see the fail https://github.com/openthread/ot-br-posix/pull/594/checks?check_run_id=1290402004 in @simonlingoogle 's new PR.

Yes. It's not an issue of otbr-rest. But we can still use this PR to enhance otbr-rest, e.g. configure test timeout.

ok, will update this PR according to our discussion.

@tttttangTH I think it would be better to close this PR and create a new PR with clearer purpose.

@tttttangTH
Copy link
Contributor Author

The too many logs issue is due to a bug of otbr with latest openthread, see the fail https://github.com/openthread/ot-br-posix/pull/594/checks?check_run_id=1290402004 in @simonlingoogle 's new PR.

Yes. It's not an issue of otbr-rest. But we can still use this PR to enhance otbr-rest, e.g. configure test timeout.

ok, will update this PR according to our discussion.

@tttttangTH I think it would be better to close this PR and create a new PR with clearer purpose.

Will do!

@tttttangTH
Copy link
Contributor Author

tttttangTH commented Oct 26, 2020

This PR will be closed and will be divided into several PRs with clear purpose in which they aim to:

  • Enhance tests for OTBR_REST
  • Modify error handler for according to openthread API response
  • (If possible)Deal with collaboration issues within different modules of ot-br-posix on otThreadSetReceiveDiagnosticGetCallback
  • Others( For example, does the change SetCallback need to be done in an individual PR or in [web] refactor OTBR-WEB #537 ? @wgtdkp )

@wgtdkp
Copy link
Member

wgtdkp commented Oct 26, 2020

This PR will be closed and will be divided into several PRs with clear purpose in which they aim to:

  • Enhance tests for OTBR_REST
  • Modify error handler for according to openthread API response
  • (If possible)Deal with collaboration issues within different modules of ot-br-posix on otThreadSetReceiveDiagnosticGetCallback
  • Others( For example, does the change SetCallback need to be done in an individual PR or in [web] refactor OTBR-WEB #537 ? @wgtdkp )

@tttttangTH We need to make sure #537 always work and mergeable.

@tttttangTH
Copy link
Contributor Author

This PR will be closed and will be divided into several PRs with clear purpose in which they aim to:

  • Enhance tests for OTBR_REST
  • Modify error handler for according to openthread API response
  • (If possible)Deal with collaboration issues within different modules of ot-br-posix on otThreadSetReceiveDiagnosticGetCallback
  • Others( For example, does the change SetCallback need to be done in an individual PR or in [web] refactor OTBR-WEB #537 ? @wgtdkp )

@tttttangTH We need to make sure #537 always work and mergeable.

👌!

Base automatically changed from master to main March 8, 2021 21:50
@bukepo bukepo force-pushed the rest_build branch 4 times, most recently from c4f0ba5 to 598be52 Compare May 13, 2021 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants