Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve robustness #182

Merged
merged 9 commits into from
Nov 28, 2023
Merged

improve robustness #182

merged 9 commits into from
Nov 28, 2023

Conversation

qzhuyan
Copy link
Contributor

@qzhuyan qzhuyan commented Nov 20, 2023

No description provided.

@qzhuyan
Copy link
Contributor Author

qzhuyan commented Nov 20, 2023

overall, it looks like this

v5.3.1-alpha.1-g42fa1289([email protected])34> esockd:listeners().
[{{'tcp:default',{{0,0,0,0},1883}},<0.2626.0>},
 {{'ssl:default',{{0,0,0,0},8883}},<0.2606.0>}]
v5.3.1-alpha.1-g42fa1289([email protected])35> supervisor:which_children( <0.2626.0>).
[{listener,<0.2629.0>,worker,[esockd_listener]},
 {acceptor_sup,<0.2628.0>,supervisor,[esockd_acceptor_sup]},
 {connection_sup,<0.2627.0>,supervisor,
                 [esockd_connection_sup]}]

@qzhuyan qzhuyan force-pushed the dev/william/improve-robustness branch from 4b4127f to 43af862 Compare November 20, 2023 20:20
@@ -210,8 +211,9 @@ handle_event(Type, Content, StateName, _) ->
),
keep_state_and_data.

terminate(_Reason, _StateName, #state{lsock = LSock}) ->
close(LSock).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it because I don't want to spread errors to the entire sup tree.

@qzhuyan qzhuyan force-pushed the dev/william/improve-robustness branch 2 times, most recently from 7750b3e to 9b0f0fc Compare November 21, 2023 10:40
@zmstone
Copy link
Member

zmstone commented Nov 21, 2023

econnaborted - Software caused connection abort

econnaborted is maybe an error happens while socket is being closed.

  • If it's an transient error, it could be retried, e.g. Fix issue#27 - {accept_error,econnaborted} crasher #28 thought it's transient.
  • If it's an permeant error, maybe the best strategy is to:
    • close the socket, and stop with normal reason
    • socket close will trigger listener process to restart, which will in turn restart acceptors due to rest_for_one

If we cannot bet on retry, then close socket + stop normal is perhaps a better choice.

@qzhuyan qzhuyan force-pushed the dev/william/improve-robustness branch from 3380bb9 to cbf63b5 Compare November 22, 2023 13:25
{error, closed} ->
{stop, normal, State};
{error, Reason} ->
error_logger:error_msg("~p async_accept error: ~p", [?MODULE, Reason]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close the socket before stop ?
so the listener process will for sure get an EXIT signal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just record after discussion:
We don't need to close the listen port here to avoid spreading the error to the other acceptors (where other acceptors will get closed error) since we don't know if the error is from the Listen port or some other unknown posix errors.
AND we have a listener process that handles the EXIT signal from the listen port.
AND we trust OTP will trigger port signals if listen port is unusable, unacceptable.
AND we checked that inet_tcp:accept does not close the port when it gets an error, thus we follow it.

@qzhuyan qzhuyan merged commit 5cb22a8 into master Nov 28, 2023
8 checks passed
@zmstone zmstone deleted the dev/william/improve-robustness branch December 6, 2023 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants