Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control SD sync/async behaviour with env var on QNX #710

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kheaactua
Copy link
Contributor

@kheaactua kheaactua commented May 27, 2024

Description

This is a change intended for systems like QNX which do not generally have a service like netlink monitoring the network status that can easily be tied into. This is not setup to be used/enabled on Linux/Android.

This allows us to launch and use a routing manager with local UDS communication before remote networking is available which simplifies our startup graph and saves time during startup. This is in contrast to the current version where SD will fail to load if networking is not available at process start - leaving the routing manager in an error state.

I have been using this change for 1-2 years now and it has been very beneficial.

Usage

This behaviour can be enabled by exporting the following environment variables:

# Env var that if exists will cause SD setup to be performed asynchronously and
# wait on network availability.  This wait also impacts the routing_manager to
# also wait until a network interface is available before issuring OFFERs
export VSOMEIP_USE_ASYNCHRONOUS_SD

# Iff SD is running synchronously, the existence of this env var will cause the
# SD setup to still block on network availability. (mostly a testing scenario)
export VSOMEIP_WAIT_FOR_INTERFACE

# The current waiting mechanism is to block until a file (specified by this
# define)
export VSOMEIP_NETWORK_INT_READY_FILE=<file path of file created when network available>

This is implemented by modifying the signature of service_discovery::start() in order to accept a callback send from routing_manager_impl.

Notes:

  • mutexes
    • sd_impl::endpoint_ is now mutex protected
    • rm_impl::pending_sd_offers_mutex_ is a recursive mutex, as now it can be called in its own thread and the new thread in SD
  • There is no timeout on the waitfor. The original implementation had a configurable timeout, however because timing out left us in an error state anyways, this timeout was removed (raised to numeric_limits<int>::max() = ~45 days, give or take.)

@kheaactua kheaactua force-pushed the async_sd branch 2 times, most recently from 7522cf0 to 9855bca Compare May 28, 2024 15:45
@kheaactua kheaactua marked this pull request as draft May 28, 2024 15:46
This is a change intended for systems like QNX which do not generally
have a service like netlink monitoring the network status that can
easily be tied into.  This is not setup to be used/enabled on Linux/Android.

SD will use the new asynchronous behaviour if the env var
VSOMEIP_USE_ASYNCHRONOUS_SD is set.  If not set, SD will use a
synchronous behaviour and only wait for the interface if the env var
VSOMEIP_WAIT_FOR_INTERFACE exists.  The latter (without waiting) is very
close to the upstream behaviour, the difference being that a callback is
still sent to sd::start() from routing_manager_impl, rather than
executing that code directly in routing_manager_impl.

Notes:
- mutexes
  - sd_impl::endpoint_ is now mutex protected
  - rm_impl::pending_sd_offers_mutex_ is a recursive mutex, as now it
    can be called in its own thread and the new thread in SD
- There is no timeout on the waitfor.  The original implementation had a
  configurable timeout, however because timing out left us in an error
  state anyways, this timeout was removed (raised to
  numeric_limits<int>::max() = ~45 days, give or take.)
@@ -71,7 +82,7 @@ service_discovery_impl::service_discovery_impl(
find_debounce_time_(VSOMEIP_SD_DEFAULT_FIND_DEBOUNCE_TIME),
find_debounce_timer_(_host->get_io()),
main_phase_timer_(_host->get_io()),
is_suspended_(false),
is_suspended_(true), // Start suspended: this is different than upstream as we start before a network interface is available
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduced a bug where SD tries to start multicast a second time, leading to an error in the log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant