Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Centralized management of edge devices and services #121

Open
1 of 4 tasks
tiokim opened this issue Aug 20, 2020 · 17 comments
Open
1 of 4 tasks

[WIP] Centralized management of edge devices and services #121

tiokim opened this issue Aug 20, 2020 · 17 comments
Assignees
Labels
enhancement New feature or request

Comments

@tiokim
Copy link
Contributor

tiokim commented Aug 20, 2020

Is your feature request related to a problem? Please describe.
Current method of selecting an edge device for service execution is decentralized and complex. The edge orchestration as edge node collects the final score result from other edge devices and offloads services to the high-score device.

Describe the solution you'd like
Only one edge orchestrator controls every other edge nodes and services.
The following features are required.

  • Collect resource information from other devices
  • Metrics for selecting primary/secondary
  • Forward service requests from requester to edge orchestrator
  • Notify the service offloading result to requester
@MoonkiHong
Copy link
Contributor

@suresh-lc @Karthikeyan-Samsung PTAL. Current scoring mechanism is a bit inefficient to me, because the regarding edge candidates calculate their capacity as a score by themselves and just send that results to the master. On the other hand, the master can calculate those scores instead of each edge candidate based on the capacity factors transmitted by them. This seems more fit to the master. What do you think?

@suresh-lc
Copy link
Contributor

suresh-lc commented Aug 21, 2020

Our current approach is not primary-secondary kind. The device details (memory/cpu/bandwidth) are exchanged with other devices during the initial discovery. Hence when a device 'A' wants to offload a service, the scoring is calculated in the device 'A' itself of all other devices(which have been stored) and offloading happens. So just to clarify the scoring is not got from other devices when request for service offloading is made to edge orchestration by any application. Hence making this scoring mor dynamic(real time) based on current memeory/CPU/bandwidth would be good point of improvement.

As pointed by @t25kim : wants a centralized approach(Master).
Following are some of the consideration in such case:

  1. Selecting a primary node
  2. Managing a scenario when primary node goes down, say a secondary master kind of scenario if we would like to.
  3. Any request from Device A to B , needs to initially go through primary to identify the Device B.

@MoonkiHong
Copy link
Contributor

MoonkiHong commented Aug 24, 2020

@suresh-lc The point is not the primary/secondary topology, The current scoring mechanism is as follows (at least from the source code level: a.k.a. scoringmgr:

  1. Each candidate calculates its score and does all the required processing by themselves.
  2. Actually, the offloading is required when the edge-orchestrator decides to do so, so the score calculation should be done in that orchestrator instead of the candidate itself.
  3. Suggestion here starts looking that scenario.

If you see the true disaggregated topology, the scenario like "primary nodes goes down" should be touched by another delegate, which only takes care of status DB and something in the Home Edge scenario. I am not sure if the current scoring mechanism is the best option.

@MoonkiHong
Copy link
Contributor

MoonkiHong commented Aug 24, 2020

@tiokim
Copy link
Contributor Author

tiokim commented Aug 24, 2020

I like the idea of changing the term master/slave to primary/secondary.
Additionally, I'd like to suggest the term blocklist/allowlist instead of blacklist/whitelist.

@suresh-lc
Copy link
Contributor

BTW, let us call master-slave by alternative terminologies. What about Primary/Secondary?

Reference:
https://www.zdnet.com/article/linux-team-approves-new-terminology-bans-terms-like-blacklist-and-slave/
https://www.theserverside.com/opinion/Master-slave-terminology-alternatives-you-can-use-right-now
https://tools.ietf.org/id/draft-knodel-terminology-00.html#rfc.section.1.1.1

In one of the TSC , one of the member suggested to use Superior/Inferior term instead of Master/Slave. I kept it as master/slave for our understanding.

@suresh-lc
Copy link
Contributor

I like the idea of changing the term master/slave to primary/secondary.
Additionally, I'd like to suggest the term blocklist/allowlist instead of blacklist/whitelist.

For Blacklist: in the TSC call it was suggested to use the term "approved" like in "Approved containers " and so.

@MoonkiHong
Copy link
Contributor

BTW, let us call master-slave by alternative terminologies. What about Primary/Secondary?
Reference:
https://www.zdnet.com/article/linux-team-approves-new-terminology-bans-terms-like-blacklist-and-slave/
https://www.theserverside.com/opinion/Master-slave-terminology-alternatives-you-can-use-right-now
https://tools.ietf.org/id/draft-knodel-terminology-00.html#rfc.section.1.1.1

In one of the TSC , one of the member suggested to use Superior/Inferior term instead of Master/Slave. I kept it as master/slave for our understanding.

I like the idea of changing the term master/slave to primary/secondary.
Additionally, I'd like to suggest the term blocklist/allowlist instead of blacklist/whitelist.

For Blacklist: in the TSC call it was suggested to use the term "approved" like in "Approved containers " and so.

@suresh-lc It must be the TSC from LF Edge. Any guideline from that?

@suresh-lc
Copy link
Contributor

@suresh-lc The point is not the master/slave topology, The current scoring mechanism is as follows (at least from the source code level: a.k.a. scoringmgr:

  1. Each candidate calculates its score and does all the required processing by themselves.
  2. Actually, the offloading is required when the edge-orchestrator decides to do so, so the score calculation should be done in that orchestrator instead of the candidate itself.
  3. Suggestion here starts looking that scenario.

If you see the true disaggregated topology, the scenario like "master nodes goes down" should be touched by another delegate, which only takes care of status DB and something in the Home Edge scenario. I am not sure if the current scoring mechanism is the best option.

Device A, B and C are three devices connected in same network. All the device/ service details are exchanged between all the devices along with their compute/memory capabilities.

Device A : App X requests to offload a service to Edge-Orchestrator in same device.
Device A : Edge-Orchestrator in device A, finds both B and C have the service.
Device A : A requests B and C for score
Device A : Based on the score , A offloads the service to device C, which had better score.
Device C : Performs the service requested.

The reason for doing the scoring this way makes the scoring more real time. Instead if we do scoring at device which requested, then real time values cant be taken into. During discovery, memory could be 1Gb, whereas when calculating scoring it could be 512, since other process are running. So this type of scoring calculation makes it near real time. The Scoring calculation at candidates is anyways calculated by the Edge Orchestration in those devices.

We should think of enhancing the scoring mechanism by making it based on more real time parameters and also considering more number of parameters. Even time based analysis can be done, like say if a service is offloaded, but prior we know the device would become busy because of user pattern, such things can be incorporated.

@tiokim
Copy link
Contributor Author

tiokim commented Aug 24, 2020

@suresh-lc I disagree with your idea that the current scenario is more real-time because the resources are not different at the time of receiving the request at each edge.
Assume that the time the request arrives to B or C is T. In the current scenario each device calculates and sends the score based on the resource at time T. In the proposed scenario each device sends the same resource information at time T and primary(master) edge calculates scores based on the information.

Your concern is important but the current problem I think is that Device A doesn't know what advantages B and C have.
If C is significantly out of memory, then A should choose B even if C scores better than B.

@suresh-lc
Copy link
Contributor

suresh-lc commented Aug 24, 2020

@t25kim : Whatever you mention like sharing all the device information instead of score makes all the device resource details shared to other device.

Assume that the time the request arrives to B or C is T. In the current scenario each device calculates and sends the score based on the resource at time T. In the proposed scenario each device sends the same resource information at time T and primary(master) edge calculates scores based on the information.

Even in this approach, the resource at T could be different. But the resource at T+1, when the score calculation happens at Edge could be different. So this will also not lead to actual score value. After the device sends the resource details to Edge, the resource values can change (for B and C), this is what i mean to say.

If C is significantly out of memory, then A should choose B even if C scores better than B.

In this case, naturally the calculated score would be less as score takes memory also into consideration. Hence naturally A chooses B instead of C.

Hence there is no single fool proof approach. But to strengthen the system we can think of other methods of score calculation. We can think of time kind of empirical way instead of Apriori way

@MoonkiHong
Copy link
Contributor

@suresh-lc Um.. It is interesting, because as you pointed out "empirical way" rather than "apriori way", it has a context that the edge orchestrator should make a final decision of the best primary to offload the given services based on the collected data (with a historical patterns and so on, and in the end employing the AI/ML). So to me, the current mechanism, which the neighbor devices complete its own calculation is a bit overhead.

@MoonkiHong
Copy link
Contributor

@suresh-lc @t25kim I think that @t25kim has some basic idea to present his proposal. In order to end the iterating discussions for this matter, what about creating a separate branch to check out the feasibility for this proposal? Any opinion?

@suresh-lc
Copy link
Contributor

@suresh-lc @t25kim I think that @t25kim has some basic idea to present his proposal. In order to end the iterating discussions for this matter, what about creating a separate branch to check out the feasibility for this proposal? Any opinion?

Ya its good point to do a feasibility. Implementation wise, should be able to do. But for comparing the existing and new approach, we need to calculate metrics to see which method fair well.

  1. Device B and C send score to A and then any one device resource goes down.
  2. Device B and C send resource details to A and one device resource goes down.

@tiokim
Copy link
Contributor Author

tiokim commented Aug 25, 2020

@suresh-lc @MoonkiHong That is an important issue and I couldn't think about how to solve the scenario at the moment. Is there any code on home edge solving the problem when B or C sending high score to A goes down abruptly?

@tiokim tiokim changed the title Centralized management of edge devices and services [WIP] Centralized management of edge devices and services Aug 26, 2020
@MoonkiHong
Copy link
Contributor

We might need to find out any ground implementation / algorithm to reconfigure the primary / secondary device in the given local network.

@MoonkiHong
Copy link
Contributor

MoonkiHong commented Sep 17, 2020

Embracing the consideration with the existing report by #26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants