Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added latency injection functionality #2230

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

gareth-johnston
Copy link
Contributor

@gareth-johnston gareth-johnston commented Oct 13, 2024

Sometimes you may wish to configure machines with varying latencies between them. For example, you may need to simulate multiple DCs, or a cluster with a latency profile specific to a use case.

This commit adds the ability to easily configure latencies between different nodes and clients created in your testing environment, eliminating the need to do this manually per machine.

There are two methods to leverage this functionality. In both cases you must edit the generated inventory.yaml and add a group: groupId property to relevant hosts.

i.e.

loadgenerators:
  hosts:
    3.121.207.133:
      ansible_ssh_private_key_file: key
      ansible_user: ec2-user
      private_ip: 10.0.55.38
      group: group1

nodes:
  hosts:
    3.122.199.101:
      ansible_ssh_private_key_file: key
      ansible_user: ec2-user
      private_ip: 10.0.44.25
      group: group2

Option 1: For a simple configuration when you just want a steady latency between two or more nodes, use command

inventory inject_latencies --latency <l> --interface <ni>

add --rtt if you wish for the latency to be the total round trip time between groups.

Option 2: For a more nuanced set up with group -> group having different latencies, create a yaml file defining the relationships as follows

latency_profiles:
  relationships:
    - source: group1
      target: group2
      latency: 5
    - source: group2
      target: group1
      latency: 2
  default_latency: 0

In this example, we have two hosts, each belonging to a different group (group1 and group2). Latencies will only be applied between different groups.
If a host is not assigned a group, no direction of latency will be applied to that host.

Note that the latency is applied to the outbound traffic from a source group to the target group.

Add this file to the working directory of your project, then run

inventory inject_latencies --interface <ni> --profiles my_profile.yaml

This will apply the defined latencies between groups. Note that latency values are defaulted to ms so there is no need to specify the unit when defining the profile.yaml or running the command.

Adds documentation to readme for more information and usage.

print(stderr.read().decode())

# apply the filter command for each IP individually
for ip in target_ips:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be easier and cleaner with ansible (similar to inventory tune). why not use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't consider it, wasn't really aware of the option if I am honest : )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much of a functional difference, definitely cleaner but they both achieve the same result and at the moment I think in the current form this commit adds value by reducing setup time/ human error enough that it is worth it to go ahead and migrate it later. If you think it wouldn't take too much time to change it to use ansible I am more than happy to look into it during a cool down (along with your other comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using Ansible will make this more consistent with rest of simulator, more robust and easier to maintain. Ansible has some learning curve, but IMO it is worth learning.
with ansible you will have some challenges with translating custom yaml to node-specific configs/scripts, but Ansible has some extensive templating and programming capabilities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependency installation should be very easy with ansible (we already have playbooks that do it).
the configuration commands are a simple loop over some config: should be rather easy in ansible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Let's leave this here for now, I will look into ansible and translate it across, thank you foir the valuable input 👍

Once your hosts are grouped, you can inject latencies by running the following command:

```bash
inventory inject_latencies --latency 10 --interface eth0 --rtt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are 2 possible approaches to defining the latencies: globally (independent on test) and as part of test setup (in tests.yaml / test case definition). Both have pros and cons. I am curious why did you choose this approach?

Copy link
Contributor Author

@gareth-johnston gareth-johnston Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly because it was the simplest approach, in most cases if we are testing a use case with latencies applied I feel like they are going to be fairly static arrangements. If you want to test with different latencies you can just run another test suite with machines having different latencies applied.

That said, I thought about the benefit of having it per test case definition, it is a superior implementation if we can do both - but don't have the time to allocate to this as it isn't scheduled work. It is definitely a possible improvement though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplicity of implementation and limited time allocated are perfectly good explanation :)

I was wondering which way is more convenient in practice. Definition in test case makes it easily repeatable, but will we actually use it, or will that become something we only copy-paste over and over again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can extend it in the future if we such value? In practice I see myself using this per test class, not case. But fore example it might be more useful in the future with the automated performance tests we that work commences.

Copy link
Contributor

@JamesHazelcast JamesHazelcast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this @gareth-johnston, this is really useful to have built into Simulator. One minor question, and I agree with @k-jamroz regarding Ansible usage, but I'll leave him to check over that when it's ready so you can have my approval in advance ✅

In this example, we have two groups (group1 and group2) with different latencies between them.
Communication from group1 to group2 will have a 5ms latency, while communication from group2 to group1 will have a 2ms latency.

The default_latency serves as a fallback value in case no specific latency is defined between two groups.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is default_latency required, or can it be optionally omitted?

Once your hosts are grouped, you can inject latencies by running the following command:

```bash
inventory inject_latencies --latency 10 --interface eth0 --rtt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can I disable the latency injection when no longer needed without recreating the environment? inventory inject_latencies --latency 0 or something more complicated? Maybe it is worth adding dedicated command or option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can just do as you say. Same to overwrite existing form. But I agree, a command would be useful to just wipe the state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants