Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 64-bit bandwidths #395

Merged
merged 14 commits into from
Oct 1, 2024
Merged

Conversation

mihaibrodschi
Copy link
Contributor

@mihaibrodschi mihaibrodschi commented Aug 21, 2024

Based on #378 by gkatsikas.

Updates TokenBucket, GapRate and bw argument parsing.

@mihaibrodschi mihaibrodschi marked this pull request as draft August 21, 2024 16:05
@tbarbette
Copy link
Owner

Thanks! Can you take a look at the failing tests? I can help you if you don't get them.

@mihaibrodschi
Copy link
Contributor Author

Looks like I used a symbol defined by DPDK, so non-DPDK builds are failing.

@tbarbette
Copy link
Owner

Can you add an ifundef then? Probably in glue.hh if used multiple times, or just where it is used if just once.

@tbarbette
Copy link
Owner

That one looks functional. You can use make check locally to debug. I saw some MS value changed to US, maybe that's the problem. You should try to keep configuration compatible to previous behavior.

@mihaibrodschi
Copy link
Contributor Author

Some BigInt tests are broken. They were written to compare 64-bit multiplication and division with the same operation using 2-limb BigInts, where each limb was 32-bit. Now (with 64-bit limbs) to test the same way we would need to compare with 128-bit operations, which aren't necessarily supported on all platforms.

@tbarbette
Copy link
Owner

Can you test with shift and masks? Or propose any other resolution that is acceptable :)

@tbarbette
Copy link
Owner

Do you actually need bigint? We might keep this problem/discussion separate.

@mihaibrodschi
Copy link
Contributor Author

mihaibrodschi commented Aug 22, 2024

BigInt is used by TokenBucket in a few places. Not sure if replacing it is the best solution. The problem is the test code.
Actually, all of the BigintTest code is more or less broken in #378. The easiest solution would be to revert this file to only check 32-bit BigInts, even though TokenBucket uses 64-bit ones.

@tbarbette
Copy link
Owner

Yes, tests being broken is why I never merged. Can you try to reverse the testing to 32bits, and add maybe a few minimal 64bit ones? At least invoke it to verify it compiles?

include/click/glue.hh Outdated Show resolved Hide resolved
@mihaibrodschi
Copy link
Contributor Author

The RatedUnqueue test fails because it expects an inaccurate result (909|910), but receives the accurate one (1000). Is it OK to update the test file?

@tbarbette
Copy link
Owner

Weird.. I'll take a look.

@mihaibrodschi mihaibrodschi marked this pull request as ready for review August 30, 2024 16:32
@tbarbette
Copy link
Owner

Seems good now, thanks ! Sorry for the delay.

@tbarbette tbarbette merged commit 0727933 into tbarbette:main Oct 1, 2024
12 checks passed
@mihaibrodschi
Copy link
Contributor Author

mihaibrodschi commented Oct 2, 2024

The latest PR merged (#398) undid all of the changes made in this one.

tbarbette added a commit that referenced this pull request Oct 2, 2024
The last merge removed some changes. Sorry about that, two huge murges
at the same time with different bases...
@mihaibrodschi mihaibrodschi deleted the u64-rates branch October 2, 2024 12:58
@tbarbette
Copy link
Owner

@mihaibrodschi there are two failures in the advanced test (see https://forge.uclouvain.be/ensg/fastclick/-/pipelines/53850). Not sure if you have access, but basically, I am building with
./configure CXXFLAGS="-std=gnu++11" --disable-batch --enable-simtime && make clean && make -j 16 && make check

and then I get:

./test/standard/BandwidthRatedSplitter-01.clicktest:41: standard error has unexpected value starting at line 3
./test/standard/BandwidthRatedSplitter-01.clicktest:41: stderr:3: expected '20'
./test/standard/BandwidthRatedSplitter-01.clicktest:41: stderr:3: but got  '21'
./test/standard/BandwidthRatedUnqueue-01.clicktest:39: standard error has unexpected value starting at line 5
./test/standard/BandwidthRatedUnqueue-01.clicktest:39: stderr:5: expected '71'
./test/standard/BandwidthRatedUnqueue-01.clicktest:39: stderr:5: but got  '70'

I would say it's because you added a better precision but it does not happen in all cases. Any idea?

@mihaibrodschi
Copy link
Contributor Author

mihaibrodschi commented Oct 24, 2024

@tbarbette I think the results are correct.
For the first test, the setup is:

BandwidthRatedSpliter-01.clicktest:
InfiniteSource(LENGTH 100)
	-> Queue(10)
	-> u1 :: BandwidthRatedUnqueue(RATE 2000Bps)
	-> s1 :: BandwidthRatedSplitter(RATE 200Bps)
	-> c1 :: Counter
	-> Discard;
Script(wait 10, read c1.count, read c2.count, read c3.count, read c4.count, write stop);
Expected 20, got 21.

The BandwidthRatedSplitter uses a token bucket, which has an initial amount of tokens equal to tb_bandwidth_thresh = 131072 (see elements/standard/ratedunqueue.cc:77).
To check if a packet can go through, the RatedSplitter compares the token bucket's contents to the tb_bandwidth_thresh. Since there are enough tokens in the bucket at the beginning, one packet goes through immediately.
Over the next 10 seconds, another 20 packets go through at the specified rate, so the total is 21.

For the second test, the setup is:

BandwidthRatedUnqueue-01.clicktest:
InfiniteSource(LENGTH 100)
	-> Queue(10)
	-> u2 :: BandwidthRatedUnqueue(RATE 200Bps, BURST_BYTES 5000)
	-> c2 :: Counter
	-> Discard;
Script(wait 10, read c1.count, read c2.count, read c3.count, read c4.count, write stop);
Expected 71, got 70.

In this case, the BWRatedUnqueue has a token bucket with an initial token count of 5000 + tb_bandwidth_thresh.
Thus, 51 packets are immediately allowed through. After this, the BWRatedUnqueue task only wakes up after enough time has passed to replenish its tokens back to tb_bandwidth_thresh (see elements/standard/bwratedunqueue.cc:88).
In this case, the problem seems to be a slight inaccuracy in the timing. In 10 seconds, the task should wake up 20 times, but it only wakes up 19 times. If I extend the time by 2 microseconds:
Script(wait 10000002us, read c1.count, read c2.count, read c3.count, read c4.count, write stop);
the test passes with the expected result of 71.

@tbarbette
Copy link
Owner

tbarbette commented Oct 28, 2024

So there was another problem. First with batching enabled, the default burst is still to read 32 packets. So reading 32 packets leads to very large imprecision in terms of token overcommitment.

With a BURST of 1, all tests are now failing for this 20/21 reason.

The timing before your patch if this:

1000000000.000000008:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000000.530000076:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000001.030000138:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000001.530000200:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000002.030000262:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000002.530000324:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000003.030000386:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000003.530000448:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000004.030000510:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000004.530000572:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000005.030000634:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000005.530000696:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000006.030000758:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000006.530000820:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000007.030000882:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000007.530000944:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000008.030001006:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000008.530001068:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000009.030001130:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000009.530001192:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
c1.count:
20

After your patch it is :

1000000000.000000008:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000000.480001070:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000000.980001132:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000001.480001194:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000001.980001256:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000002.480001318:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000002.980001380:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000003.480001442:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000003.980001504:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000004.480001566:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000004.980001628:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000005.480001690:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000005.980001752:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000006.480001814:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000006.980001876:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000007.480001938:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000007.980002:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000008.480001062:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000008.980001124:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000009.480001186:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
1000000009.980001248:  100 | 52616e64 6f6d2062 756c6c73 68697420 696e2061 20706163
c1.count:
21

We see the bucket is refilled a bit sooner, so 3 packets go up in the first second.

This is only happening with --simtime. With real time it does not happen.

@mihaibrodschi
Copy link
Contributor Author

mihaibrodschi commented Oct 28, 2024

So there was another problem. First with batching enabled, the default burst is still to read 32 packets. So reading 32 packets leads to very large imprecision in terms of token overcommitment.

This is because of this code in bwratedunqueue.cc, correct?

    if (_tb.contains(tb_bandwidth_thresh)) {
#if HAVE_BATCH
        if (in_batch_mode) {
            PacketBatch* batch = input(0).pull_batch(_burst);

It does not check if there are enough tokens for the entire batch, only for one packet.
This seems to be by design, but I'm not sure how my patch would affect it.

As for the slight timing difference, is it an incorrect result? If so, I'll see what I can do to fix it.

@mihaibrodschi
Copy link
Contributor Author

Perhaps a cleaner solution for batching in BWRatedUnqueue would be to store the packets which can't be pushed due to insufficient tokens in a small queue (of size _burst).

@tbarbette
Copy link
Owner

Yes indeed. It's not really a bug, it's a different "default" behavior (which is not appreciable).
I thought of the "internal queue" solution but maybe that should be another element on top of BWUnqueue because checking the queue might impact a bit the performance.
I pushed the update on the PRfixbw branch. The behavior is now constant at least.
Now the right behavior depends on what to expect for the default amount of tokens at t=0...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants