Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce finite test runs, repeatability, reporting, and fine-tuned controls #4

Conversation

adesjardins-stripe
Copy link
Contributor

Preamble

This PR introduces some substantial changes McTester's core logic. The intention is to make test configurations more intuitive, repeatable, and informative. Some of logic may have been changed without fully appreciating its original intent, so please let me know if I have undermined an important feature and I'll work to make these changes more harmonious.

Overview

The most significant change surrounds request cadence. Instead of using a sleep-burst cycle, I made use of a rate limiter to govern request timings precisely. This rate is unbounded by default but can be configured using the rps option. When specified, each thread will send requests at the given rate. For instance, setting -conncount=8 and -rps=1000 will result in a total rate of 8000 requests per second.

I will briefly summarize each of the other changes.

Finite Test Runs

The -duration option will tell the application to run for a precise period of time before terminating by itself.

Value Comparisons

Values are still "random" but are now deterministic, seeded by their key before being generated. This allows for the value returned from a Get to be validated against what should have been Set. If any invalid data is returned, implying a key collision, this is logged.

Run Reports

For finite runs, a JSON report is emitted to STDOUT detailing the run configuration used, timing metrics, and some basic stats such as "Get Hits".

Repeatability

The RNG seed is now a configurable using the -rng option. By default, this is set using the same method as before. When set, a test run will send the exact same sequence of requests with the same keys and values in each thread as before (synchronization between threads cannot be controlled, of course). The seed value is included in the config portion of the run report.

Cache Warming

When the -warm option is provided, the given percent of keys will be set prior to beginning the core test loop.

Get-Set-Delete Ratio

Before, every failed 'Get' would result in a subsequent 'Set', and a 'Delete' percentage could be configured separately. Now, statistically, all requests will adhere to the specified Get-Set-Delete proportions.

Misc

I incorporated the changes made by @feihu-stripe (here), which include key prefix stripping, unix socket support, custom IP targets, and pipelined Gets (for now, just repeating the same query n times). I also reformatted some things, like reordering argument lists lexicographically as they were starting to get a bit long and removing some comments that no longer seemed applicable.

Limitations

These changes were, for the most part, only applied to the "basic" test protocol and were not ported to the "server" variant. Ideally, I think we would abstract the core logic so that both protocols would remain in sync without needing to copy anything.

I added two third party dependencies, but both are OSS with MIT licenses.

Let me know what you think. We can iterate on this work to settle on something that appeals to both of us. There is always room for further development, of course!

@dormando
Copy link
Member

dormando commented Jun 3, 2022

Thanks! I'll have to take some time with this, I think most of the changes are easy acceptances.

For future reference, it's much easier to review sets of commits that build into something rather than blobs like this; probably 80% of the changes I could accept without argument, then we'd iterate on the rest. Instead I have to sort of mentally map what's what while looking through the bigger change.

I haven't looked at it closely yet, but offhand; an earlier version of this tested the value based on the key (or maybe the server binary still does that? I'm forgetting) like you have here, but the performance cratered. While this doesn't need to reach mc-crusher levels of performance I do need it to run in the low millions of rps, or high hundreds of thousands. Were you measuring peak performance of the tester while working on this?

edit: by "tested the value based on the key" - I mean used the key as the random seed then re-generated the value to test against.

@@ -17,6 +17,7 @@ import (
"io"
"net"
"strconv"
"strings"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe all changes in protocol.go were pulled in from the PR made by @feihu-stripe.

@@ -135,7 +135,7 @@ func main() {
log.Fatal(err)
}

resp, err := http.Post(*delAddr + "/delete", "Content-Type: application/json", bytes.NewReader(data))
resp, err := http.Post(*delAddr+"/delete", "Content-Type: application/json", bytes.NewReader(data))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all formatting changes enforced by the IDE. They seem reasonable but I can drop them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wish it were a separate commit; a "gofmt" cleanup that I could just merge straight away and make the rest easier to page through. I know you folks are busy but normally I would've asked for at minimum formatting changes to be split out on their own.

"math/rand"
"time"

"github.com/dgryski/go-pcgr"
mct "github.com/memcached/mctester"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are related to Fei's PR as well. I did not make further changes to the server application.

Comment on lines 29 to 33
clientFlags := flag.Uint("clientflags", 0, "(32bit unsigned) client flag bits to set on miss")
connCount := flag.Int("conncount", 1, "number of client connections to establish")
reqPerSleep := flag.Int("reqpersleep", 1, "number of requests to issue when client wakes up")
reqBundlePerConn := flag.Int("reqbundles", 1, "number of times to wake up and send requests before disconnecting (-1 for unlimited)")
sleepPerBundle := flag.Duration("sleepperbundle", time.Millisecond*1, "time to sleep between request bundles (accepts Ns, Nms, etc)")
deletePercent := flag.Int("deletepercent", 0, "percentage of queries to issue as deletes instead of gets (0-1000)")
duration := flag.Duration("duration", 0, "length of time that the test will run (0 for unlimited)")
keyLength := flag.Int("keylength", 10, "number of random characters to append to key")
keyPrefix := flag.String("keyprefix", "mctester:", "prefix to append to all generated keys")
keySpace := flag.Int("keyspace", 1000, "number of unique keys to generate")
keyLength := flag.Int("keylength", 10, "number of random characters to append to key")
pipelines := flag.Uint("pipelines", 1, "(32bit unsigned) number of GET requests to stack within the same syscall")
delRatio := flag.Int("ratiodel", 0, "proportion of requests that should be sent as `deletes`")
getRatio := flag.Int("ratioget", 90, "proportion of requests that should be sent as `gets`")
setRatio := flag.Int("ratioset", 10, "proportion of requests that should be sent as `sets`")
rngSeed := flag.Int64("rngseed", time.Now().UnixNano(), "seed value used when initializing RNG")
rps := flag.Int("rps", 0, "target number of requests per second (0 for unlimited)")
server := flag.String("server", "127.0.0.1:11211", "`ip:port` for Memcached instance under test")
socket := flag.String("socket", "", "domain socket used for connections")
stripKeyPrefix := flag.Bool("stripkeyprefix", false, "remove key prefix before comparing with response")
keyTTL := flag.Uint("ttl", 180, "TTL to set with new items")
valueSize := flag.Uint("valuesize", 1000, "size of value (in bytes) to store on miss")
warmPercent := flag.Int("warm", 90, "percent of keys to `set` in Memcached before testing begins")
useZipf := flag.Bool("zipf", false, "use Zipf instead of uniform randomness (slow)")
zipfS := flag.Float64("zipfS", 1.01, "zipf S value (general pull toward zero) must be > 1.0")
zipfV := flag.Float64("zipfV", float64(*keySpace/2), "zipf V value (pull below this number")
valueSize := flag.Uint("valuesize", 1000, "size of value (in bytes) to store on miss")
clientFlags := flag.Uint("clientflags", 0, "(32bit unsigned) client flag bits to set on miss")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added flags:

  • duration
  • pipelines
  • del-get-set ratios
  • rngSeed
  • rps
  • server
  • socket
  • stripKeyPrefix
  • warmPercent

Removed flags:

  • reqPerSleep
  • reqBundlePerConn
  • sleepPerBundle
  • deletePercent

Also, sorted all flags lexicographically based on the flag name (not the variable assigned). This should match what a user sees when running ./basic -h.

Comment on lines 90 to 112
// Basic persistent load test, using text protocol:
// - list of servers to connect to, pct of each.
// - zipf or uniform random
// - requests per connect (-1 for unlim)
// - gets per etc
// - multiget or not
// - set or add to replace
// - delete frequency
// - set size range
// - variances: peak/antipeak load
// - variances: how often to change item sizes
type BasicLoader struct {
servers []string
stopAfter time.Time
desiredConnCount int
requestsPerSleep int
requestBundlesPerConn int
sleepPerBundle time.Duration
setValueSizes []int
deletePercent int
keyLength int
keyPrefix string
keySpace int
keyTTL uint
useZipf bool
zipfS float64 // (> 1, generally 1.01-2) pulls the power curve toward 0)
zipfV float64 // v (< keySpace) puts the main part of the curve before this number
valueSize uint
clientFlags uint
type Config struct {
ClientFlags uint
ConnCount int
DelRatio int
Duration time.Duration
GetRatio int
KeyLength int
KeyPrefix string
KeySpace int
KeyTTL uint
Pipelines uint
RngSeed int64
RPS int
Servers []string
SetRatio int
Socket string
StripKeyPrefix bool
UseZipf bool
ValueSize uint
WarmPercent int
ZipfS float64 // (> 1, generally 1.01-2) pulls the power curve toward 0)
ZipfV float64 // v (< keySpace) puts the main part of the curve before this number
tachymeter *tachymeter.Tachymeter
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the comment as it no longer reflected the code accurately and the features are documented in the flags.

Renamed BasicLoader to Config.

Fields are sorted and exported so that they appear in the JSON report.

iterStart := time.Now()
if iterStart.Sub(start) > conf.Duration {
break
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single for loop now which terminates after the specified duration has elapsed.

if conf.UseZipf {
subRS.Seed(int64(zipRS.Uint64()))
} else {
subRS.Seed(conf.RngSeed + int64(randR.Intn(conf.KeySpace)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the global seed here so that every run no longer has the exact same keys (unless the same global seed is used).


key := mct.RandString(&subRS, conf.KeyLength, conf.KeyPrefix)
valSeed := new(big.Int).SetBytes([]byte(key)).Int64()
subRS.Seed(valSeed)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit ugly, but just having the values seeded by their keys.

valSeed := new(big.Int).SetBytes([]byte(key)).Int64()
subRS.Seed(valSeed)

switch rng := randR.Intn(conf.DelRatio + conf.SetRatio + conf.GetRatio); {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core worker logic simply branches probabilistically according to the delete-set-get ratios (respectively).

}
case rng < (conf.DelRatio + conf.SetRatio):
value := mct.RandBytes(&subRS, int(conf.ValueSize))
rl.Take()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rl.Take() is where the rate limiter pauses execution for the prescribed amount of time.

@adesjardins-stripe
Copy link
Contributor Author

For future reference, it's much easier to review sets of commits that build into something rather than blobs like this; probably 80% of the changes I could accept without argument, then we'd iterate on the rest.

Very understandable, and I apologize. This work was completed as a small project with this feature set in mind. I will try to add several comments to help clarify the different aspects of the PR. If it remains intractable, I might be able to break it up.

In terms of performance, I benchmarked the original code against my changes—with and without value comparisons. Value size has a clear effect on performance when comparisons are included. At a size of 5, I see a roughly 3% drop in performance which is due to my overall changes rather than the comparison itself. However, at a size of 1000, the performance impact approaches 20%, which is definitely substantial.

To mitigate this, I will add a flag to enable/disable the comparisons, so they won't have an impact if they're not desired. Something else to consider is using a partial comparison—only check the first n digits instead of the whole string. This should still catch any key collisions without the dramatic performance impact.

I didn't reach millions of RPS (100,000 on one machine and 300,000 on another), but I expect this is just due to hardware differences.

@adesjardins-stripe
Copy link
Contributor Author

The flag has been implemented and performance is as expected when validation is disabled.

While testing it, I noticed that it appears that value comparisons aren't causing the performance impact; it's generating the expected value on every get that slows things down.

For an overall performance uplift, it may be worth pre-generating all keys and values and storing them rather then generating them each loop.

@dormando
Copy link
Member

dormando commented Jun 7, 2022

For an overall performance uplift, it may be worth pre-generating all keys and values and storing them rather then generating them each loop.

I swear I did exactly this in this repo already. I need to double check... I went through the same path of "oh long values take a while to generate", "a local HT lookup isn't too bad", etc. Hopefully I didn't leave that sitting in a branch somewhere.

@adesjardins-stripe
Copy link
Contributor Author

I added key/value pre-generation in a branch as a test. There's a moderate performance improvement (~10%) when value comparisons are enabled and values are long (1000+). Otherwise, the difference isn't really meaningful.

https://github.com/adesjardins-stripe/mctester/pull/1/files

@dormando
Copy link
Member

dormando commented Jun 9, 2022

I can't find the branch that I thought I had, and the server mode doesn't seem to have a value checker either.

I do have some notes about testing other methods, ie key is the checksum of the value, but it doesn't look like I tried any of it.

re: the rate limiter... while my variables were extremely poorly named (reqspersleep, bundles, etc), I was actually using the prescriptive pacing in specific test scenarios. Also the original get/set/delete pattern is deliberate since it doesn't exercise things like extstore very well if you're doing blind set/deletes as a ratio.

Given the structure of the repo has different test commands with a central library for code reuse/abstraction, what would you think about moving this whole schebang to a new command? Then the discussion can be around which parts to pull into the middle library (which wouldn't be much of a rush and can happen over time). The "server" structure is similar in that the pacing/etc are abstracted out of its core so if we want to port features over there that'd probably be the forcing function for moving code into the library.

@dormando
Copy link
Member

dormando commented Jun 9, 2022

I'd still give the command another full readthrough/audit for issues but we could skip the whole discussion around functionality changes :)

@adesjardins-stripe
Copy link
Contributor Author

Just to be clear, do you mean copying my new routine from cmd/basic into an adjacent directory (cmd/ratelimited, say) and then leaving "basic" exactly as it was (aside from some minor changes related to Fei's PR)?

That sounds perfectly reasonable to me—and probably what I should have done to begin with!

I can push that change here or I can break it up into 2 PRs:

  1. The changes made by Fei that pertain to the protocol (prefix stripping, pipelining, etc...) from his old PR, along with those minor formatting changes.
  2. Introduction of the new test command, which should just be the addition of the new main file and the report file.

@dormando
Copy link
Member

dormando commented Jun 9, 2022

Yup, exactly. I went out of my way to make it easy to do this, so it'd make sense to do this :) I've been doing this so long the only consistent thing I see in people's test patterns is... they're not super consistent. There are also sometimes tests or optimizations that only make sense in some conditions or for specific features. Putting them all into the same high level tool just makes the tool unusably complicated; thus people write more tools.

So for mctester the goal is more about coverage while allowing shared code so I can test in more conditions that're important to people, yet still have bench test code that makes sense for me as a developer. IE; the request pacing specifics lets me test pipelining/batching inside of memcached purposefully, while it doesn't end up reflecting user workloads that well.

@dormando
Copy link
Member

dormando commented Jun 9, 2022

Fwiw a new PR would be best, with the changes split up.

edit: though they wouldn't have to be two distinct PR's.

@adesjardins-stripe
Copy link
Contributor Author

I think I will make it 2 PRs (as long as that's ok). The first will be the common changes introduced by Fei and the second will be the new test routine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants