-
Notifications
You must be signed in to change notification settings - Fork 680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Load and Performance Doc #2140
Changes from all commits
2616395
3391233
8a36c5f
a3beb7c
9dc4568
9e575f2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,200 +1,75 @@ | ||
--- | ||
title: Load and Performance Testing | ||
description: Learn how to monitor internal execution performance of your Pantheon Drupal or WordPress site. | ||
tags: [performance, cache] | ||
categories: [platform, cache] | ||
tags: [performance] | ||
categories: [performance] | ||
--- | ||
We highly recommend load testing a site both prior and post launch to ensure your site is optimally configured. | ||
|
||
## Before You Begin | ||
|
||
You should: | ||
|
||
- [Enable New Relic Pro](/docs/new-relic) to monitor internal execution performance without needing any additional modules or tools. | ||
- Have access to a command-line environment, preferably with administrative privileges. | ||
|
||
<div class="alert alert-info" role="alert"> | ||
<h3 class="info">Note</h3> | ||
<p><strong>Load testing should only be performed on the Live environment</strong>. Dev has much lower default caching settings than other environments to facilitate iterative development. Test has the exact same configuration as Live, but Test can only have one appserver, while Live can have as many as your plan allows. If disruptive behavior occurs outside of the Live environment, the site may be temporarily disabled to prevent disruption to other customers.</p></div> | ||
|
||
## Performance vs. Scalability | ||
|
||
There are two things to test for: | ||
|
||
1. **Performance**: the response time for an individual request | ||
2. **Scalability**: the ability to deliver with optimal response time to a larger number of concurrent requests | ||
|
||
High-performance is the ability to deliver a page in under a second; scalability is the ability to deliver that page in under a second for many requests. It's important to understand the difference between these two dimensions and that there are trade-offs between performance and scalability. | ||
|
||
## Verify Varnish is Working | ||
|
||
To verify that the [Varnish](/docs/varnish) cache is working, the `curl` command can be run with the `-I` flag to gather and display header information. Header information can also be obtained via [Firebug](http://en.wikipedia.org/wiki/Firebug_(software)) or [Inspect](http://en.wikipedia.org/wiki/Google_Chrome) in the browser. The results should be something like this: | ||
|
||
```nohighlight | ||
curl -I http://live-yoursite.pantheonsite.io | ||
HTTP/1.1 200 OK | ||
Server: nginx/1.0.10 | ||
Date: Fri, 17 Aug 2012 23:47:36 GMT | ||
Content-Type: text/html; charset=utf-8 | ||
Connection: keep-alive | ||
cache-control: public, max-age=300 | ||
last-modified: Fri, 17 Aug 2012 23:44:40 +0000 | ||
expires: Sun, 11 Mar 1984 12:00:00 GMT | ||
etag: "1345247080" | ||
X-Varnish: 1082592805 1082586928 | ||
Age: 176 | ||
Via: 1.1 varnish | ||
X-Pantheon-Edge-Server: 108.166.96.132 | ||
Vary: Accept-Encoding, Cookie | ||
``` | ||
The "Age" field should be greater than 0. If the max age is not greater than 0, please review [Drupal's Performance and Caching Settings](/docs/drupal-cache#drupal-7-performance-configuration) and [Varnish Caching for High Performance](/docs/varnish) documentation. | ||
|
||
<div class="alert alert-danger" role="alert"> | ||
<h3 class="info">Warning</h3> | ||
<p>Until Varnish has been correctly configured, don't worry about further testing.</p></div> | ||
|
||
## Timing an Uncached Page Request | ||
|
||
Passing the curl command with `time` before it, as well as sending a `NO_CACHE` cookie, which prevents Varnish from caching the response, will test the actual response of the application containers backend: | ||
|
||
time curl -I -H "Cookie: NO_CACHE=1;" http://live-yoursite.pantheonsite.io | ||
|
||
The command returns the following results. Note the appended timestamp at the bottom. The "real" time is the one to pay attention to: | ||
```nohighlight | ||
time curl -I -H "Cookie: NO_CACHE=1;" http://live-yoursite.pantheonsite.io | ||
HTTP/1.1 200 OK | ||
Server: nginx/1.0.10 | ||
Date: Fri, 17 Aug 2012 23:57:39 GMT | ||
Content-Type: text/html; charset=utf-8 | ||
Connection: keep-alive | ||
cache-control: public, max-age=300 | ||
last-modified: Fri, 17 Aug 2012 23:57:38 +0000 | ||
expires: Sun, 11 Mar 1984 12:00:00 GMT | ||
etag: "1345247858" | ||
Accept-Ranges: bytes | ||
X-Varnish: 1082615375 | ||
Age: 0 | ||
Via: 1.1 varnish | ||
X-Pantheon-Edge-Server: 108.166.96.132 | ||
Vary: Accept-Encoding, Cookie | ||
|
||
real 0m0.874s | ||
user 0m0.036s | ||
sys 0m0.004s | ||
``` | ||
Test specific-pages of a site by passing a specific URL, as well as the experience of a logged-in user by passing a PHP-Session ID. | ||
|
||
To get the PHP-Session ID, log in to your site and check the browsers cookie setting and value. The Session ID can be passed in the following way: | ||
|
||
time curl -I -H "Cookie: SESSe6c673379860780ffbc45bdd6d9c1ab4=dKanNfIMe_0CnOMF7v1Qb5SpDN7UDvyQE8um-1Rpkcg;;" http://live-yoursite.pantheonsite.io | ||
|
||
If you're not satisfied with the response time, focus should be shifted to optimizing the performance of the site. | ||
|
||
## Testing Scale and Throughput | ||
|
||
In order to test scale and throughput, we use AB, a simple tool made available by the Apache Project. | ||
|
||
<div class="alert alert-danger" role="alert"> | ||
<h3 class="info">Warning</h3> | ||
<p>Do not raise the concurrency or total number of request values drastically. Small, measured tests should yield the proper results.</p></div> | ||
|
||
Run the following command: | ||
```nohighlight | ||
ab -n 100 -c 5 http://live-yoursite.pantheonsite.io/ | ||
``` | ||
Varnish should now be properly configured, and what you've tested should generate good response times and a high requests per second. | ||
|
||
As with `curl`, you can run `ab` with the following parameters: `-C NO_CACHE=1` parameter to stop Varnish from caching the response. `ab` returns the following output: | ||
```nohighlight | ||
ab -n 100 -c 5 -C NO_CACHE=1 http://live-yoursite.pantheonsite.io/ | ||
This is ApacheBench, Version 2.3 <$Revision: 655654 $> | ||
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ | ||
Licensed to The Apache Software Foundation, http://www.apache.org/ | ||
|
||
Benchmarking http://live-yoursite.pantheonsite.io (be patient).....done | ||
|
||
Server Software: 10.176.69.43 | ||
Server Hostname: http://live-yoursite.pantheonsite.io | ||
Server Port: 80 | ||
|
||
Document Path: / | ||
Document Length: 30649 bytes | ||
|
||
Concurrency Level: 5 | ||
Time taken for tests: 12.854 seconds | ||
Complete requests: 100 | ||
Failed requests: 0 | ||
Write errors: 0 | ||
Total transferred: 3118447 bytes | ||
HTML transferred: 3064900 bytes | ||
Requests per second: 7.78 [#/sec] (mean) | ||
Time per request: 642.705 [ms] (mean) | ||
Time per request: 128.541 [ms] (mean, across all concurrent requests) | ||
Transfer rate: 236.92 [Kbytes/sec] received | ||
|
||
Connection Times (ms) | ||
min mean[+#sd] median max | ||
Connect: 60 81 32.5 73 258 | ||
Processing: 411 554 150.2 496 1213 | ||
Waiting: 82 131 100.5 109 794 | ||
Total: 471 635 162.9 574 1280 | ||
|
||
Percentage of the requests served within a certain time (ms) | ||
50% 574 | ||
66% 614 | ||
75% 646 | ||
80% 696 | ||
90% 899 | ||
95% 1010 | ||
98% 1170 | ||
99% 1280 | ||
100% 1280 (longest request) | ||
``` | ||
The output provides insight into the requests per second, the most critical element in regards to the scalability of a site. Pay attention to the 90/95% response time as well, as this gives an idea of how the site is actually performing. Check that the number of failed requests is 0; if it's not, this should be investigated. | ||
|
||
<div class="alert alert-info" role="alert"> | ||
<h3 class="info">Note</h3> | ||
<p>Testing with a session cookie to emulate the experience of a logged-in user is extremely important, as the contrast between an anonymous user and a logged-in user may be drastically different.</p></div> | ||
|
||
## Performance Goals | ||
|
||
Response times vary from site to site depending on the size of your modules stack, database queries, etc. Generally, anything under 1 second is considered excellent, but this is up to you. | ||
|
||
Emulating a logged in user's experience with `ab` is a key metric, as it provides the number of pages per second your site can generate on Pantheon. This number may determine whether or not you need to add additional application containers. | ||
|
||
## Testing Tools | ||
|
||
There are a number of other tools to consider when you are planning your load testing strategy. This can vary by the need for detail, nature of your site, or requirements for quality analysis. | ||
|
||
<table class="table"> | ||
<tbody> | ||
<tr> | ||
<th>Testing Tool</th> | ||
<th>Documentation</th> | ||
<th>Acquisition</th> | ||
</tr> | ||
<tr> | ||
<td>Apache AB</td> | ||
<td><a href="http://httpd.apache.org/docs/2.2/programs/ab.html">Documentation</a></td> | ||
<td><a href="http://httpd.apache.org/download.cgi">Download</a></td> | ||
</tr> | ||
<tr class="tr_class1"> | ||
<td>J-Meter</td> | ||
<td><a href="http://jmeter.apache.org/usermanual/index.html">Documentation</a></td> | ||
<td><a href="http://jmeter.apache.org/download_jmeter.cgi">Download</a></td> | ||
</tr> | ||
<tr> | ||
<td>The Grinder</td> | ||
<td><a href="http://grinder.sourceforge.net">Documentation</a></td> | ||
<td><a href="http://grinder.sourceforge.net/download.html">Download</a></td> | ||
</tr> | ||
<tr> | ||
<td>Blitz.io</td> | ||
<td><a href="http://blitz.io/docs/">Documentation</a></td> | ||
<td><a href="https://www.blitz.io/pricing#/subscriptions">Pricing</a></td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
Load and performance tests are critical steps in going live procedures, as they help expose and identify potential performance killers. These tests provide insight for how a site will perform in the wild under peak traffic spikes. | ||
|
||
## Load vs Performance Testing | ||
Before you start, it's important to understand the difference between load and performance testing and know when to use each. | ||
### Performance Testing | ||
Performance testing is the process in which you measure an application's response time to proactively expose bottlenecks. These tests should be regularly executed as part of routine maintenance. Additionally, you should run these test before any load testing. If your application is not performing well, then you can be assured that the load test will not go well. | ||
|
||
The scope of performance tests should be limited to the application itself on a development environment (Dev or [Multidev](/docs/multidev)) without caching. This will give you an honest look into your application and show exactly how uncached requests will perform. You can bypass cache by [setting the `no-cache` HTTP headers](/docs/cache-control) in responses. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Offer alternatives to bypass cache by setting a no-cache header? How about just disabling cache completely on Dev/Multidev during testing through Drupal/WordPress Admin UI? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding is that the dev environment has a default time-to-live of zero for dev, which implies no caching, but that things like Pantheon Advanced Page Cache may override this to be non-zero value. While a no-cache header may help, this may depend on when this get executed. Suggesting to disable caching via the UI is an option, with an emphasis to remember to re-enable prior to pushing to prod. |
||
|
||
### Load Testing | ||
Load testing is the process in which you apply requests to your site that will represent the most load that your site will face once it is live. This test will ensure that the site can withstand the peak traffic spikes after launch. This test should be done on the Live environment before the site has launched, after performance testing. | ||
|
||
If your site is already live, then you should run load tests on the Test environment. Keep in mind that the Test environment has one application container, while Live environments on sites with a service level of Business and above can have multiple application containers serving the site. So try to run a proportionate amount of traffic based on how many containers you currently have on your Live environment. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Offer concrete example with math? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The EOM team is the best source for the algorithm we use. |
||
|
||
## Preparing for Tests | ||
The procedure for executing a load test and a performance test are similar: | ||
|
||
1. Enable [New Relic Pro](/docs/new-relic) within the Site Dashboard on Pantheon to ensure you have clear reporting to monitor response times. | ||
|
||
* Set your [apdex](https://docs.newrelic.com/docs/apm/new-relic-apm/apdex/apdex-measuring-user-satisfaction#score) threshold according to your business rules (.5 is the default). Be careful not to set this too high, otherwise you will not get as many transaction traces in New Relic. | ||
* If you have particular transactions that you want to ensure are traced, set them up as [key transactions](https://docs.newrelic.com/docs/apm/transactions/key-transactions/key-transactions-tracking-important-transactions-or-events). | ||
|
||
2. Select a load testing tool: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like verifying Varnish is working is still important before doing a load test? Maybe this can be more concise? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this still the case now that Global CDN is in place? |
||
|
||
* SaaS Solutions | ||
* [Blazemeter](https://www.blazemeter.com) | ||
* [Load Impact](https://loadimpact.com) | ||
* Open Source tools | ||
* [Jmeter](http://jmeter.apache.org) | ||
* [Locust](http://locust.io/) | ||
|
||
The Pantheon onboarding team uses Locust, an open source load testing tool. Locust makes it easy to build out test scripts, and it allows you to crawl the site instead of using predefined URLs. Crawling the site has the added benefit of loading every page that is linked to anywhere on the site. This exposes edge case performance bottlenecks that would have gone undetected under tests with predifined URLs. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "makes it easy" -- link to example script? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The EOM team should be asked to update this section. |
||
|
||
Ultimately, it doesn't matter what tool you use as long as you to test your site properly. Be sure to allow for any authenticated traffic as well as anonymous. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Be sure to allow for any authenticated traffic as well as anonymous" - Not sure we should just assert this in passing. Load testing authenticated users can be difficult. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that authenticated user testing is a complex task and thus the generic statement should be along the lines of "It is important for Load Testing to test against the anticipated traffic patterns of the site, both in terms of traffic volume and authenticated/anonymous proportion. Note that testing authenticated workflows is considerably more complex requiring more time, skills and iterations." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I edited |
||
|
||
3. Determine how much load to apply. | ||
|
||
* **Performance Tests**: Smaller loads should suffice, as you should be able to see transactional bottlenecks with 10-20 concurrent users. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why 10-20? A single request can give you all you need, no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO, you want to generate more than single request to tease out potential bottlenecks. Also, I know that we have a Quicksilver example that will use free loader.io account to automatically run this level of test on each push to Test environment. Not only does this result in automated testing procedures, it provides a standard profile that you can see in New Relic. Here's a related link, but we need better: pantheon-systems/quicksilver-examples#110 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO, this is good to go (i.e. no edits needed). A separate issue should be created, if/when we want to include reference to the loader.io Quicksilver example. |
||
* **Load Tests**: Determine how many concurrent users the site is expected to serve based on historical analytics for the site. Identify the peak hourly sessions and average session duration, then do some math: `hourly_sessions / (60 / average_duration) = Concurrent Users` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bentekwork How do I determine load to apply in the test after calculating concurrent users? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's reiterate difference between load test on Live vs non-live, and include app containers in calculation for scenario. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Load tests should not be run on Test, rather performance test can/should be run there. In terms of providing formulas, it is complicated by the fact that to run "proportionate amount of traffic" on Test involves knowing the number of appservers on Live, which clients can't determine on their own (other than asking Support, or looking at New Relic, which will include decommissioned appservers for some time). |
||
|
||
|
||
##Running the Tests | ||
If this is a performance test, be sure to run the test on a development environment (Dev or [Multidev](/docs/multidev)) without caching. Run load tests on the Live environment before launching the site. If the site is already launched, use the Test environment instead. | ||
<div markdown="1" class="alert alert-danger" role="alert"> | ||
###Warning {.info} | ||
We do not recommend load testing on the Live environment if the site has already launched because you risk overwhelming your live site and causing downtime. | ||
</div> | ||
Note the start time for the test. As the test executes, it's a good idea to keep a close eye on [log files](/docs/logs). Make note of any errors and warnings that pop up during the test so that you can fix them. | ||
|
||
Once the test is running, execute common tasks done by editors and administrators and note the time. Example tasks may include: | ||
|
||
* Clear the drupal cache | ||
* Clear the edge cache (if this is a load test, performance tests should not be cached) | ||
* Run Drupal cron | ||
* Run any scripts that could be triggered while users are on the site. | ||
|
||
##Assess Results | ||
Now that the test is complete, examine the New Relic data. The **Overview** tab will give you an average response time for the duration of the test. Times above 750ms are good indicators of performance optimization opportunites. | ||
|
||
Next, review the **Transactions** tab in New Relic and sort by **Slowest average response time**. Click on the slowest transaction to pull up the transaction trace. Review the transaction trace to find the performance bottleneck. | ||
|
||
Finally, review the **Error analytics** tab in New Relic. PHP errors often indicate huge performance bottlenecks. If you have errors, fix them. | ||
|
||
### Calculating Load Capacity After Launch | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How can we highlight this scenario? And flesh it out with concrete example explaining how to collect RPM and response time from New Relic? |
||
After launch, you can establish a baseline that `X` response time will let you handle `Y` traffic. If `X` degrades in Dev/Test, that will impact how much traffic Live can handle. | ||
|
||
## See Also | ||
- [Going Live](/docs/going-live) | ||
|
||
* [Load Testing Drupal and WordPress with BlazeMeter](/docs/guides/load-testing-with-blazemeter/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should these tests be run regularly as part of routine maintenance? To ensure performance doesn't degrade with a code or configuration change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I'd favor suggesting that clients:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some of these notions.