Autoscaling guidance

Overview

Autoscaling is the process of dynamically allocating the resources required by an application to match performance requirements and satisfy service level agreements (SLAs) while minimizing runtime costs. As the volume of work grows, an application may require additional resources to enable it to perform its tasks in a timely manner. As demand slackens, resources can be de-allocated to minimize costs while still maintaining adequate performance and meeting SLAs. Autoscaling takes advantage of the elasticity of cloud-hosted environments while easing management overhead by reducing the need for an operator to continually monitor the performance of a system and make decisions about adding or removing resources.

Autoscaling applies to all of the resources used by an application, not just the compute resources. For example, if your system uses message queues to send and receive information, it could create additional queues as it scales.

Types of scaling

Scaling typically takes one of two forms—vertical and horizontal scaling:

Vertical Scaling (often referred to as scaling up and down) requires that you modify the hardware (expand or reduce its capacity and performance), or redeploy the solution using alternative hardware that has the appropriate capacity and performance. In a cloud environment, the hardware platform is typically a virtualized environment. Unless the original hardware was substantially overprovisioned, with the consequent upfront capital expense, vertically scaling up in this environment involves provisioning more powerful resources, and then moving the system onto these new resources. Vertical scaling is often a disruptive process that requires making the system temporarily unavailable while it is being redeployed. It may be possible to keep the original system running while the new hardware is provisioned and brought online, but there will likely be some interruption while the processing transitions from the old environment to the new one. It is uncommon to use autoscaling to implement a vertical scaling strategy.
Horizontal Scaling (often referred to as scaling out and in) requires deploying the solution on additional or fewer resources, which are typically commodity resources rather than high-powered systems. The solution can continue running without interruption while these resources are provisioned. When the provisioning process is complete, copies of the elements that comprise the solution can be deployed on these additional resources and made available. If demand drops, the additional resources can be reclaimed after the elements using them have been shut down cleanly. Many cloud-based systems, including Microsoft Azure, support automation of this form of scaling.

Implementing an autoscaling strategy

Implementing an autoscaling strategy typically involves the following components and processes:

Instrumentation and monitoring systems at the application, service, and infrastructure levels that capture key metrics such as response times, queue lengths, CPU utilization, and memory usage.
Decision-making logic that can evaluate the monitored scaling factors against predefined system thresholds or schedules and make decisions regarding whether to scale or not.
Components that are responsible for carrying out tasks associated with scaling the system, such as provisioning or de-provisioning resources.
Testing, monitoring, and tuning of the autoscaling strategy to ensure that it functions as expected.

Most cloud-based environments, such as Microsoft Azure, provide built-in autoscaling mechanisms that address common scenarios. If the environment or service you use does not provide the necessary automated scaling functionality, or if you have extreme autoscaling requirements beyond its capabilities, a custom implementation may be necessary to collect operational and system metrics, analyze them to identify relevant data, and then scale resources accordingly.

Considerations for implementing autoscaling

Autoscaling is not an instant solution. Simply adding resources to a system or running more instances of a process does not guarantee that the performance of the system will improve. Consider the following points when designing an autoscaling strategy:

The system must be designed to be horizontally scalable. Avoid making assumptions about instance affinity; do not design solutions that require that the code is always running in a specific instance of a process. When scaling a cloud service or web site horizontally, do not assume that a series of requests from the same source will always be routed to the same instance. For the same reason, design services to be stateless to avoid requiring a series of requests from an application to always be routed to the same instance of a service. When designing a service that reads messages from a queue and processes them, do not make any assumptions about which instance of the service handles a specific message because autoscaling could start additional instances of a service as the queue length grows. The Competing Consumers pattern describes how to handle this scenario.
If the solution implements a long-running task, design this task to support both scaling out and scaling in. Without due care, such a task could prevent an instance of a process from being shutdown cleanly when the system scales in, or it could lose data if the process is forcibly terminated. Ideally, refactor a long-running task and break up the processing that it performs into smaller, discrete chunks. The Pipes and Filters pattern provides an example of how you can achieve this. Alternatively, you can implement a checkpoint mechanism that records state information about the task at regular intervals, and save this state in durable storage that can be accessed by any instance of the process running the task. In this way, if the process is shutdown, the work that it was performing can be resumed from the last checkpoint by using another instance.
When background tasks run on separate compute instances, such as in worker roles of a Cloud Services hosted application, you may need to scale different parts of the application using different scaling policies. For example, you may need to deploy additional UI compute instances without increasing the number of background compute instances, or the opposite of this. If you offer different levels of service (such as basic and premium service packages), you may need to scale out the compute resources for premium service packages more aggressively than those for basic service packages in order to meet SLAs.
Consider using the length of the queue over which UI and background compute instances communicate as a driver for your autoscaling strategy. This is the best indicator of an imbalance or difference between the current load and the processing capacity of the background task.
If you base your autoscaling strategy on counters that measure business processes, such as the number of orders placed per hour or the average execution time of a complex transaction, ensure that you fully understand the relationship between the results from these types of counters and the actual compute capacity requirements. It may be necessary to scale more than one component or compute unit in response to changes in business process counters.
To prevent a system from attempting to scale out excessively, and to avoid the costs associated with running many thousands of instances, consider limiting the maximum number of instances that can be automatically added. Most autoscaling mechanisms allow you to specify the minimum and maximum number of instances for a rule. In addition, consider gracefully degrading the functionality that the system provides if the maximum number of instances have been deployed and the system is still overloaded.
Keep in mind that autoscaling might not be the most appropriate mechanism to handle a sudden burst in workload. It takes time to provision and start new instances of a service or add resources to a system, and the peak may have passed by the time these additional resources have been made available. In this scenario, it may be better to throttle the service. For more information, see the Throttling pattern.
Conversely, if you do need the capacity to process all requests when the volume fluctuates rapidly, and cost is not a major contributing factor, consider using an aggressive auto-scaling strategy that starts additional instances more quickly, or by using a scheduled policy that starts a sufficient number of instances to meet the maximum load before that load is expected.
The autoscaling mechanism should monitor the autoscaling process and log the details of each autoscaling event (what triggered it, what resources were added or removed, and when). If you create a custom autoscaling mechanism, ensure that it incorporates this capability. The information can be analyzed to help measure the effectiveness of the autoscaling strategy, and tune it if necessary—both in the short term as the usage patterns become more obvious, and over the long term as the business expands or the requirements of the application evolve. If an application reaches the upper limit defined for autoscaling, the mechanism might also alert an operator who could manually start additional resources if the situation warrants this. Note that, under these circumstances, the operator may also be responsible for manually removing these resources after the workload eases.

Autoscaling in an Azure solution

There are several options for configuring autoscaling for your Azure solutions:

Azure Autoscaling. This feature supports the most common scaling scenarios based on a schedule and, optionally, triggered scaling operations based on runtime metrics (such as processor utilization, queue length, or built in and custom counters). You can configure simple autoscaling policies for a solution quickly and easily by using the Azure Management Portal, and you can use the Azure Monitoring Services Management Library to configure autoscaling rules with a finer degree of control. For more information, see the section The Azure Monitoring Services Management Library.
A custom solution based on the diagnostics, monitoring, and service management features of Azure. For example, you could use Azure diagnostics, custom code, or the System Center Management Pack for Windows Azure to continually monitor performance of the application; and the Azure Service Management REST API, the Microsoft Azure Management Libraries, or the Autoscaling Application Block to scale out and in. The metrics for triggering a scaling operation can be any built-in or custom counter, or other instrumentation you implement within the application. However, a custom solution is not simple to implement, and should be considered only if none of the previous approaches can fulfil your requirements. Note that the Autoscaling Application Block is an open sourced framework, and is not supported directly by Microsoft.
Third party services such as Paraleap AzureWatch that enable you to scale a solution based on schedules, service load and system performance indicators, custom rules, and combinations of different types of rules.

When choosing which autoscaling solution to adopt, consider the following points:

Use the built in autoscaling features of the platform, if they can meet your requirements. If not, carefully consider whether you really do need more complex scaling features. Some examples of additional requirements beyond those the built-in auto-scaling capability offers may include more granularity of control, different ways to detect trigger events for scaling, scaling across subscriptions, scaling other types of resources, and more.
Consider if you can predict the load on the application with sufficient accuracy to depend only on scheduled autoscaling (adding and removing instances to meet anticipated peaks in demand). Where this is not possible, use reactive autoscaling based on metrics collected at runtime to allow the application to handle unpredictable changes in demand. However, it is typically appropriate to combine these approaches. For example, create a strategy that adds resources such as compute, storage, and queues based on a schedule of the times when you know the application is most busy. This helps to ensure that capacity is available when required without the delay encountered when starting new instances. In addition, for each scheduled rule, define metrics that allow reactive autoscaling during that period to ensure that the application can handle sustained but unpredictable peaks in demand.
It is often difficult to understand the relationship between metrics and capacity requirements, especially when an application is initially deployed. Prefer to provision a little extra capacity at the beginning, and then monitor and tune the autoscaling rules to bring the capacity closer to the actual load.

Using Azure Autoscaling

Azure Autoscaling enables you to configure scale out and scale in options for a solution. Azure Autoscaling can automatically add and remove instances of Azure Cloud Services web and worker roles, Azure Mobile Services, and Azure Web Sites applications. It can also enable automatic scaling by starting and stopping instances of Azure Virtual Machines. An Azure autoscaling strategy comprises two sets of factors:

Schedule-based autoscaling that can ensure additional instances are available to coincide with an expected peak in usage, and can scale in once the peak time has passed. This enables you to ensure that you have sufficient instances already running without waiting for the system to react to the load.
Metrics-based autoscaling that reacts to factors such as average CPU utilization over the last hour, or the backlog of messages that the solution is processing in an Azure storage or Service Bus queue. This allows the application to react separately from the scheduled autoscaling rules to accommodate unplanned or unforeseen changes in demand.

Consider the following points when using Azure Autoscaling:

Your autoscaling strategy combines both scheduled and metrics-based scaling. You can specify both types of rules for a service, so that an application scales both on a schedule and in response to changes in load.
You should configure the Azure Autoscaling rules and then monitor the performance of your application over time. Use the results of this monitoring to adjust the way in which the system scales if necessary. However, keep in mind that autoscaling is not an instantaneous process—it takes time to react to a metric such as average CPU utilization exceeding (or falling below) a specified threshold.
Autoscaling rules that use a detection mechanism based on a measured trigger attribute (such as CPU usage or queue length) use an aggregated value over time, rather than the instantaneous values, to trigger an autoscaling action. By default, the aggregate is an average of the values. This prevents the system from reacting too quickly, or causing rapid oscillation. It also allows time for new instances that are auto-started to settle into running mode, preventing additional autoscaling actions from occurring while the new instances are starting up. For Cloud Services and Virtual Machines, the default period for the aggregation is 45 minutes, so it can take up to this period of time for the metric to trigger autoscaling in response to spikes in demand. You can change the aggregation period by using the SDK, but be aware that periods of less than 25 minutes may cause unpredictable results (see Auto Scaling Cloud Services on CPU Percentage with the Windows Azure Monitoring Services Management Library for more information). For Azure Web Sites, the averaging period is much shorter, allowing new instances to be available in around five minutes after a change to the average trigger measure.
If you configure autoscaling using the SDK rather than the web portal, you can specify a more detailed schedule during which the rules are active. You can also create your own metrics and use them with or without any of the existing ones in your autoscaling rules. For example, you may wish to use alternative counters such as the number of requests per second or the average memory availability, or use custom counters that measure specific business processes. For more information, see the section The Azure Monitoring Services Management Library.
When autoscaling Azure Virtual Machines, you must deploy a number of instances of the virtual machine that is equal to the maximum number you will allow autoscaling to start. These instances must be part of the same Availability Set. The Virtual Machines autoscaling mechanism does not create or delete instances of the virtual machine; instead, the autoscaling rules you configure will start and stop an appropriate number of these instances. For more information, see Automatically scale an application running Web Roles, Worker Roles, or Virtual Machines.
If new instances cannot be started, perhaps because the maximum for a subscription has been reached (such as the maximum number of cores when using the Virtual Machines service) or an error occurs during startup, the portal may show that an autoscaling operation succeeded. However, subsequent ChangeDeploymentConfiguration events displayed in the portal will show only that a service startup was requested, and there will be no event to indicate it was successfully completed.
In Azure Autoscaling, you can use the web portal UI to link resources such as SQL Database instances and queues to a compute service instance. This allows you to more easily access the separate manual and automatic scaling configuration options for each of the linked resources. For more information, see How to: Link a resource to a cloud service in the page How to Manage Cloud Services and the page How to Scale an Application.
When you configure multiple policies and rules, there is a possibility that they could conflict with each other. Azure Autoscaling uses the following conflict resolution rules to ensure that there is always a sufficient number of instances running:
- Scale out operations always take precedence over scale in operations.
- When scale out operations conflict, the rule that initiates the largest increase in the number of instances takes precedence.
- When scale in operations conflict, the rule that initiates the smallest decrease in the number of instances takes precedence.

The Azure Monitoring Services Management Library

You can use the Service Management API to configure Azure Autoscaling with a finer degree of control and to access capabilities that are not available through the web portal. This API is accessed directly as a REST Web API, or through the Azure Monitoring Services Management Library.

Azure Autoscaling is configured by specifying autoscaling profiles for Cloud Services roles, Virtual Machines availability sets, Azure Web Sites (as server farms in a webspace), or Azure Mobile Services. Each profile, of which a target can have up to 20, indicates:

When it is to be applied (using a recurrence or a fixed date interval),
The permitted number of instances (the minimum, maximum, and default number)
Which autoscaling rules are in effect

The web portal allows for the configuration of a fixed set of profiles, essentially distinguishing day/night and weekday/weekend profiles, with a single pair of scale rules based on CPU utilization or queue length. By using the Service Management API instead you can configure finer-grained applicability dates for profiles, and specify up to ten rules with triggers based on any metric available to the Azure Monitoring Service.

Autoscaling rules are composed of a trigger that indicates when a rule applies, and a scale action that indicates the change to perform on the configuration of the target. At the time of writing, the only supported action was an increase or decrease in the instance count.

The triggers for autoscaling rules are based on available metrics. Values for the configured metrics are sampled periodically from the appropriate sources, as defined in the autoscaling configuration. When each rule from an active profile is evaluated, the values of the metric specified on the trigger are aggregated in time and across instances (if appropriate), and this aggregate is compared against a threshold to indicate whether the rule applies. Valid aggregates over time are average (the default), minimum, maximum, last, total, and count. Valid aggregates over instances are average (the default), maximum, and minimum.

The metrics available for triggers are Azure storage and Service Bus queue lengths, the standard performance counters published by Windows Azure Diagnostics, and any custom performance counter published by each role or virtual machine. In a Cloud Services solution, when dealing with performance counters other than those available by default, you must change the monitoring level setting in the UI from “Minimum” to “Verbose” for the service.

For more information, see:

Monitoring SDK Class Library
How to Configure Performance Counters
Operations on Autoscaling
Add Autoscale Settings
Auto Scaling Cloud Services on CPU Percentage with the Windows Azure Monitoring Services Management Library
How to use Windows Azure Monitoring Services Management Library to create an Autoscale Rule

Related patterns and guidance

The following patterns and guidance may also be relevant to your scenario when implementing autoscaling:

Throttling Pattern. This pattern describes how an application can continue to function and meet service level agreements when an increase in demand places an extreme load on resources. Throttling can be used with autoscaling to prevent a system from being overwhelmed while the system scales out.
Competing Consumers Pattern. This pattern describes how to implement a pool of service instances that can handle messages from any application instance. Autoscaling can be used to start and stop service instances to match the anticipated workload. This approach enables a system to process multiple messages concurrently to optimize throughput, improve scalability and availability, and balance the workload.
Instrumentation and Telemetry Guidance. Instrumentation and telemetry are vital for gathering the information that can drive the autoscaling process.

#More information

How to Scale an Application
Automatically scale an application running Web Roles, Worker Roles, or Virtual Machines
How to: Link a resource to a cloud service
Scale linked resources
Azure Monitoring Services Management Library
Azure Service Management REST API
Operations on Autoscaling
Microsoft.WindowsAzure.Management.Monitoring.Autoscale Namespace
The Autoscaling Application Block documentation and key scenarios on MSDN.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-scaling.md

Auto-scaling.md

Autoscaling guidance

Overview

Types of scaling

Implementing an autoscaling strategy

Considerations for implementing autoscaling

Autoscaling in an Azure solution

Using Azure Autoscaling

The Azure Monitoring Services Management Library

Related patterns and guidance

Files

Auto-scaling.md

Latest commit

History

Auto-scaling.md

File metadata and controls

Autoscaling guidance

Overview

Types of scaling

Implementing an autoscaling strategy

Considerations for implementing autoscaling

Autoscaling in an Azure solution

Using Azure Autoscaling

The Azure Monitoring Services Management Library

Related patterns and guidance