Azure Cosmos DB auto scaling


Cosmos DB is Microsoft’s multi-model database service, aimed at high availability and low latency across Azure regions.

It relies strongly on partitioning and geo replication, to provide high throughput at virtually any location.

One feature that is present in a lot of key Azure services but hasn’t been available in Cosmos DB is automatic scaling. With auto scaling, your provisioned maximum throughput increases or decreases, based on demand.

In this post, I’ll illustrate how you can monitor for throttling issues, and what the options are for achieving auto scaling.

Tip: check out this course to learn more about Cosmos DB.

Request Units and throttling

In Cosmos DB, throughput is measured in Request Units (RUs) per second. A Request Unit is the cost of the combined system resources needed to read a 1 KB item.

Throttling occurs when you reach the provisioned RUs. This will cause Cosmos DB to return a HTTP status 429 (Too Many Requests) response, indicating you should back off and retry the operation.

For the complete list of possible HTTP status codes and how to handle them, see here.

Per database or per container?

Cosmos DB supports 2 models for provisioning throughput: per database and per container.

When configured on database level, throughput is shared amongst all containers in the database. This is the easiest way to set a limit over a lot of containers. It can also handle spikes in a single container by allocating more of the available resources for it.

Setting limits per container is the way to go if you have predictable demands. You can specify a low (cheaper) throughput on less used containers, and higher throughput for containers that are very busy or need to meet a certain SLA.

A little throttling is not a problem

As mentioned, when your application hits the throughput limit, resulting in a 429 HTTP status, it should back off and try again. Chances are that after a few retries, the service is available again, and your query or update will still succeed, transparent to users or application logic depending on it.

The .NET SDK for SQL API supports automatic retries by default, but you can tweak it with the RetryOptions property. For the .NET Standard SDK for SQL API, see the MaxRetryAttemptsOnRateLimitedRequests property.

Monitor for throttling

An easy way to get notified when throttling is occurring, is to use an Azure Monitor alert rule. If you’re not familiar with creating alert rules, see here.

To set up the alert, add a new rule, and under RESOURCE, choose your Cosmos DB account. Next, under CONDITION, add a new condition. In the condition properties window, search for the Total Requests signal:

After selecting the signal, under dimensions, select your database and optionally a specific container to monitor. Make sure to set the filter for the 429 HTTP status code and specify a threshold count and time window:

You’ll notice that I specified a threshold count of 5, instead of 1, because I don’t want to get notified about every occurrence. As explained, a little throttling is no big deal, if you’re using some retry logic.

Not directly related to throttling, but also good to monitor is the Service Availability signal. For example, to get notified when availability drops below 100%, configure the following rule:

Another thing I want to point out here is that the Azure Monitor also has a REST API.

Query for throttling

There is another way to get information about your Cosmos DB usage: by collecting diagnostics and sending them to, for example, a Log Analytics Workspace. If you don’t know how to create one, see here. Note that there are costs involved.

It takes a little more setting up than using Azure Monitor, but it also provides a lot more details.

To be able to query the collected data, first create the Log Analytics Workspace. Next, go to your Cosmos DB account and select the Diagnostic Settings from the left pane. Add a diagnostic setting, point it to the workspace, and select the kinds of data you want to collect:

After a few minutes the first data should be available. Select Logs in the left pane and select the Query editor. Next, enter a Kusto query to view the available metrics from the AzureMetrics collection. You can see there are a bunch of interesting ones (highlighted):

Depending on your selection in the diagnostic setting you created, there might be more collections available in the left pane of the query editor. You can query any collection by just double-clicking it.

To execute a query from code, you can use the Log Analytics REST API, as described here.

Instead of (or together with) collecting the data in a Log Analytics Workspace, you can choose to store it in a Blob storage account:

You can then read the data from the desired blob container.

Auto scaling option 1: DIY

A neat feature of Azure Monitor alert rules is that you can use an action group to target an HTTP triggered Azure Function. This means the function logic is invoked whenever the alert is set off.

Combine this with the possibility of changing the Cosmos DB throughput limit programmatically, and we can figure out some ways to achieve auto scaling.

One option is to expose a couple of lightweight Azure functions to do the work, as illustrated here:

In C#, using the .NET Standard SDK, changing the maximum number of RUs takes just a few lines of code:

using (var client = new Microsoft.Azure.Cosmos.CosmosClient("connection string here..."))
  var database = client.GetDatabase("mydatabase");

  // get current throughput
  var currentThroughput = await database.ReadThroughputAsync();

  // increase throughput
  //   must set throughput in increments of 100, with minimum of 400
  var newThroughput = (int)((currentThroughput ?? 400) + 100);

  var througputResponse = await database.ReplaceThroughputAsync(newThroughput);

  if (througputResponse.StatusCode != System.Net.HttpStatusCode.OK)
    // something went wrong...

Another option is to query the Cosmos DB status from Log Analytics or Blob storage (as described in a previous paragraph) every few minutes and change the throughput accordingly.

Predictably, there have been some attempts to implement a custom solution, some similar to the one I described:

Auto scaling option 2: Autopilot

As you might imagine, auto scale for Cosmos DB has been an often requested feature in the Azure community.

Microsoft has listened and announced Autopilot for Cosmos DB in November 2019. With Autopilot, which is still in preview at the time of this writing, RUs are automatically scaled based on usage:

While this may seem ideal, there are some drawbacks. One is that it’s only available for new databases and containers. Another is that, if you specify database level throughput (as opposed to container level), the maximum number of containers in the database is limited by a calculated factor. For example, 20,000 Autopilot RUs allow only up to 20 containers.

Also keep in mind that the maximum throughput is directly related to the storage limit. Currently, these limits are:

Maximum RUsMaximum GBs

Then there is pricing. As it stands, throughput provisioned with Autopilot seems to be 1.5 times more expensive than regular provisioned throughput. Matt Collinge did some math on this.

As I see it, one major advantage of Autopilot is that it can prevent over-provisioning, which happens when you configure a higher throughput than needed, resulting in unnecessary costs. Also, for extremely bursting or unpredictable loads, Autopilot might be more cost efficient than rolling your own scaling solution.

So, it really is a tradeoff: do you want the ease of flipping on a switch and maybe pay more, or do you want to implement and maintain a possibly cheaper, custom scaling solution.

Also note that limits and pricing might change once the feature becomes general available.


To recap, here are some best practices:

  • If you have predictable demands on your containers, configure specific throughputs for each one. Otherwise consider using a shared throughput on database level.
  • If you have data bursts on regular times, consider using scheduled code to upscale and downscale throughput around those times.
  • With unpredictable bursts, you can:
    • Trigger code to upscale or downscale by using alert rules.
    • Use code to query for load status from a scheduled interval timer and upscale or downscale as needed.
    • Find a library or SDK that does the work for you.
    • Use the Autopilot preview feature. This is much easier to implement but might be more costly.

Finally, some more guidance on performance and optimization:

Thanks for reading!

Leave a Reply

BCF Theme By aThemeArt - Proudly powered by WordPress.