Cloud Design Patterns: Retry

June 07, 2016

The Problem

Cloud applications depend on a lot of interconnected services and resources. These services are generally reliable, but transient failures are simply a reality of the Internet. While providers like Azure and AWS try their best to make sure their services are available they can’t mitigate the risk entirely. Not accounting for this in the design phase can cause breakdowns and bad user experience. Fortunately transient failures usually rectify themselves within a few seconds. There is a simple pattern that we can use to reduce the number of breakdowns caused by transient failures.

The Solution - Retry Pattern

The idea behind the retry pattern is simple: it’s better to retry a failed connection than it is to just immediately fail. If the connection problem resolves itself after two or three retries then you don’t have to worry about it. You should implement retries any time your application communicates with a remote service such as a database or blob storage.

Many wrapper libraries around remote services already have retry capabilities built in to them. Some have a retry policy enabled by default (such as the Azure Storage Client Libraries). Others have retry policies that can be enabled with just a few lines of code (such as Entity Framework). The specifics of how to configure retries in these libraries is beyond the scope of this article. If you do have questions about a specific library let me know in the comments. If you aren’t using a library with retry capabilities then you’ll have to handle it yourself. Don’t worry; it’s easy.

Implementing Retry Techniques

There are two primary techniques to implement retries in your application: linear and exponential backoff.

The Linear Technique

Linear retries are simple. A linear retry algorithm waits a few milliseconds before attempting the connection again. It also usually limits the maximum number of attempts before throwing the exception. The linear technique is ideal for operations that affect user experience. If a connection doesn’t succeed within a few attempts then we want to fail quickly. Users would much rather see an error after a few seconds than stare at a loading screen for several minutes.

Visualization of a linear retry

Notice the interval x between retries stays constant.

Here’s an example of using the linear retry technique to connect to a remote service. We make the remote call inside a for loop. The first time the connection succeeds it breaks out of the loop and continues on. If the connection doesn’t succeed we retry every two seconds a maximum of five times. If after five attempts we can’t connect then we throw the exception up to be handled elsewhere.

var resource = new MyRemoteResource();
int retryInterval = 2000; // 2 seconds
int maxAttempts = 5;
// If the connection fails then we'll try reconnecting every 2 seconds a maximum number
// of 5 times before throwing the exception.
for(int i = 1; i <= maxAttempts; i++)
{
try
{
resource.Connect();
break;
}
catch
{
if(i == maxAttempts)
throw;
Thread.Sleep(retryInterval);
}
}
view raw LinearRetry.cs hosted with ❤ by GitHub

It’s important to carefully consider the wait interval between retries. If you make the interval too small then you may cause more harm than good. A service that is trying to recover won’t react well to being bombarded with too many new connection attempts.

The Exponential Backoff Technique

In an exponential backoff the interval between retries increases exponentially. This gives the service more time to recover between connection attempts. A lot of libraries use exponential backoff as their default retry policy.

Visualization of an exponential backoff retry

Notice the interval x between retries doubles every time.

Here’s an example of implementing an exponential backoff retry. It’s similar to the linear example above except the interval between retries doubles each time the connection fails.

var resource = new MyRemoteResource();
int initialRetryInterval = 2000; // 2 seconds
int maxAttempts = 5;
int currentRetryInterval = initialRetryInterval;
// If the connection fails then we'll try reconnecting a maximum number of 5 times
// before throwing the exception. The interval between retries will double each time.
for(int i = 1; i <= maxAttempts; i++)
{
try
{
resource.Connect();
break;
}
catch
{
if(i == maxAttempts)
throw;
Thread.Sleep(currentRetryInterval);
currentRetryInterval *= 2;
}
}

Bonus: C# Retry Utility Functions

Wrapping every remote call inside a loop can make for some nasty looking code. I created a utility class to make using retries easier. The class and examples of how to use it are below. Feel free to copy and/or modify this code to suit your needs. If you found them useful let me know in the comments.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace Demos
{
public static class RetryHelpers
{
/// <summary>
/// Retries an action at a specified interval.
/// </summary>
/// <param name="action">The action to retry.</param>
/// <param name="retryInterval">The retry interval in milliseconds.</param>
/// <param name="maxAttempts">The maximum number of times to retry before throwing the exception.</param>
public static void LinearRetry( Action action, int retryInterval = 2000, int maxAttempts = 5 )
{
for ( int i = 1; i <= maxAttempts; i++ )
{
try
{
action();
return;
}
catch
{
if ( i == maxAttempts )
throw;
Thread.Sleep( retryInterval );
}
}
}
/// <summary>
/// Retries an action at a specified interval and returns the result of that action (if successful).
/// </summary>
/// <typeparam name="T">The expected return type.</typeparam>
/// <param name="action">The action to retry.</param>
/// <param name="retryInterval">The retry interval in milliseconds.</param>
/// <param name="maxAttempts">The maximum number of times to retry before throwing the exception.</param>
/// <returns>The result of the action (if successful).</returns>
public static T LinearRetry<T>( Func<T> action, int retryInterval = 2000, int maxAttempts = 5 )
{
T result = default( T );
LinearRetry( () => { result = action(); }, maxAttempts, retryInterval );
return result;
}
/// <summary>
/// Retries an action at an exponentially-increasing interval.
/// </summary>
/// <param name="action">The action to retry.</param>
/// <param name="initialInterval">The initial retry interval.</param>
/// <param name="maxAttempts">The maximum number of times to retry before throwing the exception.</param>
/// <param name="exponent">The exponent used to increase the initial interval between retries.</param>
public static void ExponentialBackoff( Action action, int initialInterval = 2000, int maxAttempts = 5, int exponent = 2 )
{
int retryInterval = initialInterval;
for ( int i = 1; i <= maxAttempts; i++ )
{
try
{
action();
return;
}
catch
{
if ( i == maxAttempts )
throw;
Thread.Sleep( retryInterval );
retryInterval *= exponent;
}
}
}
/// <summary>
/// Retries an action at an exponentially-increasing interval and returns the result of that action (if successful).
/// </summary>
/// <typeparam name="T">The expected return type.</typeparam>
/// <param name="action">The action to retry.</param>
/// <param name="initialInterval">The initial retry interval.</param>
/// <param name="maxAttempts">The maximum number of times to retry before throwing the exception.</param>
/// <param name="exponent">The exponent used to increase the initial interval between retries.</param>
/// <returns>The result of the action (if successful).</returns>
public static T ExponentialBackoff<T>( Func<T> action, int initialInterval = 2000, int maxAttempts = 5, int exponent = 2 )
{
T result = default( T );
ExponentialBackoff( () => { result = action(); }, initialInterval, maxAttempts, exponent );
return result;
}
}
}
view raw RetryHelpers.cs hosted with ❤ by GitHub

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace Demos
{
class Program
{
static void Main( string[] args )
{
var service = new MyRemoteService();
// Retry simple operations that don't return anything.
RetryHelpers.LinearRetry( service.Connect );
RetryHelpers.ExponentialBackoff( service.Connect );
// Retry operations that do return values.
MyResource resource = RetryHelpers.LinearRetry( () =>
{
return service.GetResource( 5 );
} );
MyResource resource2 = RetryHelpers.ExponentialBackoff( () =>
{
return service.GetResource( 5 );
} );
}
}
}
view raw Program.cs hosted with ❤ by GitHub

Conclusion

Dealing with transient failures is something you must do when developing for a cloud environment. The retry pattern makes this easy and helps reduce breakdowns and errors caused by these failures and makes applications more resilient. Your homework for this article is to identify potential failure points in your application and add retry logic to them. As always if you have any questions feel free to ask in the comments.


© 2020 Jesse Barocio. Built with Gatsby