Retry: Patience as Policy

What You'll Learn

  • How to configure retry policies with different backoff strategies
  • When to use fixed, exponential, linear, or jittered delays
  • How to filter retries by exception type
  • How to monitor retry attempts with RetryEvent
  • How retry integrates with VTask, IOPath, and VTaskPath

Networks are unreliable. Services restart. Databases hiccup during failover. Most of these failures are transient: the same request that failed at 14:32:07.003 would have succeeded at 14:32:07.250. A retry policy encodes the belief that patience will be rewarded, whilst also setting a limit on how much patience to exercise.

RetryPolicy

RetryPolicy is an immutable configuration object that describes how to retry: how many times, how long to wait, and which failures are worth retrying.

Factory Methods

// Fixed delay: same wait between every attempt
RetryPolicy fixed = RetryPolicy.fixed(3, Duration.ofMillis(100));
// Delays: 100ms, 100ms, 100ms

// Exponential backoff: doubling delays
RetryPolicy exponential = RetryPolicy.exponentialBackoff(5, Duration.ofSeconds(1));
// Delays: 1s, 2s, 4s, 8s, 16s (capped at maxDelay)

// Exponential with jitter: randomised to prevent thundering herd
RetryPolicy jittered = RetryPolicy.exponentialBackoffWithJitter(5, Duration.ofSeconds(1));
// Delays: ~1s, ~2s, ~4s (each randomised between 0 and the calculated delay)

// Linear backoff: delays increase by a fixed increment
RetryPolicy linear = RetryPolicy.linear(5, Duration.ofMillis(200));
// Delays: 200ms, 400ms, 600ms, 800ms, 1000ms

// No retry: fail immediately
RetryPolicy none = RetryPolicy.noRetry();

Choosing a Backoff Strategy

    Fixed           Exponential        Exponential         Linear
    (predictable)   (aggressive)       + Jitter            (gentle)
                                       (distributed)

    ──X──X──X──     ──X─X──X────X──    ──X─X───X──X────   ──X──X───X────X──
      │  │  │         │ │  │    │        │ │   │  │         │  │   │    │
    100 100 100     100 200 400 800    ~100 ~200 ~400 ~800  200 400 600 800
    ms  ms  ms      ms  ms  ms  ms     ms   ms   ms   ms   ms  ms  ms  ms
StrategyBest forRisk
FixedKnown recovery time (e.g., lock contention)Can overwhelm a recovering service
ExponentialUnknown recovery timeSlow convergence for quick recoveries
Exponential + JitterMultiple clients retrying the same serviceSlightly less predictable
LinearGentle ramp-up, moderate recovery timesSlower backoff than exponential

Configuration

Policies are immutable. Configuration methods return new instances:

RetryPolicy policy = RetryPolicy.exponentialBackoff(5, Duration.ofMillis(100))
    .withMaxDelay(Duration.ofSeconds(30))   // Cap the maximum wait
    .retryOn(IOException.class);             // Only retry I/O errors

Custom Retry Predicates

RetryPolicy selective = RetryPolicy.fixed(3, Duration.ofMillis(100))
    .retryIf(ex ->
        ex instanceof IOException
        || ex instanceof TimeoutException
        || (ex instanceof HttpException http && http.statusCode() >= 500));

The Builder

For complex policies, the builder offers full control:

RetryPolicy policy = RetryPolicy.builder()
    .maxAttempts(5)
    .initialDelay(Duration.ofMillis(100))
    .backoffMultiplier(2.0)
    .maxDelay(Duration.ofSeconds(30))
    .useJitter(true)
    .retryOn(IOException.class)
    .onRetry(event -> log.warn("Retry #{}: {}",
        event.attemptNumber(), event.lastException().getMessage()))
    .build();

Monitoring with RetryEvent

The onRetry listener receives a RetryEvent before each retry attempt:

RetryPolicy monitored = RetryPolicy.exponentialBackoff(5, Duration.ofSeconds(1))
    .onRetry(event -> {
        log.warn("Attempt {} failed after {}: {}",
            event.attemptNumber(),
            event.nextDelay(),
            event.lastException().getMessage());
        metrics.incrementRetryCount(event.attemptNumber());
    });

RetryEvent contains:

FieldTypeDescription
attemptNumber()intThe 1-based attempt that just failed
lastException()ThrowableThe exception that triggered this retry
nextDelay()DurationHow long the system will wait before the next attempt
timestamp()InstantWhen this event occurred

Using Retry

Direct Execution

The Retry utility class executes an operation immediately with retry:

String response = Retry.execute(policy, () -> httpClient.get(url));

// Convenience methods
String fast = Retry.withExponentialBackoff(3, Duration.ofMillis(100),
    () -> httpClient.get(url));

String fixed = Retry.withFixedDelay(3, Duration.ofMillis(100),
    () -> httpClient.get(url));

VTask-Native Retry

For lazy, composable retry, use Retry.retryTask():

// Wrap any VTask with retry
VTask<String> resilient = Retry.retryTask(
    VTask.of(() -> httpClient.get(url)),
    RetryPolicy.exponentialBackoffWithJitter(3, Duration.ofMillis(200))
        .retryOn(IOException.class));

// Simple form with default exponential backoff
VTask<String> resilient2 = Retry.retryTask(
    VTask.of(() -> httpClient.get(url)),
    RetryPolicy.exponentialBackoff(3, Duration.ofSeconds(1)));

Both forms return a lazy VTask. Nothing executes until you call run(), runSafe(), or runAsync().

Retry with Fallback

VTask<String> withFallback = Retry.retryTaskWithFallback(
    VTask.of(() -> httpClient.get(url)),
    RetryPolicy.exponentialBackoff(3, Duration.ofSeconds(1)),
    lastError -> "default response");

Retry with Recovery Task

VTask<String> withRecovery = Retry.retryTaskWithRecovery(
    VTask.of(() -> primaryService.get(url)),
    RetryPolicy.exponentialBackoff(3, Duration.ofSeconds(1)),
    lastError -> VTask.of(() -> backupService.get(url)));

IOPath and VTaskPath Integration

// IOPath
IOPath<Response> resilient = IOPath.delay(() -> httpClient.get(url))
    .withRetry(RetryPolicy.exponentialBackoff(3, Duration.ofSeconds(1)));

// VTaskPath (once Path API integration is complete)
VTaskPath<Response> resilient = Path.vtask(() -> httpClient.get(url))
    .withRetry(RetryPolicy.exponentialBackoff(3, Duration.ofSeconds(1)));

Handling Exhausted Retries

When all attempts fail, RetryExhaustedException is thrown with the last failure as its cause:

try {
    resilient.run();
} catch (RetryExhaustedException e) {
    log.error("All {} retries failed: {}", e.getAttempts(), e.getMessage());
    Throwable lastFailure = e.getCause();
    // Handle the last failure specifically
}

Composing Retry with Other Patterns

Retry composes naturally with other resilience patterns and effect combinators:

VTask<Data> robust = Retry.retryTask(
        VTask.of(() -> primarySource.fetch()),
        RetryPolicy.exponentialBackoff(3, Duration.ofSeconds(1)))
    .recover(e -> {
        log.warn("Primary exhausted, trying backup", e);
        return Retry.retryTask(
            VTask.of(() -> backupSource.fetch()),
            RetryPolicy.fixed(2, Duration.ofMillis(100))
        ).run();
    })
    .recover(e -> {
        log.error("All sources failed", e);
        return Data.empty();
    });

Quick Reference

PatternCode
Fixed delayRetryPolicy.fixed(3, Duration.ofMillis(100))
Exponential backoffRetryPolicy.exponentialBackoff(5, Duration.ofSeconds(1))
With jitterRetryPolicy.exponentialBackoffWithJitter(5, Duration.ofSeconds(1))
Linear backoffRetryPolicy.linear(5, Duration.ofMillis(200))
Cap max delay.withMaxDelay(Duration.ofSeconds(30))
Retry specific errors.retryOn(IOException.class)
Custom predicate.retryIf(ex -> ...)
Monitor retries.onRetry(event -> ...)
Apply to VTaskRetry.retryTask(task, policy)
Apply to IOPathpath.withRetry(policy)
Simple retryRetry.retryTask(task, 3)
Retry with fallbackRetry.retryTaskWithFallback(task, policy, fallbackFn)
Retry with recoveryRetry.retryTaskWithRecovery(task, policy, recoveryFn)

See Also


Previous: Resilience Patterns Next: Circuit Breaker