Circuit Breaker: Learning When to Stop

What You'll Learn

  • How the circuit breaker state machine works (CLOSED, OPEN, HALF_OPEN)
  • How to configure failure thresholds, recovery timeouts, and call timeouts
  • How to protect VTask operations with a shared circuit breaker
  • How to use fallbacks when the circuit is open
  • How to monitor circuit breaker health via metrics

A retry policy assumes that trying again will eventually work. Sometimes it will not. If the database is down, retrying every 200 milliseconds for the next 30 seconds just generates 150 doomed requests. The service needs time to recover, and hammering it with traffic makes recovery harder.

A circuit breaker solves this by tracking recent failures and, when enough accumulate, stopping calls entirely for a cooling-off period. After the period expires, it cautiously allows a single probe request through. If the probe succeeds, normal traffic resumes. If it fails, the circuit re-opens.

The State Machine

    Normal operation              Service failing              Probing recovery
    ┌────────────────┐           ┌────────────────┐           ┌────────────────┐
    │                │  failures │                │  timeout  │                │
    │     CLOSED     │  reach    │      OPEN      │  expires  │   HALF_OPEN    │
    │                │  threshold│                │           │                │
    │  All calls     │──────────▶│  All calls     │──────────▶│  One probe     │
    │  flow through  │           │  rejected with │           │  call allowed  │
    │                │           │  CircuitOpen-  │           │                │
    │  Failures      │           │  Exception     │           │  Success:      │
    │  counted       │           │                │           │  close circuit │
    │                │◀──────────│                │◀──────────│                │
    │  Successes     │  probes   │  No calls      │  probe    │  Failure:      │
    │  reset count   │  succeed  │  reach service │  fails    │  re-open       │
    └────────────────┘           └────────────────┘           └────────────────┘
StateBehaviourTransitions to
CLOSEDAll calls allowed. Consecutive failures counted.OPEN (when failures reach threshold)
OPENAll calls rejected immediately with CircuitOpenException.HALF_OPEN (after open duration expires)
HALF_OPENOne probe call allowed.CLOSED (probe succeeds) or OPEN (probe fails)

Configuration

CircuitBreakerConfig config = CircuitBreakerConfig.builder()
    .failureThreshold(5)                      // 5 failures before opening
    .successThreshold(3)                      // 3 probes must succeed to close
    .openDuration(Duration.ofSeconds(30))     // Stay open for 30 seconds
    .callTimeout(Duration.ofSeconds(5))       // Each call times out after 5s
    .recordFailure(ex ->                      // Only count certain exceptions
        !(ex instanceof BusinessValidationException))
    .build();
SettingDefaultDescription
failureThreshold5Consecutive failures before the circuit opens
successThreshold1Successful probes in HALF_OPEN before closing
openDuration60sHow long the circuit stays open
callTimeout10sTimeout applied to each protected call
recordFailureall exceptionsPredicate determining which exceptions count

The recordFailure predicate is important: not every exception means the service is unhealthy. A 400 Bad Request or a business validation error reflects a problem with the request, not the service. Only count failures that indicate the service itself is struggling.

Creating a Circuit Breaker

// With custom configuration
CircuitBreaker breaker = CircuitBreaker.create(config);

// With sensible defaults
CircuitBreaker breaker = CircuitBreaker.withDefaults();

Protecting VTask Operations

The protect() method is generic. A single circuit breaker instance can protect calls that return different types:

CircuitBreaker paymentBreaker = CircuitBreaker.create(
    CircuitBreakerConfig.builder()
        .failureThreshold(3)
        .openDuration(Duration.ofSeconds(30))
        .build());

// Protects a call returning String
VTask<String> getStatus = paymentBreaker.protect(
    VTask.of(() -> paymentService.getStatus(orderId)));

// Same breaker protects a call returning BigDecimal
VTask<BigDecimal> getBalance = paymentBreaker.protect(
    VTask.of(() -> paymentService.getBalance(accountId)));

// Both share state: failures from either call count towards the threshold

This is the correct design. A circuit breaker protects a service endpoint, not a specific return type.

Fallbacks

When the circuit is open, protect() throws CircuitOpenException. Use protectWithFallback() to provide a default value instead:

VTask<String> withFallback = paymentBreaker.protectWithFallback(
    VTask.of(() -> paymentService.getStatus(orderId)),
    ex -> "status-unavailable");

Or compose with recover() for more control:

VTask<String> resilient = paymentBreaker.protect(
        VTask.of(() -> paymentService.getStatus(orderId)))
    .recover(ex -> {
        if (ex instanceof CircuitOpenException coe) {
            log.warn("Payment service down, retry after {}", coe.retryAfter());
            return cachedStatus(orderId);
        }
        return "unknown";
    });

Metrics

CircuitBreakerMetrics m = breaker.metrics();

log.info("Circuit breaker: total={}, success={}, failed={}, rejected={}, transitions={}",
    m.totalCalls(), m.successfulCalls(), m.failedCalls(),
    m.rejectedCalls(), m.stateTransitions());
MetricDescription
totalCallsTotal calls attempted (including rejected)
successfulCallsCalls that completed successfully
failedCallsCalls that failed (counted by the failure predicate)
rejectedCallsCalls rejected because the circuit was open
stateTransitionsNumber of state transitions
lastStateChangeWhen the last transition occurred

Manual Control

// Reset to CLOSED with zeroed counters
breaker.reset();

// Manually trip to OPEN (e.g., during maintenance)
breaker.tripOpen();

// Inspect current state
CircuitBreaker.Status status = breaker.currentStatus();

Combining with Retry

A common pattern is to combine circuit breaker with retry. The order matters:

// Circuit breaker inside retry: each retry attempt checks the circuit
VTask<String> resilient = Retry.retryTask(
    paymentBreaker.protect(VTask.of(() -> paymentService.get(url))),
    RetryPolicy.exponentialBackoff(3, Duration.ofMillis(200))
        .retryIf(ex -> !(ex instanceof CircuitOpenException)));

Note the retry predicate: CircuitOpenException should not be retried, because the circuit breaker has already determined the service is unhealthy. Use ResilienceBuilder for correct ordering without manual wiring:

VTask<String> resilient = Resilience.<String>builder(
        VTask.of(() -> paymentService.get(url)))
    .withCircuitBreaker(paymentBreaker)
    .withRetry(RetryPolicy.exponentialBackoff(3, Duration.ofMillis(200)))
    .build();

See Also

  • Retry -- backoff strategies and retry configuration
  • Bulkhead -- concurrency limiting
  • Combined Patterns -- composing all patterns with ResilienceBuilder

Previous: Retry Next: Bulkhead