Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Cascading Failures (Anti-Pattern) Medium

A cascading failure occurs when a failure in one component of an interconnected system triggers failures in dependent components, creating a domino effect that can bring down the entire system. This is an anti-pattern — something to recognize and prevent.

How It Happens

Service A (overloaded)
    → times out responding to Service B
        → Service B's thread pool fills up waiting on A
            → Service C can't reach B
                → System-wide outage

Example: The Problem

package main

import (
	"fmt"
	"net/http"
	"time"
)

// BAD: No timeout, no circuit breaker, no bulkhead.
// If serviceA is slow, this handler holds a goroutine and connection
// indefinitely, eventually exhausting server resources.
func handleRequest(w http.ResponseWriter, r *http.Request) {
	resp, err := http.Get("http://service-a/api/data")
	if err != nil {
		// Service A is down — but we've already waited a long time.
		// Meanwhile, hundreds of requests piled up behind us.
		http.Error(w, "service unavailable", http.StatusServiceUnavailable)
		return
	}
	defer resp.Body.Close()

	fmt.Fprintf(w, "got data from service A")
}

Prevention Strategies

1. Timeouts

Always set deadlines on outbound calls.

client := &http.Client{
	Timeout: 2 * time.Second,
}
resp, err := client.Get("http://service-a/api/data")

2. Circuit Breaker

Stop calling a failing service to give it time to recover (see Circuit-Breaker).

3. Bulkheads

Isolate resource pools per dependency so one slow service doesn’t consume all resources (see Bulkheads).

4. Fail-Fast

Check dependency health before attempting expensive work (see Fail-Fast).

5. Graceful Degradation

Return cached or default responses when a dependency is unavailable.

func getData(client *http.Client, cache *Cache) (string, error) {
	resp, err := client.Get("http://service-a/api/data")
	if err != nil {
		// Fall back to cached data instead of failing entirely.
		if cached, ok := cache.Get("data"); ok {
			return cached, nil
		}
		return "", err
	}
	defer resp.Body.Close()
	// ... process response
}

Rules of Thumb

  • Every network call needs a timeout. No exceptions.
  • Design for failure: assume every dependency will fail and plan what happens when it does.
  • Monitor inter-service latency and error rates. Cascading failures often start with a subtle latency increase long before a hard failure.
  • Test failure scenarios with chaos engineering tools to verify that your safeguards actually work.
  • Combine multiple stability patterns (timeouts + circuit breaker + bulkhead) for defense in depth.