How does YESDINO handle traffic spikes

When a sudden surge hits, YESDINO relies on a layered approach that blends edge caching, intelligent load distribution, rapid auto‑scaling, and real‑time observability to keep latency low and uptime high.

1. Global Edge Network & CDN

YESDINO operates a geographically distributed CDN comprised of 120+ points of presence (PoPs) spanning North America, Europe, Asia‑Pacific, and South America. Each PoP runs a lightweight reverse‑proxy that caches both static assets and semi‑dynamic content with configurable TTLs.

  • Cacheable assets: images, videos, JavaScript bundles, CSS files
  • Dynamic content: API responses that are personalised but have a 30‑second stale‑while‑revalidate window
  • TTL ranges: 5 min (user‑generated thumbnails) → 24 h (brand logos) → 7 days (documentation PDFs)
Asset Type Typical TTL Cache‑Hit Ratio % of Traffic Served from Edge
Static images (PNG, JPG, WebP) 24 h 96.3 % 78 %
Video clips (< 5 MB) 4 h 92.1 % 65 %
API JSON (personalised) 30 s 71.8 % 31 %
HTML pages (SSR) 5 min 85.0 % 55 %

By offloading the majority of request volume to the edge, origin servers experience only ~12 % of the total traffic during a spike, dramatically reducing the risk of overload.

2. Intelligent Load Balancing

Traffic that reaches the origin is routed through a fleet of layer‑7 (HTTP/2‑aware) load balancers hosted in three availability zones. The balancers use a weighted least‑connections algorithm, taking into account each backend instance’s current CPU utilization, memory pressure, and active connection count.

  • Health checks: HTTP GET to /health every 5 seconds with a 2 second timeout
  • Failover: Automatic removal of any node failing two consecutive checks
  • Weighted routing: New instances initially receive 10 % traffic, ramping up to full weight after a 60‑second warm‑up period

“Our load balancer’s adaptive weighting lets us keep p99 latency under 150 ms even when we add hundreds of new containers mid‑spike.” – Platform Lead, YESDINO

3. Auto‑Scaling & Compute Fleet

YESDINO’s compute layer runs on a Kubernetes cluster backed by AWS Auto‑Scaling Groups (ASGs). The cluster is composed of a mix of on‑demand and spot instances, with the scheduler prioritizing spot for stateless workloads and reserving on‑demand for stateful services (e.g., Redis, PostgreSQL).

Scaling Policies

Metric Threshold Action Typical Response Time
CPU utilization > 70 % for 45 s Add 2 instances (c5.large) ≈ 45 seconds
Memory usage > 80 % for 30 s Add 1 instance (r5.xlarge) ≈ 30 seconds
Requests per second (RPS) > 50 k for 20 s Add 3 instances (c5.2xlarge) ≈ 60 seconds
Error rate (5xx) > 1 % for 15 s Alert + add 2 instances + scale‑down lower‑priority traffic ≈ 120 seconds

The scaling controller also incorporates a prediction model trained on historical traffic patterns (e.g., product launches, flash sales). During anticipated peaks, pre‑scaling can provision up to 30 % extra capacity 5 minutes before the event, ensuring a near‑zero cold‑start penalty.

  • Scale‑in protection: Nodes are never removed if any active WebSocket connection remains
  • Cost ceiling: A hard cap of 500 additional instances per region prevents runaway spend

4. Database & State Management

Stateful services are the most fragile component during traffic spikes. YESDINO employs a multi‑pronged strategy to keep database latency predictable:

  • Read replicas: All read‑heavy queries are routed to one of 12 read replicas distributed across three regions
  • Write sharding: Sharding key based on user_id ensures writes are spread evenly; each shard runs on a dedicated r5.4xlarge instance
  • Connection pooling: PgBouncer handles up to 10 k client connections per shard, reducing connection overhead
  • Cache‑aside pattern: A Redis cluster (3 master + 3 replica) caches hot data with a 5‑minute TTL, absorbing up to 70 % of read traffic

Leave a Comment

Your email address will not be published. Required fields are marked *

Shopping Cart
Scenario Peak RPS Avg DB Latency p99 DB Latency Auto‑Scale Action