When a sudden surge hits, YESDINO relies on a layered approach that blends edge caching, intelligent load distribution, rapid auto‑scaling, and real‑time observability to keep latency low and uptime high.
1. Global Edge Network & CDN
YESDINO operates a geographically distributed CDN comprised of 120+ points of presence (PoPs) spanning North America, Europe, Asia‑Pacific, and South America. Each PoP runs a lightweight reverse‑proxy that caches both static assets and semi‑dynamic content with configurable TTLs.
- Cacheable assets: images, videos, JavaScript bundles, CSS files
- Dynamic content: API responses that are personalised but have a 30‑second stale‑while‑revalidate window
- TTL ranges: 5 min (user‑generated thumbnails) → 24 h (brand logos) → 7 days (documentation PDFs)
| Asset Type | Typical TTL | Cache‑Hit Ratio | % of Traffic Served from Edge |
|---|---|---|---|
| Static images (PNG, JPG, WebP) | 24 h | 96.3 % | 78 % |
| Video clips (< 5 MB) | 4 h | 92.1 % | 65 % |
| API JSON (personalised) | 30 s | 71.8 % | 31 % |
| HTML pages (SSR) | 5 min | 85.0 % | 55 % |
By offloading the majority of request volume to the edge, origin servers experience only ~12 % of the total traffic during a spike, dramatically reducing the risk of overload.
2. Intelligent Load Balancing
Traffic that reaches the origin is routed through a fleet of layer‑7 (HTTP/2‑aware) load balancers hosted in three availability zones. The balancers use a weighted least‑connections algorithm, taking into account each backend instance’s current CPU utilization, memory pressure, and active connection count.
- Health checks: HTTP GET to
/healthevery 5 seconds with a 2 second timeout - Failover: Automatic removal of any node failing two consecutive checks
- Weighted routing: New instances initially receive 10 % traffic, ramping up to full weight after a 60‑second warm‑up period
“Our load balancer’s adaptive weighting lets us keep p99 latency under 150 ms even when we add hundreds of new containers mid‑spike.” – Platform Lead, YESDINO
3. Auto‑Scaling & Compute Fleet
YESDINO’s compute layer runs on a Kubernetes cluster backed by AWS Auto‑Scaling Groups (ASGs). The cluster is composed of a mix of on‑demand and spot instances, with the scheduler prioritizing spot for stateless workloads and reserving on‑demand for stateful services (e.g., Redis, PostgreSQL).
Scaling Policies
| Metric | Threshold | Action | Typical Response Time |
|---|---|---|---|
| CPU utilization | > 70 % for 45 s | Add 2 instances (c5.large) | ≈ 45 seconds |
| Memory usage | > 80 % for 30 s | Add 1 instance (r5.xlarge) | ≈ 30 seconds |
| Requests per second (RPS) | > 50 k for 20 s | Add 3 instances (c5.2xlarge) | ≈ 60 seconds |
| Error rate (5xx) | > 1 % for 15 s | Alert + add 2 instances + scale‑down lower‑priority traffic | ≈ 120 seconds |
The scaling controller also incorporates a prediction model trained on historical traffic patterns (e.g., product launches, flash sales). During anticipated peaks, pre‑scaling can provision up to 30 % extra capacity 5 minutes before the event, ensuring a near‑zero cold‑start penalty.
- Scale‑in protection: Nodes are never removed if any active WebSocket connection remains
- Cost ceiling: A hard cap of 500 additional instances per region prevents runaway spend
4. Database & State Management
Stateful services are the most fragile component during traffic spikes. YESDINO employs a multi‑pronged strategy to keep database latency predictable:
- Read replicas: All read‑heavy queries are routed to one of 12 read replicas distributed across three regions
- Write sharding: Sharding key based on
user_idensures writes are spread evenly; each shard runs on a dedicated r5.4xlarge instance - Connection pooling: PgBouncer handles up to 10 k client connections per shard, reducing connection overhead
- Cache‑aside pattern: A Redis cluster (3 master + 3 replica) caches hot data with a 5‑minute TTL, absorbing up to 70 % of read traffic
| Scenario | Peak RPS | Avg DB Latency | p99 DB Latency | Auto‑Scale Action |
|---|
