GitHub Availability Report: April 2026 — 10 Incidents, Including a 30% Scraping Attack
GitHub Availability Report: April 2026 — 10 Incidents, Including a 30% Scraping Attack
On May 14, GitHub published its April 2026 availability report. It was a rough month: 10 incidents across code search, Copilot, Pages, Codespaces, Actions, and more. Here’s what happened and what GitHub is doing about it.
Incident Summary
| Date | Service | Duration | Impact |
|---|---|---|---|
| Apr 1 | Code Search | 8h 43m | 100% query failure, full re-index needed |
| Apr 1 | Audit Log | 4m | 4,297 API actors affected |
| Apr 9 | Copilot Agent | 4h 16m | ~84% of new sessions delayed, 54 min queues |
| Apr 13 | Pages | 39m | ~17.5M HTTP 500 errors (12.8% peak) |
| Apr 16 | Codespaces | 3h 22m | ~40% of VS Code starts failed |
| Apr 20 | Code Scanning / Projects | 15h 36m | New PRs not scanned; new issues missing from boards |
| Apr 22 | Copilot Chat | 3h 43m | Full unavailability, then regional recovery |
| Apr 23 | Multi-service | 1h 18m | Copilot, Webhooks, Git, Actions, Deployments — 5-7% traffic |
| Apr 27 | Search (scraping attack) | 6h 15m | Up to 65% of searches timed out across Issues, PRs, more |
| Apr 27+ | Search continued | — | See above, same incident |
The Big Ones
April 27: The Scraping Attack That Took Down Search
The most interesting incident was on April 27. Between 16:15 and 22:46 UTC, GitHub’s search services experienced severe degradation. The cause? A massive anonymous distributed scraping attack.
The attacker used 600,000+ unique IP addresses, with all requests including matching actor information — making standard rate limiting ineffective since each IP stayed below the threshold. This traffic made up 30% of the day’s total search traffic, concentrated within a 4-hour window. The load balancer tier saturated, causing up to 65% of searches to time out across Issues, Pull Requests, Projects, Repositories, Actions, Package Registry, and Dependabot Alerts.
GitHub’s response: scale the load balancer tier, block the traffic, add better connection handling, and implement new controls to allow restricting anonymous traffic to protect registered users.
April 23: DNS Degradation Cascades Across Copilot, Webhooks, Git, Actions
A single-datacenter DNS infrastructure degradation cascaded into a multi-service incident affecting 5-7% of overall traffic. A recently introduced traffic-balancing mechanism caused DNS resolvers to begin failing under a specific load pattern. The impact spread across Copilot (~7% model request failures), Webhooks (elevated latency >3s), Git Operations (1.25% errors), Actions (workflow status delays ~8s), and Deployments (temporarily blocked).
The fix: restart DNS infrastructure. The takeaway: better DNS resilience, safer rollout procedures, and self-healing mechanisms for resolution failures.
April 9: Rate Limit Bug Cripples Copilot Agent
A bug in Copilot’s rate limiting logic applied a global rate limit instead of per-installation. A coincidental 3-4x traffic surge from a client update accelerated the exhaustion. 84% of new agent sessions were delayed, with queue times hitting 54 minutes (normal: 15-40 seconds). A second wave on the same day was caused by a caching bug that persisted the rate-limited state.
Fix: per-installation credentials, disable faulty caching, and better monitoring.
Notable Recurring Themes
-
Cascading failures from shared infrastructure — The April 23 DNS incident is the textbook example. One degraded datacenter component → multiple services affected. GitHub is working on better isolation.
-
Rate limiting scoping — Both the Copilot (Apr 9) and scraping (Apr 27) incidents involve rate limits being either too global or too easy to bypass.
-
Automation causing harm — The April 1 code search outage was triggered by an automated infrastructure change applied too aggressively. The April 13 Pages outage was caused by an automated DNS tool deleting a necessary record.
-
Detection gaps — Multiple incidents had detection delays of 40-53 minutes because monitoring didn’t classify the failure pattern as a risk (e.g., the scraping attack was only discovered while working on mitigation).
What GitHub Is Doing
The report lists specific follow-up actions for each incident. Common threads:
- Stronger DNS resilience and multi-datacenter failover
- More gradual rollouts with better health checks
- Faster detection through improved monitoring and alerting
- Better traffic isolation to prevent cascading impact
- Rate limit hardening — both per-installation scoping and anonymous traffic controls
- Fallback mechanisms for upstream service dependencies (Codespaces VS Code Server, Pages storage)