Incident Post Mortem: May 19, 2021

By Bryant Khau and Leonardo Zizzamia

Summary

Between 5:50am and 7:38am PT on Wednesday, May 19th, there were connectivity issues with coinbase.com, Coinbase mobile apps, and Coinbase Pro. During this time, many users experienced slow load times and errors while attempting to access Coinbase, including features like buying, selling and trading. This post will detail the outage, explain what caused it, and describe the changes we’ve made to prevent similar failures going forward.

The Outage

There was a large spike of traffic due to many users reacting to a sudden price drop in the crypto market leading up to this incident (ETH dropped 20%, BTC dropped 25%). A group of oncall engineers convened after being paged for high error rates across several services.

The affected services were:

Logged out web servers: This caused users that weren’t logged-in to hit an error page when visiting coinbase.com.
GraphQL service: This caused parts of the mobile app to load very slowly and error ~10% of the time.
Coinbase Pro API: This caused Coinbase Pro to be partially unreachable.
Non-US card payment processing service: This caused non-US customers attempting to buy crypto with a card to be rejected.

Once these issues were identified, engineers split into different groups to investigate each issue in parallel and prioritize follow up actions.

Root Cause Analysis

In the days since the outage, we have reconstructed a clear picture of what happened since the first minute.

The Logged out coinbase.com pages were largely unreachable as the instances started failing and took over 40 minutes to return to a healthy state. The rapid spike in requests ended up hitting a max threshold in Nginx router connections, which was manually increased during the incident. This ultimately addressed the bottleneck.

2. We saw timeouts and increased latency on our GraphQL service, which aggregates data from underlying services. The timeouts were caused by GraphQL autoscaling up too slowly. The autoscaling eventually caught up and the errors subsided, restoring functionality to the mobile app and logged-in users.

3. We saw that the database that powers the Coinbase Pro exchange had high latency and CPU load. Additionally the API servers that run our market data feed were under high CPU load. We increased the operation throughput configured on the database and also provisioned more API servers.

4. In our Non-US card payment processing service, the number of failed payments increased as the queue to process the payments became backlogged. We increased the number of queue workers and card payments started succeeding.

Improvements

At Coinbase, we’ve committed significant resources to improving our reliability, including regular load tests to prepare us for high periods of traffic. However, this incident has identified some blind spots to address, especially around very sudden spikes of traffic.

A common theme around several of the failures in this incident were autoscaling rules that weren’t tuned to the nature of traffic spikes that crypto markets can cause. We’re working on tailoring our load tests to better simulate real world situations, such as sudden traffic spikes. This will help surface more issues like untuned autoscaling rules, during controlled testing.

Another improvement that we are investing in is the implementation of kill switches for parts of the client application so that when failures happen, we can keep unaffected parts of our applications working while we work to address the failures.

We take the uptime and performance of our infrastructure very seriously, and we’re working hard to support the millions of customers that choose Coinbase to manage their cryptocurrency. If you’re interested in solving scaling challenges like those presented here, come work with us.

Incident Post Mortem: May 19, 2021 was originally published in The Coinbase Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

editorial staff

News

Dogecoin Bounces Back With 8% Gain—Is $0.26 In Sight?

News

Ripple USD Finds First Banking Partner as Swiss Bank Amina Offers RLUSD Access

News

Bitcoin Price Coiling Up — Is a Surge Past $110K on Deck?

News

Ethereum Liquid Staking Hits New ATH With 35.5 Million ETH Locked – Will Price Follow?

News

US Senator Pushes Crypto Tax Bill With $300 De Minimis Threshold

News

DeFi TVL breaks above $116B as lending roars back

Name	Price	24H (%)
DSLA Protocol (DSLA)	€0.000000	-6.88%
Lympo (LYM)	€0.000000	-4.43%
YAM v2 (YAMV2)	€0.000000	-1.41%
PolkaBridge (PBR)	€0.000000	-7.02%
Bitball (BTB)	€0.000000	0.37%
Cornichon (CORN)	€0.000000	-0.86%
Stacy (STACY)	€0.000000	0.00%
Lunch Money (LMY)	€0.000000	0.00%
Relevant (REL)	€0.000000	1.67%
Heart Number (HTN)	€0.000000	-30.47%

Name	Price	24H (%)
DSLA Protocol (DSLA)	$0.003679	-6.88%
Lympo (LYM)	$0.004392	-4.43%
YAM v2 (YAMV2)	$4.70	-1.41%
PolkaBridge (PBR)	$0.439876	-7.02%
Bitball (BTB)	$0.001977	0.37%
Cornichon (CORN)	$0.073096	-0.86%
Stacy (STACY)	$0.000710	0.00%
Lunch Money (LMY)	$0.000418	0.00%
Relevant (REL)	$0.79	1.67%
Heart Number (HTN)	$0.000553	-30.47%

#bitcoin

FXMETERS @fxmeters·

8 Sep 2021

Saxo Bank Review | FX Meters https://www.fxmeters.com/reviews/saxo-bank-review/?utm_source=ReviveOldPost&utm_medium=social&utm_campaign=ReviveOldPost #trading #ethereum #bitcoin #forex #ethereum #crypto #cryptocurrency #forextrading #btc

Reply on Twitter 1435611188856070149 Retweet on Twitter 1435611188856070149 Like on Twitter 1435611188856070149 Twitter 1435611188856070149

Le Renard ➐ @Le_renardy·

8 Sep 2021

Un pump comme tu en a jamais vu.

Contrat : 0xd39a081b9d368fca3d90054a5d78478776c8909b

(ceci n'est pas un conseil financier)

#LegalLeaf #crypto #btc #bitcoin #pump #ATH

Reply on Twitter 1435611188289736707 Retweet on Twitter 1435611188289736707 Like on Twitter 1435611188289736707 Twitter 1435611188289736707

Cardano Rainbow Pool @rainbow_pool·

8 Sep 2021

Wants to know how to stake $ADA for passive income?

🔴Check this 👇👇👇 out.

https://youtube.com/watch?v=C_gSUIEm-JA

#staking #Cardano #CardanoADA #cryptocurrencies #Crypto $ADA #Bitcoin #Binance #etoro #passiveincome #RainbowPool #CRBP

Reply on Twitter 1435611186570072064 Retweet on Twitter 1435611186570072064 Like on Twitter 1435611186570072064 Twitter 1435611186570072064

Diana Verónica y Tony @dianavytony·

8 Sep 2021

🎙️ ((AL AIRE)) Conversamos esta mañana con @Beiioso sobre la implementación del #Bitcoin en El Salvador.

🔴EN VIVO » 105.3 FM @Punto105 📻 #TuneIn Punto 105📲 7850-2060 📲 https://fb.watch/7Uwhzypgfr/

Reply on Twitter 1435611182585483265 Retweet on Twitter 1435611182585483265 Like on Twitter 1435611182585483265 Twitter 1435611182585483265

#crypto

Crypto Exchange Listings | CoinListingRush @CoinListingRush·

3 Apr 2023

.@LiqsRush provides realtime perpetual future liquidations from the most popular #crypto exchanges.

Reply on Twitter 1642905292512821249 Retweet on Twitter 1642905292512821249 Like on Twitter 1642905292512821249 Twitter 1642905292512821249

PayGG @PayGG5·

3 Apr 2023

@Ralvero The most strongest and powerful crypto community is @LNRDAO which is trying all it possible best to get it project $LNR to the top and you can check it out through this link https://discord.gg/lnr @LNR @LNRCrystalNFTs #btc #bnb #doge #bscgems #crypto

Reply on Twitter 1642905283662938116 Retweet on Twitter 1642905283662938116 Like on Twitter 1642905283662938116 Twitter 1642905283662938116