To give you some context of how that compares to typical numbers, we normally take in more than 500 million Tweets a day which means about 5,700 Tweets a second, on average. This particular spike was around 25 times greater than our steady state.
Twitter managed to run with a single-master MySQL until 2010 and they still use it, along with their own sharding infrastructure, to store tweets.
Engineering is set up with (mostly) self-contained teams that can run independently and very quickly. This means that we bias toward teams spinning up and running their own services that can call the back end systems. This has huge implications on operations, however.
We’ve traded our fragile monolithic application for a more robust and encapsulated, but also complex, services oriented application. We had to invest in tools to make managing this beast possible. Given the speed with which we were creating new services, we needed to make it incredibly easy to gather data on how well each service was doing. By default, we wanted to make data-driven decisions, so we needed to make it trivial and frictionless to get that data.
We integrated a system we call Decider across all our services. It allows us to flip a single switch and have multiple systems across our infrastructure all react to that change in near-real time. This means software and multiple systems can go into production when teams are ready, but a particular feature doesn’t need to be “active”. Decider also allows us to have the flexibility to do binary and percentage based switching such as having a feature available for x% of traffic or users. We can deploy code in the fully “off” and safe setting, and then gradually turn it up and down until we are confident it’s operating correctly and systems can handle the new load.
- Finagle: A Protocol-Agnostic RPC System
- Snowflake - ID generation
- Gizzard: a library for creating distributed datastores
- High Scalability: How Twitter Stores 250 Million Tweets A Day Using MySQL