No time to fail: How to stop databases from damaging your game launch
Percona discusses how to choose your database and approach its implementation
Being a database ninja isn't necessarily a high priority for many game developers. But your database implementation can be the difference between a smooth and successful game launch or a bumpy ride punctuated with bottlenecks, failed logins, and unhappy players.
As demonstrated by indie developer Donkey Crew, when it launched Steam Early Access for its post-apocalyptic survival MMO Last Oasis recently, it's a recurring problem every game developer must tackle.
So why do both AAA and indie games often experience such crushing bottlenecks and performance problems at launch? The answer usually involves ancillary services, or services that a developer might spend less time on internally, or outsource to a third-party company -- such as matchmaking, leaderboards, and social components.
Your database implementation can be the difference between a smooth game launch or a bumpy ride
Typically, each of these features will have its own database install to gather data and manage it. Most game companies will do significant testing on the core game, and the game itself will have reasonable scalability. However, many developers will use services like global logins for accessing the game which may not be tested or tested at scale in the same way. The result? A working game that no-one can access.
Databases play a vital role in game design and development. They store player data, game states, information on performance, and maintain the environments that developer teams have put so much effort into. Without a good database, games can't function properly. It's therefore worth spending time to understand how databases work. Without them, even the biggest companies can't succeed.
Services: trust but verify
To avoid a bumpy launch, it's important to recognise that although many providers supply code and tools that are great for generic workloads, they might not be suitable for your workload or for how components interact in your game. Typically, there is a lot of optimisation that should occur, but that doesn't always happen. When you start to test with unique workloads, you may find limitations or things that were never tested thoroughly in the first place.
This is one reason why using open source software components is so beneficial. If you know what you are doing, you can modify the database code to overcome some of those bottlenecks.
What database to use?
When it comes to selecting a database it will vary depending on what game you are making. You will want to look at the game companies that have already blazed the trail. The more successful your game is, the more traffic it will receive and the more you will push the limits of a lot of the database technologies available. Looking at the technology used by other games can help you learn.
We found that MongoDB, for instance, is very popular with mobile developers and mobile gaming apps for its flexibility and extensibility. MongoDB enables developers to add and remove information very easily and you'll find you don't have the overhead of having to add and adjust your schema on the fly.
MongoDB also stores its internal data in a format that is very easy to read called BSON, which stands for Binary JSON, and is similar to JSON. From an application development perspective, it's almost like a native format as most developers are going to know JSON, and it has that level of integration that makes it popular for mobile games.
Looking at the technology used by other games can help you learn
When developing large persistent multiplayer games, there are going to be all kinds of technologies and tools to assess. A lot of your decisions will be based on the need to access data at speed, which means using in-memory databases.
As an example, one of the world's biggest multiplayer online games was originally built on MySQL NDB Cluster, which used hundreds, if not thousands, of machines clustered together. All the data was stored in memory as the game demanded real-time updates. Nowadays, there are a lot of in-memory databases that exist and which one you choose is going to vary depending on your needs.
For very large games with millions of players, storing huge volumes of data could mean looking at Apache Cassandra or Redis depending on the workload. You may see both being used at once.
For instance, Redis is more suitable for use cases such as caching, leaderboards and scoring while Cassandra lends itself to inventories as it's great for fast-reads on large datasets with predefined indexes (making it the silent friend to open world games with vast crafting systems and gamers that like to hoard everything).
Other databases are in use for player bases in the millions -- we've already mentioned MySQL, but the same applies to PostgreSQL, MongoDB, and a few others. In terms of back-end data and analytics, the data may go into something such as Elasticsearch and you may have an SQL component for reporting and query analysis of data.
Battle testing
No matter which technology you choose, you will have to do thorough testing to ensure all the components, particularly those ancillary services, are battle-tested across the board. It's important to remember that testing in a silo only works if you can replicate your expected launch conditions; if you only test for 500, 1,000 or 10,000 concurrent users but then receive one million users at launch, your databases will fail. Load testing at the appropriate level is, therefore, an absolute must.
Here's a useful checklist for your next game launch, covering areas you will need to consider before, during and after launch:
Before launch
- Set up monitoring -- if you can't see what's going on, how can you measure your success?
- Find bottlenecks before launch -- load test your applications and test how you will scale under normal and peak loads.
- Test failover -- understand how quickly you can recover now, not on your launch day.
- Code freeze -- it's really hard to ensure performance if the game application is growing and code and configurations are changing.
- Get a second opinion -- trust but verify things are ready; it's worth it.
- Check your backups -- make sure you have reliable and consistent backups of your databases.
Testing in a silo only works if you can replicate your expected launch conditions
During launch
- Failing over is the last resort -- failing over moves the traffic to a new server. Most systems are slower when traffic is added as it takes time to warm up the cache.
- All-hands event -- have the right people standing by to monitor, tweak and fix issues before they get out of hand.
- Eye on the prize -- a temporary fix enables people to play the game. Time to make an impact is finite but, equally important, people under pressure are more prone to make mistakes. Just don't forget to make it a permanent fix eventually.
- The road to hell -- don't make the problem worse and know the impact of the changes before you make them. The road to hell is paved with good intentions.
- Collect and store -- get the data you need when things are going badly so you can analyse and improve for the next update or launch.
After the launch
- Analyse your data -- use your data to plan, enhance and tweak your strategy for future launches.
- Remove the quick fix -- take the time and expense to make permanent fixes during slow periods.
- Learn from your mistakes -- build a plan to mitigate problems and risks in the future.
- Update your systems -- take advantage of the slower times to get the latest builds and security fixes.
- Don't be complacent -- each application and user base is a living, breathing entity, and what worked for the last launch may not work next time. Analyse, plan, and review regularly.
Monitoring matters
There's a reason we put monitoring first on the game launch checklist. It's important to understand where your database performance edge is, but it's critical to know when you are getting close to falling off the edge too.
Monitoring is a vital component particularly for those ancillary services that are not part of the core game as this where many fall down. Having monitoring tools in place to pinpoint that performance problem is crucial.
There are open source monitoring and management tools available for databases such as MySQL, MariaDB, PostgreSQL and MongoDB, but what tool you need will depend on what's being monitored, such as network traffic and game application usage. If you are using more than one database, then consolidating your management tools using something like our own Percona Monitoring and Management can really help.
Alongside open source, there are a lot of proprietary tools too. For instance, we work with many companies that use New Relic for their application stack. However, it's not just about the toolset. It's also making sure you have the right level of logging and debugging setups so you can quickly find and understand those issues.
Considering scale
Planning your approach to scalability involves knowing what your game's lifecycle is likely to look like. Different types of games follow different patterns. AAA showcase games typically have a big, upfront launch period where traffic rockets and then interest gradually settle down. Not building in scalability for multiplayer match-making services can be particularly painful at launch. You should, therefore, consider overprovisioning for the first month or two.
Mobile games, especially smaller indie titles, tend to have a more prolonged lifecycle, where the increase may be slower. The key difference seen with mobile games is very jagged traffic. You might, for example, have low usage for several months and a high profile review or update could make the game go viral for a time.
Planning your approach to scalability involves knowing what your game's lifecycle is likely to look like
As traffic increases or as new features or expansions get released, it's going to change the workload on your systems and the underlying infrastructure. The change in workload can drastically alter where your bottlenecks are and this will almost certainly require iterative testing to ensure that everything is properly scaling.
If you don't know whether your databases are ready for your next game update, follow this checklist to avoid the common pitfalls:
- Add 10% to the top -- measure your database environment's performance at last year's traffic levels and increase by 10%. This approach not only provides a benchmark, but you'll also discover if any problems you encountered previously have been resolved. Once you have run through this initial scenario, you can repeat it for 25% or 50%, for example.
- Plan for disasters -- if a database goes down you need to ensure that you have a good data backup and recovery plan in place. Document the process and share it to avoid a single point of failure when you find a key person is unavailable during the outage.
- Solid monitoring -- you will need to be set up to monitor your database environment before, during and after game launch. Pay particular attention to the number of queries, any increase in query response times, as well as disk and CPU use and saturation.
- Game launch = People -- make sure you have adequate staff in place, or on-call, around the clock to cover all time zones, and with the expert knowledge to quickly diagnose problems and implement solutions.
- Figure out failures -- increase loads to find your database breaking point. Run several failure scenarios and measure the time until restoration.
Cloud and cost factors
While most developers consider how they can scale up their game, they don't always consider scaling down. You have to be careful of the costs, especially in the cloud space, where a lot of game back-ends are being hosted now. Do you know how much it will cost to operate your game's scale down? This is what we dub 'The Hotel California' effect: you can check-in and it's easy to scale up, but leaving and downgrading is a lot harder.
The cloud is also attractive because not many game companies and game developers are necessarily experts in infrastructure and in setting up and maintaining servers -- they are experts in developing new games. Although cloud services are designed to be as resilient as possible, cloud providers do have outages, which is why you will see bigger game companies using a cross-cloud approach and a multi-cloud strategy. Sometimes, companies will roll their in-house cloud infrastructure over the top of a cloud provider's infrastructure. This can involve using Kubernetes (the container orchestration platform) across multiple cloud providers.
Many game developers look into how to use managed services rather than running their own database instances. This is a good option, but it is important to understand your own responsibilities too. While many services are badged as 'fully managed', that doesn't always mean what you assume. While your instance may be fully managed in the cloud, there remains the principle of shared responsibility for security, where you, as the game developer, are still responsible for taking basic precautions around data protection.
We have found that when people adopt new cloud services they aren't aware of all the best practices that exist, or assume those are being carried out for them. A lot of databases, for instance, have default set-ups that don't automatically require authentication or passwords. This leaves the door wide open to anyone who can stumble across them on the Internet. It's a silly thing to happen but it's easy to do. It's therefore important to verify that everything is set up the way that your third-party providers said they would be.
Matt Yonkovit is Percona's chief experience officer, overseeing company strategy and marketing functions. Before joining Percona in 2009, Matt worked at MySQL AB and Sun Microsystems as a solution architect, building out and optimizing high performance infrastructure for Fortune 500 and countless other web properties.