AI moderation will cause more harm than good | Opinion
The problem of in-game abuse can only be solved through a proper commitment of resources; magical thinking about AI solutions will only make things worse
Creating a game with a large, highly engaged online player base and an active community is, for many companies, right at the top of their wishlist. When they’re really well managed, these games are a license to print money, to the extent that a single game can become a primary commercial driver of a pretty large company.
Games like Fortnite, World of Warcraft, Call of Duty, Grand Theft Auto V, and Final Fantasy XIV, to name but a few, have become central to the ongoing success of the publishers who created and operate them. Their importance rests on the fact that while many popular franchises can rely on a huge launch for each new instalment, these games never actually stop being played and making money. It’s no wonder that executives around the industry get dollar signs in their eyes when anyone starts talking about service-based games with high engagement.
There are, of course, downsides. Turning a development project that once had a clear end-point into an open-ended process is easier said than done, for a start, and relatively few development teams have turned out to be adept at it.
Unfortunately, some of those who were initially adept turned out to have no idea how to manage team burnout in the context of this new never-ending development cycle, resulting in a loss of key talent and, ultimately, degraded ability to keep the game’s quality up. Underestimating the resourcing that’s required to keep an online game popular for years on end is a very common problem.
And then there’s the question of moderation. You’ve got all these players engaged with your game, and that’s great; now how do you prevent a minority of them from being awful to the rest? Whether that’s by cheating, or in-game behaviour, or through abuse or harassment on game-related communication channels, there’s always the potential for some players to make life miserable for others in what’s meant to be a fun, entertaining activity.
Many companies are loath to allocate resources to their moderation efforts or to ensure support is properly in place for those staff, resulting in rapid burnout
Left unchecked, this can turn the entire community around a game into a seriously negative, hostile place – which aside from being unpleasant for all concerned is also a major commercial problem. Hostile online environments and communities impact your ability to attract new players, since if their first experience of an online game involves being subjected to torrents of abuse or behaviour like team-killing from other players, they’ll probably never come back.
Moreover, they make it hard to retain players you already have – and that’s a problem, because the network effects that make online games so commercially powerful (people encourage their friends to play) can also work in reverse, with a few people being chased out of a game by harassment or abuse ultimately triggering their friends to also move elsewhere.
Consequently, game companies have generally tended towards taking the moderation and policing of behaviour in the online spaces they control more seriously in recent years – albeit in rather uneven fits and starts, which can often feel like there are as many steps backwards as forwards.
Some companies have started tending towards making communication between players outright harder – switching off voice comms by default (or entirely), limiting in-game chat in various ways, or designing interaction systems to try to prevent negative or harassing behaviour as much as possible.
Even with this, however, there still remains a fundamental requirement, at least on some occasions, to pull on a long pair of rubber gloves and get elbow-deep in the cesspit – stepping in to monitor and review player behaviour, make judgement calls, and hand out bans or penalties for behaviour that’s over the line.
The problem here, of course, goes back to that fundamental issue about running these sorts of games – it takes a whole ton of resources. Hiring and training people to deal with player reports and complaints in an effective, consistent, and fair way is not cheap, and while some companies prefer to try to outsource this work, that hasn’t always been a great option.
These moderation staff end up being on the front-line of your company’s interactions with its players, and their actions and judgements reflect very directly on the company’s own values and priorities. Moreover, it’s a tough job in many regards.
While moderators of game communities generally only have to deal with text and voice chat logs, not the horrific deluge of humanity’s worst and darkest nature to which social media moderators are exposed on a daily basis, spending your entire working day immersed in logs and recordings of people spewing racist, misogynist, and bigoted invective at each other, or graphically threatening rape and murder, is something that takes a toll.
Despite this, many companies are loath to allocate a lot of resources to their moderation efforts or to ensure that HR support is properly in place for those staff, often resulting in very rapid burnout and turnover.
I’d absolutely encourage other companies to pool resources and knowledge on issues of abuse. But the fact that this cooperation focuses on AI is a red flag
One reason why companies don’t want to focus a lot of resources on this problem – despite a growing understanding of how commercially damaging it is to let toxic behaviour go unchecked in a game’s community – is that there’s quite a widespread belief among industry executives and decision-makers that a better solution is around the corner.
You can see the outline of this belief in this week’s announcement from Ubisoft and Riot that they are going to collaborate on improving their systems for policing in-game behaviour – a partnership which will focus not on an exchange of best practices for their moderation teams, or a federation of resources and reporting systems to weed out persistent bad actors across their games, but rather on the development of AI systems to monitor the games.
Look, overall it’s an extremely good thing that Ubisoft and Riot – two companies which operate games, Rainbow Six and League of Legends respectively, that have had significant problems with toxic and abusive groups within their communities – are working together on tackling this problem, and I’d absolutely encourage other companies around the industry to pool resources and knowledge on issues of harassment and abuse. The fact that this cooperation focuses on AI, though, is a red flag, because it smacks of a fallacy that I’ve heard from executives for more than a decade – the notion that automated systems are going to solve in-game behaviour problems any day now.
There’s nothing intrinsically wrong with the pursuit of this Holy Grail, an AI system that can monitor in-game behaviour and communications in real-time and take steps to prevent abuse and harassment from escalating – muting, kicking, or banning players, for example.
The problem arises if that idea is being used, either explicitly or implicitly, as an excuse for not investing in conventional moderation resources. That’s not necessarily the case for Ubisoft or Riot (they’re just getting dragged into this argument because of their recent announcement), but at other companies, especially in the social media space, a long-term unwillingness to invest in proper moderation tools and resources has gone hand-in-hand with an almost messianic belief that an AI system that would fix everything is just around the corner.
It’s easy to see why such a belief is appealing. An AI system is, on paper, the ideal solution – a system that can make real-time judgements, preventing harassment and abuse before it gets really serious; a system that scales automatically with the popularity of your game, that doesn’t get burned out or traumatised, and that isn’t at risk of being targeted, doxxed, or harassed, as human moderators of online spaces often have been.
The problem, however, is that right now that idea is science fiction at best – and systems that can reliably make those kinds of decisions may not exist for decades to come, let alone being "just around the corner." The belief that AI systems will be capable of this kind of feat is founded on a misconception, either accidental or wilful, of how AI works right now, what it’s capable of doing, and what the direction of travel in that research actually is.
It goes without saying that AI systems have been doing some really impressive stuff in the past few years, especially in the field of generative AI, where complex and extensive trained models are synthesising images and pages of text from simple prompts in a way that can seem startlingly human-like. This impressive functionality is, however, leading a lot of people to believe that AI is capable of vastly more effective judgement and reasoning than is actually the case.
An AI system is, on paper, the ideal solution. [...] The problem, however, is that right now that idea is science fiction at best
Seeing a computer system turn out a page of convincing, human-like prose about a topic, or deliver you an original oil painting of a dog playing guitar on the moon in a matter of seconds, would easily lead you to believe that such a system must be capable of some pretty effective judgements about complex topics. Unless you’re well-versed in what’s happening under the hood, it’s not easy to understand why a technological system capable of such human-like creations wouldn't be able to make the judgement calls required for the moderation of online behaviour.
Unfortunately, that’s exactly the kind of task at which AI remains resolutely terrible – and is not very likely to get better in the near future, because the direction of travel of research in this field has tended towards AI systems that are generative (creating new material that can pass for human-created, at least to a first glance) rather than AI systems that can understand, classify, and judge complex situations. And that’s for good reason: the former problem is much easier and has proved to have some significant commercial applications to boot. The latter problem is a poor fit for AI systems for two interconnected reasons.
Firstly, training AI models is intrinsically about finding shortcuts through problems, applying a simple heuristic to find a probable answer that cuts through much of the complexity. Secondly, a trained AI model is at heart a pattern recognition engine – no matter how complex the algorithms and technologies underlying it, every AI system at heart is trying to match patterns and sub-patterns in its input to a huge arsenal of examples upon which it was trained. This means that AI, in its current forms, can always be gamed; figure out what patterns it’s looking for, and you can find a way around the system.
Taken in concert, these are fatal flaws for any content moderation system based on a trained AI system – no matter how smart the system or how much training data you throw at it. Machine learning models’ affinity for shortcuts and heuristics means that pretty much every content moderation system ever built on this technology (and there have been quite a few!) has ended up basically being a fancy swear-word detector, because the training input teaches it that swear words are often associated with abusive behaviour. Consequently the weighting given to swearing (and other specific "bad" phrases) becomes dominant, as it’s such an effective short-cut; focusing on specific words simplifies the problem space, and the algorithms don’t mind a few false positives or negatives as a price to pay for such efficiency.
Online gamers are practically defined by their tendency to learn not only how to get around systems designed to check and control their behaviour, but actually how to turn them to their advantage
Just about any such system ever created would look at an interaction in which one player was saying egregiously awful things – racist, misogynist, homophobic – but doing so in polite language, while the target of their abuse eventually told them in response to fuck off, and judge the victim to be the one breaking the rules. Even trained on enormous amounts of data and tuned incredibly carefully, few of these systems significantly outperform a simple fuzzy matching test for a large library of swear words.
This is bad in any moderation context – but it also invites people to learn how to game the system, and that’s exactly what gamers are more than willing to put time and effort into figuring out. Online gamers are practically defined by their tendency to learn not only how to get around systems designed to check and control their behaviour, but actually how to turn them to their advantage.
At best, that results in fantastically fun emergent behaviour in online games, but when it’s a content moderation system that’s being gamed, it results in a system that was designed with the best of intentions actually making the game into even more of a hellscape for ordinary users. Baiting and taunting someone until they explode with anger, leading them to be the one who gets into trouble, is a standard schoolyard bully tactic all of us no doubt saw in childhood.
AI in content moderation stands to make things much worse rather than doing any good
It should come as no surprise when online abusers and harassers take the time to learn exactly how to avoid falling foul of the moderation system while still being as unpleasant as possible, only for the targets of their harassment (who don’t know how to avoid the system’s wrath because they’re not the kind of sociopaths who spend their time figuring out how to game an anti-harassment AI program) to be baited into responding and summarily kicked out of the game by an over-zealous and incompetent AI.
It’s always nice to imagine, when you hit a frustrating and intractable problem, that there’s a technological solution just around the corner: a silver bullet that will fix everything without the enormous costs and responsibilities a conventional solution would demand. For community moderation, however, such a solution is resolutely not around the corner.
There are uses for AI in this field, no doubt, and working on systems that can help to support human moderators in their work – flagging up potential issues, providing rapid summaries of player activity, and so on – is a worthy task, but the idea of an omniscient and benevolent AI sitting over every game and keeping us all playing nice is a pipe dream. Any such system based on existing AI technologies would become a weapon in the arsenal of persistently toxic players, not a shield against them. Like so many ill-considered implementations of algorithms around human behaviours in recent years, AI in content moderation stands to make things much worse rather than doing any good.