Warren's Mailbag: Publishers using Metacritic as a "weapon"

Warren Spector answers your questions about the problems with the reviews aggregation site and the industry's reliance on a flawed system

Feature by Warren Spector Contributor

Published on June 19, 2013

Thanks to all of you who responded to my second column for responding so interestingly and for providing me with so much to think about. I got more than enough comments, on the GamesIndustry International site and Facebook and elsewhere that another “Warren's Mailbag” seems justified.

As in the last mailbag, I've aggregated similar comments and tried not to quote anyone specifically. If you notice your exact words, well, congratulations, I guess?

Anyway, here goes...

People who develop and publish games really want some sort of assessment of their work, maybe even validation from some presumably “objective” party - whether that's critics or players or both.

Hm. I guess some developers want or need validation of their work from some outside party. Maybe as a guy who doesn't read reviews I'm an anomaly. (However, to be fair, plenty of people tell me what reviewers and fans are saying!)

The more interesting part of this sort of comment is the use of the word “objective.” Several commenters used the word and I found it troubling. Fact is, there's no such thing as objective criticism or evaluation. Pretending there is may be part of the problem with the current state of the art in games criticism. Reviewers HAVE to acknowledge their own biases and make sure readers, listeners or viewers understand them.

But where the word “objective” really worries me is in regard to Metacritic and other attempts to quantify a game's quality. What aggregation sites do is take subjective data (reviewers' opinions), weed some out by running the reviews through an unknown inclusion algorithm, translating each remaining review's score - “out of 10,” “out of 100,” “out of 5 stars,” “A-F” or “we don't give scores” - into a single scale somehow, weighting the now invalid scores according to some unknown algorithm and then averaging them out... I mean, when you look at it that way, isn't it ridiculous?

All developers control is the quality of their games and, therefore, need a definition and a measurement of quality.

Couple of comments here. I've already spoken to the idea that developers need a definition and measurement of quality. Some of us don't particularly want it, let alone need it. Or, more to the point, there are ways of getting such measurements that are useful and some that are not.

"The fact is it's a tool - a weapon - wielded with a heavy hand by publishers"

The second point that often went along with this sort of comment is that developers control the quality of their games. That's certainly true, to an extent. But it's important to remember that making a game is a team effort and, often, there are many people working on and making decisions about a game who are outside the developer's purview - think marketing people, PR people, publisher-side producers, executives... I could go on and on. I've always said that the determinant of a game's success (in terms of achieving the developer's goals!) is how much time for tuning and debugging the publisher gives you from the time you hit Alpha (i.e., fully playable, not-fun game). Control of quality falls fully on the developer's shoulders only in the indie world (and I bet a bunch of indie developers would argue I'm even wrong about that!).

To ignore reviewers and sites that aggregate review scores is to risk falling prey to uncontrolled ego and arrogance. It's important to believe in a vision, but not to the extent that feedback reveals flaws in that vision. You need balance.

Yeah, well, okay, maybe... I certainly know a lot of egomaniacs (myself probably included) but I haven't been accused of arrogance (today...). Seriously, there's no doubt that you have to find a balance in staying true to your vision and allowing feedback to reveal flaws that lead to reevaluation. But there are ways to get that feedback that are useful and ways that are not.

Any developer who doesn't get his or her game in front of actual players - early, often and continuously - is on a very dangerous path. Watching people play your game is critical. (Heck even the kind of focus tests publishers love to pay for can have value.)

Any developer who doesn't get trusted developer friends playing his or her game is equally foolish. There's huge value in the thoughts of people who truly understand the development process and can evaluate (and eviscerate) your game with sensitivity to where you are in that process.

What's dangerous, if not outright useless (and dangerous), is allowing backward-looking data to divert you from your course. Anyone who charges thousands of dollars to predict review scores of work in development is offering nothing of value. Anyone who drops in for a “deep dive” of a day or two can't have the context to provide useful input. Anyone who thinks you can look at the schedule of past releases, review scores of comparable games or the aggregation thereof is pulling your leg. Laugh and move on.

Reviewers should explain how they came to the conclusions about a game that they did and recognize that his or her opinion isn't absolute and may not match that of readers, listeners or viewers.

Several readers commented on the need for reviewers to explain not just what they think about a game but why they think it. That's certainly true. But the related idea that opinions aren't absolute and may not reflect that of other people? There's gold there. That's one of the key points I was trying to make (in way more words!). As a reviewer, your job is to help a potential buyer determine whether he or she will like a game - not whether you did. Put those two ideas together and you get something useful: Explain yourself so others can decide for themselves.

In the age of Big Data (or even small data) there's a tendency to believe that only that which can be quantified has any value. Game reviews, by their nature, are opinions and therefore, not quantifiable. Several people commented on the 21st century need to quantify everything - and to discount anything that can't be quantified. I have a funny feeling people are going to look back on this idea, fifty years from now, and laugh. A lot. But even looking at the situation today, look at the results of all the testing and aggregating and data-munging: Most games fail, commercially, at a rate that should embarrass anyone in the game-testing business. Many that succeed do so in the face of significant obstacles. Speaking personally, the worst reviewed games I've worked on are the best-selling. Several of the most data-driven-design proponents are in serious trouble. You can keep your data, thank you very much.

Clinging to genre conventions - and reviewing conventions institutionalized by review aggregation sites - limits developers to what came before and stunts our growth as a medium. Or... there's no limit on game creativity, a fact reflected in the scores of indie games and commercial games that fly in the face of convention.

Reading some of the comments was pretty funny. A lot of folks said that review aggregation held us back, creatively, by institutionalizing the known conventions that lead to high scores, at the expense of risky ideas that might lead to lower ones. I kind of took that position in my column. But several people pointed to the high scores of several hugely risky and innovative games as proof that I was wrong.

Mea maxima culpa. I think I was wrong. Anecdotally, there really doesn't seem to be much of a connection between review aggregation and the dominance of innovative games. (Which isn't to say we couldn't use more innovative games!) Anyway, remove one sin from Metacritic's list. There are still plenty left...

The relationship between publishers and reviewers represents a conflict of interest. Things like junkets and ads and schwag essentially “buy” review scores.

Whew. This is a tough one. I've always wondered whether junkets and ad buys and flashy E3 booths and so on actually influence game reviewers. Honestly, I've never seen any evidence of that. Doesn't mean it isn't happening, but until there's some evidence (anyone got any?) I'm going to think the best of people - even game reviewers...

Contracts use Metacritic thresholds to determine developer bonuses? If your threshold is 85 and you get an 84 - no bonus?!

I was surprised how many people didn't know this or didn't want to believe it. It's all too true and all too common. You can say until you're blue in the face that Metacritic is designed to assist players in finding games they want to play. It may have started out that way. The people who run it may think it's still that way. The fact is it's a tool - a weapon - wielded with a heavy hand by publishers. Developers sign away bonuses on Metacritic scores for games that don't even exist as a single page concept doc. If you believe in the validity of review aggregation that may not sound like such a bad deal - the people who pay for a game should base bonuses on some measurable standard. But if you (a) don't believe in Metacritic's validity or (b) know how freakin' hard a team has to work to make even a “bad” game, that one-point Metacritic score difference means a lot.

The variety of games being rated on Metacritic is too broad to result in valid, useful data.

Frankly, I'm a little ashamed that I didn't think of this myself. We would never compare Woody Allen's movies with Michael Bay's. Why does it make sense to compare Journey, The Walking Dead, Call of Duty 72 and the latest Madden? Does anyone really care that one got a 7.6 while another got a 9.2 or whatever? The silliness is self-evident. Given that players and publishers take aggregation seriously, it's also damaging.

"Simply liking games and being able to construct a sentence doesn't prepare you for a job as a reviewer and certainly not for a deeper critical role"

How are reviews selected for inclusion and how are the numbers used?

Great question. We should all insist on knowing how a review gets included in or excluded from Metacritic's rankings and how much each review counts for. It's ridiculous that this information isn't available. It affects people's pay, players' ability to determine the validity of the data they're getting, and the future of game franchises. If we're not going to be told how the numbers are generated, why should we pay any attention?

I know of games where bad scores were included instantly - even if they came from a random-guy-with-a-website - while good scores from major media outlets were nowhere to be found. One commenter opined that outlier reviews would be balanced - or revealed to be irrelevant - by the averaging with higher scores. This is only true when the inclusion decisions are reasonable. With Metacritic we just don't know.

Film criticism is relatively simple since a movie runs just a couple of hours making even multiple viewings possible before a review gets written and certainly before a deeper critical piece comes out. Games take anywhere from five to 100 hours, with most in the 15-20 hour range, making even one playthrough difficult before a review appears.

There's certainly some truth to this - films ARE easier to analyze than games. Having said that, most of the game reviewers I know DO play games to completion before writing about them. I don't think the problem is completion of games so much as the lack of training most reviewers have and a lack of anything like a coherent critical vocabulary for discussing games.

As I said in the column, the job of a reviewer isn't to say, “This is good” or “This is bad.” It's to convey to a reader/listener/viewer whether he or she will like a game. It's to express a consistent critical (dare I say it?) philosophy so readers can decide for themselves how they feel about the game elements singled out by a given writer. Simply liking games and being able to construct a sentence doesn't prepare you for a job as a reviewer and certainly not for a deeper critical role. We'll get there, eventually. I just don't think we're there yet.

Game designers spend too much time pontificating on stuff. The people with power, on the other hand, pat them on the heads and tell them to get along. Game designers should form a guild and make demands if they really want to change things.

I had to run this comment verbatim!

The fact that game designers spend too much time pontificating is undeniable. We like communicating. A lot. No excuses. If it bothers anyone, just ignore us. We're used to it.

The comment about people with power was great. The only problem with it is the idea that we get patted on the head - swatted on the butt is more like it.

"If Metacritic is the best we have, we're in a world of trouble. Again, though, it's not so much that 'Metacritic sucks'... We just have to acknowledge what it is, push it to be better and then use it appropriately"

Starting a guild? I'd love that and there have been talks about it over the years. Maybe I'll do a column about that idea some time. Yeah. That'd be interesting...

Since it combines reviewer and gamer scores as well as links to individual reviews, Metacritic allows people to make up their minds. It's the best we have.

Here's the deal. I get that Metacritic can be used by people to decide what games to buy or not buy. And in this fast-paced, helter skelter world we live in, shortcuts that allow us to make decisions without thinking too hard are appealing and seductively attractive. If you don't want to think, by all means check the scores on Metacritic and get on with your life. Just don't be surprised when the games you buy don't end up working for you.

As far as player reviews go, it's certainly good to have them - good in the abstract, at least. However, some commenters, you'll recall, complained that reviewers didn't complete games before reviewing them (which I choose not to believe) - what do you think players are doing? I bet most of them haven't even tried the games they're scoring! At risk of letting loose the hounds of hell, a lot of people on the Internet are just interested in attacking stuff, usually with copious cursing. (Oh, you haven't noticed that? Well, it's true.) I'm not sure player ratings are any more accurate than professional review scores.

And if Metacritic is the best we have, we're in a world of trouble. Again, though, it's not so much that “Metacritic sucks,” though it may sound like that's what I'm saying. We just have to acknowledge what it is, push it to be better and then use it appropriately. Right now, I don't think we're doing any of those things.

The level of “games literacy” is low, something that can only be addressed through criticism rather than consumption. If gamers demanded more nuanced evaluations of games, reviews would get better and Metacritic would be less important.

Oh my do I agree with this. I agree, to an extent, with those of you who said criticism can and should play a part in addressing this. However, while reviews-as-criticism are important, I think the more academic sort of criticism is as important, maybe more so.

Criticism doesn't have to be only a consumer service. It can explain how games work. It can change the way we think about games. It can, ultimately, change the kind of games that get made. There are some universities that get this and are making sincere efforts to address the problem. And I can tell you that, as we develop the curriculum for the Denius-Sams Game Design Academy at the University of Texas, I'm going to be working hard to ensure that we include serious critical analysis along with hands-on game development.

Rather than think of all games as “commercial art” acknowledge that there are some that are purely commercial (designed to generate a profit) and others that are purely art (designed to express a personal vision). I'd never deny that games are made by all sorts of people and institutions for all sorts of reasons. I sometimes wish I lived in a world of pure indie creativity and I bet a lot of my peers do, too. I was talking only about games that are commercial in some way. I applaud the artists and wish them well. 'Nuff said.

I don't think there is a way to predict or guarantee success.

Can I get an amen, brothers and sisters?!

Read this next