August 20, 2013

That Was Good? The Execution of All Things

Everyone's favorite avuncular analyst and UTRS expert (so, gymnecologist?) recently recalled a post comparing execution scores between the US and international judges that I wrote in June and then promptly forgot about. What, am I supposed to remember everything I say?

Basically, it amounts to the idea that we often think that international judging is some paragon of strictness that would never be as lax and charitable as the US judges, but over the last few years the international judges have been within a believable range with the national judges in execution scores. So, I began to wonder if that will continue this year and if the World judges will mimic what we have seen so far in 2013, which brings us to an analysis of execution scores at this year's Nationals.

You probably had a lot of thoughts during last weekend's P&G Championships, ranging from "Hey, fewer of these hairstyles look like shanty towns" to "Hey, that's not a switch 1/2" to "Hey, so did Nastia kill Elfi?" and all of them are completely understandable. I bet you weren't thinking, "Hey, this is some historically excellent execution." But you know who was thinking that? The judges. Yeah. Deal with it.

The average execution score across the whole senior competition was an 8.515 this year. Guess what that's higher than? 2012 Nationals. And 2011 Nationals. And 2010 Nationals. And 2009 Nationals. In fact, the only recent competition that beats that number is 2012 Olympic Trials, which is to be expected. Trials should contain only the very best athletes at the peak of their Olympic preparation and not these barely qualified, happy to be there types who are getting 8.1s for hit routines. 

Let's also take a deeper look by event. (Numbers in parentheses indicate rank)

Aside from sucking the light fantastic on bars (and being the worst bars year in the United States is quite an accomplishment), 2013 Nationals saw remarkably high execution scores compared to recent competitions. The vault scores in particular are interesting. The vaults this year were okay, but two and three tenths better than recent years? Really? Would the new code alone justify such a bump?

It should be noted that I included all routines in these averages, even calamities in the 6s, which occurred at least a few times in every competition. But lest you think the other years are being dragged down misleadingly by falls, the differences exist quite clearly at the top of the scoring range. Let's take beam as an example. In 2013, 18% of beam routines received an execution score in the 9s. Compare that to 9.5% in 2012 (Nationals), 10% in 2011, 5% in 2010, and 16% in 2009 (the only really comparable year). Is this truly the best beam group we've seen in the last five years? I don't think so.

So, what's up? Why are these execution scores significantly higher than in other years, since I think we can all agree that the standard of performance was not necessarily higher than in, say, 2011 when the average execution was over .250 lower. This is normally the part where I would provide conclusions, but I don't have an answer to the question. I'm just compiling the data and inviting analysis. I'm all Tycho Brahe up in this piece. I'm legitimately curious as to why this is.

As mentioned, we have a new code, so we can certainly expect the evaluation of routines to change, as it always does. However, the major changes to the code came in the D-Score department. Were there enough significant changes to execution evaluation to account for these multiple-tenth increases in execution average? Have the judges been instructed to make a point of going softer this year across the events, or did it just happen?

And how will this increase affect the comparison between US and World evaluation that has become rather consistent?

Clearly, I have a lot of questions. 


  1. I noticed the judges were being quite generous with execution scores throughout the competition and at first wondered if it might somehow be an attempt to avoid boxing E scores...but I suppose that argument falls apart if execution scores are higher across the board, even for sloppy routines.

    It definitely will be interesting to see how this plays out at international meets...

  2. To give you a few points of comparison…

    2013 European Championships: Averages for the Top 60 during Qualifications

    Vault: 8.662
    Bars: 7.342
    Beam: 7.048
    Floor: 7.340

    To be honest, it's not a very fair comparison. There were a LOT of 5s and 6s on bars, beam, and floor.

    2013 European Championships: Event Finals

    Vault: 8.560
    Bars: 8.079
    Beam: 8.112
    Floor: 8.319

    2013 American Cup

    Vault: 8.991
    Bars: 8.337
    Beam: 7.887
    Floor: 8.129

  3. There really were a lot of HOT dty's (Ross, Ernst, etc) this year at Nats, and no chucked Amanars (i.e. the absence of Raisman and Ross only doing a DTY), which might account for the D score bump. Skinner's Cheng score makes things look like Vault at least is being evaluated fairly. But total LOL at the Beam and Floor scores, they were

  4. I was thinking about this a bit more, and there's a problem comparing the scores from Nationals with the scores from Trials. Though there's a qualifying process for Nationals, supposedly only the best gymnasts attend Trials. So, theoretically speaking, the scores from Trials should be higher than the scores at Nationals.

    Operating under that assumption, I calculated the averages from the 2012 Nationals using only the scores of the girls who attended Trials:

    Vault: 9.100
    Bars: 8.464
    Beam: 8.309
    Floor: 8.734

    Floor at Trials was still quite a bit higher, but the other numbers even out a bit.

    If only we had the data from the verifications from the Ranch……………

  5. Yeah, that does help the numbers even out for 2012. Oh scores from the ranch, don't even joke about that. The day we get D and E breakdowns from ranch verification is the day I turn inside out with joy.

    I've been thinking more about these beam scores. It's a little crazy that the beam E scores for everybody in 2013 were so close to (or better than) the cream of the crop scores in 2012. I have a hard time imagining that some of those scores will translate to the world stage, even though beam has been the more consistent of the four events these last few years between US and international E scores.

  6. Judges cheat. The end

  7. The average of everyone means nothing. Do the averages of the top teams and get back to me.

  8. Hm ... I'd be interested to read a companion-piece to your US vs. World judging comparison with a China vs. World. I was just looking at the score breakdowns for TQ at the National Games and literally no EF qualifier had an e-score in the 9s. I can see that there might generally be lower e-scores seeing as china sucks at vault and that's where the high scores are coming from usually, but beam was the lowest-scoring event execution-wise and I suspect that in most other countries ONE of those girls would be able to get above an 8.300 ...

  9. I was literally just thinking the same thing about the Chinese National Games. It's pretty well canon among the gymternet that internal meets in China have some insanely high low-balling on the scores, but I don't know that anyone could - or has tried - to back up that particular bit of conventional wisdom. I'd be really interested to know how much internal Chinese scoring actually deviates from international scoring.