June 2, 2013

On Home Scoring, the Elite Kind

The official motto of the American gymternet is Illos numeros inter gentes numquam accipiet. (She'll never receive those scores internationally.)

Over the years, we've all seen any number of hilarious scores showered upon US gymnasts at domestic competitions, and this storied history of ridiculing hyper-American judging has cultivated the widespread assumption that less biased international judges would never succumb to such silliness.

At times in the past, this has been the case, but in the last few years, the international judges have seemed willing to evaluate the execution of routines with that we would normally consider an American lens. Has the "she will never receive those scores internationally" argument become a knee-jerk response to perceived overscoring without a strong correlation to fact? Will she probably receive those scores internationally as well?

Those are my questions, at least. So, as a way of reintroducing myself to the elite world, which at this point has been [scene missing] ever since the Olympics, I compared the execution scores given to the US team members in 2011 and 2012 at Classic/Championships/Trials with the scores they would later receive at Worlds/Olympics. (I went back no further than 2011 because I don't have the D/E breakdown for 2010 TF and AA and 2009 AA.) I threw out routines with falls and major mistakes, since they would skew the execution score in a misleading direction. This is about the evaluation of essentially equivalent routines, not comparing falls to hits. Certainly, there are differences in the actual quality of all routines (a wobble or two here, no wobbles there), but those issues should even out between the two sets of competitions, providing an overall reliable sense of how the judges are evaluating American performances.

At Worlds in 2011, things were fairly regular and predictable over the four events with the average US execution scores falling somewhere between a tenth to a tenth and a half lower than the scores received domestically for hit routines. It's a significant but not overwhelming or decisive difference. At the Olympics, things got a little funkier. The execution scores on bars were significantly lower than what was received at Championships and Trials (three or four tenths), and the floor scores were somewhat lower as well (more in line with what we saw in 2011). However, the beam execution scores were quite constant throughout all competitions, and the vault scores were much higher at the Olympics than in the US. The outlying increase in numbers on vault in 2012 can, at least in part, be attributed to legitimate improvement in execution leading up to the Olympics from the likes of Douglas and Raisman, whose Y2.5s were far stronger by that point.


I would contend, though, that on a number of occasions the vault scoring at the Olympics went a little Florida @ Utah. 9.400? 

That's my shallow overall impression of the numbers. There is enough of a difference between domestic and international scoring to remain significant, but it is not dramatic and can easily be overstated. In almost all cases recently (and we'll get to that almost in a moment), the overall execution scores received in the US have not been outside a believable range with the international scores. If we go back to 2009, the Worlds infamous for harsher execution scoring, the difference would have been greater, but in the years since, the scores have adjusted to a more forgiving place.

While those are the larger trends, they are far from consistent from gymnast to gymnast, and that's where things get interesting. The average difference in 2011 may have rested in that 1-2 tenth range, but some were consistently on the low end (or below the range) while others were always over the range. This in and of itself is not necessarily surprising, but I would have assumed that the difference would be greater for gymnasts with more questionable form issues, ones that might be ignored domestically and caught internationally. Translation: I thought it would be Aly Raisman.

Raisman received more "she'll never get those scores internationally" comments than any other gymnast of the quad, and yet of everyone, she was the gymnast least susceptible to knock-down execution scores at Worlds/Olympics. She did get those scores internationally, and she got them every single time. The justification of those scores is another question, but they happened. Raisman's execution scores were almost always within a tenth either way. Her worst differential was on vault in 2011, where her US execution average was +.125 over her World average (if we throw out the Amanar attempt from Classic, which is not comparable to the execution scores for her DTYs at Worlds – if we put it back in, her Worlds average is better). On floor in 2011 and vault in 2012, her international averages were far stronger than her domestic ones. Even on her much-maligned bars routine, there was no appreciable difference between US and international evaluation. The international judges felt pretty good about her bars work.


8.200

While Aly Raisman was a steady little tugboat, Jordyn Wieber was quite the opposite. On both the 2011 and 2012 teams, she had by far the most disparate execution scores. If Raisman's difference rarely reached above a tenth, Wieber's difference rarely reached below two tenths and was usually greater than that. It was upwards of five tenths on both bars and floor in 2012 (where Raisman was closer to one and Douglas was closer to two). Even on vault in 2012, the scoring jackpot, her increase was the smallest on the team (along with Maroney's, which I would account more to hitting a ceiling and having nowhere higher to go). Over two years of senior (major) international competition on bars, Wieber received consistent E scores between 8.700-9.000 at home and never broke 8.500 internationally. On floor, she was almost always 9.000 or greater at home yet only once reached that plateau on the big stage (for that rather strong 2012 TF floor routine, which still scored significantly lower in execution than all her domestic routines in the lead up to the Olympics, which didn't go below 9.150).


9.400

The only event where Wieber's execution evaluation was fairly constant was beam, but I would certainly make the argument that home scoring did her the greatest disservice on beam in 2012 by pretending that her 6.3 composition existed in a real world. Home scoring with regard to D score is certainly an issue, but it comes into play less frequently. D score issues have not usually had the kind of significant impact they had with Wieber.

So, what do we make of this? As with all quantitative assessments, it should exist alongside qualitative assessments. An argument can certainly be made that Wieber peaked early and simply showed better gymnastics domestically, accounting for the major difference in scores. That is absolutely part of it, but does that account for the whole difference? In 2011, the bars difference was three tenths even when throwing out the routines with mistakes like the AA performance, and the evidence of 2012 FX seems to show that even strong international performances were never going to match the domestic scores. That just didn't happen to the same degree with the rest of the team, particularly Raisman. It wasn't a whole-country thing. It wasn't even a just-the-famous-ones thing. 

These last few years, it appears that home scoring is alive and, while maybe a little bit more feeble than it used to be, still kicking. However, "she'll never receive those scores internationally" needs to be tempered as a credo because it lately amounts to only a tenth or two of difference for most gymnasts rather than a dramatic break that blows up potential scoring and because it is far from consistent, even among the various chosen ones. It has been quite person-specific.

Two years is an admittedly small sample, but let's keep an eye on it in 2013.

8 comments:

  1. I think it's more of a problem with specific gymnasts than an across the board problem. The US seems to name favorites domestically, and these chosen athletes are judged based on name rather than performance. Wieber was a prime example of that. Shawn Johnson benefited from this as well. More recently, Priessman and Hundley have scored very well domestically but were hammered on the European tour.

    I have no explanation for Aly other than some really good drugs.

    ReplyDelete
  2. Great article. All I can say about Jordyn's scores on floor in 2012 domestically vs at the Olympics is that even her best floor at the Olympics during the TF was not as solid as the Olympic Trials video you included. Not once at the Olympic did Jordyn complete a tour jete full. She had major trouble with that leap and I'd be surprised if she was given credit for it in any of the three routines she did in London. It wasn't fantastic at Trials, but it was not nearly as troublesome for her as it was in London. There's also the jump out of the triple full, which I believe she omitted during TF in London, but IIRC, did it every time during the Trials process. Those two areas alone are enough to drop her down by about .5.

    You are right on when comparing many of the others, though. Domestic judging really needs to be much more harsh. It's a great disservice to the gymnast when they are getting credit for things domestically and then shocked all to heck when they compete internationally (Jordyn's beam being the best example).

    ReplyDelete
  3. It's an even more dis-justice when one year a gymnast's form errors are forgiven by the US judges and then the next year their execution scores are decimated. Jana Beiger and Natasha Kelley in the 05-08 quad come to mind. The other problem is that the US judges are known for counting elements and connections that would not be counted anywhere else in the world. The scoring system is confusing enough to the fair weather viewer that watching a competition where your favorite gymnast scores .7 less for the exact same routine is disheartening.

    Plus judges everywhere need to use the E score for what it is: a way to differentiate gymnasts who deserve the higher scores rather than EVERY single gymnast scoring between 8.7-8.9

    ReplyDelete
  4. ea10 is correct in that jordyn performed much better at nationals and trials on fx than at the olympics so i am not surprised that she received lower execution scores at the olympics. her overall beam score was pretty much similar from nationals, trials, and olympics. the difference was her difficulty score in which the us judges bombed on. her bars execution is probably about 2 tenths higher domestically but she would have broken 15 in prelims if she didn't break rhythm in the wieler. she received approx 14.866 in prelims at the olympics w/the hesitation.

    ReplyDelete
  5. I think the US judges count elements that should be downgraded... i.e. Any split should hit 180 degrees, or recieve a deduction if it is within 15 degrees of 180. The US judges seem to be quite generous with crediting the US girls when it is obvious they don't make it within 15 degrees of 180. The internation judges follow the COP downgrade requirements, therefore many girls such as Jordan Weiber etc lose the difficulty for the skill.

    ReplyDelete
  6. The US judges hammered the girls at Secret classic. They were scoring higher on the European tour. Peyton Ernst scored over 15 on her vaults in Germany & Japan. I think her vaults at Championships were better & she never hit 15. I think the scoring difference is really exaggerated. Not to mention if you watched juniors, 95 percent of the girls were doing beautiful gymnastics. Not only good execution. So many girls had awesome FX routines.

    ReplyDelete
  7. I think the home advantage in scoring is always there - the Russians have even formalised this into a bonus which, while acknowledged, is not made available to the public, at least to my knowledge. Come to the London Open in September (the qualifying competition for the British men) and you will see more generous scores being handed out than will likely be awarded in Antwerp - it is global. The Code is not yet an absolute measure in the same way that times can measure athletic performance - do not forget that we judge gymnastics; that will always involve a subjective consideration, though hopefully the scores will be fair.

    In as far as the matching of scores between home and international competitions, and whether the US gives itself more of a bonus than other countries is concerned - well, I cannot say and I do not know if the examples given here really prove anything final. Certainly this did occur during the 1980s and 1990s and I have seen it said that it is happening less and less as American becomes recognised as the leading nation in gymnastics.

    I do not contend this point, but I think that the question involves more insidious factors than the merely obvious statement that as the US continues to assert its dominant position the top scores more naturally go to their gymnasts and thus confirm the perception of domestic scores as becoming more and more realistic. What goes hand in hand with this position is that the American paradigm of gymnastics - as a measurable construct, as an athletic, rather than artistic/creative pursuit where quantity, height etc is calculated as opposed to performances judged - has gradually been accepted as the norm.

    So the question is no longer so much one of bias as that the sport and the Code has adjusted to embrace the American norms more fully - and with that, international marks now reflect more closely the American view.

    ReplyDelete
  8. International judges are just as biased as National ones. Gabby missed connection after connection at Nationals and the Olympics and she always got credit for them.

    At Nationals her bar start value was 6.4 but when she got to the Olympics they gave her 6.6.

    Judges suck

    ReplyDelete