The AI Keeps the Score

0
15
- Advertisement -


When Simone Biles saluted the judges and stepped onto the mat to vault at the Sportpaleis in Antwerp, Belgium, it seemed like every camera in the packed arena was trained on her. People in the audience pulled their smartphones to record. The photographers zoomed in from their media perches. One TV camera tracked her run on a high-speed dolly, all the way down the runway, as she hurdled into a roundoff onto the springboard. The spider cam, swinging above, caught the upward trajectory of her body as she turned towards the table and blocked up and off, twisting one and a half times before landing on the blue mat and raising her arms above her head. The apex of human athleticism and kinesthetic beauty had been captured.

But there were other cameras that few other people watching in the arena were thinking about as they took in Biles’ prowess on the event: the four placed in each corner of the mat where the vault was situated. These cameras also caught the occasion but not with the purpose of transmitting it to the rest of the world. These were set up by the Japanese technology giant Fujitsu, which, since 2017, has been collaborating with the International Gymnastics Federation (FIG) to create an AI gymnastics judging system.

In its early days, the system used lidar (light detection and ranging) technology to create 3D composites of gymnasts in action. These days, it uses an even more sophisticated system, drawing from four to eight strategically placed hi-def cameras to capture the movement of the athletes, make 3D models, and identify whether the elements they are performing fall into the parameters established by the judging bodies inside the federation.

But the computer system doesn’t make judgments itself. Instead, it is deployed when there is an inquiry from the gymnast or coaches or a dispute within the judging panel itself. The Judging Support System (JSS) can be consulted to calculate the difficulty score of an athlete’s exercise — a second opinion, rather than an initial prognosis. Currently, it is mostly used for edge cases.

The JSS wasn’t necessary to evaluate the value of Biles’ vault in Antwerp. Her performance on that vault was too emphatic to be borderline. Still, the cameras positioned at the corners of the vault podium captured her 3D likeness as they did for all of the other athletes who competed through the 2023 World Gymnastics Championships. The technology distilled the legendary athlete and her performance down to straight lines and sharp angles; it showed the distance and height she traveled in numbers. The awe and wonder one feels when watching Biles perform could now be recognized by a computer — understood, though not exactly appreciated.


Fujitsu and FIG announced JSS back in 2017 with the goal of having the system up and running by the Summer Olympics in 2021. A home Games in Tokyo would have been an ideal opportunity for the Japanese-based tech conglomerate to showcase this kind of technology, and it would’ve been a noteworthy achievement for Morinari Watanabe, the first Japanese president of the Lausanne-based FIG. But the JSS wasn’t ready; in fact, it would take another four years of work. At the 2023 world championships in Antwerp, the JSS was finally ready to go on all 10 artistic gymnastics apparatuses — six for the men and four for the women.

This was all part of the “dream,” as Watanabe put it in the joint press conference hosted by FIG and Fujitsu heralding the technological breakthrough. “Today is a day of liberation in sports,” he proclaimed to the media and other gymnastics officials who showed up for the explainer that was held shortly before the start of the men’s all-around final. “The day has come when all athletes, not just gymnasts, will receive fair and transparent scoring.”

This proclamation was a bit hyperbolic, especially given that this is not AI’s first foray into judging athletic competition. It has already been successfully applied in sporting contexts, often with approval from athletes and coaches themselves. Hawk-Eye Live, the electronic line-calling system, is used in lieu of line judges in tennis at two of the majors, and its calls are generally considered reliable.

But in tennis, Hawk-Eye is being tasked with answering a yes / no question — is the ball in, or is it out? The JSS is being asked to perform a much more complicated task: it needs to be able to identify hundreds of skills in the Code of Points, and the ranges in which they’re done, across the whole span of gymnast body types — a complex undertaking, and one that changes regularly, as the FIG is updating its rules every four years. In a sport where the difference between first and fifth can be a mere tenth of a point, and when global rankings can mean the difference between being funded by your national federation or not, getting the score right is very important.

The appeal to a technological solution to judging feels practically inevitable. Humans are fallible. That’s why deductions exist in the first place: to quantify the mistakes that the gymnasts make. But we’d never replace the human athletes with machines, regardless of how advanced Boston Dynamics’ back-flipping robot gets. The draw of gymnastics is watching mere mortals push the limits of athleticism. But the performance of the judges is a means to an end, not the end itself. For more than a century, human judgment was the only option, no matter how much this might’ve discomfited us, given the stakes. Now, there’s a potential technological solution that shows promise. But can AI judge human excellence better than a human?


The JSS started, according to Watanabe and Hidenori Fujiwara, as a joke. It was late in 2015, about a year before Watanabe won his first FIG presidential election, making him the first non-European to helm the international federation since its inception in 1881. He suggested that Fujitsu should develop robots to judge gymnastics.

Fujiwara, head of Fujitsu’s sports business development division, took the challenge seriously. “We developed a prototype system,” Fujiwara said, which he then showed to Watanabe, who was surprised by the progress. Watanabe clarified that what he’d said about robots had only been a joke, and yet here they were.

This origin story for the JSS was emphasized during the press conference I attended in Antwerp shortly before the start of the men’s all-around final. There was, of course, a PowerPoint. An early slide in the presentation showed a comic with robots holding up score placards, as a male gymnast swings into a scissor-like movement on the pommel horse. The caption above the image read: “Joke come true!” (I didn’t get why it was funny; I guess you had to be there.)

It’s a “joke” that Fujitsu has spent untold amounts of money, time, and energy on. Though the company wouldn’t disclose the cost of this whole undertaking, it’s hard to fathom, after strolling through their offices in the annals of the Sportpaleis and seeing the arena setup of the technology in the field of play — and off to the side — that it was anything short of a tremendously expensive and resource-intensive endeavor. But I couldn’t help but feel like it was a lot of effort for technology that, at least as pitched by Watanabe, would only ever amount to a slightly better version of judge-assisted video replay.

Even ignoring the years of investing in R&D, the physical footprint of JSS appears expensive. During the competition, I glimpsed the backroom where there was a row of servers and another of monitors, a cluster of power packs, and tons of cable. Like so much of AI, its “magic” obscures copious amounts of energy-intensive hardware.

Out on the floor, the JSS cameras were subtle, but a lot of human effort went into calibrating them. Before the start of the day’s competition and frequently in between sessions, you could watch as technicians took to the floor, placing large orange balls similar to exercise balls you’d find at the gym, mounted to tripod-like devices, at strategic spots on or near the equipment to make sure that the cameras were properly aligned. Sometimes, they waved these balls like wands around the apparatuses. And throughout the competition, several technicians monitored the event from behind six computer screens near the media box. Nothing about this can be done cheaply.

The entire history of judging had created tragedies, Watanabe explained somewhat dramatically. But even if his remark to Fujiwara had been made in jest, the fact that FIG has doggedly pursued this venture with Fujitsu going on six years suggests that the joke hinted at something critical and true (as jokes often do): that he felt that there was something amiss in judging in the sport of gymnastics, and maybe technology could fix it.

Watanabe didn’t specify any particular instance of judging malfeasance or error that created these personal tragedies. But he didn’t really have to. The conventional wisdom around the judged aesthetic sports, such as gymnastics and figure skating, is that there are and always have been issues with the scoring. During the Cold War, when both the US and the Soviet Union fought for the top spot in the Olympic medal rankings, there was fairly widespread cheating and collusion in gymnastics judging. Back in 1988, after former University of Utah gymnastics head coach Greg Marsden’s brief foray into international elite gymnastics, he let slip to the media that, at the previous year’s world championships, there was judging collusion between the US and Romania, with the coaches exchanging scores before their athletes took to the mat. And in the years since the Cold War ended and the old judging alliances started to break down, the issues became more mundane but no less consequential. It was mostly human error, confusing rules and processes, with a dash of bias — racial, national, or both — that created most of the problems.

Elements of subjectivity can be found in most sports, and these judgment calls can end up having major consequences when it comes to competitive outcomes. In basketball, for example, a referee might make a bad call that affects the outcome of the entire game, like this year’s Women’s Final Four matchup between UConn and Iowa that featured a controversial offensive foul call in the final seconds of the game. But in general, the way of amassing points is fairly straightforward and has remained consistent over many years. The lines on the court, except in the case of the free throw, determine the point value of any given shot, and this didn’t change when Stephen Curry started nailing deep three-pointers. A shot from well behind the three-point line is objectively more difficult — and impressive — than one made closer to the basket. But the NBA hasn’t painted another line on the court to reward the higher difficulty level of shots taken from well behind the arc. Nor did the league change the rules to make Curry’s threes harder. Players simply learned to shoot from further back.


This is not how gymnastics operates. As gymnasts introduce new elements, the FIG has to assess them for their difficulty value, and there is no upward limit, at least in theory, as there is with basketball shots. In gymnastics, a half-court shot isn’t worth the same as one from right behind the arc. Skill valuations can change from one Olympic cycle to the next; requirement groups can be added or removed. A bad score in one cycle might be a good one in the next. The rules are not stable as they are in other sports, and it can be baffling, without highly specialized knowledge, to understand the difference in difficulty from one skill to the next.

The most significant change to the rules came in 2006 when the FIG scrapped the Perfect 10 scoring paradigm in favor of an open-ended approach that gives the gymnasts two marks that are added together — the difficulty score, which starts at zero and builds, depending on the fulfillment of requirements and the skills the athlete performs; and an execution one that starts at 10 and is reduced as the judges apply deductions for mistakes the athlete makes.

The immediate catalyst for this particular change was the scoring controversies of the 2004 Olympics, particularly the miscalculated start value of Yang Tae-young. The South Korean gymnast was erroneously docked a tenth of a point, which led to him missing out on the gold medal in the men’s all-around. This mistake has meant that Yang, who is now a coach, doesn’t receive a gold medalist’s pension from the South Korean government. Watanabe was not wrong about how errors in judging can have serious ramifications for athletes, even years after the fact.

Judges still make mistakes on the D-score, which is the updated name for start value. But, unlike with the execution mark (aka the “how well you did it” score), a gymnast or coach has the right to appeal the difficulty calculation. This is where JSS can help. Like in the earlier iterations of Hawk-Eye in tennis (and still at Wimbledon), players can challenge the call of a line judge, and the computer will override any human error. Fujitsu’s system enables something similar, albeit much slower and more bureaucratic.

Several times over the course of the world championships in Antwerp, I heard an announcement over the PA that an inquiry had been submitted for one gymnast’s beam score or a different athlete’s bars mark. The large scoreboard to my back would show the athlete’s name and “under review” right next to it. Judges would consult video replay and the new JSS system, though it was unclear under which conditions the JSS, rather than video review, was used. Often, inquiries were only a few minutes, though, in an already long competition, it felt like a drag waiting for the eventual resolution to be announced. In most cases, the gymnast’s score remained unchanged. If AI was used in these inquiries, it functioned solely to validate the work of the human judges.


When I sat down with the Fujitsu technicians in Antwerp in a room somewhere in the bowels of the Sportpaleis, I got to see just how precise the JSS can be. I was shown recordings of the switch ring leap, a skill that was also highlighted during the press conference the day before. This element is notoriously tricky to perform and to judge. The gymnast has a lot of boxes to tick: split of the legs, the position of the back leg relative to the crown of the head (they have to be at roughly the same level), the arch of the back, and the head release. The judge has to be able to register all of that in the split second the skill appears before them on the balance beam.

JSS looked a lot like video replay, except that the gymnast is transformed into an unclothed mannequin performing the elements. The apparatus is there, but all of the trappings of the gymnasium are gone; the rendering is set against what looks like the holodeck set on Star Trek before the computer program fills in the details, a black space, with white lines running parallel and perpendicular. To the side, you can see key measurements, such as angles, to help determine whether the gymnast met the demands of the element — all of the color and flare stripped out down to the nuts and bolts.

In the first clip, the gymnast did not fulfill the requirements. At the apex of the leap, her back foot didn’t line up with the crown of her head. The technician applied one tool, a blue horizontal plane, which made it quite clear that her back leg wasn’t high enough. “It’s minus 40 centimeters,” she said, pointing her cursor at the upper right corner of the screen.

Next, she played a recording of another switch ring, at normal speed. “What do you think?” she asked. I responded that I thought it was performed within acceptable parameters. Turns out I was right. Don’t give me too much credit here, though; the reason I could see it easily is because the gymnast had performed it exceptionally well. Her split was oversplit; her back foot went so high that it was well above the crown of her head.

As much fun as I had playing around with the system — and talking about the finer points of gymnastics with the experts — I wasn’t entirely convinced that the JSS, at its current stage of development, had made a compelling case for its necessity as a decision support system. It felt like a solution in search of a problem.

Steve Butcher, former head of the men’s technical committee and technical coordinator for FIG, said initially he shared my same skepticism. He knows better than most people how hard judging can be, having spent 40 years doing it. But Butcher said he was won over quickly. All it took was a short demonstration showing a gymnast doing an iron cross, a static strength hold, gripping the rings with their arms extended to the sides so they’re completely parallel to the floor. Ideally, the athlete will create a perfectly straight line across, from wrist to wrist.

“They showed me one arm, he has three degrees of deviation. And the other arm, he has one degree of deviation,” Butcher said, noting it was not perceptible by the human eye. Since that demo, he has worked with Fujitsu on behalf of FIG to help the company address the gymnastics needs and has remained a consultant on the project even though he left his full-time position with the gymnastics federation in 2022.

But was this really an improvement over plain ol’ video review? How would seeing the angles of someone’s arms to this degree — a difference of two degrees, to be specific — actually improve the judging? In the example that Butcher cited, the knowledge that the JSS provided was interesting, but it wouldn’t have changed the valuation for the gymnast: he would’ve been credited the skill because he had performed it very close to the platonic ideal. At the top end of the performance, those minute flaws, if they rise to the deductible level, would be sorted out by the execution judges. The JSS isn’t up to that particular task yet.


To provide an example where the JSS could’ve potentially outperformed the judges — and certainly video review — Butcher brings me back to 2012, to the moment when the men’s team finals in London had medals on the line. It was the final rotation, last routine, last gymnast up. Kohei Uchimura, then the three-time world all-around champion, was on the pommel horse, the event where Butcher was the apparatus supervisor. Uchimura’s routine went off as planned, clean and smooth, until the dismount. As he swung up from the pommel to the handstand, his arms seemed to buckle, legs akimbo; he spun wildly and slipped off the apparatus, somehow landing on the mat on his feet, albeit chest down. He walked off the podium, seemingly bemused and confused as to what just happened.

This last mistake created a dilemma for the D judges: did Uchimura successfully reach a handstand — or get close enough to it — in order to receive credit for doing a dismount? If the judges didn’t give him credit for it, he would lose the value of the skill and miss a requirement group. The hit to his — and, by extension, Team Japan’s — overall score would be massive.

Butcher did not give credit for the handstand, nor did the other two D judges. Uchimura’s mark put Japan in fourth place, behind Great Britain and Ukraine. Both teams started celebrating medals they thought they had just won. The Japanese team, however, immediately submitted an inquiry.

The superior jury watched the video replay several times, in slow motion, frame by frame. The TV cameras hovered by the shoulders of the judges as they studied Uchimura’s routine. The action in the North Greenwich Arena had shifted from the athletes to a bunch of men in gray blazers, staring at a laptop.

Finally, the superior jury decided Uchimura was close enough to a handstand. The reversal of the D panel’s original call added seven-tenths to Uchimura’s score. Japan shot from fourth to second. Great Britain ended up with the bronze, and Ukraine, to their utter devastation, was bumped off the podium.

Butcher, however, still stands by what he and the two other D judges decided over 10 years ago. “We have to remember, they’re not looking at any exact angles. They’re looking at a foot here, a leg there, and looking in a video, freezing it, with no true measurements being applied,” Butcher pointed out. The decision to award credit or to withhold it was something of a very educated coin flip. “In that situation, I would have loved to have been able to have the Fujitsu system and be able to have that as the primary decision-maker,” he said.

When I watched the video of Uchimura’s London performance, I found myself agreeing with the original call. That was not a handstand. He never even managed to straighten his arms completely. But like the judges of the superior jury, I wasn’t working with any precise measurements. I was basing this strictly off of my gut. It was an aesthetic judgment as much as a technical one. But in gymnastics, there’s long been a feedback loop between the technical and aesthetic; what is technically sound is often most aesthetically pleasing, and vice versa.

Of course, none of this matters to AI. It doesn’t “know” things in the way that humans do. Facial and object recognition technology doesn’t recognize what a “labrador” is; it’s been shown millions of photos of that dog and has been told that this is, in fact, a labrador, or at least the sum average of a labrador.

Apply the same logic of what an AI “knows” to a handstand in gymnastics, and it recognizes what a handstand is based on a series of rules and parameters of what a handstand is supposed to be. At the same time, it knows when the articulations of a body aren’t doing a handstand. That distinction may seem trite, but it also turns the sport into the color-negative version of itself.

Which presents the weird irony of AI-assisted judging, a system that cannot understand or appreciate the beauty of the sport: Butcher and his panel could have used a system like JSS to back an aesthetic opinion with hard numbers.


In many industries, AI has been used as an excuse to cut down on labor expenses. That’s not the case here with JSS since its implementation is strictly to support human judges. Besides, judging gymnastics isn’t a full-time career for anyone, not even at the very highest levels, so that particular objection to AI doesn’t play. But the fact that judging gymnastics events is a sporadic activity points to another issue with the JSS’s application: there isn’t a lot of opportunity to use this expensive system. It will judge even less frequently than humans do. The majority of gymnastics events are decidedly low-tech affairs. Not every competition venue will have the necessary infrastructure to support the JSS. And all meets, except the biggest ones, are a couple of days long, if that, hardly worth the time, energy, and costs that go into the setup. Fujitsu said that it took about a dozen people to set up and run the JSS in Antwerp. When asked about the next competition this much-ballyhooed system will be used at, Fujitsu didn’t answer. They said it would be decided jointly by them and FIG.

Of course, it would be foolish to assume that it will always be this costly or difficult to set up the JSS in a competition format. The technology should improve over time and get cheaper, too. That opens up the possibility for what Butcher believes is its best use case: as a training aid. He told me that this was his first thought when Fujitsu first presented the JSS to him.

“Somebody’s doing a triple back off the high bar but you can see that their body’s slightly skewed in the air and you can measure that angle, you can see that they [are] landing heavier on one side of their body than the other.” Being slightly off like this in the air doesn’t change the valuation of the skill. It will still be regarded as a triple-back. But in the hands of the athlete and the coach, this kind of information can prevent an almost imperceptible defect from blooming into an injury. In this example, the JSS is merely a sophisticated measuring tool. Butcher said that some national federations have expressed interest in aligning the JSS with their pre-existing video systems, which Fujitsu confirmed, adding that they plan to unveil a version specifically for training in July. Throughout the week in Antwerp, and in follow-up calls with experts, this was the most persuasive use case that I came across.

Right after the Fujitsu press conference, I encountered Donatella Sacchi, the president of the women’s technical committee, who had been on the panel, along with her counterpart on the men’s side. She’s a compact woman, on the short side — but who isn’t in gymnastics? — with cropped hair, and speaks exuberantly, often standing to make her point and to demonstrate what she means by using her whole body.

Sacchi was very excited at the potential of the JSS but raised the specific issue that AI couldn’t intuitively understand things the way a person with gymnastics experience could.

A lot of work needed to be done — and continues to be done — to “parameterize” everything just so JSS could “see” things like a human, though not make errors like one.

Sacchi pointed to a couple of issues that the system has not yet been able to overcome. When we spoke again about a month after the world championships, Sacchi told me that the JSS cannot determine whether two skills done consecutively on the beam are actually connected in one continuous movement. This is one of the ways that gymnasts rack up tenths, linking different skills for connection bonus or value (CV). This is one of the most challenging aspects for human judges to evaluate since not all credited connections feature the transfer of speed and momentum from one skill into the next, which would make the connection easy to perceive. This is especially true if you change direction in a series or if you’re combining dance and acrobatic skills. There’s usually some sort of pause or hesitation, however slight. It’s up to the gymnast to move briskly between elements, even if the skills don’t lend themselves to seamless connections. If you’re going to have a system like the JSS around to help determine difficulty scores, it needs to be able to handle connections, especially since on an event like beam, they are the most contested part of the D-score, and isn’t that what the JSS is there to address, after all?

I asked Ayako Kawahito, a former gymnast and current judge who is working as a manager in the Human Digital Twin division of Fujitsu, about the beam connection problem. The issue, she said, is not about movement but about stillness. Kawahito pointed out that a person can appear to be completely still, according to the human eye, but if you subjected them to an MRI, their “joint coordinates are always moving around.” In order for the JSS to be able to assess connection value, Fujitsu and the FIG have to agree on the “(amount of) movement that can be considered a stop by a human judge,” she said.

Movement that can be considered a stop. Sounds a bit like an oxymoron, but it’s the kind of question that must be answered if the JSS will be able to help the judges in the places they need it the most.


If you were in Antwerp at the world championships and wandered into the Fujitsu booth, you’d be forgiven for temporarily forgetting you were at a gymnastics competition. There was very little inside to suggest that you were even at a sporting event of any kind. Monitors were hung on the bare white walls, but they didn’t show videos of gymnasts performing routines or even single elements, overlaid by JSS analysis. Instead, they showed how the technology behind the JSS could be used for fraud and theft prevention.

Though this might come as something of a surprise, it’s not really the left turn that some might imagine it to be. There’s a long tradition of the Games being used as a showcase for new surveillance and security technology. “The Olympics are often used to be kind of a showroom,” Dennis Pauschinger, a researcher at the University of Neuchâtel, told me in 2019 when I was working on a story about the global anti-Olympic movement.

The Fujitsu booth experience began with a simplified version of the JSS that you could play around with. I stood in front of a camera, which projected my movements onto a large screen and labeled them appropriately. It would say which hand you raised and what it was doing. “The judging system is based on what we call ‘pose estimation,’” Mike Fournigault, a Fujitsu AI architect, explained to me. “With cameras, we are able to reconstruct the pose of the body of people and to understand where are the hands, where are the arms, what are they doing with their hands, with their arms, with their legs?”

This is the kind of technology that is used for self-driving cars, with incredibly mixed results. In 2018, Uber’s self-driving car could delineate between a person walking and a person riding a bike but could not reconcile the existence of a 49-year-old woman walking her bike in Tempe, Arizona; the vehicle struck and killed her. At least the stakes for JSS aren’t life and death — though, to the athletes, it can sometimes feel that way.


I was shocked how much of Fujitsu’s booth was dedicated to crimes — not of the sports judging variety, but actual chargeable offenses. The monitors showed how this pose estimation might be applied to situations outside of sports. One showed how it could help prevent car theft; the other demonstrated how it can discern whether people were getting up to no good in the self-checkout line, such as putting an item in their bag without first scanning it. In the press conference, there was also mention of its applications in healthcare and rehab settings, which is not hard to imagine with a technology that can measure body movements and angles as precisely as the JSS can.

“There has been increasingly this sense that we can’t just end with gymnastics because, you know, obviously it was a very expensive process to develop JSS,” Andrew Kane, then Fujitsu’s deputy head of international public relations, told me in Antwerp. Fujitsu’s end goal was never gymnastics.

Later, I follow up with Fujitsu and receive a somewhat evasive answer. “We demonstrated different solutions related to Human Motion Analytics (HMA), which were for more than just gymnastics/sports,” Yuka Hatagaki of Fujitsu’s global PR wrote in an email about the booth’s contents. “The HMA technology that can analyze human movement with high precision cultivated through JSS can be applied to various industries, such as healthcare, ergonomics, and entertainment besides monitoring and theft prevention.”

JSS was being developed as a means of capturing the body, to synthesize the great range of human motion into something that could be understood by a computer. What gymnastics offered was a massive set of training data to help train the AI. Fujitsu mentioned additional uses in follow-up correspondence, including applications for physical therapists to develop hyper-specific programs for patients and using gait analysis to detect early signs of dementia in the elderly, which sounds very promising, especially as someone with a mother in cognitive decline.

All of this technology is built on the back of what I was witnessing around me in Antwerp. The heights of athleticism — and the competition as a whole — were used to feed a system that is repurposed and resold as a tool of surveillance. A solution in search of profit.


On the morning of the final day of competition in Antwerp, I was allowed to sit in the beam judges seat while the JSS was being calibrated and the arena was being set up for the evening’s competition. The field of play was clean, not yet covered in a white, chalky film, as it would be later when the gymnasts arrived to warm up. Some athletes mark the beam with chalk as a cue for where to start their acrobatic series. All of them douse themselves in the white stuff to mop up sweat on their feet and hands, both of which they need to grip the apparatus. It’s even worse over at the uneven bars where the whole apparatus is covered in the stuff. At a gymnastics meet, magnesium is always in the air.

In person, the beam seems smaller than it does on TV. When you’re watching on television, the camera zooms in on the apparatus and athlete. It’s practically all you see. Live, the equipment and the gymnast are set against the massive arena. You don’t get a sense of that scale on your screen. Still, the action seems more impressive in person, even if everything and everyone appears smaller. The added dimension really makes a difference. And in some cases, so does the massive arena. There are gymnasts out there, like Simone Biles, who, despite her diminutive stature, seem to be able to truly fill the space.

As an exercise, I tried to imagine what it would be like to actually rigorously evaluate a routine, to look at it piece by piece, and find favor or fault with it when medals are on the line. Imagining that burden left me with a queasy anxiety. Years of watching and analyzing the sport, mostly from the comfort of my couch, qualified me to do exactly what I was in Antwerp to do — report on a gymnastics competition — and little more, my success at identifying the credited switch ring notwithstanding.

“You cannot duplicate [that pressure] when you sit in your chair and in front of you are the best gymnasts, maybe trying to qualify for the Olympic Games,” Sacchi told me. She said that even after all of her years as a judge, she is still nervous before big events. At least the JSS can’t experience anxiety.

I get why, with so much on the line, you’d reach for a technology that promises to overcome human limitations. What the JSS offers is not only the promise of accuracy but also consistency, across rounds of competition, across several days of competition. It will not tire after a 12-hour judging day the way that human judges are wont to do. Gymnasts and coaches don’t like competing in the earliest subdivisions for a reason: the judges are fresh, and their figurative pencils — they actually use tablets — are sharp, and as a result, the execution scores tend to be lower. (The JSS doesn’t yet address the execution score, but I imagine that this is the eventual goal for the technology and would make the system more useful in the long term.)

Some of the hopes that are being pinned on the JSS, such as increased transparency, which Watanabe mentioned in his opening remarks at the conference, seem misplaced. Yes, the JSS can provide a lot of detailed information, but that is not the same thing as transparency. The FBI collects lots of information on US citizens, often through high-tech means, but no one would accuse it of being transparent. (Any journalist that has tried to get info from the FBI knows that it’s actually a black hole.) The fact that the JSS is collecting all this data doesn’t mean it will be shared with the gymnastics community. Ultimately, transparency is not a question of technology but of policy.


The yearslong process that it took to create the JSS illuminated the complexity of the judging task, which simultaneously calls for technological intervention and impedes it at every turn. Some of that complexity is unavoidable, even desirable. It shows a sport that is constantly evolving, its athletes always innovating. And some of it points to opportunities to streamline and improve the rules.

Later that day, when I was back in the media section where I belonged, I watched the eight women who qualified for the beam final. Biles won the gold there, her performance clean and surefooted. Her pace was brisk, moving from one element to the next with only the most minor of adjustments. She competed with the nonchalance of someone who has been there many times before. In second was Chinese gymnast Zhou Yaqin, a newcomer who showed a lot of style and precision in her world championship debut. She was rewarded with a 14.7 for her efforts, just a tenth behind Biles. Zhou’s coach immediately filed an inquiry because they had been anticipating a higher D-score, based on what she had been previously awarded. It would all come down to the question of those frustrating connections, the ones that the JSS is not yet able to adjudicate.

After a few minutes, the announcer told the audience that there had been no change. Biles would remain in first, Zhou in second. From my seat, a few rows above the judges, this result seemed fair — though, if it had gone the other way and Zhou had received the additional tenth, tying Biles, I might’ve felt the same way. With so little separating gymnasts, who wins and who loses can, at times, feel more like a judgment call. Everything can be endlessly debated on social media. This can have the effect of making it feel like no results are ever truly final. One of the hopes for the JSS is to offer finality to the outcomes so that when an athlete looks back on their careers, the counterfactuals they might spin have nothing to do with the competency of the judges evaluating them that day.

“When I speak to coaches, judges, administrators, [I] say the job of the judge is to separate gymnasts,” Butcher said. The judges’ job is to slice finely, to find the difference between gymnasts, and rank them accordingly.

Judging and scoring in gymnastics can certainly be improved, and perhaps the JSS can help along that trajectory. But we’ll never escape human judgment altogether, no matter how discomfiting that thought might be.

Written by Dvora Meyers
This news first appeared on https://www.theverge.com/c/24182327/olympics-gymnastics-ai-judging-fujitsu-jss-fig under the title “The AI Keeps the Score”. Bolchha Nepal is not responsible or affiliated towards the opinion expressed in this news article.