The Men’s 100 meters: a story in 5(ish) parts and more than a few graphs

INTRODUCTION

This series of shorts is the answer to the question: “What happens when a sports fan with a modelling (mathematical rather than photographic) background comes across a phenomenal data source?”. As it turns out, the answer involves a few late nights, a smattering of simple statistics, an aesthetically pleasing graph or two, and of course a healthy dose of armchair analysis.

The data consists of roughly 4500 (of which 3000 are legal) sub-10.1 second times for the Men’s 100m retrieved from alltime-athletics.com. The data set is incredibly rich, allowing for several analyses to be done. How many separate times can one evaluate the same set of data? It turns out, many! For the series, I have planned (roughly) the following analyses over the next few weeks:

1.     1968-2017: “Are times slumping?” and other post-Bolt questions

2.     Sub-10 by the numbers…

3.     Age analysis I & II: Automatic Average Asafa (AAA)’ & ‘Fan Robberies’

4.     Age analysis III: Justin Gatlin and other Suspiciously Late Bloomers (SLBs)*

5.     TrayBrom, C_Cole, Andre De Grasse (Tyson) & the stars of the future

6.     Bonus round: Geography of the sub-10

* Featuring Linford’s Lunchbox!

Part I: The 100m 1968-2017: “Are times slumping?” and other post-Bolt questions

August was a difficult month for many athletics fans. Usain Bolt finished third behind Justin Gatlin and Christian Coleman in the 100m Men’s Final of the World Championships in London and then retired after an injury during the 4x100m. It was a sad end to a legendary. Maybe not as sad as the current political situation in America** or various humanitarian crises around the world, but sad in a sporting kind of way. Like when the Springboks lose to the All Blacks by 2 points. (Oh how I miss those days)

** [Ed – I refer you back to the disclaimer, see: “Personal Bias and Humour”]

Bolt is undoubtedly the greatest sprinter of all time. His numbers are astonishing and already well covered, but here is just one gem that caught my eye:

“The average of Bolt’s 10 fastest 100m races is 9.73. Only Gay, Blake, Powell and Bolt himself have ever gone quicker than that.”

And I checked the calculation: the average of his top 10 times is really 9.73! Running a 9.73s would be the 9th fastest 100m of all-time! And the average of his top 20 times is 9.76!!! Running 9.76s would make you the 6th fastest man in history!!!! [I am running out of exclamation points here]

But this is after all not a story about Usain, he has left us. It is over. Just let it go total diet, Francois, and move on. This is a story about how much fun one over-sized kid-engineer can have with a data set. Naturally the first step is to plonk all the data onto one graph and see what it looks like (Figure 1).

FIGURE 1: All sub-10.1s times since 1968 plonked onto one graph

The first trend is obvious. Since the 1970s there is an almost linear improvement (green arrow) in times up to 2012. Also the density (or rather frequency) of sub-10.1s times has increases. The improvement could be partly due to having more professional athletes, more big races and better timing equipment, but likely this is still representative of an overall improvement of the sport. Since 2012 however, there an apparent slump (orange arrow).

[Two drug cheats are also shown on the graph, mainly just emphasize the obscenity of Ben Johnson’s 1988 run.]

But as fans we aren’t interested in the sub-10s density. We become attached to big-name stars and WRs (world records, not wide receivers!). Let’s zoom in a little and see how times developed (Figure 2). For context: although I was aware of Carl Lewis and Linford Christie, my interest in sprinting began at age 9 with the 1996 Atlanta Olympics. While Donovan Bailey took the gold in 9.84s, Frankie Fredericks grabbed silver for Namibia (or is it Nambia?) and Africa. From then on, I didn’t miss an Olympics 100m Final.

FIGURE 2: Pretty much every (legal and illegal) 100m time since 1988

Figure 2 reveals a few insights:

  • Around the early 2000s, (legal) times dropped off badly following the bans of several drug cheats involved scandals such as BALCO***
  • Since the 2006, there have far fewer disallowed drug times, but seemingly a higher proportion of “fast” times disallowed (i.e. sub-9.9s vs sub-10.1s drug times)
  • The 10th fastest time metric has been relatively consistent over the ten years from 2006-2016
  • The World Leading and 10th fastest yearly times do track each relatively well over the entire period [well spotted Captain Obvious] [except for 2008, 2009 & 2012 when Usain Bolt was breaking human biomechanics sciences]

*** The BALCO scandal is of course a whole saga unto itself involving many sports, but in terms of men’s sprinting then WR-holder Tim Montgomery was the most notable participant. The fact that none of the BALCO group ever tested positive, but were only discovered via other evidence is worrisome. Somewhat related to this, if you are interested in the latest scandal involving the Russian Athletics Federation in the lead up to the Rio games, watch Icarus on Netflix (8.5/10).

Finally, let’s look at the trend of the Average Yearly Top 10 times (Figure 3) and do some very basic analysis on it.

FIGURE 3: Average of the yearly Top 10 100m Men’s times (1984-2017)

Figure 3 requires a brief introductory discussion, especially for the statistically uninitiated:

  • In statistics, “μ” is used to represent an average – in this case the average of the 10 fastest legal times recorded in a given year
  • “σ” is called the standard deviation – the calculation is unimportant here, but this value indicates the variance for the top 10 times each year i.e. when σ is small, all 10 of the times were quite close together and vice versa
  • The two dotted lines “μ-2σ” and “μ+2σ” give an indication of the degree of spread in the times for each year e.g. in 2009 there was a big difference between Bolt’s top time (9.58s) and the 10th fastest time that year (9.86s also by Bolt)
  • In this figure I have used two standard deviations, which corresponds to the statistical certainty (or confidence interval) of 95% – be careful though, this is not the same as saying, “The average will be between the two dotted lines 95% of the time”
  • Finally, a yellow linear trend (straight line) is fitted to the Average of the Yearly Top 10 times from 1984-2016

The linear trend is displayed for illustrative purposes only (in order to highlight points the discussions below) and is otherwise a somewhat daft idea (especially if you are thinking about forecasting!), because:

  • Male sprinting performance will not increase linearly in the long-term [projecting the trend into the future predicts that we will be able to run the 100m in exactly zero seconds somewhere in the year 3277 and I am assuming we will make use of time-travel thereafter in order to achieve negative times]
  • Haven’t we already seen the impacts of once-in-a-generation athletes like Bolt?!
  • Just go read any book by Nicholas Nassim Taleb!

So with ALL the technospeak out the way, what does Figure 3 show us:

  • The data is cyclic – it (generally) goes above/below the trendline for several periods at a time, which is unlikely if the behaviour were random
  • The data is not cyclic in the way we might expect: athletes build towards their “Olympic Dream” which would make you think that Olympic years might be the peak in the cycle where athletes are “going all-out for the gold”
  • But only 5 out of 9 (which is no better than random) Olympic years occur above the trendline
  • And, 6 out of 9 (slightly less random, but still) Olympic years show improvement from the preceding year
  • On only three occasions (2003, 2016, 2017) does the linear trendline not fall with the confidence band (dotted lines) of the average

The last few points an be interpreted in two or three different ways, depending on whether you are a glass half empty or glass half full type of person. But since it is late, I will go to bed and leave you with your glass of water and your interpretation. My opinion is that we are in a slump but are also due for an uptick in times – I just hope that we don’t have to wait for Doha ’19 or Tokyo ’20!

In the next few sections I will discuss some of the cool numbers I have found in the data analysis, poke some fun at Asafa Powell and also ponder on what might have been. I hope you will join me for that.

On an aside, if you are looking for sprinter running shoes, take a look at the linked sprinter running shoe review.