Fun With Numbers

by Jason Anthony · Published November 29, 2016 · Updated November 29, 2016

This is the time of the year where every rally and motorsport news outlet puts out their obligatory ranking of the top drivers in the WRC. In some ways, this seems like a bit of a pointless exercise. That’s why we have a championship table with points… right? However, while the championship points system does a good job of measuring success by rally results, it doesn’t quite tell the story of how those rallies, and the season as a whole played out. For example, Kris Meeke won two rallies this year, and if he wasn’t unlucky, he could have won more. However, if you look at points alone, he is ranked 9th in the championship, because he only competed in 7 events. Meeke’s championship position doesn’t tell the story of how often he was right up there challenging Sebastien Ogier on several rallies. It is for reasons like this that everyone loves to debate how the drivers should be ranked at the end of a season. Usually, it is done quite subjectively by referring to anecdotal memories of events that played out over the course of the season. I could have done this as well, but then it would have been my (very unqualified) opinion up against everyone else. Instead, I decided to brew a pot of tea and spend a rainy afternoon trying to figure out a way to objectively determine the success (or lack thereof) of each WRC driver. Before I go any further, I need to thank the people at eWRC-results.com. Without all of their work, I wouldn’t have been able to burn through several hours happily pouring over data from the 2016 WRC season. I’m going to warn you right now… if you don’t like numbers, you might not want to read the rest of this article! However, if you’re one of the “Gary Boyds” of this world, you might find this stuff pretty interesting.

So, where to begin when diving into an endeavor like this? I started by jotting down some ideas of what constitutes a “good” or “bad” rally. The good included things like wins, podiums, and points-scoring finishes. The bad included things like crashes and non-points scoring finishes. I took the raw numbers of the bad stuff and subtracted it from the raw numbers of the good stuff to form a provisional ranking. Not surprisingly, the results were quite skewed, because this system didn’t account for the amount of rallies that each driver participated in. For example, it somehow put Eric Camilli ahead of Craig Breen, even though Craig had scored his first ever WRC podium in Finland, and Eric’s best finish this season was 5th. In addition, my data didn’t account for power stage points. Because of this, Jari-Matti Latvala, who had a torrid weekend in Germany where he eventually finished 48th, got credit for having a “good” rally because he scored 2 points on the power stage. Clearly, it was time to make another mug of tea and head back to the drawing board.

For my second attempt, I decided to use the same criteria, but convert the raw numbers into percentages. This would solve the problem that some drivers (such as Meeke and Breen), who participated in only a few rallies this year, were being disadvantaged. I also discounted power stage points from my data by changing “points-scoring percentage” to “top 10 percentage”. This would get rid of the issue that I had with Latvala in Germany. When I ran those numbers, the data seemed to make more sense, and it rewarded guys like Martin Prokop who participated in only 4 rallies this season but scored points on 3 of them. At this point, it had already been about 2 hours, and I was almost ready to be done with the exercise, but my rally-driven obsessive compulsive disorder kept bugging me. Something still didn’t seem quite right. My results were too heavily weighted by the final results of rallies, and for that reason, they were basically the same as the championship standings. They didn’t capture how well each driver performed over the course of a rally regardless of how they finished. For example, until he was unlucky and clipped a rock in Monte Carlo, Kris Meeke was trading fastest stage times with Sebastien Ogier. With this in mind, I decided it was time for another go.

In an effort to capture individual rally performance with my algorithm, I decided to include stage wins and stages leading the rally into my data. Stage wins are the best statistical indicator we currently have of outright speed. It recognizes the talent of guys like Jari-Matti Latvala and Ott Tanak who, when the conditions are right, can blow away the rest of the field. Stages leading, on the other hand capture the ability of a driver to manage a lead once he gets it. This statistic rewards guys like Dani Sordo who won only 2 stages on Rally Catalunya this year, but held off Sebastien Ogier’s challenge and retained the lead of the rally for 9 stages. The ability to manage pressure from behind is just as important as outright speed, so these two statistics do a good job of telling the story of how a particular driver performed over the course of an entire rally. When this data was entered into the algorithm, guys like Meeke, Tanak, and Latvala who spent a lot of time going fast this season, saw a big jump in my rankings.

At this point, my bladder was about to burst from all the tea, so it was time for a quick bathroom break. It was in the bathroom (where many good ideas are born) that I realized my formula was still missing one thing. While it captured the final rally results and the performance of individual rallies, it didn’t account for the concept of “form”. What I mean by this is that it didn’t show the ability of a driver to string together a run of good results or snap out of a streak of poor finishes. To achieve this, I decided to figure out the longest consecutive streaks of wins, podiums, top 10’s, and non-points finishes for each driver. This statistic rewarded guys like Thierry Neuville who strung together an impressive run of 5 podium finishes to close out the end of the season. It also penalized drivers such as Latvala and Tanak, who had some amazing results that were followed immediately by a retirement or non-points scoring finish. It also captured the difference in form between Andreas Mikkelsen and Thierry Neuville. While they finished quite close together in the final championship standings, Thierry’s ability to get onto a hot streak at the end of the season was what gave him 2nd place in the championship despite Andreas winning the final rally of the season in Australia.

So, after several hours, countless mugs of tea, and a few enlightening trips to the bathroom, here’s what I came up with:

What do you think? How does this compare to your personal ranking of the WRC’s drivers this season? Yes, at the end of the day, statistics can be quite arbitrary, but overall, I’m pretty pleased with these results. First of all, it shows exactly how incredibly dominant Sebastien Ogier was this season despite having to deal with road sweeping on the gravel rallies. He managed to still finish on the podium 92% of the time! Secondly, it helps differentiate some of the “mid-pack” drivers. Despite winning a rally, by his own admission, Jari-Matti Latvala had a pretty lousy season. This is reflected in this table where his percentage of finishes outside of the top 10 really brought him down in the rankings. This data also rewards consistency which means that both Dani Sordo and Mads Ostberg were ranked above drivers who won rallies this season. I’m OK with this because I do believe that as a whole, they both had better seasons than Hayden Paddon and Jari-Matti Latvala. However, the fact that Craig Breen finished above Hayden Paddon based on his run of 5 points finishes in a row seems a bit fishy. Don’t get me wrong, I’m a huge Craig Breen fan, but I’m not sure that his consistency over 5 rallies is enough to eclipse Paddon’s amazing victory in Argentina. Lastly, I like that this system recognizes how well some of the privateers did this season. It’s easy to forget because we don’t see him in the WRC very often, but Martin Prokop has really become a very good rally driver. He is by far the best privateer, and he definitely showed better performance this season than guys like Camilli, Abberring, and Lefevbre who were running with factory teams. With all the talk about the factory teams and their new cars in 2017, it’s data like this that helps us remember how important the privateers are to the championship. Let’s hope the WRC realizes this as well.

So, what did we learn with this exercise? Probably nothing, but if you’re a statistics nerd like me, I hope you enjoyed reading these ramblings. At the very least, it was an enjoyable way to spend a rainy afternoon. Shoot me a comment to let me know how you would tweak this algorithm to improve the results. I’m really curious!