Patrik Laine is playing his first season in the NHL and currently leads the league in scoring with 12 goals. With 51 shots on goal, his shooting percentage is 23.5 %. How does this number compare to great goal scores over the years? I downloaded NHL player statistics for each season from 1967–1968 onwards, which was the first year the number of shots was recorded. I then calculated career summaries for each player. But if we simply look for players with the highest shooting percentages, the first 12 all scored one goal with just one shot. Obviously these are not the best shooters, just some random flukes. In its official leaderboard for all-time career shooting percentage (S%), the NHL only counts players with at least 800 shots. This is what the top 10 looks like:
Requiring a minimum number of shots (or goals) does get rid of the flukes, but how can you compare a rookie player? What kind of method could be used to take the scarcity of evidence into account, until the player catches up with the threshold? David Robinson has written a terrific series of articles for situations like this, using baseball statistics as an example. I’ll follow one of his tutorials and use empirical Bayes estimation to obtain a more reliable picture. In short, we’ll first use all players’ data to obtain an estimate for a beta prior, and then use each player’s own data to update the prior based on individual evidence. Put another way, we start by assuming everyone is average, and if and only if they show more and more evidence to the contrary, we start to gradually consider them as special. For a more much better description, please see the original blog post. All R code is also adapted from that post.
Before we get to estimation of the beta prior, let’s first check if we should use all of the available data or only a subset. Since in this case we are estimating only one prior, we would like all players to come from a single distribution. As the gameplay has surely changed a bit over the years, let’s look at the overall shooting percentages over the 49 seasons. Also, since defensemen normally play futher away from the opponent’s net than forwards, player position is likely to have an effect as well. Let’s look at shooting percentages separately for each position (excluding players with less than ten goals).
As we can see, shooting percentages used to be much higher around the 1980s. For this simple analysis, I’ll only include data from season 1996–1997 onwards. I’ll also leave out the defensemen, as they tend to have lower shooting percentages. (I hope to write follow-ups posts later with all of the data included and handled properly, either using some of the other approaches David has described for empirical Bayes, with a standard Bayesian analysis, or maybe even both.)
Overall, the average shooting percentage for all forwards over the last 20 seasons is 11.0 %. Next, let’s estimate a beta prior from the data and see how it fits:
Shooting percentages can now be adjusted using this prior. This will shrink individual players’ estimates towards the horizontal dashed line. The more evidence there is for an individual (the brighter the blue dot), the more we trust it. The darker dots show a lot of shrinkage, whereas the light ones are much closer to the diagonal red line, which marks the case of no shrinkage at all.
Finally, let’s look at the ranking (from season 1996–1997 onwards) for shooting percentage estimated with empirical Bayes (EB). Patrik Laine currently sits at number 40, and only time will tell where he moves on that list. But what we do know today, is that he is one of only four 18-year-olds to score two hat tricks in the NHL (others being Jack Hamilton, Dale Hawerchuk, and Trevor Linden), and he still has the rest of the regular season to hunt for a third one before his 19th birthday on April 19th, 2017.