<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>sports &#8211; ilari.scheinin.fi</title>
	<atom:link href="https://ilari.scheinin.fi/category/sports/feed/" rel="self" type="application/rss+xml" />
	<link>https://ilari.scheinin.fi</link>
	<description></description>
	<lastBuildDate>Fri, 18 May 2018 12:37:48 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.0.9</generator>
	<item>
		<title>Ranking NHL&#8217;s best shooters – 2018 update</title>
		<link>https://ilari.scheinin.fi/ranking-nhls-best-shooters-2018-update/</link>
		
		<dc:creator><![CDATA[Ilari Scheinin]]></dc:creator>
		<pubDate>Thu, 12 Apr 2018 07:29:09 +0000</pubDate>
				<category><![CDATA[R]]></category>
		<category><![CDATA[sports]]></category>
		<guid isPermaLink="false">https://ilari.scheinin.fi/?p=817</guid>

					<description><![CDATA[Last year, I published an analysis that ranks NHL&#8217;s best shooters with Bayesian multilevel modeling. Now that the 2017–2018 regular season is over, I have repeated the analysis with an additional year&#8217;s worth of statistics (it starts from season 1967–1968). &#8230; <a href="https://ilari.scheinin.fi/ranking-nhls-best-shooters-2018-update/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
										<content:encoded><![CDATA[<p>Last year, I published an analysis that ranks <a href="https://ilari.scheinin.fi/ranking-nhls-best-shooters/">NHL&#8217;s best shooters</a> with Bayesian multilevel modeling. Now that the 2017–2018 regular season is over, I have repeated the analysis with an additional year&#8217;s worth of statistics (it starts from season 1967–1968). Some highlights are shown below, and the full results table is available <a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2018/04/sharp-shooter-2018.html">here</a>.</p>
<p>I&#8217;ll start with a brief recap of the methodology. More details are available in the <a href="https://ilari.scheinin.fi/ranking-nhls-best-shooters/">original blog post</a>.</p>
<p>There are multiple ways to model players&#8217; shooting ability. The simplest is to use the <em>shooting percentage</em>, which is the number of <em>goals scored</em> divided by the number of <em>shots on goal</em>. The metric is simple to define and straightforward to use, but it has two shortcomings.</p>
<p>The first shortcoming is that while it works well for players with a large number shots of goal, there&#8217;s more room for random chance for players with few shots. For this reason, the official <a href="https://en.wikipedia.org/wiki/List_of_NHL_statistical_leaders#Regular_season:_Shooting_percentage">career shooting percentage leaderboard</a> only includes players with at least 800 shots. What if one wants to evaluate the shot of a young player who has only played for a couple of seasons?</p>
<p>The second point is not as clear of a shortcoming, but many have argued that the gameplay and level of goaltending have changed over the years. As a result, so has the difficulty of scoring goals, and this change is not reflected in shooting percentages of players from different eras.</p>
<p>To take these two issues into account, I have analyzed shooting percentages with Bayesian multilevel modeling. As always with modeling, there are assumptions that are simplifications of reality. The most important one here is the assumption that each player has an innate level of <em>skill</em>, or shooting ability, that does not change throughout their careers. While this is naturally not exactly realistic, it gives us a nice metric that can be used to rank players. And since players&#8217; careers typically span multiple seasons, it also allows us accommodate scoring difficulty not being constant between seasons.</p>
<p>If you&#8217;re interesting in more details, please read the <a href="https://ilari.scheinin.fi/ranking-nhls-best-shooters/">original blog post</a>. It also contains full R and Stan code to perform the analysis, and some plots to evaluate its performance.</p>
<h2 id="results">Results</h2>
<p>After updating the model with data from season 2017–2018, here are the overall top 10 shooters. Any changes in ranking from last year are highlighted with arrows and color.</p>
<div style="overflow: scroll;">
<table style="font-size: smaller;">
<thead>
<tr class="header">
<th style="text-align:left;">   </th>
<th style="text-align:left;"> player </th>
<th style="text-align:left;"> position </th>
<th style="text-align:left;"> career </th>
<th style="text-align:right;"> shots </th>
<th style="text-align:right;"> goals </th>
<th style="text-align:right;"> raw </th>
<th style="text-align:right;"> model </th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align:left;"> 1 </td>
<td style="text-align:left;"> Alex Tanguay </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1999–2016 </td>
<td style="text-align:right;"> 1,525 </td>
<td style="text-align:right;"> 283 </td>
<td style="text-align:right;"> 18.56% </td>
<td style="text-align:right;"> 19.93% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 2 </td>
<td style="text-align:left;"> Craig Simpson </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1985–1995 </td>
<td style="text-align:right;"> 1,044 </td>
<td style="text-align:right;"> 247 </td>
<td style="text-align:right;"> 23.66% </td>
<td style="text-align:right;"> 19.85% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 3 <span style="color: blue;">↑ 1</span> </td>
<td style="text-align:left;"> Andrew Brunette </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1995–2012 </td>
<td style="text-align:right;"> 1,516 </td>
<td style="text-align:right;"> 268 </td>
<td style="text-align:right;"> 17.68% </td>
<td style="text-align:right;"> 18.84% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 4 <span style="color: red;">↓ 1</span> </td>
<td style="text-align:left;"> Steven Stamkos </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2008–current </td>
<td style="text-align:right;"> 2,088 </td>
<td style="text-align:right;"> 348 </td>
<td style="text-align:right;"> 16.67% </td>
<td style="text-align:right;"> 18.81% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 5 </td>
<td style="text-align:left;"> Sergei Makarov </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1989–1997 </td>
<td style="text-align:right;"> 610 </td>
<td style="text-align:right;"> 134 </td>
<td style="text-align:right;"> 21.97% </td>
<td style="text-align:right;"> 18.64% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 6 </td>
<td style="text-align:left;"> John Bucyk </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1967–1978 </td>
<td style="text-align:right;"> 1,723 </td>
<td style="text-align:right;"> 329 </td>
<td style="text-align:right;"> 19.09% </td>
<td style="text-align:right;"> 18.38% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 7 </td>
<td style="text-align:left;"> Mark Parrish </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1998–2011 </td>
<td style="text-align:right;"> 1,247 </td>
<td style="text-align:right;"> 216 </td>
<td style="text-align:right;"> 17.32% </td>
<td style="text-align:right;"> 18.14% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 8 </td>
<td style="text-align:left;"> Charlie Simmer </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1974–1988 </td>
<td style="text-align:right;"> 1,531 </td>
<td style="text-align:right;"> 342 </td>
<td style="text-align:right;"> 22.34% </td>
<td style="text-align:right;"> 18.09% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 9 <span style="color: blue;">↑ 48</span> </td>
<td style="text-align:left;"> Patrik Laine </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2016–current </td>
<td style="text-align:right;"> 445 </td>
<td style="text-align:right;"> 80 </td>
<td style="text-align:right;"> 17.98% </td>
<td style="text-align:right;"> 18.06% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 10 <span style="color: blue;">↑ 7</span> </td>
<td style="text-align:left;"> Paul Byron </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2010–current </td>
<td style="text-align:right;"> 388 </td>
<td style="text-align:right;"> 70 </td>
<td style="text-align:right;"> 18.04% </td>
<td style="text-align:right;"> 17.96% </td>
</tr>
</tbody>
</table>
</div>
<p>As the most interesting changes from last year are for active players, here is also their top 10.</p>
<div style="overflow: scroll;">
<table style="font-size: smaller;">
<thead>
<tr class="header">
<th style="text-align:left;">   </th>
<th style="text-align:left;"> player </th>
<th style="text-align:left;"> position </th>
<th style="text-align:left;"> career </th>
<th style="text-align:right;"> shots </th>
<th style="text-align:right;"> goals </th>
<th style="text-align:right;"> raw </th>
<th style="text-align:right;"> model </th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align:left;"> 1 </td>
<td style="text-align:left;"> Steven Stamkos </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2008–current </td>
<td style="text-align:right;"> 2,088 </td>
<td style="text-align:right;"> 348 </td>
<td style="text-align:right;"> 16.67% </td>
<td style="text-align:right;"> 18.81% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 2 <span style="color: blue;">↑ 10</span> </td>
<td style="text-align:left;"> Patrik Laine </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2016–current </td>
<td style="text-align:right;"> 445 </td>
<td style="text-align:right;"> 80 </td>
<td style="text-align:right;"> 17.98% </td>
<td style="text-align:right;"> 18.06% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 3 <span style="color: red;">↓ 1</span> </td>
<td style="text-align:left;"> Paul Byron </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2010–current </td>
<td style="text-align:right;"> 388 </td>
<td style="text-align:right;"> 70 </td>
<td style="text-align:right;"> 18.04% </td>
<td style="text-align:right;"> 17.96% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 4 <span style="color: red;">↓ 1</span> </td>
<td style="text-align:left;"> Brad Marchand </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2009–current </td>
<td style="text-align:right;"> 1,426 </td>
<td style="text-align:right;"> 226 </td>
<td style="text-align:right;"> 15.85% </td>
<td style="text-align:right;"> 17.77% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 5 <span style="color: red;">↓ 1</span> </td>
<td style="text-align:left;"> Adam Henrique </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2010–current </td>
<td style="text-align:right;"> 1,068 </td>
<td style="text-align:right;"> 166 </td>
<td style="text-align:right;"> 15.54% </td>
<td style="text-align:right;"> 17.20% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 6 <span style="color: blue;">↑ 2</span> </td>
<td style="text-align:left;"> Mark Stone </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2012–current </td>
<td style="text-align:right;"> 613 </td>
<td style="text-align:right;"> 95 </td>
<td style="text-align:right;"> 15.50% </td>
<td style="text-align:right;"> 16.55% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 7 <span style="color: blue;">↑ 4</span> </td>
<td style="text-align:left;"> Sean Monahan </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2013–current </td>
<td style="text-align:right;"> 929 </td>
<td style="text-align:right;"> 138 </td>
<td style="text-align:right;"> 14.85% </td>
<td style="text-align:right;"> 16.42% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 8 <span style="color: blue;">↑ 44</span> </td>
<td style="text-align:left;"> Auston Matthews </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2016–current </td>
<td style="text-align:right;"> 466 </td>
<td style="text-align:right;"> 74 </td>
<td style="text-align:right;"> 15.88% </td>
<td style="text-align:right;"> 16.42% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 9 <span style="color: red;">↓ 4</span> </td>
<td style="text-align:left;"> Sidney Crosby </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2005–current </td>
<td style="text-align:right;"> 2,843 </td>
<td style="text-align:right;"> 411 </td>
<td style="text-align:right;"> 14.46% </td>
<td style="text-align:right;"> 16.30% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 10 <span style="color: blue;">↑ 8</span> </td>
<td style="text-align:left;"> Mark Scheifele </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2011–current </td>
<td style="text-align:right;"> 760 </td>
<td style="text-align:right;"> 113 </td>
<td style="text-align:right;"> 14.87% </td>
<td style="text-align:right;"> 16.24% </td>
</tr>
</tbody>
</table>
</div>
<p>The full results table with 5,573 players is available <a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2018/04/sharp-shooter-2018.html">here</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Ranking NHL&#8217;s best shooters with Bayesian multilevel modeling</title>
		<link>https://ilari.scheinin.fi/ranking-nhls-best-shooters/</link>
		
		<dc:creator><![CDATA[Ilari Scheinin]]></dc:creator>
		<pubDate>Mon, 17 Apr 2017 17:19:34 +0000</pubDate>
				<category><![CDATA[R]]></category>
		<category><![CDATA[R-bloggers]]></category>
		<category><![CDATA[sports]]></category>
		<guid isPermaLink="false">https://ilari.scheinin.fi/?p=777</guid>

					<description><![CDATA[Update April 12, 2018: This post is from spring 2017, so the results reflect statistics up to season 2016–2017. I have now re-run the analysis including data from season 2017–2018, and updated results are available here. Methodology is unchanged, so &#8230; <a href="https://ilari.scheinin.fi/ranking-nhls-best-shooters/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
										<content:encoded><![CDATA[<p><em><strong>Update April 12, 2018</strong>: This post is from spring 2017, so the results reflect statistics up to season 2016–2017. I have now re-run the analysis including data from season 2017–2018, and updated results are available <a href="https://ilari.scheinin.fi/ranking-nhls-best-shooters-2018-update/">here</a>. Methodology is unchanged, so everything below is still an accurate description of how the analysis was performed.</em></p>
<p>In this post I will look at the question of who have been the best shooters in the NHL. The metric I will use is the <em>shooting percentage</em>, which is the number of <i>goals scored</i> divided by the number of <i>shots on goal</i>. To deal with two issues that will be explained in the next section, I will use a technique called Bayesian multilevel modeling. If that sounds complicated, fear not; it works kind of the same way as human intuition. </p>
<p>After Jake Guentzel scored <a href="https://www.nhl.com/video/guentzel-impresses-in-debut/t-277350912/c-46596703">his first NHL career goal</a> during his first shift in his first game with his first shot, his <i>shooting percentage</i> was 100%. But no reasonable human being would say based on that alone that he was the best shooter of all times with that flawless record. Instead, to evaluate a new player, one tends to start with a vague assumption that their shooting ability is probably somewhat average, but can also be higher or lower. Then, as more and more evidence builds up, that assumption is updated, which leads to a more and more precise picture. This is essentially what &#8220;Bayesian modeling&#8221; means; start with a prior expectation and update it with the data you have.</p>
<p>There are many ways we can come up with that prior expectation. It could be the average <i>shooting percentage</i> of all players in the league, or defined more narrowly. Depending on the situation, we could for example look only at players who shoot from the right. Or players who are first-round draft picks. Or use the information we have on player position, and assess forwards and defencemen separately. After all, on average forwards do score more goals and in general shoot from much closer to the opponent&#8217;s net than defencemen. Here, the use of such a hierarchy, players grouped according to their position, is the &#8220;multilevel modeling&#8221; part.</p>
<p>When these two are put together, what Bayesian multilevel modeling means for this post is that I will use historical player statistics to estimate average performance separately for forwards and defencemen, and use this average as the starting point to evaluate each individual player. The more evidence there is for any given player, the further away we are ready to move from the average.</p>
<p>This post contains all the <a href="https://gist.github.com/ilarischeinin/7ef934c00033d3fc0803e27324eb110a">code</a> needed to perform the analysis with R and Stan, but the code sections can be freely skipped over if one is not interested in them. Impatient readers can also skip directly to the <a href="#results">results section</a>, or just look at the <a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2017/04/sharp-shooter.html" target="_blank">full result table</a>.</p>
<p><script src="https://gist.github.com/ilarischeinin/7ef934c00033d3fc0803e27324eb110a.js?file=01-chunk.R"></script></p>
<h2 id="data">Data</h2>
<p>Our data consists of the number of <i>goals scored</i> and <i>shots on goal</i> for each player for each regular season. NHL started to count <i>shots on goal</i> for season 1967–1968, so our data spans from that season to the just finished one 2016–2017. Here&#8217;s an excerpt of the first ten data rows. <a href="#footnote-1-777" id="note-1-777" rel="footnote">1</a></p>
<p><script src="https://gist.github.com/ilarischeinin/7ef934c00033d3fc0803e27324eb110a.js?file=02-chunk.R"></script></p>
<div style="overflow: scroll;">
<table style="font-size: smaller;">
<thead>
<tr class="header">
<th style="text-align:left;">   </th>
<th style="text-align:left;"> player </th>
<th style="text-align:left;"> position </th>
<th style="text-align:left;"> season </th>
<th style="text-align:right;"> shots </th>
<th style="text-align:right;"> goals </th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align:left;"> 1 </td>
<td style="text-align:left;"> Antti Aalto </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1997-1998 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 0 </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 2 </td>
<td style="text-align:left;"> Antti Aalto </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1998-1999 </td>
<td style="text-align:right;"> 61 </td>
<td style="text-align:right;"> 3 </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 3 </td>
<td style="text-align:left;"> Antti Aalto </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1999-2000 </td>
<td style="text-align:right;"> 102 </td>
<td style="text-align:right;"> 7 </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 4 </td>
<td style="text-align:left;"> Antti Aalto </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2000-2001 </td>
<td style="text-align:right;"> 18 </td>
<td style="text-align:right;"> 1 </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 5 </td>
<td style="text-align:left;"> Spencer Abbott </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2013-2014 </td>
<td style="text-align:right;"> 2 </td>
<td style="text-align:right;"> 0 </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 6 </td>
<td style="text-align:left;"> Spencer Abbott </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2016-2017 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 0 </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 7 </td>
<td style="text-align:left;"> Justin Abdelkader </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2007-2008 </td>
<td style="text-align:right;"> 6 </td>
<td style="text-align:right;"> 0 </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 8 </td>
<td style="text-align:left;"> Justin Abdelkader </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2008-2009 </td>
<td style="text-align:right;"> 2 </td>
<td style="text-align:right;"> 0 </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 9 </td>
<td style="text-align:left;"> Justin Abdelkader </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2009-2010 </td>
<td style="text-align:right;"> 79 </td>
<td style="text-align:right;"> 3 </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 10 </td>
<td style="text-align:left;"> Justin Abdelkader </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2010-2011 </td>
<td style="text-align:right;"> 129 </td>
<td style="text-align:right;"> 7 </td>
</tr>
</tbody>
</table>
</div>
<p>We can aggregate this by player to get raw career <i>shooting percentages</i>.</p>
<p><script src="https://gist.github.com/ilarischeinin/7ef934c00033d3fc0803e27324eb110a.js?file=03-chunk.R"></script></p>
<div style="overflow: scroll;">
<table style="font-size: smaller;">
<thead>
<tr class="header">
<th style="text-align:left;">   </th>
<th style="text-align:left;"> player </th>
<th style="text-align:left;"> position </th>
<th style="text-align:left;"> career </th>
<th style="text-align:right;"> shots </th>
<th style="text-align:right;"> goals </th>
<th style="text-align:right;"> raw </th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align:left;"> 1 </td>
<td style="text-align:left;"> Antti Aalto </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1997–2001 </td>
<td style="text-align:right;"> 182 </td>
<td style="text-align:right;"> 11 </td>
<td style="text-align:right;"> 6.04% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 2 </td>
<td style="text-align:left;"> Spencer Abbott </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2013– </td>
<td style="text-align:right;"> 3 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 0.00% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 3 </td>
<td style="text-align:left;"> Justin Abdelkader </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2007– </td>
<td style="text-align:right;"> 964 </td>
<td style="text-align:right;"> 85 </td>
<td style="text-align:right;"> 8.82% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 4 </td>
<td style="text-align:left;"> Pontus Aberg </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2016– </td>
<td style="text-align:right;"> 12 </td>
<td style="text-align:right;"> 1 </td>
<td style="text-align:right;"> 8.33% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 5 </td>
<td style="text-align:left;"> Dennis Abgrall </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1975–1976 </td>
<td style="text-align:right;"> 9 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 0.00% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 6 </td>
<td style="text-align:left;"> Ramzi Abid </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2002–2007 </td>
<td style="text-align:right;"> 112 </td>
<td style="text-align:right;"> 14 </td>
<td style="text-align:right;"> 12.50% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 7 </td>
<td style="text-align:left;"> Thommy Abrahamsson </td>
<td style="text-align:left;"> defenceman </td>
<td style="text-align:left;"> 1980–1981 </td>
<td style="text-align:right;"> 66 </td>
<td style="text-align:right;"> 6 </td>
<td style="text-align:right;"> 9.09% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 8 </td>
<td style="text-align:left;"> Noel Acciari </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2015– </td>
<td style="text-align:right;"> 33 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 0.00% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 9 </td>
<td style="text-align:left;"> Doug Acomb </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1969–1970 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;"> 0 </td>
<td style="text-align:right;">  </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 10 </td>
<td style="text-align:left;"> Keith Acton </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1979–1994 </td>
<td style="text-align:right;"> 1,690 </td>
<td style="text-align:right;"> 226 </td>
<td style="text-align:right;"> 13.37% </td>
</tr>
</tbody>
</table>
</div>
<p>To get an idea of common values for raw career <i>shooting percentage</i>, we make a density plot and separate between forwards and defencemen.</p>
<p><script src="https://gist.github.com/ilarischeinin/7ef934c00033d3fc0803e27324eb110a.js?file=04-chunk.R"></script></p>
<p><a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2017/04/sharp-shooter-1.png"><img loading="lazy" src="https://cdn.ilari.scheinin.fi/wp-content/uploads/2017/04/sharp-shooter-1-1024x722.png" alt="" width="800" height="564" class="aligncenter size-large wp-image-725" /></a></p>
<p>From the plot we can see that:<br />
1. Defencemen tend to have lower <i>shooting percentages</i> than forwards (averages of about 4.5% and 11%).<br />
2. There are big peaks at zero, which represent players who have never scored a goal (and who might also have a very small number of <i>shots on goal</i>).<br />
3. There are also small peaks at 100%, 50%, 33%, etc, which represent players who have a small number of <i>shots on goal</i>, but who got lucky and scored.</p>
<p>Points 2. and 3. above are the first reason why we use the modeling approach. It will give us with a way to assess players for whom we have little data. The second reason is the fact that the gameplay in the NHL has changed over the years. Nowadays the game is faster, and players have less time and less space to maneuver with the puck and to score. Also, a lot more attention is paid to <a href="https://www.theatlantic.com/magazine/archive/2014/03/the-puck-stops-here/357579/">goaltending</a>, and goalies receive more training and coaching on their technique than in the old days. To assess this change over time, we can plot the overall league <i>shooting percentages</i> per season.</p>
<p><script src="https://gist.github.com/ilarischeinin/7ef934c00033d3fc0803e27324eb110a.js?file=05-chunk.R"></script></p>
<p><a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2017/04/sharp-shooter-2.png"><img loading="lazy" src="https://cdn.ilari.scheinin.fi/wp-content/uploads/2017/04/sharp-shooter-2-1024x722.png" alt="" width="800" height="564" class="aligncenter size-large wp-image-726" /></a></p>
<p>From the plot we can see that the average <i>shooting percentage</i> has indeed changed over time, and was the highest in the 1980s.</p>
<h2 id="model">Model</h2>
<p>For every <i>shot on goal</i>, there are two possible outcomes: a goal or no goal. When we count the number of <i>goals scored</i> (successes) from some number of <i>shots on goal</i> (trials), in statistics this is represented with the <a href="https://en.wikipedia.org/wiki/Binomial_distribution">binomial distribution</a>. In addition to the number of trials, the number of successes depend on the probability of success for each trial, which here represents the player&#8217;s shooting ability, or <i>skill</i>. This probability is assumed to be the same for each trial, which is naturally not really true here. In reality, the probability varies from play to play, and is affected by factors such as distance, proximity of other players, positioning of the goalie, and so forth. But here we make an oversimplification and assume that each player has a constant probability of success that depends on their <i>skill</i>. For any given player on any given season, the number of <i>goals scored</i> is therefore distributed as:</p>
<p style="text-align: center;">goals scored ~ binomial(shots on goal, skill of player)</p>
<p>Another oversimplification we are going to make is that we assume players&#8217; </i>skills</i> to stay constant not only within a season, but all through their careers. Again, in real life young players develop and get better, and before older players retire, their performance generally shows some decline. But here we are interested in ranking the best shooters, and want to be able to compare players across time, for example current players to those who played in the 1980s. Therefore, we will define the probability of success as the player&#8217;s <i>skill</i> minus the <i>difficulty</i> of the season.</p>
<p style="text-align: center;">goals scored ~ binomial(shots on goal, skill of player &#8211; difficulty of season)</p>
<p>The seasonal <i>difficulty</i> represents the combined effect of all other factors besides the player&#8217;s <i>skill</i>, such as goaltending, overall gameplay, and so on. We are able to combine these effects, because most players&#8217; careers have spanned multiple seasons. This allows us to fit a model that finds an innate <i>skill</i> for each player (that stays constant throughout their career) and separately captures an estimate for seasonal <i>difficulty</i>.</p>
<p>We fit this model with Stan.</p>
<p><script src="https://gist.github.com/ilarischeinin/7ef934c00033d3fc0803e27324eb110a.js?file=06-model.stan"></script></p>
<p><script src="https://gist.github.com/ilarischeinin/7ef934c00033d3fc0803e27324eb110a.js?file=07-chunk.R"></script></p>
<h2 id="evaluation">Evaluation</h2>
<p>Now that we have fitted our model, we would like to evaluate if it makes sense. One way to approach this is with a simulation. We can use the model (players&#8217; <i>skills</i>, and seasons&#8217; <i>difficulties</i>) and the historical number of <i>shots on goal</i> to generated a simulated set of <i>goals scored</i>. Then we can visualize the actual and simulated numbers, and see if they behave similarly. Let&#8217;s start with the same density plot we used above to evaluate typical raw career <i>shooting percentages</i> for forwards and defencemen. Actual results are shown with a solid line, and the simulation results with a dashed line. Ideally, they should be close to each other.</p>
<p><script src="https://gist.github.com/ilarischeinin/7ef934c00033d3fc0803e27324eb110a.js?file=08-chunk.R"></script></p>
<p><a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2017/04/sharp-shooter-3.png"><img loading="lazy" src="https://cdn.ilari.scheinin.fi/wp-content/uploads/2017/04/sharp-shooter-3-1024x722.png" alt="" width="800" height="564" class="aligncenter size-large wp-image-727" /></a></p>
<p>We will also re-create the plot we used above for overall league <i>shooting percentages</i> per season. Again, ideally the actual and simulated data points should be close to each other.</p>
<p><script src="https://gist.github.com/ilarischeinin/7ef934c00033d3fc0803e27324eb110a.js?file=09-chunk.R"></script></p>
<p><a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2017/04/sharp-shooter-4.png"><img loading="lazy" src="https://cdn.ilari.scheinin.fi/wp-content/uploads/2017/04/sharp-shooter-4-1024x722.png" alt="" width="800" height="564" class="aligncenter size-large wp-image-728" /></a></p>
<p>And finally, we will make a scatter plot of actual and simulated goals scored per player per season. Ideally, the cloud of points should be symmetric around the diagonal line.</p>
<p><script src="https://gist.github.com/ilarischeinin/7ef934c00033d3fc0803e27324eb110a.js?file=10-chunk.R"></script></p>
<p><a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2017/04/sharp-shooter-5.png"><img loading="lazy" src="https://cdn.ilari.scheinin.fi/wp-content/uploads/2017/04/sharp-shooter-5-1024x722.png" alt="" width="800" height="564" class="aligncenter size-large wp-image-729" /></a></p>
<p>In the density plot, the fit looks good for defensemen, but for forwards the model seems to slightly overestimate the number of forwards with an average raw <i>shooting percentage</i> (around 11%), and underestimate the number of forwards with low raw <i>shooting percentages</i>. Otherwise the fits seem to be reasonable close. So, let&#8217;s look at the results.</p>
<h2 id="results">Results</h2>
<p>When we fit the model, two things happen to the original raw career <i>shooting percentages</i>. First, they are shrunk towards the averages defined separately for forwards or defencemen. Second, they are adjusted for the seasonal <i>difficulty</i>. For players with careers during the &#8220;easier&#8221; seasons, such as in the 1980s, this will reduce their estimated <i>skill</i>. And for players who played during seasons with a higher estimated <i>difficulty</i>, their <i>skill</i> values will see an increase.</p>
<p>The resulting values, players&#8217; <i>skills</i>, are visualized here together with their raw career <i>shooting percentages</i>.</p>
<p><script src="https://gist.github.com/ilarischeinin/7ef934c00033d3fc0803e27324eb110a.js?file=11-chunk.R"></script></p>
<p><a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2017/04/sharp-shooter-6.png"><img loading="lazy" src="https://cdn.ilari.scheinin.fi/wp-content/uploads/2017/04/sharp-shooter-6-1024x722.png" alt="" width="800" height="564" class="aligncenter size-large wp-image-730" /></a></p>
<p>We can see that among the raw <i>shooting percentages</i> on the x-axis, there are extreme values, such as 0%, 100%, 50%, etc. But on the modeled <i>skill</i> on the y-axis, they have been shrunk to all be between about 2% and 20%. Also visible are the two separate clusters for forwards (around 11%–12%), and defencemen (4%–5%). The shades of blue show how the same raw career <i>shooting percentage</i> results in a higher estimated <i>skill</i> for more recent players compared to the 1970s and 80s.</p>
<p>Finally, below are tables of the top 10 forwards and defencemen, ranked according to their modeled shooting <i>skills</i>.</p>
<p><script src="https://gist.github.com/ilarischeinin/7ef934c00033d3fc0803e27324eb110a.js?file=12-chunk.R"></script></p>
<div style="overflow: scroll;">
<table style="font-size: smaller;">
<thead>
<tr class="header">
<th style="text-align:left;">   </th>
<th style="text-align:left;"> player </th>
<th style="text-align:left;"> position </th>
<th style="text-align:left;"> career </th>
<th style="text-align:right;"> shots </th>
<th style="text-align:right;"> goals </th>
<th style="text-align:right;"> raw </th>
<th style="text-align:right;"> model </th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align:left;"> 1 </td>
<td style="text-align:left;"> Alex Tanguay </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1999–2016 </td>
<td style="text-align:right;"> 1,525 </td>
<td style="text-align:right;"> 283 </td>
<td style="text-align:right;"> 18.56% </td>
<td style="text-align:right;"> 19.96% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 2 </td>
<td style="text-align:left;"> Craig Simpson </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1985–1995 </td>
<td style="text-align:right;"> 1,044 </td>
<td style="text-align:right;"> 247 </td>
<td style="text-align:right;"> 23.66% </td>
<td style="text-align:right;"> 19.91% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 3 </td>
<td style="text-align:left;"> Steven Stamkos </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 2008– </td>
<td style="text-align:right;"> 1,876 </td>
<td style="text-align:right;"> 321 </td>
<td style="text-align:right;"> 17.11% </td>
<td style="text-align:right;"> 19.32% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 4 </td>
<td style="text-align:left;"> Andrew Brunette </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1995–2012 </td>
<td style="text-align:right;"> 1,516 </td>
<td style="text-align:right;"> 268 </td>
<td style="text-align:right;"> 17.68% </td>
<td style="text-align:right;"> 18.92% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 5 </td>
<td style="text-align:left;"> Sergei Makarov </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1989–1997 </td>
<td style="text-align:right;"> 610 </td>
<td style="text-align:right;"> 134 </td>
<td style="text-align:right;"> 21.97% </td>
<td style="text-align:right;"> 18.66% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 6 </td>
<td style="text-align:left;"> John Bucyk </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1967–1978 </td>
<td style="text-align:right;"> 1,723 </td>
<td style="text-align:right;"> 329 </td>
<td style="text-align:right;"> 19.09% </td>
<td style="text-align:right;"> 18.39% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 7 </td>
<td style="text-align:left;"> Mark Parrish </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1998–2011 </td>
<td style="text-align:right;"> 1,247 </td>
<td style="text-align:right;"> 216 </td>
<td style="text-align:right;"> 17.32% </td>
<td style="text-align:right;"> 18.17% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 8 </td>
<td style="text-align:left;"> Charlie Simmer </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1974–1988 </td>
<td style="text-align:right;"> 1,531 </td>
<td style="text-align:right;"> 342 </td>
<td style="text-align:right;"> 22.34% </td>
<td style="text-align:right;"> 18.14% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 9 </td>
<td style="text-align:left;"> Gary Roberts </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1986–2009 </td>
<td style="text-align:right;"> 2,374 </td>
<td style="text-align:right;"> 438 </td>
<td style="text-align:right;"> 18.45% </td>
<td style="text-align:right;"> 17.92% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 10 </td>
<td style="text-align:left;"> Ray Ferraro </td>
<td style="text-align:left;"> forward </td>
<td style="text-align:left;"> 1984–2002 </td>
<td style="text-align:right;"> 2,164 </td>
<td style="text-align:right;"> 408 </td>
<td style="text-align:right;"> 18.85% </td>
<td style="text-align:right;"> 17.68% </td>
</tr>
</tbody>
</table>
</div>
<div style="overflow: scroll;">
<table style="font-size: smaller;">
<thead>
<tr class="header">
<th style="text-align:left;">   </th>
<th style="text-align:left;"> player </th>
<th style="text-align:left;"> position </th>
<th style="text-align:left;"> career </th>
<th style="text-align:right;"> shots </th>
<th style="text-align:right;"> goals </th>
<th style="text-align:right;"> raw </th>
<th style="text-align:right;"> model </th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align:left;"> 1 </td>
<td style="text-align:left;"> Sandis Ozolinsh </td>
<td style="text-align:left;"> defenceman </td>
<td style="text-align:left;"> 1992–2008 </td>
<td style="text-align:right;"> 1,771 </td>
<td style="text-align:right;"> 167 </td>
<td style="text-align:right;"> 9.43% </td>
<td style="text-align:right;"> 9.55% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 2 </td>
<td style="text-align:left;"> Shea Weber </td>
<td style="text-align:left;"> defenceman </td>
<td style="text-align:left;"> 2005– </td>
<td style="text-align:right;"> 2,235 </td>
<td style="text-align:right;"> 183 </td>
<td style="text-align:right;"> 8.19% </td>
<td style="text-align:right;"> 9.24% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 3 </td>
<td style="text-align:left;"> Mike Green </td>
<td style="text-align:left;"> defenceman </td>
<td style="text-align:left;"> 2005– </td>
<td style="text-align:right;"> 1,618 </td>
<td style="text-align:right;"> 134 </td>
<td style="text-align:right;"> 8.28% </td>
<td style="text-align:right;"> 9.15% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 4 </td>
<td style="text-align:left;"> Lubomir Visnovsky </td>
<td style="text-align:left;"> defenceman </td>
<td style="text-align:left;"> 2000–2015 </td>
<td style="text-align:right;"> 1,532 </td>
<td style="text-align:right;"> 128 </td>
<td style="text-align:right;"> 8.36% </td>
<td style="text-align:right;"> 9.00% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 5 </td>
<td style="text-align:left;"> Bobby Orr </td>
<td style="text-align:left;"> defenceman </td>
<td style="text-align:left;"> 1967–1979 </td>
<td style="text-align:right;"> 2,795 </td>
<td style="text-align:right;"> 257 </td>
<td style="text-align:right;"> 9.19% </td>
<td style="text-align:right;"> 8.91% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 6 </td>
<td style="text-align:left;"> Marc-Andre Bergeron </td>
<td style="text-align:left;"> defenceman </td>
<td style="text-align:left;"> 2002–2013 </td>
<td style="text-align:right;"> 951 </td>
<td style="text-align:right;"> 82 </td>
<td style="text-align:right;"> 8.62% </td>
<td style="text-align:right;"> 8.88% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 7 </td>
<td style="text-align:left;"> Oliver Ekman-Larsson </td>
<td style="text-align:left;"> defenceman </td>
<td style="text-align:left;"> 2010– </td>
<td style="text-align:right;"> 1,134 </td>
<td style="text-align:right;"> 88 </td>
<td style="text-align:right;"> 7.76% </td>
<td style="text-align:right;"> 8.53% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 8 </td>
<td style="text-align:left;"> Nick Holden </td>
<td style="text-align:left;"> defenceman </td>
<td style="text-align:left;"> 2010– </td>
<td style="text-align:right;"> 350 </td>
<td style="text-align:right;"> 32 </td>
<td style="text-align:right;"> 9.14% </td>
<td style="text-align:right;"> 8.45% </td>
</tr>
<tr class="odd">
<td style="text-align:left;"> 9 </td>
<td style="text-align:left;"> Tyler Myers </td>
<td style="text-align:left;"> defenceman </td>
<td style="text-align:left;"> 2009– </td>
<td style="text-align:right;"> 737 </td>
<td style="text-align:right;"> 59 </td>
<td style="text-align:right;"> 8.01% </td>
<td style="text-align:right;"> 8.45% </td>
</tr>
<tr class="even">
<td style="text-align:left;"> 10 </td>
<td style="text-align:left;"> Mark Giordano </td>
<td style="text-align:left;"> defenceman </td>
<td style="text-align:left;"> 2005– </td>
<td style="text-align:right;"> 1,295 </td>
<td style="text-align:right;"> 99 </td>
<td style="text-align:right;"> 7.64% </td>
<td style="text-align:right;"> 8.44% </td>
</tr>
</tbody>
</table>
</div>
<p>Craig Simpson holds the <a href="http://www.hockey-reference.com/leaders/shot_pct_career.html">official record</a> for best career <i>shooting percentage</i>, which only counts players with at least 800 <i>shots on goal</i>, with 23.66%. But here he has lost the number one spot to Alex Tanguay, who originally ranked 22nd. Their modeled <i>shooting skills</i> are 19.91% vs. 19.96%. This is is due to the fact that Simpson&#8217;s career was in 1985–1995, which according to the model was a less <i>difficult</i> era for goal scoring than Tanguay&#8217;s 1999–2016.</p>
<p>There are many such differences between the official career <i>shooting percentage</i> ranking and our modeled one. They can be explored from the <a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2017/04/sharp-shooter.html" target="_blank">full table</a> with 5,574 players. It is by no means the <i>right</i> ranking; it is simply a plausible one given the model and assumptions described above. But compared to the official ranking for career <i>shooting percentage</i>, it does have two benefits. First, it does not omit players with less than 800 career <i>shots on goal</i>. And second, it provides one way to gauge changes in gameplay, and thus facilitate comparisons between players whose careers do not overlap. So, while the model&#8217;s assumptions are not exactly realistic (an innate <i>shooting skill</i> that stays constant for the duration of a player&#8217;s career), the results can be a useful complement to the official career <i>shooting percentage</i> statistics in some situations.</p>
<div class="footnotes"><hr /><ol><li id="footnote-1-777" class="footnote"><p>Centers and left/right wingers are all counted simply as &#8220;forwards&#8221;. This is because multiple players have played both as centers and wingers. The small number who have played both as forwards and defencemen were excluded from the analysis.<a href="#note-1-777" class="footnote-return">&#8617;</a></p></li><!--/#footnote-1.footnote--></ol></div><!--/#footnotes-->]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How good is Patrik Laine&#8217;s shot?</title>
		<link>https://ilari.scheinin.fi/how-good-is-patrik-laines-shot/</link>
		
		<dc:creator><![CDATA[Ilari Scheinin]]></dc:creator>
		<pubDate>Wed, 16 Nov 2016 20:08:59 +0000</pubDate>
				<category><![CDATA[R]]></category>
		<category><![CDATA[sports]]></category>
		<guid isPermaLink="false">http://ilari.scheinin.fi/?p=612</guid>

					<description><![CDATA[Patrik Laine is playing his first season in the NHL and currently leads the league in scoring with 12 goals. With 51 shots on goal, his shooting percentage is 23.5 %. How does this number compare to great goal scores over &#8230; <a href="https://ilari.scheinin.fi/how-good-is-patrik-laines-shot/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
										<content:encoded><![CDATA[<p>Patrik Laine is playing his first season in the NHL and currently leads the league in scoring with 12 goals. With 51 shots on goal, his shooting percentage is 23.5 %. How does this number compare to great goal scores over the years? I downloaded NHL player statistics for each season from 1967–1968 onwards, which was the first year the number of shots was recorded. I then calculated career summaries for each player. But if we simply look for players with the highest shooting percentages, the first 12 all scored one goal with just one shot. Obviously these are not the best shooters, just some random flukes. In its official leaderboard for all-time career shooting percentage (S%), the NHL only counts players with at least 800 shots. This is what the top 10 looks like:</p>
<div style="overflow: scroll;">
<table style="font-size: smaller;">
<thead>
<tr class="header">
<th></th>
<th align="left">Name</th>
<th align="left">Pos</th>
<th align="right">GP</th>
<th align="right">G</th>
<th align="right">A</th>
<th align="right">PTS</th>
<th align="right">S</th>
<th align="right">S%</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td align="left">Craig Simpson</td>
<td align="left">LW</td>
<td align="right">634</td>
<td align="right">247</td>
<td align="right">250</td>
<td align="right">497</td>
<td align="right">1,044</td>
<td align="right">23.7</td>
</tr>
<tr class="even">
<td>2</td>
<td align="left">Charlie Simmer</td>
<td align="left">LW</td>
<td align="right">712</td>
<td align="right">342</td>
<td align="right">369</td>
<td align="right">711</td>
<td align="right">1,531</td>
<td align="right">22.3</td>
</tr>
<tr class="odd">
<td>3</td>
<td align="left">Paul MacLean</td>
<td align="left">RW</td>
<td align="right">719</td>
<td align="right">324</td>
<td align="right">349</td>
<td align="right">673</td>
<td align="right">1,513</td>
<td align="right">21.4</td>
</tr>
<tr class="even">
<td>4</td>
<td align="left">Mike Bossy</td>
<td align="left">RW</td>
<td align="right">752</td>
<td align="right">573</td>
<td align="right">553</td>
<td align="right">1,126</td>
<td align="right">2,705</td>
<td align="right">21.2</td>
</tr>
<tr class="odd">
<td>5</td>
<td align="left">Yvon Lambert</td>
<td align="left">LW</td>
<td align="right">683</td>
<td align="right">206</td>
<td align="right">273</td>
<td align="right">479</td>
<td align="right">1,038</td>
<td align="right">19.8</td>
</tr>
<tr class="even">
<td>6</td>
<td align="left">Rick Middleton</td>
<td align="left">RW</td>
<td align="right">1,005</td>
<td align="right">448</td>
<td align="right">540</td>
<td align="right">988</td>
<td align="right">2,275</td>
<td align="right">19.7</td>
</tr>
<tr class="odd">
<td>7</td>
<td align="left">Blaine Stoughton</td>
<td align="left">RW</td>
<td align="right">526</td>
<td align="right">258</td>
<td align="right">191</td>
<td align="right">449</td>
<td align="right">1,322</td>
<td align="right">19.5</td>
</tr>
<tr class="even">
<td>8</td>
<td align="left">Darryl Sutter</td>
<td align="left">LW</td>
<td align="right">406</td>
<td align="right">161</td>
<td align="right">118</td>
<td align="right">279</td>
<td align="right">829</td>
<td align="right">19.4</td>
</tr>
<tr class="odd">
<td>9</td>
<td align="left">Rob Brown</td>
<td align="left">RW</td>
<td align="right">543</td>
<td align="right">190</td>
<td align="right">248</td>
<td align="right">438</td>
<td align="right">979</td>
<td align="right">19.4</td>
</tr>
<tr class="even">
<td>10</td>
<td align="left">Mike Ridley</td>
<td align="left">C</td>
<td align="right">866</td>
<td align="right">292</td>
<td align="right">466</td>
<td align="right">758</td>
<td align="right">1,513</td>
<td align="right">19.3</td>
</tr>
</tbody>
</table>
</div>
<p>Requiring a minimum number of shots (or goals) does get rid of the flukes, but how can you compare a rookie player? What kind of method could be used to take the scarcity of evidence into account, until the player catches up with the threshold? <a href="https://twitter.com/drob">David Robinson</a> has written a terrific series of articles for situations like this, using baseball statistics as an example. I’ll follow one of his tutorials and use <a href="http://varianceexplained.org/r/empirical_bayes_baseball/">empirical Bayes estimation</a> to obtain a more reliable picture. In short, we’ll first use all players’ data to obtain an estimate for a beta prior, and then use each player’s own data to update the prior based on individual evidence. Put another way, we start by assuming everyone is average, and if and only if they show more and more evidence to the contrary, we start to gradually consider them as special. For a more much better description, please see the original blog post. All R code is also adapted from that post.</p>
<p>Before we get to estimation of the beta prior, let’s first check if we should use all of the available data or only a subset. Since in this case we are estimating only one prior, we would like all players to come from a single distribution. As the gameplay has surely changed a bit over the years, let’s look at the overall shooting percentages over the 49 seasons. Also, since defensemen normally play futher away from the opponent’s net than forwards, player position is likely to have an effect as well. Let’s look at shooting percentages separately for each position (excluding players with less than ten goals).</p>
<p><a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-3-1.png"><img loading="lazy" src="https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-3-1-1024x731.png" alt="unnamed-chunk-3-1" width="800" height="571" class="aligncenter size-large wp-image-618" srcset="https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-3-1-1024x731.png 1024w, https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-3-1-300x214.png 300w, https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-3-1-768x549.png 768w, https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-3-1.png 1344w" sizes="(max-width: 800px) 100vw, 800px" /></a></p>
<p><a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-4-1.png"><img loading="lazy" src="https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-4-1-1024x731.png" alt="unnamed-chunk-4-1" width="800" height="571" class="aligncenter size-large wp-image-619" srcset="https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-4-1-1024x731.png 1024w, https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-4-1-300x214.png 300w, https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-4-1-768x549.png 768w, https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-4-1.png 1344w" sizes="(max-width: 800px) 100vw, 800px" /></a></p>
<p>As we can see, shooting percentages used to be much higher around the 1980s. For this simple analysis, I’ll only include data from season 1996–1997 onwards. I’ll also leave out the defensemen, as they tend to have lower shooting percentages. (I hope to write follow-ups posts later with all of the data included and handled properly, either using some of the other approaches David has described for empirical Bayes, with a standard Bayesian analysis, or maybe even both.)</p>
<p>Overall, the average shooting percentage for all forwards over the last 20 seasons is 11.0 %. Next, let’s estimate a beta prior from the data and see how it fits:</p>
<p><a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-8-1.png"><img loading="lazy" src="https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-8-1-1024x731.png" alt="unnamed-chunk-8-1" width="800" height="571" class="aligncenter size-large wp-image-620" srcset="https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-8-1-1024x731.png 1024w, https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-8-1-300x214.png 300w, https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-8-1-768x549.png 768w, https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-8-1.png 1344w" sizes="(max-width: 800px) 100vw, 800px" /></a></p>
<p>Shooting percentages can now be adjusted using this prior. This will shrink individual players’ estimates towards the horizontal dashed line. The more evidence there is for an individual (the brighter the blue dot), the more we trust it. The darker dots show a lot of shrinkage, whereas the light ones are much closer to the diagonal red line, which marks the case of no shrinkage at all.</p>
<p><a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-10-1.png"><img loading="lazy" src="https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-10-1-1024x731.png" alt="unnamed-chunk-10-1" width="800" height="571" class="aligncenter size-large wp-image-621" srcset="https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-10-1-1024x731.png 1024w, https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-10-1-300x214.png 300w, https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-10-1-768x549.png 768w, https://cdn.ilari.scheinin.fi/wp-content/uploads/2016/11/unnamed-chunk-10-1.png 1344w" sizes="(max-width: 800px) 100vw, 800px" /></a></p>
<p>Finally, let’s look at the ranking (from season 1996–1997 onwards) for shooting percentage estimated with empirical Bayes (EB). Patrik Laine currently sits at number 40, and only time will tell where he moves on that list. But what we do know today, is that he is one of only four 18-year-olds to score two hat tricks in the NHL (others being Jack Hamilton, Dale Hawerchuk, and Trevor Linden), and he still has the rest of the regular season to hunt for a third one before his 19th birthday on April 19th, 2017.</p>
<div style="overflow: scroll;">
<table style="font-size: smaller;">
<thead>
<tr class="header">
<th></th>
<th align="left">Name</th>
<th align="left">Pos</th>
<th align="right">GP</th>
<th align="right">G</th>
<th align="right">A</th>
<th align="right">PTS</th>
<th align="right">S</th>
<th align="right">S%</th>
<th align="right">EB</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td align="left">Alex Tanguay</td>
<td align="left">LW</td>
<td align="right">1,088</td>
<td align="right">283</td>
<td align="right">580</td>
<td align="right">863</td>
<td align="right">1,525</td>
<td align="right">18.6</td>
<td align="right">17.9</td>
</tr>
<tr class="even">
<td>2</td>
<td align="left">Andrew Brunette</td>
<td align="left">LW</td>
<td align="right">1,099</td>
<td align="right">265</td>
<td align="right">462</td>
<td align="right">727</td>
<td align="right">1,500</td>
<td align="right">17.7</td>
<td align="right">17.1</td>
</tr>
<tr class="odd">
<td>3</td>
<td align="left">Steven Stamkos</td>
<td align="left">C</td>
<td align="right">586</td>
<td align="right">321</td>
<td align="right">261</td>
<td align="right">582</td>
<td align="right">1,876</td>
<td align="right">17.1</td>
<td align="right">16.7</td>
</tr>
<tr class="even">
<td>4</td>
<td align="left">Mark Parrish</td>
<td align="left">RW</td>
<td align="right">722</td>
<td align="right">216</td>
<td align="right">171</td>
<td align="right">387</td>
<td align="right">1,247</td>
<td align="right">17.3</td>
<td align="right">16.7</td>
</tr>
<tr class="odd">
<td>5</td>
<td align="left">Dmitri Khristich</td>
<td align="left">C</td>
<td align="right">420</td>
<td align="right">111</td>
<td align="right">171</td>
<td align="right">282</td>
<td align="right">633</td>
<td align="right">17.5</td>
<td align="right">16.3</td>
</tr>
<tr class="even">
<td>6</td>
<td align="left">Mike Ridley</td>
<td align="left">C</td>
<td align="right">75</td>
<td align="right">20</td>
<td align="right">32</td>
<td align="right">52</td>
<td align="right">79</td>
<td align="right">25.3</td>
<td align="right">16.1</td>
</tr>
<tr class="odd">
<td>7</td>
<td align="left">Tomas Holmstrom</td>
<td align="left">LW</td>
<td align="right">1,026</td>
<td align="right">243</td>
<td align="right">287</td>
<td align="right">530</td>
<td align="right">1,489</td>
<td align="right">16.3</td>
<td align="right">15.9</td>
</tr>
<tr class="even">
<td>8</td>
<td align="left">Gary Roberts</td>
<td align="left">LW</td>
<td align="right">639</td>
<td align="right">181</td>
<td align="right">224</td>
<td align="right">405</td>
<td align="right">1,120</td>
<td align="right">16.2</td>
<td align="right">15.6</td>
</tr>
<tr class="odd">
<td>9</td>
<td align="left">Brenden Morrow</td>
<td align="left">LW</td>
<td align="right">991</td>
<td align="right">265</td>
<td align="right">310</td>
<td align="right">575</td>
<td align="right">1,670</td>
<td align="right">15.9</td>
<td align="right">15.5</td>
</tr>
<tr class="even">
<td>10</td>
<td align="left">Jan Hrdina</td>
<td align="left">C</td>
<td align="right">513</td>
<td align="right">101</td>
<td align="right">196</td>
<td align="right">297</td>
<td align="right">619</td>
<td align="right">16.3</td>
<td align="right">15.3</td>
</tr>
<tr class="odd">
<td>11</td>
<td align="left">Jason Allison</td>
<td align="left">C</td>
<td align="right">519</td>
<td align="right">152</td>
<td align="right">326</td>
<td align="right">478</td>
<td align="right">962</td>
<td align="right">15.8</td>
<td align="right">15.2</td>
</tr>
<tr class="even">
<td>12</td>
<td align="left">Ziggy Palffy</td>
<td align="left">RW</td>
<td align="right">565</td>
<td align="right">276</td>
<td align="right">333</td>
<td align="right">609</td>
<td align="right">1,799</td>
<td align="right">15.3</td>
<td align="right">15.0</td>
</tr>
<tr class="odd">
<td>13</td>
<td align="left">John LeClair</td>
<td align="left">LW</td>
<td align="right">624</td>
<td align="right">281</td>
<td align="right">274</td>
<td align="right">555</td>
<td align="right">1,833</td>
<td align="right">15.3</td>
<td align="right">15.0</td>
</tr>
<tr class="even">
<td>14</td>
<td align="left">Alexander Mogilny</td>
<td align="left">RW</td>
<td align="right">530</td>
<td align="right">207</td>
<td align="right">274</td>
<td align="right">481</td>
<td align="right">1,341</td>
<td align="right">15.4</td>
<td align="right">15.0</td>
</tr>
<tr class="odd">
<td>15</td>
<td align="left">Pierre Turgeon</td>
<td align="left">C</td>
<td align="right">622</td>
<td align="right">197</td>
<td align="right">351</td>
<td align="right">548</td>
<td align="right">1,285</td>
<td align="right">15.3</td>
<td align="right">14.9</td>
</tr>
<tr class="even">
<td>16</td>
<td align="left">Tyler Bozak</td>
<td align="left">C</td>
<td align="right">451</td>
<td align="right">112</td>
<td align="right">169</td>
<td align="right">281</td>
<td align="right">717</td>
<td align="right">15.6</td>
<td align="right">14.8</td>
</tr>
<tr class="odd">
<td>17</td>
<td align="left">Sergei Kostitsyn</td>
<td align="left">LW</td>
<td align="right">353</td>
<td align="right">67</td>
<td align="right">109</td>
<td align="right">176</td>
<td align="right">414</td>
<td align="right">16.2</td>
<td align="right">14.8</td>
</tr>
<tr class="even">
<td>18</td>
<td align="left">Joe Nieuwendyk</td>
<td align="left">C</td>
<td align="right">628</td>
<td align="right">236</td>
<td align="right">242</td>
<td align="right">478</td>
<td align="right">1,555</td>
<td align="right">15.2</td>
<td align="right">14.8</td>
</tr>
<tr class="odd">
<td>19</td>
<td align="left">Anson Carter</td>
<td align="left">RW</td>
<td align="right">674</td>
<td align="right">202</td>
<td align="right">219</td>
<td align="right">421</td>
<td align="right">1,331</td>
<td align="right">15.2</td>
<td align="right">14.8</td>
</tr>
<tr class="even">
<td>20</td>
<td align="left">Tony Hrkac</td>
<td align="left">C</td>
<td align="right">425</td>
<td align="right">70</td>
<td align="right">105</td>
<td align="right">175</td>
<td align="right">438</td>
<td align="right">16.0</td>
<td align="right">14.7</td>
</tr>
<tr class="odd">
<td>21</td>
<td align="left">Mark Messier</td>
<td align="left">C</td>
<td align="right">555</td>
<td align="right">155</td>
<td align="right">264</td>
<td align="right">419</td>
<td align="right">1,015</td>
<td align="right">15.3</td>
<td align="right">14.7</td>
</tr>
<tr class="even">
<td>22</td>
<td align="left">Yanic Perreault</td>
<td align="left">C</td>
<td align="right">742</td>
<td align="right">217</td>
<td align="right">237</td>
<td align="right">454</td>
<td align="right">1,436</td>
<td align="right">15.1</td>
<td align="right">14.7</td>
</tr>
<tr class="odd">
<td>23</td>
<td align="left">Adam Deadmarsh</td>
<td align="left">RW</td>
<td align="right">441</td>
<td align="right">154</td>
<td align="right">154</td>
<td align="right">308</td>
<td align="right">1,010</td>
<td align="right">15.2</td>
<td align="right">14.7</td>
</tr>
<tr class="even">
<td>24</td>
<td align="left">Adam Henrique</td>
<td align="left">C</td>
<td align="right">364</td>
<td align="right">101</td>
<td align="right">109</td>
<td align="right">210</td>
<td align="right">650</td>
<td align="right">15.5</td>
<td align="right">14.7</td>
</tr>
<tr class="odd">
<td>25</td>
<td align="left">Teemu Selanne</td>
<td align="left">RW</td>
<td align="right">1,192</td>
<td align="right">521</td>
<td align="right">594</td>
<td align="right">1,115</td>
<td align="right">3,528</td>
<td align="right">14.8</td>
<td align="right">14.6</td>
</tr>
<tr class="even">
<td>26</td>
<td align="left">Jonathan Toews</td>
<td align="left">C</td>
<td align="right">662</td>
<td align="right">255</td>
<td align="right">321</td>
<td align="right">576</td>
<td align="right">1,710</td>
<td align="right">14.9</td>
<td align="right">14.6</td>
</tr>
<tr class="odd">
<td>27</td>
<td align="left">Jiri Hudler</td>
<td align="left">C</td>
<td align="right">680</td>
<td align="right">161</td>
<td align="right">256</td>
<td align="right">417</td>
<td align="right">1,068</td>
<td align="right">15.1</td>
<td align="right">14.6</td>
</tr>
<tr class="even">
<td>28</td>
<td align="left">Paul Byron</td>
<td align="left">C</td>
<td align="right">217</td>
<td align="right">34</td>
<td align="right">43</td>
<td align="right">77</td>
<td align="right">198</td>
<td align="right">17.2</td>
<td align="right">14.5</td>
</tr>
<tr class="odd">
<td>29</td>
<td align="left">Brad Marchand</td>
<td align="left">C</td>
<td align="right">470</td>
<td align="right">158</td>
<td align="right">147</td>
<td align="right">305</td>
<td align="right">1,057</td>
<td align="right">14.9</td>
<td align="right">14.5</td>
</tr>
<tr class="even">
<td>30</td>
<td align="left">Sidney Crosby</td>
<td align="left">C</td>
<td align="right">716</td>
<td align="right">348</td>
<td align="right">603</td>
<td align="right">951</td>
<td align="right">2,376</td>
<td align="right">14.6</td>
<td align="right">14.4</td>
</tr>
<tr class="odd">
<td>31</td>
<td align="left">Mike Sillinger</td>
<td align="left">C</td>
<td align="right">831</td>
<td align="right">210</td>
<td align="right">234</td>
<td align="right">444</td>
<td align="right">1,420</td>
<td align="right">14.8</td>
<td align="right">14.4</td>
</tr>
<tr class="even">
<td>32</td>
<td align="left">Keith Tkachuk</td>
<td align="left">LW</td>
<td align="right">893</td>
<td align="right">394</td>
<td align="right">382</td>
<td align="right">776</td>
<td align="right">2,713</td>
<td align="right">14.5</td>
<td align="right">14.3</td>
</tr>
<tr class="odd">
<td>33</td>
<td align="left">Peter Forsberg</td>
<td align="left">C</td>
<td align="right">579</td>
<td align="right">204</td>
<td align="right">515</td>
<td align="right">719</td>
<td align="right">1,390</td>
<td align="right">14.7</td>
<td align="right">14.3</td>
</tr>
<tr class="even">
<td>34</td>
<td align="left">Milan Lucic</td>
<td align="left">LW</td>
<td align="right">664</td>
<td align="right">164</td>
<td align="right">242</td>
<td align="right">406</td>
<td align="right">1,111</td>
<td align="right">14.8</td>
<td align="right">14.3</td>
</tr>
<tr class="odd">
<td>35</td>
<td align="left">Dany Heatley</td>
<td align="left">RW</td>
<td align="right">869</td>
<td align="right">372</td>
<td align="right">419</td>
<td align="right">791</td>
<td align="right">2,565</td>
<td align="right">14.5</td>
<td align="right">14.3</td>
</tr>
<tr class="even">
<td>36</td>
<td align="left">David Desharnais</td>
<td align="left">C</td>
<td align="right">420</td>
<td align="right">78</td>
<td align="right">168</td>
<td align="right">246</td>
<td align="right">511</td>
<td align="right">15.3</td>
<td align="right">14.3</td>
</tr>
<tr class="odd">
<td>37</td>
<td align="left">Martin Straka</td>
<td align="left">LW</td>
<td align="right">714</td>
<td align="right">206</td>
<td align="right">370</td>
<td align="right">576</td>
<td align="right">1,408</td>
<td align="right">14.6</td>
<td align="right">14.3</td>
</tr>
<tr class="even">
<td>38</td>
<td align="left">Thomas Vanek</td>
<td align="left">LW</td>
<td align="right">824</td>
<td align="right">320</td>
<td align="right">337</td>
<td align="right">657</td>
<td align="right">2,213</td>
<td align="right">14.5</td>
<td align="right">14.2</td>
</tr>
<tr class="odd">
<td>39</td>
<td align="left">Stephane Matteau</td>
<td align="left">LW</td>
<td align="right">471</td>
<td align="right">75</td>
<td align="right">88</td>
<td align="right">163</td>
<td align="right">493</td>
<td align="right">15.2</td>
<td align="right">14.2</td>
</tr>
<tr class="even">
<td>40</td>
<td align="left">Patrik Laine</td>
<td align="left">RW</td>
<td align="right">18</td>
<td align="right">12</td>
<td align="right">5</td>
<td align="right">17</td>
<td align="right">51</td>
<td align="right">23.5</td>
<td align="right">14.2</td>
</tr>
</tbody>
</table>
</div>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Predicting the Stanley Cup Champion</title>
		<link>https://ilari.scheinin.fi/predicting-the-stanley-cup-champion/</link>
		
		<dc:creator><![CDATA[Ilari Scheinin]]></dc:creator>
		<pubDate>Wed, 08 Jul 2015 12:25:07 +0000</pubDate>
				<category><![CDATA[R]]></category>
		<category><![CDATA[sports]]></category>
		<guid isPermaLink="false">http://ilari.scheinin.fi/?p=392</guid>

					<description><![CDATA[When I was at the Recurse Center, I wanted to try the caret package for R. It provides a unified interface for training various types of classification and regression models, and parameter tuning through resampling. I needed a project to &#8230; <a href="https://ilari.scheinin.fi/predicting-the-stanley-cup-champion/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
										<content:encoded><![CDATA[<p>When I was at the <a href="https://www.recurse.com">Recurse Center</a>, I wanted to try the <a href="https://github.com/topepo/caret">caret</a> package for R. It provides a unified interface for training various types of classification and regression models, and parameter tuning through resampling. I needed a project to work on, and since I love hockey and the Stanley Cup playoffs were just starting, it was a natural choice.</p>
<p>The source code is all on <a href="https://github.com/ilarischeinin/stanley">GitHub</a>, and is split into four R Markdown documents: <a href="https://cdn.rawgit.com/ilarischeinin/stanley/0126954/scrape.html">scrape raw data</a>, <a href="https://cdn.rawgit.com/ilarischeinin/stanley/0126954/process.html">process data</a>, <a href="https://cdn.rawgit.com/ilarischeinin/stanley/0126954/model.html">train models</a>, and <a href="https://cdn.rawgit.com/ilarischeinin/stanley/0126954/predict.html">make predictions</a>. I&#8217;ll present a short summary here, and more details can be found behind those links. The repository also contains a Makefile to replicate the analysis. Random seeds are specified in the code to make it fully reproducible.</p>
<p>First, I used the <a href="https://github.com/acthomasca/nhlscrapr">nhlscrapr</a> package to scrape play-by-play data from <a href="http://www.nhl.com">NHL.com</a> starting from the 2002-2003 season. Then, I used <a href="https://github.com/hadley/dplyr">dplyr</a> to calculate some summary statistics. For each game, I calculated the following statistics for both the home and away teams:</p>
<ul>
<li>the proportion of goals scored, i.e. &#8220;goals scored / (goals scored + goals against)&#8221;</li>
<li>the proportion of shots</li>
<li>the proportion of faceoffs won</li>
<li>the proportion of penalties</li>
<li>power play, i.e. &#8220;power play goals scored / penalties for the other team&#8221;</li>
<li>
penalty kill, i.e. &#8220;power play goals against / penalties for own team&#8221;</li>
</ul>
<p>I&#8217;m sure many more useful predictor variables could be derived from the play-by-play data, which in turn would result in more accurate predictions. But since this was mainly an exercise to try out caret, these variables will suffice for now.</p>
<p>For each season, I then calculated the average performance of each team, separately for when they were playing at home and on the road. Here&#8217;s an example of away performance for six teams from the 2002-2003 season:</p>
<table style="font-size: smaller;">
<thead>
<tr class="header">
<th align="left">season</th>
<th align="left">team</th>
<th align="right">goals</th>
<th align="right">shots</th>
<th align="right">faceoffs</th>
<th align="right">penalties</th>
<th align="right">pp</th>
<th align="right">pk</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">20022003</td>
<td align="left">ANA</td>
<td align="right">0.480</td>
<td align="right">0.477</td>
<td align="right">0.544</td>
<td align="right">0.517</td>
<td align="right">0.159</td>
<td align="right">0.095</td>
</tr>
<tr class="even">
<td align="left">20022003</td>
<td align="left">ATL</td>
<td align="right">0.433</td>
<td align="right">0.434</td>
<td align="right">0.467</td>
<td align="right">0.504</td>
<td align="right">0.147</td>
<td align="right">0.139</td>
</tr>
<tr class="odd">
<td align="left">20022003</td>
<td align="left">BOS</td>
<td align="right">0.422</td>
<td align="right">0.503</td>
<td align="right">0.481</td>
<td align="right">0.555</td>
<td align="right">0.137</td>
<td align="right">0.103</td>
</tr>
<tr class="even">
<td align="left">20022003</td>
<td align="left">BUF</td>
<td align="right">0.416</td>
<td align="right">0.482</td>
<td align="right">0.493</td>
<td align="right">0.526</td>
<td align="right">0.118</td>
<td align="right">0.119</td>
</tr>
<tr class="odd">
<td align="left">20022003</td>
<td align="left">CAR</td>
<td align="right">0.384</td>
<td align="right">0.492</td>
<td align="right">0.512</td>
<td align="right">0.514</td>
<td align="right">0.105</td>
<td align="right">0.169</td>
</tr>
<tr class="even">
<td align="left">20022003</td>
<td align="left">CBJ</td>
<td align="right">0.376</td>
<td align="right">0.432</td>
<td align="right">0.472</td>
<td align="right">0.511</td>
<td align="right">0.110</td>
<td align="right">0.104</td>
</tr>
</tbody>
</table>
<p>Next, I took the outcomes of all playoff series from the past 11 seasons, and calculated two deltas to be used as explanatory variables. I calculated the difference between the home team&#8217;s home performance and the away team&#8217;s away performance, and also the home team&#8217;s away performance and the away team&#8217;s home performance. This was to capture how the two teams would perform at the two arenas for the series.</p>
<p>I then used caret to train five different types of statistical models on this training data. The methods I included were generalized linear model, linear discriminant analysis, neural network, random forest, and support vector machine with a linear kernel. For each, model parameters were tuned with 10-fold cross-validation, which was repeated 10 times. Parameter values with the best overall accuracy were used to fit the final model with all of the training data.</p>
<p>For my predictions, instead of picking just one of the five fitted models, I used all of them. For each playoff series, I used a majority vote from all five models to pick the winner. (That&#8217;s why I fitted an odd number of models.) The predictions are below, with the predicted winner in bold:</p>
<h3>Round 1</h3>
<ul>
<li>Pittsburg Penguins at <strong>New York Rangers</strong></li>
<li>Ottawa Senators at <strong>Montreal Canadians</strong></li>
<li>Detroit Red Wings at <strong>Tampa Bay Lightning</strong></li>
<li><strong>New York Islanders</strong> at Washington Capitals</li>
<li>Winnipeg Jets at <strong>Anaheim Ducks</strong></li>
<li>Minnesota Wild at <strong>St. Louis Blues</strong></li>
<li><strong>Chicago Blackhawks</strong> at Nashville Predators</li>
<li><strong>Calgary Flames</strong> at Vancouver Canucks</li>
</ul>
<h3>Round 2</h3>
<ul>
<li>New York Islanders at <strong>New York Rangers</strong></li>
<li>Tampa Bay Lightning at <strong>Montreal Canadians</strong></li>
<li>Calgary Flames at <strong>Anaheim Ducks</strong></li>
<li><strong>Chicago Blackhawks</strong> at St. Louis Blues</li>
</ul>
<h3>Round 3 &#8211; Conference Finals</h3>
<ul>
<li>Montreal Canadians at <strong>New York Rangers</strong></li>
<li><strong>Chicago Blackhawks</strong> at Anaheim Ducks</li>
</ul>
<h3>Round 4 &#8211; Stanley Cup Finals</h3>
<ul>
<li><strong>Chicago Blackhawks</strong> at New York Rangers</li>
</ul>
<p>My prediction for the 2015 Stanley Cup Champion was Chicago Blackhawks.</p>
<p><em>To be clear, this blog entry was posted after the playoffs were already over. The explanatory text in the R Markdown documents was also written during the playoffs. But the same prediction as presented above can be seen in <a href="https://github.com/ilarischeinin/stanley/blob/30c3838f629261796895ecd271d327fb7cabd543/predict.html">this GitHub commit</a> (and the same HTML document on <a href="https://cdn.rawgit.com/ilarischeinin/stanley/30c3838/predict.html">RawGit</a>) from April 23rd. This was not before the playoffs started (April 15th), but when the first round was 3-4 games in, depending on the series.</em></p>
<h2 id="validation">Validation Set</h2>
<p>And since the playoffs are in fact already over, it means that the natural validation set is also available. Chicago Blackhawks did end up winning the Cup, but how did I do otherwise? Below are the predictions again, now together with the real outcomes. And since an incorrect prediction in one round leads to wrong pairs on the subsequent rounds, I have added in the series that actually ended up happening. (I made a prediction for all possible games that could happen, but only presented the resulting bracket here.) These added ones are in italics.</p>
<p style="text-align: center;"><a href="https://cdn.ilari.scheinin.fi/wp-content/uploads/2015/07/bracket.png"><img loading="lazy" src="https://cdn.ilari.scheinin.fi/wp-content/uploads/2015/07/bracket.png" alt="bracket" width="973" height="675" class="alignnone size-full wp-image-432" /></a></p>
<h3>Round 1</h3>
<ul>
<li>Pittsburg Penguins at <strong>New York Rangers</strong> &#8211; correct</li>
<li>Ottawa Senators at <strong>Montreal Canadians</strong> &#8211; correct</li>
<li>Detroit Red Wings at <strong>Tampa Bay Lightning</strong> &#8211; correct</li>
<li><strong>New York Islanders</strong> at Washington Capitals &#8211; INCORRECT</li>
<li>Winnipeg Jets at <strong>Anaheim Ducks</strong> &#8211; correct</li>
<li>Minnesota Wild at <strong>St. Louis Blues</strong> &#8211; INCORRECT</li>
<li><strong>Chicago Blackhawks</strong> at Nashville Predators &#8211; correct</li>
<li><strong>Calgary Flames</strong> at Vancouver Canucks &#8211; correct</li>
</ul>
<h3>Round 2</h3>
<ul>
<li><em>Washington Capitals at <strong>New York Rangers</strong></em> &#8211; correct</li>
<li>Tampa Bay Lightning at <strong>Montreal Canadians</strong> &#8211; INCORRECT</li>
<li>Calgary Flames at <strong>Anaheim Ducks</strong> &#8211; correct</li>
<li><em>Minnesota Wild at <strong>Chicago Blackhawks</strong></em> &#8211; correct</li>
</ul>
<h3>Round 3 &#8211; Conference Finals</h3>
<ul>
<li><em>Tampa Bay Lightning at <strong>New York Rangers</strong></em> &#8211; INCORRECT</li>
<li><strong>Chicago Blackhawks</strong> at Anaheim Ducks &#8211; correct</li>
</ul>
<h3>Round 4 &#8211; Stanley Cup Finals</h3>
<ul>
<li><em><strong>Chicago Blackhawks</strong> at Tampa Bay Lightning</em> &#8211; correct</li>
</ul>
<p>Overall, my accuracy was 11 out of 15, which is 73%.</p>
<p>An obvious follow-up from here could be to look at each of the five different models (generalized linear model, linear discriminant analysis, neural network, random forest, and support vector machine with a linear kernel) and compare their accuracies against each other.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Fight Back</title>
		<link>https://ilari.scheinin.fi/fight-back/</link>
		
		<dc:creator><![CDATA[Ilari Scheinin]]></dc:creator>
		<pubDate>Mon, 04 Nov 2013 10:13:55 +0000</pubDate>
				<category><![CDATA[inspiration]]></category>
		<category><![CDATA[sports]]></category>
		<guid isPermaLink="false">http://ilari.scheinin.fi/?p=112</guid>

					<description><![CDATA[This guy was injured pretty badly in a skiing accident, but he&#8217;s fighting back.]]></description>
										<content:encoded><![CDATA[<p><iframe loading="lazy" src="//player.vimeo.com/video/78405837" width="500" height="281" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe></p>
<p>This guy was injured pretty badly in a skiing accident, but he&#8217;s fighting back.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>I broke in on the left wing</title>
		<link>https://ilari.scheinin.fi/i-broke-in-on-the-left-wing/</link>
		
		<dc:creator><![CDATA[Ilari Scheinin]]></dc:creator>
		<pubDate>Mon, 25 Jun 2012 20:18:24 +0000</pubDate>
				<category><![CDATA[sports]]></category>
		<guid isPermaLink="false">http://ilari.scheinin.fi/?p=275</guid>

					<description><![CDATA[I broke in on the left wing, and as soon as it was on my stick, I snapped it and caught Patrick Roy off guard. It went right between his legs. I will never forget that feeling. It was like &#8230; <a href="https://ilari.scheinin.fi/i-broke-in-on-the-left-wing/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
										<content:encoded><![CDATA[<blockquote><p>I broke in on the left wing, and as soon as it was on my stick, I snapped it and caught Patrick Roy off guard. It went right between his legs. I will never forget that feeling. It was like watching a girl take her clothes off in front of you for the first time.<br />
&#8211; <a href="http://www.amazon.com/gp/product/B003A023LQ/ref=as_li_ss_tl?ie=UTF8&#038;camp=1789&#038;creative=390957&#038;creativeASIN=B003A023LQ&#038;linkCode=as2&#038;tag=ilarischeinin-20">Theo Fleury in his biography Playing with Fire, on scoring a game-winning goal in the 1989 Stanley Cup finals.</a></p></blockquote>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>A message from Nick and Annika Lidström</title>
		<link>https://ilari.scheinin.fi/a-message-from-nick-and-annika-lidstrom/</link>
		
		<dc:creator><![CDATA[Ilari Scheinin]]></dc:creator>
		<pubDate>Mon, 04 Jun 2012 10:22:18 +0000</pubDate>
				<category><![CDATA[sports]]></category>
		<guid isPermaLink="false">http://ilari.scheinin.fi/?p=271</guid>

					<description><![CDATA[A message from Nick and Annika Lidström Nicklas Lidström thanks the people of Detroit with a full-page ad in the local paper. Classy.]]></description>
										<content:encoded><![CDATA[<p><a href="http://www.aftonbladet.se/sportbladet/hockey/internationellt/nhl/article14924413.ab">A message from Nick and Annika Lidström</a></p>
<p>Nicklas Lidström thanks the people of Detroit with a full-page ad in the local paper. Classy.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Our Way of Life</title>
		<link>https://ilari.scheinin.fi/our-way-of-life/</link>
		
		<dc:creator><![CDATA[Ilari Scheinin]]></dc:creator>
		<pubDate>Wed, 28 Dec 2011 09:30:40 +0000</pubDate>
				<category><![CDATA[inspiration]]></category>
		<category><![CDATA[sports]]></category>
		<guid isPermaLink="false">http://ilari.scheinin.fi/?p=242</guid>

					<description><![CDATA[Hockey.]]></description>
										<content:encoded><![CDATA[<p><iframe loading="lazy" width="560" height="315" src="//www.youtube-nocookie.com/embed/OohqTf5uvkY" frameborder="0" allowfullscreen></iframe></p>
<p>Hockey.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/

Object Caching 13/178 objects using Disk
Page Caching using Disk: Enhanced 
Content Delivery Network via Amazon Web Services: CloudFront: cdn.ilari.scheinin.fi
Database Caching 1/24 queries in 0.015 seconds using Disk

Served from: ilari.scheinin.fi @ 2025-02-01 22:23:11 by W3 Total Cache
-->