thrice fortnight sports blog: Pre-Tourney Thoughts

Early today I posted a some bubble thoughts over at Triangle Offense, and bit my tongue about my thoughts on RPI. In truth, it is a blunt tool that measures only wins and losses and ignores quality. Now, to be fair the NCAA selection committee does a fine job sorting through all the available information (and ignoring conference affiliation, no we're serious folks, we don't count conference bids). They use useful information like best and worst wins and average win RPI and average loss RPI (hmmm, I wonder if they compare it with the median...).

Anyways, I pulled the numbers off KenPom and compared them with RPI, just to get a short list of what teams the RPI is over and undervaluing. Since the committee is partially seeding by RPI, look for these teams to under or over perform their seed expectations, or "PASE" (see previous post for more PASE fun).

RPI>KenPom (Number is actual difference)
-52 Dayton
-43 Siena
-33 George Mason
-29 Florida St.
-25 Louisville
-21 Mississippi
-20 Tennessee
-20 Utah St.
-18 Illinois St.
-17 Oklahoma

These are teams the RPI loves while KenPom is a bit more doubtful of, since I trust KenPom I'd say these are teams you should think twice before blindly filling in to fit their seed line.

KenPom>RPI (Number is difference)
26 Gonzaga
27 UCLA
29 Kentucky
35 Houston
36 Kansas St.
38 Notre Dame
39 New Mexico
42 North Dakota St.
49 Stanford
61 Washington St.

Ditto from above, but opposite direction. These teams are better than their RPI's indicate, not all of them will make the NCAA's, but the ones that do may be a seed line or two lower than they should be. A team that should be a #3 playing as a #5 or #6 seed (UCLA) is a pretty big change.

More importantly, what causes this discrepancy in the rating systems? Well, my theory is that it is everything that the RPI ignores: scoring margin. The RPI sees a win as a win and a loss as a loss, when in reality there are good losses and bad wins. Virginia Tech taking Duke to the wire today was a good win, it was likely a better performance than they had expected. Similarly, Butler only defeating Cleveland St. by two points at home was a closer game than expected, and should be counted as such.

To determine if my hypothesis is correct, I've taken the top 100 teams and determined how over or under rated they are in RPI (a "Delta" value), and calculated the correlations and r squared values for each of the front page KenPom values (Pyth, OE, DE, Cons, etc.) to the calculated Delta value. This should show if any KenPom value can be said to be causing the difference between the two ranking systems.

Once I did this, there were several KenPom values that stood out with a better than random correlation (.300 or greater):

The stats with the most significant correlations are Luck, Consistency, Wins, and Losses. Wins and Losses are obvious and easily explainable, since the RPI is strictly based on wins and losses it is tied to them, while KenPom often sees through the good losses and bad wins to discredit fraudulent teams eking by easy competition or the good ones losing heartbreakers against strong competition.

Consistency isn't as obvious. The stats are saying that a higher consistency rating correlates well with a higher Delta value (higher being better in KenPom, worse in RPI, think of it as "higher up in Ryan's opinion). My guess would be that this inconsistency is penalized, because it often results in losses. The inconsistency can also win games a team shouldn't win, and perhaps KenPom's system rewards teams more than RPI does. This correlation doesn't make as much sense as the Luck correlation.

Luck is pretty obvious, and I'm glad it is by far the strongest correlation. Luck amounts to winning close games or, on the reverse side, losing them. Luck can be considered KenPom's way of wiping away close results, since they come down to random chance anyways (today, ABC proved the point, by repeatedly showing VA Tech's record in close games in the Seth Greenberg era, 33-35 I believe, 9-9 in 1 point games). RPI is essentially luck blind, beating a team by 1 point is the same as beating a team by 41, and we all know that simply shouldn't be the case. Appropriately Luck has a -.602 R-Squared value, meaning that bad luck (lower values) correlate to higher Delta values (overrated in RPI). Teams that play in and lose close games end up being underrated by RPI, while those that win too many close games are overrated.

In summary (and I commend you if you've made it through this quantitatively heavy post, I can't post stuff like this at TO, so while the week to week Four Factor stuff will end up there, the math lectures will remain a TFSB item) you can't rely on past performance in close games as an indicator of future performance. RPI is blind to margin while KenPom takes it into account. Use this to your advantage next weekend when filling out brackets, unless you're in my bracket pool of course...

thrice fortnight sports blog

Sunday, March 1

Pre-Tourney Thoughts

No comments: