I have noticed a few posts in recent weeks that have begun with something along the lines of "I'm not sure how DVOA works, but...". (There are also a couple of posters in this forum who seem to find the stat personally offensive, for reasons that I have not quite been able to fathom). So I am going to take out my recent football frustrations by opening up a discussion of this stat, which I hope will supplement the official Football Outsiders write-up (available here) and the Aaron Schatz chat from a couple of years ago in terms of making it more universally accessible. And I hope that others will chime in with thoughts on DVOA or on other topics in football analytics.
In a nutshell, DVOA (Defense-adjusted Value Over Average) is basically just yards per play, adjusted for the most significant elements of game context. This adjustment is a three-step process corresponding to the elements of the acronym. FO does a good job of explaining this, so I will only briefly touch on each:
Step 1: Value. Not all yards are created equal, due to differing implications for the probability of picking up a first down. A four-yard gain is much more valuable on 3rd and 4 than it is on 3rd and 15. So DVOA begins by assigning a success value to each play based on the down/distance context in which it occurred.
Step 2: Over Average. As other elements of game context change, so does the importance of achieving plays that are nominally successful by the standard established in Step 1. When trailing by 17 points with 5 minutes left, picking up a first down is of little consequence. DVOA compares the success value of plays to the average value to be expected, based a database of plays in comparable game situations, in order to account for this.
These first two steps generate VOA, which can be a useful stat in its own right.
Step 3: Defense-adjusted. This is actually something of a misnomer: "opponent-adjusted" would be more accurate. Given the small number of games in an NFL season and the division-heavy schedule, some teams will find it much easier than others to achieve (context-adjusted) per-play success. So the final step is to adjust VOA in comparison with the baseline implied by the opponents faced.
Again, all of this is well explained in the FO write-up, which also offers empirical confirmation that DVOA is better than unadjusted yards per play in terms of both autocorrelation and correlation with game results. The interesting question is why this is the case - in particular why, in contrast with baseball, autocorrelation improves when we include elements of game context. I think it can be most succinctly explained as follows: DVOA is based on the understanding that in football, the game state provides the players with actionable information. When it is 3rd and 4, both offense and defense know that the fourth yard is the make-or-break yard, and can (should) adjust their actions accordingly. And when defending a 17-point lead with five minutes remaining, the defense knows that a 12-yard gain for a first down is all but irrelevant, and will play the situation much differently than they would if they were trailing by 3 with three minutes remaining. As such, we can see that context-adjusted success, not raw yards per se, are the currency in which per-play performance should be measured in football.
So it is interesting to note that the sabermetric revolution has, to a large extent, proceeded in opposite directions in baseball and football. In baseball, most traditional statistics simply assign game results to the player most obviously connected to them (e.g. runs scored, RBI, W-L, and even ERA). But we now know that this practice is sub-optimal, as in baseball, game context does not (to first approximation) provide any such actionable information. The batter is always trying to hit the ball as hard as he can, or get on base via a walk, while the pitcher is always trying to prevent him from doing either.* So we need to remove exogenous game context factors from player stats such as RBI, in order to avoid crediting/debiting players for factors over which they have (virtually) no control. In football, on the other hand, responding to game context is a significant element of the player's performance on each snap, and as such needs to be added on to the raw yardage figures that have traditionally been used.
Reading through this forum, one finds various criticisms of DVOA, which are of varying levels of interest. We can begin by addressing those who seem almost to feel threatened by the stat, and noisily object whenever it is introduced into discussion (e.g. dcmissle's oh-so-eloquent "Fuck DVOA" post from last week.) I can only assume that this attitude is based on an assumption that proponents of DVOA somehow view it as the be-all and end-all of football analysis. If so, this assumption is mistaken in virtually all cases - I don't know of anyone, including FO writers themselves, who attempt to use the stat in this manner. (See Schatz's response to dcmissle in the chat, for example.) As discussed above, DVOA is probably the best stat we have for measuring the per-play performance of a given team at a given point in the season; as such, it forms a useful starting point for evaluation of various questions, such as an upcoming playoff match-up. But of course, a fully robust analysis will go well beyond this, and attempt to show how the particular game may differ due to individual player match-ups, game plans, etc. In the old days, we might have begun such a discussion by pointing out that Team A averaged, say, 8.0 yards per pass attempt over the course of a season, while their opponent gave up 7.0 yards per pass attempt on defense. While it would be perfectly correct to insist that further analysis remained to be done, would it really occur to anyone to respond to this with "Fuck YPA?"
Much more interesting are specific quibbles with the results that DVOA generates, which are occasionally counter-intuitive in the extreme. One example of this occurred in Week 1 of this season, when the Patriots did not score well by VOA in their narrow win over Buffalo, despite what seemed like a clear superiority on a per-play basis. Discussion in this forum centered on the possibility that FO's approach to the value-adjustment component (Step 1 above) is sub-optimal, and I do believe that this likely to be the case. As FO's write-up makes clear, they assign "success points" to a play based on the yards gained relative to the down and distance. 1st down plays are successful if they gain 45% of the required yards, 2nd down plays if they gain 60%, and 3rd/4th down plays only if they actually gain a first down. This is based on research from The Hidden Game of Football, where authors show that a team is approximately as likely to achieve a new first down when facing 2nd and 6 as on 1st and 10; therefore, gaining 40% of the yards required on 1st down has kept the team "on schedule" toward their next first down.
When I first read about the DVOA methodology around ten years ago, I thought that this made good intuitive sense: staying on schedule in terms of first-down probability should confer a disproportionate degree of benefit, as it seemingly keeps the entire play-book open, and avoids ending up obvious passing situations. So I was a little surprised a few years later when I found this post by Brian Burke at AdvancedNFLStats.com, where he shows that first-down probability is more or less linear with respect to yards required. There is no obvious inflection point at the more moderate distances on second and third down that would seem to lend themselves to greater offensive flexibility (at least, not until you get to around 3rd and 1). As such, while keeping on schedule for the next first down is perhaps mildly interesting as a benchmark for play success, there is no obvious reason to assign it disproportionate importance in the success points system.
It might be suggested here that FO handles this objection through their system of fractional success points - a success value of 1 is simply a benchmark on a continuous value scale, not an inflection point. But the insurmountable problem (or so I see it, anyway) remains: given what we see in Brian Burke's graphs, there is simply no way that a five-yard gain on 1st and 10 should be treated as equal in value to a three-yard gain on 3rd and 2. The former represents a miniscule rise in first-down probability; the latter, a gain of around 40%. Or in another way of looking at it: a series starting on 1st and 10 that gains 5, 3, and 0 yards will receive two success points (the first and second-down plays). A series that gains 0, 0, and 10 yards will receive one success point (the third-down play). But it is the latter series that gains a new first down. I see no way around the conclusion that their success point system is fundamentally miscalibrated, due to excessive reliance on the research from THGOF.
This would seem to explain the strange DVOA outcome of the Week 1 NE-BUF game, as the Bills had a lot of moderate gains on first and second down that DVOA might tend to overrate. And it is worth noting that after another such anomalous game two years ago, Aaron Schatz actually acknowledged the possibility that their system gives too much credit to these partial successes on first and second down.
Nonetheless, even if this criticism turns out to be accurate, this is a comparatively minor flaw. And it should not detract from DVOA's demonstrated track record of superiority over unadjusted yardage stats, or from the conclusion that it represents a superior approach for measuring true performance on the football field, as discussed above.
Any other thoughts on DVOA? Other stats? Brian Burke's Expected Points Added is also interesting - I might discuss that in a later post.
--------------------
* There are exceptions to this, of course, even beyond the existence of small amounts of clutch ability that have been detected in recent years: when batting with a runner on 3rd, 1 out, and Lugo and Varitek due up next, the optimal approach will be different than it would be leading off the ninth inning down by two runs. But over the course of a season, these differences are very slight, and an attempt to explicitly account for them is likely to cause more problems than it solves. We don't see the marked change in incentives, depending on game situation, that we do in football.
In a nutshell, DVOA (Defense-adjusted Value Over Average) is basically just yards per play, adjusted for the most significant elements of game context. This adjustment is a three-step process corresponding to the elements of the acronym. FO does a good job of explaining this, so I will only briefly touch on each:
Step 1: Value. Not all yards are created equal, due to differing implications for the probability of picking up a first down. A four-yard gain is much more valuable on 3rd and 4 than it is on 3rd and 15. So DVOA begins by assigning a success value to each play based on the down/distance context in which it occurred.
Step 2: Over Average. As other elements of game context change, so does the importance of achieving plays that are nominally successful by the standard established in Step 1. When trailing by 17 points with 5 minutes left, picking up a first down is of little consequence. DVOA compares the success value of plays to the average value to be expected, based a database of plays in comparable game situations, in order to account for this.
These first two steps generate VOA, which can be a useful stat in its own right.
Step 3: Defense-adjusted. This is actually something of a misnomer: "opponent-adjusted" would be more accurate. Given the small number of games in an NFL season and the division-heavy schedule, some teams will find it much easier than others to achieve (context-adjusted) per-play success. So the final step is to adjust VOA in comparison with the baseline implied by the opponents faced.
Again, all of this is well explained in the FO write-up, which also offers empirical confirmation that DVOA is better than unadjusted yards per play in terms of both autocorrelation and correlation with game results. The interesting question is why this is the case - in particular why, in contrast with baseball, autocorrelation improves when we include elements of game context. I think it can be most succinctly explained as follows: DVOA is based on the understanding that in football, the game state provides the players with actionable information. When it is 3rd and 4, both offense and defense know that the fourth yard is the make-or-break yard, and can (should) adjust their actions accordingly. And when defending a 17-point lead with five minutes remaining, the defense knows that a 12-yard gain for a first down is all but irrelevant, and will play the situation much differently than they would if they were trailing by 3 with three minutes remaining. As such, we can see that context-adjusted success, not raw yards per se, are the currency in which per-play performance should be measured in football.
So it is interesting to note that the sabermetric revolution has, to a large extent, proceeded in opposite directions in baseball and football. In baseball, most traditional statistics simply assign game results to the player most obviously connected to them (e.g. runs scored, RBI, W-L, and even ERA). But we now know that this practice is sub-optimal, as in baseball, game context does not (to first approximation) provide any such actionable information. The batter is always trying to hit the ball as hard as he can, or get on base via a walk, while the pitcher is always trying to prevent him from doing either.* So we need to remove exogenous game context factors from player stats such as RBI, in order to avoid crediting/debiting players for factors over which they have (virtually) no control. In football, on the other hand, responding to game context is a significant element of the player's performance on each snap, and as such needs to be added on to the raw yardage figures that have traditionally been used.
Reading through this forum, one finds various criticisms of DVOA, which are of varying levels of interest. We can begin by addressing those who seem almost to feel threatened by the stat, and noisily object whenever it is introduced into discussion (e.g. dcmissle's oh-so-eloquent "Fuck DVOA" post from last week.) I can only assume that this attitude is based on an assumption that proponents of DVOA somehow view it as the be-all and end-all of football analysis. If so, this assumption is mistaken in virtually all cases - I don't know of anyone, including FO writers themselves, who attempt to use the stat in this manner. (See Schatz's response to dcmissle in the chat, for example.) As discussed above, DVOA is probably the best stat we have for measuring the per-play performance of a given team at a given point in the season; as such, it forms a useful starting point for evaluation of various questions, such as an upcoming playoff match-up. But of course, a fully robust analysis will go well beyond this, and attempt to show how the particular game may differ due to individual player match-ups, game plans, etc. In the old days, we might have begun such a discussion by pointing out that Team A averaged, say, 8.0 yards per pass attempt over the course of a season, while their opponent gave up 7.0 yards per pass attempt on defense. While it would be perfectly correct to insist that further analysis remained to be done, would it really occur to anyone to respond to this with "Fuck YPA?"
Much more interesting are specific quibbles with the results that DVOA generates, which are occasionally counter-intuitive in the extreme. One example of this occurred in Week 1 of this season, when the Patriots did not score well by VOA in their narrow win over Buffalo, despite what seemed like a clear superiority on a per-play basis. Discussion in this forum centered on the possibility that FO's approach to the value-adjustment component (Step 1 above) is sub-optimal, and I do believe that this likely to be the case. As FO's write-up makes clear, they assign "success points" to a play based on the yards gained relative to the down and distance. 1st down plays are successful if they gain 45% of the required yards, 2nd down plays if they gain 60%, and 3rd/4th down plays only if they actually gain a first down. This is based on research from The Hidden Game of Football, where authors show that a team is approximately as likely to achieve a new first down when facing 2nd and 6 as on 1st and 10; therefore, gaining 40% of the yards required on 1st down has kept the team "on schedule" toward their next first down.
When I first read about the DVOA methodology around ten years ago, I thought that this made good intuitive sense: staying on schedule in terms of first-down probability should confer a disproportionate degree of benefit, as it seemingly keeps the entire play-book open, and avoids ending up obvious passing situations. So I was a little surprised a few years later when I found this post by Brian Burke at AdvancedNFLStats.com, where he shows that first-down probability is more or less linear with respect to yards required. There is no obvious inflection point at the more moderate distances on second and third down that would seem to lend themselves to greater offensive flexibility (at least, not until you get to around 3rd and 1). As such, while keeping on schedule for the next first down is perhaps mildly interesting as a benchmark for play success, there is no obvious reason to assign it disproportionate importance in the success points system.
It might be suggested here that FO handles this objection through their system of fractional success points - a success value of 1 is simply a benchmark on a continuous value scale, not an inflection point. But the insurmountable problem (or so I see it, anyway) remains: given what we see in Brian Burke's graphs, there is simply no way that a five-yard gain on 1st and 10 should be treated as equal in value to a three-yard gain on 3rd and 2. The former represents a miniscule rise in first-down probability; the latter, a gain of around 40%. Or in another way of looking at it: a series starting on 1st and 10 that gains 5, 3, and 0 yards will receive two success points (the first and second-down plays). A series that gains 0, 0, and 10 yards will receive one success point (the third-down play). But it is the latter series that gains a new first down. I see no way around the conclusion that their success point system is fundamentally miscalibrated, due to excessive reliance on the research from THGOF.
This would seem to explain the strange DVOA outcome of the Week 1 NE-BUF game, as the Bills had a lot of moderate gains on first and second down that DVOA might tend to overrate. And it is worth noting that after another such anomalous game two years ago, Aaron Schatz actually acknowledged the possibility that their system gives too much credit to these partial successes on first and second down.
Nonetheless, even if this criticism turns out to be accurate, this is a comparatively minor flaw. And it should not detract from DVOA's demonstrated track record of superiority over unadjusted yardage stats, or from the conclusion that it represents a superior approach for measuring true performance on the football field, as discussed above.
Any other thoughts on DVOA? Other stats? Brian Burke's Expected Points Added is also interesting - I might discuss that in a later post.
--------------------
* There are exceptions to this, of course, even beyond the existence of small amounts of clutch ability that have been detected in recent years: when batting with a runner on 3rd, 1 out, and Lugo and Varitek due up next, the optimal approach will be different than it would be leading off the ninth inning down by two runs. But over the course of a season, these differences are very slight, and an attempt to explicitly account for them is likely to cause more problems than it solves. We don't see the marked change in incentives, depending on game situation, that we do in football.