It occurred to me that if PADE were well done (and not all BP stats are), it could be used to figure out whether UZR was accurately park adjusted.
It turns out that PADE, though expressed terribly in their report, is really well constructed. They calculate H and A DE for every team, then calculate a raw park factor for each park, the calculate the expected DE based on each club's schedule (that's they key extra step), then divide actual DE by expected. What they should do is translate this back into a DE by multiplying it by MLB average DE, but instead they just list the percentage difference.
Important edit: it is BPís PADE which is the source of the error I attributed to UZR.
I took PADE on faith because the methodology is straightforward and they described it well. But it turns out that PADE is wildly overcorrected, by precisely the factor of 4 that I ascribed to UZR. And it is not consistent about it; my preliminary figures for 2009 correlate with theirs only decently.
Lesson: if there is a discrepancy between two metrics, donít assume that itís the simple one that must be accurate! (I refrain from making any generalization about the reliability of any one source of analysis ...)
I am in the process of generating my own set of park adjustments for DE, for the 8 years that weíve had UZR data, at which point I'll redo this. It might take me another month or more, though.
See the post of 11/24 for some interim revised findings.
It's very easy from their data to calculate the number of raw plays made by each team's defense as well as the plays that were given to or taken away from the defense by the park. For instance, Fenway has been the second worst park for team defense (only Coors is worse), averaging 35 plays or 32 runs per year (the average play not made costs .92 runs; that figure assumes that a play not made has the observed, average chance of being a 1B, 2B, 3B, ROE, or GDP. It may be a very slight overestimate in that a disproportionate number of plays not made may well be singles, but I'm not even sure if that's the case).
Our next step is to think about the data we have at hand.
Team UZR purports to be a measure of defense only, properly adjusted for park. It may actually be a measure of team defense, tainted by park factor. What we can be fairly certain of is that it is not greatly tainted by the pitching staff's BABIP skill (yes, skill, as we'll demonstrate several different ways before we're done).
PADE, converted into plays and then runs, measures Team Defense + Staff BABIP Skill. This number is as accurate as we can make it.
We also have the original DE which is PADE + Ballpark Factor.
And finally, we have that Ballpark Factor, and that's damn accurate, too.
Well, the first thing we think of is this. If UZR is correctly park adjusted, it should correlate much better to PADE than it does to the unadjusted DE.
It doesn't. In fact, it correlates more strongly to DE (.61) than it does to PADE (.57), which indicates that it's leaving a majority of the park adjustment on the table.
How much, exactly?
What we can do is calculate what UZR thinks is the Staff BABIP Skill. We know that PADE = Defense + BABIP Skill. So what UZR thinks is the true value of the team's BABIP skill is PADE - UZR.
If UZR had no park error, this estimate of staff BABIP skill would not correlate with our very reliably calculated Park Factors. But it does so, enormously (r = .47, p = 10^-15). In fact, the best predictor of what UZR thinks is staff BABIP skill is .751 * Park Factor. Which is an awful lot.
If we know that 75% of what UZR thinks is the team BABIP skill (which is to say, all of DE that isn't fielding) is actually the park adjustment it's missing, we can use the park factor to correct its estimate of staff BABIP skill, and then put the pieces back together to get a park-adjusted UZR.
As you would expect from the numbers so far, it turns out that UZR for the Sox has been low by an average of 32 * .75 = 24 runs per year. That is, 24 runs of what UZR thinks is bad fielding is actually the impact of Fenway that it was failing to measure. And UZR's park adjustment is exactly 25% of what it ought to be.
Since team UZR is the work of 7 fielders, you can convert this last number to UZR/150 per fielder by dividing by 7 and multiplying by 150/162. That works out to 3.1 runs per fielder.
Not every ballpark is as consistent as Fenway. Here are the UZR PAF errors for each team since the dawn of UZR, which, by a miracle of mathematical coincidence, are almost exactly one tenth of the park factor (.75 / 7 *150 / 162 = .99). So it is also a table of the park factors -- just move the decimal point.
The extra, cool data.
Oh, yeah. Along the way we also calculated the actual team defense for each year and the team BABIP skill, using the assumption that UZR is correct except for the missing park adjustment. Here those are.
How Accurate is UZR Even After the Missing Park Adjustment?
Now, the first thing that jumps out at you is that there's no way the 2005-6 New York Yankees were both the worst fielding and best BABIP-pitching team in recent memory. They were certainly bad at the former and good at the latter, but the size of the numbers suggests that their UZR for those years was low, maybe way too low, and thus the data is giving their pitchers undeserved credit and
Equally suspicious are the '06-'07 Royals, who are the opposite. The '03 A's, another crazy good-fielding, bad pitching team, are also suspect.
In fact, if UZR were doing a perfect job of separating fielding from BABIP skill (which is precisely what it is attempting to do), these two tables would not correlate at all. In fact, they have a mild inverse correlation (-.18); you can predict the numbers in the second table to a mild but very significant degree by multiplying the first table by .16 and flipping the sign.
The correlation tells us that UZR is, on the whole, doing 82% of the job it claims to. However, if you remove the five cases already mentioned (they really are honking outliers on the chart of PA-UZR vs. BABIP), the correlation drops to -.05, so it might be fair to say that UZR, once adjusted for park, is 95% accurate at separating fielding from pitching except for 1 team in about 50, which it gets very wrong for some unknown reason. Obviously, looking at the 5 exceptions in further detail might give some insight into the system's weaknesses.
One Last Thing
The year-to-year correlation of PA-UZR is .447. The year-to-year correlation of staff BABIP skill is .440. They are absolutely as reliable as one another, and that means that staff BABIP skill is indeed skill.
The standard deviation of PA-UZR is 38 runs, while staff BABIP skill is 34 runs and the park factor is 24. If you want to assume the park, it's fair to say that BABIP is 55% fielding and 45% pitching (it's actually 53 / 47 but we love round numbers, and the 5 outliers pull the ratio down a tad). Observed BABIP is 40% fielding, 35% pitching, and 25% the park, on the nose. Think of BABIP as Barbie, and fielding is the bust, pitching is the hips, and the ballpark is the waistline.
Edited by Eric Van, 24 November 2009 - 09:54 AM.