Jump to content


Yo! You're not logged in. Why am I seeing this ad?

Photo

Fixing UZR With Park Adjustments? Probably Not


  • This topic is locked This topic is locked
11 replies to this topic

#1 Eric Van


  • fails often, thus succeeds


  • 10,844 posts

Posted 31 October 2009 - 05:01 AM

Baseball Prospectus has a stat they call Park Adjusted Defensive Efficiency (PADE). DE, as you probably know, is almost 1-BABIP; more specifically, it is 1-BABIP if you count a ROE as a H. PADE attempts to adjust DE for the impact of each club's ballpark on DE.

It occurred to me that if PADE were well done (and not all BP stats are), it could be used to figure out whether UZR was accurately park adjusted.

It turns out that PADE, though expressed terribly in their report, is really well constructed. They calculate H and A DE for every team, then calculate a raw park factor for each park, the calculate the expected DE based on each club's schedule (that's they key extra step), then divide actual DE by expected. What they should do is translate this back into a DE by multiplying it by MLB average DE, but instead they just list the percentage difference.

Important edit: it is BP’s PADE which is the source of the error I attributed to UZR.

I took PADE on faith because the methodology is straightforward and they described it well. But it turns out that PADE is wildly overcorrected, by precisely the factor of 4 that I ascribed to UZR. And it is not consistent about it; my preliminary figures for 2009 correlate with theirs only decently.

Lesson: if there is a discrepancy between two metrics, don’t assume that it’s the simple one that must be accurate! (I refrain from making any generalization about the reliability of any one source of analysis ...)

I am in the process of generating my own set of park adjustments for DE, for the 8 years that we’ve had UZR data, at which point I'll redo this. It might take me another month or more, though.

See the post of 11/24 for some interim revised findings.

It's very easy from their data to calculate the number of raw plays made by each team's defense as well as the plays that were given to or taken away from the defense by the park. For instance, Fenway has been the second worst park for team defense (only Coors is worse), averaging 35 plays or 32 runs per year (the average play not made costs .92 runs; that figure assumes that a play not made has the observed, average chance of being a 1B, 2B, 3B, ROE, or GDP. It may be a very slight overestimate in that a disproportionate number of plays not made may well be singles, but I'm not even sure if that's the case).

Our next step is to think about the data we have at hand.

Team UZR purports to be a measure of defense only, properly adjusted for park. It may actually be a measure of team defense, tainted by park factor. What we can be fairly certain of is that it is not greatly tainted by the pitching staff's BABIP skill (yes, skill, as we'll demonstrate several different ways before we're done).

PADE, converted into plays and then runs, measures Team Defense + Staff BABIP Skill. This number is as accurate as we can make it.

We also have the original DE which is PADE + Ballpark Factor.

And finally, we have that Ballpark Factor, and that's damn accurate, too.

Well, the first thing we think of is this. If UZR is correctly park adjusted, it should correlate much better to PADE than it does to the unadjusted DE.

It doesn't. In fact, it correlates more strongly to DE (.61) than it does to PADE (.57), which indicates that it's leaving a majority of the park adjustment on the table.

How much, exactly?

What we can do is calculate what UZR thinks is the Staff BABIP Skill. We know that PADE = Defense + BABIP Skill. So what UZR thinks is the true value of the team's BABIP skill is PADE - UZR.

If UZR had no park error, this estimate of staff BABIP skill would not correlate with our very reliably calculated Park Factors. But it does so, enormously (r = .47, p = 10^-15). In fact, the best predictor of what UZR thinks is staff BABIP skill is .751 * Park Factor. Which is an awful lot.

If we know that 75% of what UZR thinks is the team BABIP skill (which is to say, all of DE that isn't fielding) is actually the park adjustment it's missing, we can use the park factor to correct its estimate of staff BABIP skill, and then put the pieces back together to get a park-adjusted UZR.

As you would expect from the numbers so far, it turns out that UZR for the Sox has been low by an average of 32 * .75 = 24 runs per year. That is, 24 runs of what UZR thinks is bad fielding is actually the impact of Fenway that it was failing to measure. And UZR's park adjustment is exactly 25% of what it ought to be.

Since team UZR is the work of 7 fielders, you can convert this last number to UZR/150 per fielder by dividing by 7 and multiplying by 150/162. That works out to 3.1 runs per fielder.

Not every ballpark is as consistent as Fenway. Here are the UZR PAF errors for each team since the dawn of UZR, which, by a miracle of mathematical coincidence, are almost exactly one tenth of the park factor (.75 / 7 *150 / 162 = .99). So it is also a table of the park factors -- just move the decimal point.

UZR Missing Park Adjustments
Tm 2002 2003 2004 2005 2006 2007 2008 2009
BAL -4.4 -4.7 -2.5 -1.8 -0.6 -0.5 0.5 1.6
BOS 6.8 3.2 3.5 0.6 1.5 2.9 4.2 3.1
CHA -1.4 -1.6 0.2 1.3 1.4 1.8 0.2 0.7
CLE 1.5 1.8 1.9 0.1 0.2 1.8 3.8 4.7
DET 1.5 -0.4 -1.3 -0.9 1.3 2.0 2.3 -0.1
KCA 6.5 5.3 1.4 -0.2 0.2 1.3 1.2 3.1
LAA 2.7 3.0 1.8 0.8 0.7 0.2 0.7 -0.3
MIN 3.3 2.7 1.8 2.3 -0.3 -2.5 -4.2 -2.3
NYA -2.2 -1.8 -0.7 1.5 -0.1 0.6 -1.8
-1.9
OAK -2.4 -3.2 -3.4 -3.4 -1.2 -2.0 -3.2 -3.3
SEA -1.8 -1.9 -1.8 -2.5 -2.1 -1.4 -1.9 -0.3
TBA -0.5 -0.2 -2.4 -1.5 -3.2 -1.8 -2.7 -1.9
TEX 1.1 3.6 3.0 1.1 -0.7 -1.2 0.3 0.9
TOR 0.8 2.6 3.0 4.3 1.3 -2.4 -2.6 -3.3
ARI 2.4 2.2 1.0 0.2 1.6 2.2 1.2 1.2
ATL 1.8 0.0 -1.1 -0.9 0.5 0.0 1.2 0.0
CHN -5.2 -1.8 -0.6 0.7 -0.5 1.3 2.4 3.5
CIN 1.7
0.8 -1.6 -1.8 -1.9 -0.9 -2.0 -2.4
COL 5.9 3.2 4.3 5.0 5.8 4.4 3.2 3.5
FLO -2.2 -2.2 -2.4 -2.4 -0.9 1.1 3.1 4.4
HOU 1.5 -0.2 -0.9 -2.4 -1.0 -1.0 0.7 0.6
LAN -4.1 -2.9 -2.3 -2.8 -1.1 0.5 0.0 -0.7
MIL -1.3 -1.8 -2.4 -3.5 -2.7 -4.2 -1.4 -1.7
NYN -0.7 0.7 2.4 2.9 1.7 -0.1 -2.4
-4.2
PHI -2.5 -3.5
-2.8 1.3 2.1 2.2 0.9 1.1
PIT -0.9 1.7 2.6 3.0 1.5 0.5 -1.1 -2.4
SDN -2.8 -1.8
-1.5 -1.9 -1.9 -3.3 -2.2 -4.3
SFN -1.3 1.5 4.2 3.8 2.4 1.8 2.9 3.0
SLN -2.0 -2.7 -2.0 0.3
0.1 0.2 -1.9 -0.3
MON -1.2 -0.1 -1.0
WAS -2.7 -3.9 -3.3 -1.7
-2.8


The extra, cool data.

Oh, yeah. Along the way we also calculated the actual team defense for each year and the team BABIP skill, using the assumption that UZR is correct except for the missing park adjustment. Here those are.

True Team Defense (PA UZR)
Tm 2002 2003 2004 2005 2006 2007 2008 2009
BAL -19 -29 -26 -17 -4 -10 11 -14
BOS 65 1 -8 10 -2 46 74 7
CHA -22 9 33 49 -19 -32 -22 -27
CLE -25 42 13 36 -27 3 41 2
DET -32 -46 -37 9 48 47 -22 44
KCA 31 7 -11 -54 60 83 19 -26
LAA 62 61 6 30 -9 0 6 9
MIN 75 40 9 47 17 -12 -49 -54
NYA -65 -76 -82 -126 -74 -4 -58 -32
OAK -36 -59 -27 3 -9 -19 18 -22
SEA 0 48 10 -17 12 -54 -35 83
TBA 21 50 49 -17 -25 -71 53 55
TEX -6 22 33 -28 -32 -16 -49 40
TOR 12 -41 61 79 47 13 1 -59
ARI 18 -16 -10 -42 -1 23 -18 31
ATL 34 20 20 48 14 40 -6 -23
CHN -70 -20 22 17 16 53 29 7
CIN 12 -10 -37 -81 -40 -48 -58 35
COL 72 30 13 26 50 45 -11 14
FLO -43 -24 -28 -59 -18 -30 21 14
HOU 3 42 5 2 36 -43 40 -12
LAN -2 -28 42 -17 -51 -44 -48 -4
MIL -27 -85 -34 -27 -19 -11 3 -5
NYN -4 -29 -5 -3 23 6 9 -80
PHI -1 11 -5 67 28 72 80 38
PIT 15 14 -9 26 -42 -5 -35 11
SDN -60 1 -21 -41 19 -53 -47 -46
SFN 18 70 32 53 43 35 38 76
SLN 28 -2 4 19 15 11 15 -19
MO/WA -48 3 -7 12 -53 -22 -4 -48


Staff BABIP Skill (According to UZR)
Tm 2002 2003 2004 2005 2006 2007 2008 2009
BAL 6 -61 -21 -5 -37 3 -15 -10
BOS 29 -11 46 -46 -19 52 2 -24
CHA 47 20 -20 39 55 24 6 29
CLE -43 -9 -28 26 -28 9 -21 2
DET -19 -4 -7 -5 19 -12 23 -31
KCA 3 28 -45 -67 -112 -84 -9 -8
LAA 57 1 -10 6 29 -31 7 -22
MIN -6 7 -12 9 -31 -15 -8 29
NYA 13 -2 53 116 100 17 7 34
OAK 43 98 13 42 -2 18 -13 -45
SEA 13 35 3 21 -31 -20 -19 -3
TBA -62 1 -37 -45 -77 -73 -4 -56
TEX -13 -67 -24 -17 -11 -23 -35 0
TOR -22 11 -27 -11 -17 30 26 -11
ARI 9 22 -25 -2 2 21 13 -37
ATL 46 -4 -44 -60 -16 3 32 5
CHN -20 -11 -10 9 26 27 51 64
CIN -9 -9 8 -10 -6 -1 -31 -4
COL -31 -68 -22 -72 -11 57 -9 12
FLO -8 -11 42 -25 -2 -68 21 5
HOU -25 -6 -38 17 3 23 -2 -42
LAN 41 35 22 10 8 36 50 83
MIL 16 19 8 0 -14 -73 13 -26
NYN -9 10 41 56 52 48 -3 44
PHI 15 -16 25 -14 -34 -55 -48 -16
PIT -38 -11 10 4 -3 -47 -43 -36
SDN -47 -17 13 6 60 76 48 12
SFN 7 5 21 -2 32 23 -30 13
SLN 6 -5 51 21 20 -10 -14 27
MO/WA -10 -5 8 -26 28 34 -18 -2


How Accurate is UZR Even After the Missing Park Adjustment?

Now, the first thing that jumps out at you is that there's no way the 2005-6 New York Yankees were both the worst fielding and best BABIP-pitching team in recent memory. They were certainly bad at the former and good at the latter, but the size of the numbers suggests that their UZR for those years was low, maybe way too low, and thus the data is giving their pitchers undeserved credit and Derek Jeter their fielders too much blame.

Equally suspicious are the '06-'07 Royals, who are the opposite. The '03 A's, another crazy good-fielding, bad pitching team, are also suspect.

In fact, if UZR were doing a perfect job of separating fielding from BABIP skill (which is precisely what it is attempting to do), these two tables would not correlate at all. In fact, they have a mild inverse correlation (-.18); you can predict the numbers in the second table to a mild but very significant degree by multiplying the first table by .16 and flipping the sign.

The correlation tells us that UZR is, on the whole, doing 82% of the job it claims to. However, if you remove the five cases already mentioned (they really are honking outliers on the chart of PA-UZR vs. BABIP), the correlation drops to -.05, so it might be fair to say that UZR, once adjusted for park, is 95% accurate at separating fielding from pitching except for 1 team in about 50, which it gets very wrong for some unknown reason. Obviously, looking at the 5 exceptions in further detail might give some insight into the system's weaknesses.

One Last Thing

The year-to-year correlation of PA-UZR is .447. The year-to-year correlation of staff BABIP skill is .440. They are absolutely as reliable as one another, and that means that staff BABIP skill is indeed skill.

The standard deviation of PA-UZR is 38 runs, while staff BABIP skill is 34 runs and the park factor is 24. If you want to assume the park, it's fair to say that BABIP is 55% fielding and 45% pitching (it's actually 53 / 47 but we love round numbers, and the 5 outliers pull the ratio down a tad). Observed BABIP is 40% fielding, 35% pitching, and 25% the park, on the nose. Think of BABIP as Barbie, and fielding is the bust, pitching is the hips, and the ballpark is the waistline.

Edited by Eric Van, 24 November 2009 - 09:54 AM.


#2 Eric Van


  • fails often, thus succeeds


  • 10,844 posts

Posted 31 October 2009 - 07:29 PM

The year to year correlation of UZR/150 for players who stayed with their team and played 800 innings or more in both seasons (you can go that low without any additional noise) is .53. For players who changed teams, it's .22. With a minimum of 400 innings in both seasons, it's .50 for the players who stayed with their team and .15 for players who changed teams. That's a pretty firm confirmation that UZR is still heavily biased by park effects.

The next step is seeing if adding my park adjustments improves the correlation for the players who changed teams (it should) and for the players who didn't (if it does, it means that 1-year park factors are meaningful, if it doesn't, it means they're noisy and we should try 3-year). And I'm going to see if giving more of the PAF to the OF than the INF causes a better year-to-year correlation.

#3 SumnerH


  • Malt Liquor Picker


  • 9,177 posts

Posted 31 October 2009 - 07:42 PM

QUOTE (Eric Van @ Oct 31 2009, 08:29 PM) <{POST_SNAPBACK}>
The year to year correlation of UZR/150 for players who stayed with their team and played 800 innings or more in both seasons (you can go that low without any additional noise) is .53. For players who changed teams, it's .22. With a minimum of 400 innings in both seasons, it's .50 for the players who stayed with their team and .15 for players who changed teams. That's a pretty firm confirmation that UZR is still heavily biased by park effects.


Is there a need to distinguish park effects vs. team defense effects? For instance, would a CF who went from one stadium of very similar dimensions/effects to another possibly see his numbers skewed if he was playing next to Manny and Adam Dunn on the first team and next to Willie Mays and Andruw Jones on the second?

#4 Eric Van


  • fails often, thus succeeds


  • 10,844 posts

Posted 01 November 2009 - 03:30 AM

QUOTE (SumnerH @ Oct 31 2009, 07:42 PM) <{POST_SNAPBACK}>
Is there a need to distinguish park effects vs. team defense effects? For instance, would a CF who went from one stadium of very similar dimensions/effects to another possibly see his numbers skewed if he was playing next to Manny and Adam Dunn on the first team and next to Willie Mays and Andruw Jones on the second?

Really good point -- I was already worrying about the extent to which non-park aspects of changing teams might add further variance -- but I think the bigger factor would be the scouting and coaching of positioning.

Looking at the correlations of the park-adjusted data will help separate the park and non-park factors, and we may also be able to identify teams that are good and bad at positioning their fielders. That will be real challenge, though, with just 150 guys who changed teams from one year to the next and had 400+ innings with both. (There another 75 guys traded mid-season with 800+ innings -- I've just decided to look each of them up and put their data in manually if they have 400 innings with both clubs.)

I'm also about to look at the players who stayed with their team but moved to a newly constructed park. The problem with that, though, is that a lot of the new stadiums have been built similar to their predecessors. And again it's a very SSS.

#5 Eric Van


  • fails often, thus succeeds


  • 10,844 posts

Posted 02 November 2009 - 08:16 AM

I'm continuing to work on this. What I've found in the last few hours is:

1) Adding in the PAFs I calculated does improve the year-to-year correlation of UZR, but only for players who did not change parks. The improvement is modest -- only 3% -- but I think it's very unlikely to be random. So the PAFs smooth the year-to-year variation for guys staying in the same park.

2) The correlations for guys changing parks (in the same year or in consecutive years) are much worse to begin with. For instance, OF with a minimum of 600 innings played in each park have an inverse correlation! There are only 235 seasons with a minimum of 400 innings in each park, and the PAFs actually make no difference for OF (correlation .15 either way) and make the INF somewhat worse (.19 vs. .23). While I'm disappointed by this finding, the evidence is that changing parks really does scramble fielding performance, especially in the OF.

3) While I thought at first that you could assign all the PAF to OF play, that's not true. Assigning the PAF other than equally tends to make the year-to-year correlations stronger no matter where you assign it. It is true that assigning it all to the OF makes the correlation stronger than assigning it all to the INF and this may suggest that the PAF is more due to OF play, but I think the safest bet is to stick with an equal distribution.

So I no longer think that Jason Bay was -1.4 UZR last year, but I still have him at -5.6 rather than -8.7.

4) I thought another interesting approach would be to look at players who came or left Fenway and what happened to them. The average INF who switched clubs to or from the Sox (minimum 400 innings for both teams) was 1.0 R/150 better with the Sox unadjusted, 3.7 adjusted, while the average OF (only 6 of 'em: Damon, Crisp, Manny, Drew, Bay, Nixon) was -2.9 without the adjustment and -0.6 with. The average player in general was 2.5 runs better with the Sox, but that's biased a bit high because there are more INF than OF. It turns out that if 85% of the Sox PAF is the OF and 15% is the INF, you get an equal improvement for INF and OF and an average improvement of 1.7 runs.

If this is correct, Bay was -2.4 last year, not -5.6. I'm not sure it is correct; it's based on tiny SS and highly irregular data.

Note that the coaching staff deserves credit for the average 1.7 improvement, if it isn't all noise. Multiplied by 7, it suggests that defensive positioning has been worth about +12 runs or 1.2 wins per season. It's based on 34,000 total innings, but there's huge variation among players; the odds of getting the average (unweighted) improvement per player randomly are 54%. Believe it if you want!

Edited by Eric Van, 02 November 2009 - 01:33 PM.


#6 Eric Van


  • fails often, thus succeeds


  • 10,844 posts

Posted 02 November 2009 - 02:03 PM

QUOTE (SumnerH @ Oct 31 2009, 07:42 PM) <{POST_SNAPBACK}>
Is there a need to distinguish park effects vs. team defense effects? For instance, would a CF who went from one stadium of very similar dimensions/effects to another possibly see his numbers skewed if he was playing next to Manny and Adam Dunn on the first team and next to Willie Mays and Andruw Jones on the second?

Now that I've got the numbers, there's no evidence that changing teams has any affect greater than changing parks.

Here are the year-to-year UZR correlations for players doing different things:

Correl
What 750 Inn 400 Inn
Same team & park, consec years 0.54 0.50
Same team, new park, consec years -0.27 -0.02
Different team, same year N/A 0.43
Different team, consec yrs 0.23 0.13


You can see that players traded mid-season have a decent correlation (.43) even with a low minimum innings) despite the change in parks. Players traded in the off-season have a dramatically lower correlation. That tells us that the year-to-year variation is large, even larger than the effect of changing parks.

The very small sample of guys who moved from one park to a new park while with the same team (including the Expos who moved to Washington) have inverse correlations! That is, they correlated less well than guys who changed both park and team. So there is no evidence for a separate team effect. The overall correlation for guys changing parks from one year to the next (whether or not they change teams) is .15 at 750 innings, .12 at 400.

Oh, and the park factors actually improve / smooth the data for guys traded mid-season, by 6%. It's just the already really uncorrelated data of the guys who have changed both park and year that they fail to improve.


#7 Eric Van


  • fails often, thus succeeds


  • 10,844 posts

Posted 03 November 2009 - 10:40 AM

I've satisfied myself that the missing PAF is equally distributed among all fielders.* Which would make sense if it is a result of a systematic, general miscalculation in the metric.

(*Adding in UZR on INF to the estimate of BABIP makes it correlate somewhat stronger with Park Factor, adding in UZR of OF makes it correlate somewhat weaker, which is what you'd expect if the 4 infielders were somewhat more responsible for the park adjustment error than the 3 INF. If the error were disproprtionately with the OF, the opposite would have happened.)

Here are year to year correlations of UZR, raw and adjusted:

Year to Year UZR Correlations
OF 750 Inn 400 Inn 750 Adj 400 Adj
Same park, next year .61 .54 .62 .55
Different park, same year N/A .40 N/A .48
Different park, next year .13 .06 .10 .06
INF
Same park, next year .45 .45 .46 .47
Different park, same year N/A .59 N/A .58
Different park, next year .17 .17 .09 .14


It does bother me that the park factors make the correlations for the players who changed parks from one year to the next worse, not better. But the unadjusted data makes even less sense! It's clear from the much larger sample of players who did not change teams that OF data is more consistent than INF data (which is a surprise at first, but not if you think about it. In general, the numbers for OF are higher, and really bad defensive INF tend to lose their jobs while bad OF get to keep theirs if they hit. So the data set for OF includes more very high or very low pairs). But for players who change teams, the INF show more consistency.

We can estimate the separate impact of changing park and changing year by subtracting relevant correlations. For instance, to estimate the importance of park, subtract "different park, next year" from "same park, next year." Here's what you get (using the minimum-400 inning data since there's no 750-inning data for players traded mid-season):

Estimates of Effect Sizes on UZR Consistency
Effect of Raw Adj
Park-OF .48 .49
Park-INF .28 .33
Year-OF .34 .42
Year-INF .42 .44


Changing parks scrambles OF data much more than INF data, which makes sense.

The estimates of the impact of changing years is really interesting. It's an important finding that guys who are traded mid-season have a much stronger correlation than guys traded in the off-season. That makes our estimate of the impact of changing years quite large. In the raw data, it was larger for INF than OF, which I can't make sense of. The adjusted data equalizes this, which may well be good. In general, the disruption caused by changing years is about equal to that of changing parks.

This finding of the real variation (not just statistical noise) between fielders from season to season is consistent with another very interesting finding which has been implicit in this whole thread. Players with as few as 750 innings per year (in the year with fewer innings) correlate just as strongly from year-to-year as players with 1200 or more. And you can go down to 400 innings before you start to see a significant degradation of the correlation.

This flies in the conventional wisdom that you need three years of fielding data to get a handle on a player -- in fact, I'm not sure that your projection based on 3 years of data would be much better than a projection based on 400 innings! That's because extreme changes in performance are real and not uncommon and will muck up any projection based on any amount of data. (I'm going to redo my projection algorithm, but the last time I looked there was no correlation going back more than 2 years, anyway). All these assertions about fielding stats appear to have been based on comparing the UZR year-to-year correlation with other stats assuming that the amount of real variation was similar.* It appears as if this is not the case. There appear to be many more fielders who really do collapse like Lowell this year or really do have career years like Coco in '07 than there are hitters who have comparable swings in performance. Fielders are tough to project not so much because the data is noisy but because they (not just their numbers) really are unpredictable.

Edit: *There have also been comparisons of the number of fielding chances in a season with the number of PAs, but that's a meaningless, eggs to oranges comparison. The easiest possible ball to hit (fastball right down the middle) actually gets hit for a HR a surprisingly low percentage of the time. The easiest possible ball to field gets turned into an out 99%+ of the time. The odds of success in fielding vary predictably enough with apparent degree of difficulty that Dave Pinto has had nice success modeling fielding probabilistically rather than with discrete zones; the relationship of success in hitting in a given PA with apparent degree of difficulty is profoundly more complex and vastly more unpredictable (this dichotomy is one of the things that makes the game fascinating; an extraordinarily unpredictable event is immediately followed by a much more predictable one). One fielding chance tells us vastly more about a fielder than one PA does for a hitter.

Edited by Eric Van, 03 November 2009 - 10:00 PM.


#8 Eric Van


  • fails often, thus succeeds


  • 10,844 posts

Posted 03 November 2009 - 09:36 PM

The final clincher on this is also the final clincher (as if we need another) that BABIP is a skill.

Something I discovered a few years ago but have only alluded to is that team BABIP allowed is not seemingly random (and hence purely defense, ballpark, and luck); it is a function of other pitching rates. The better the K and BB rates, the lower the BABIP allowed. But the correlations, though significant, were not profoundly so.

The correlation of straight DE to pitching stats (K% and BB%) is .26, p = .0003. BB% is 18% more important than K rate in predicting DE. There is a trend towards HR / Contact being predictive, but it's not close to being significant.

There is a trend (p=.11) for staffs to have higher K rates in easy defensive parks. I'll have to mull that one over. But it may be the reason why PADE correlates more weakly to pitching stats (r =.20, p =.008) than unadjusted DE does.

What we're really looking to do, though, is to subtract UZR from DE to get an estimate of true staff BABIP skill, and see how that correlates to pitching stats.

Using original, unadjusted UZR to estimate true BABIP, we get a very nice improvement in the correlation, and HRC suddenly becomes significant. The correlation is now .30, p = 0.00004. The only odd thing is that the significance of the K% factor is weak, at p = . 11 (BB% is .006, HRC is .004).

The big test: will using my Park Adjusted-UZR to get a True BABIP Skill improve this correlation? Yes, it does, in spades. The correlation is now up to .36, p = 0.0000004 (1 chance in 2.532 million of being random). And all three factors are terrifically significant with separate p's of .006, .002, and .002.

That tells me that my park adjustments to UZR really do work.

The formula to estimate the number of runs saved by staff BABIP skill is 324.6 * K% - 685.5 * BB% -1431.3 * HRC + 51.2. I'll next convert this to actual BABIP -- which will yield a formula which will tell us the BABIP you'd expect from an individual pitcher given his K and BB rates and his rate of HR allowed. Which will be cool beans.

You will note that the profound correlation between staff HR rate allowed and our highly sophisticated estimate of staff BABIP allowed completely flies in the face of various trendy pitching theories. If HR/FB were a fixed constant, you'd never see this, because that would mean that fly ball pitchers give up a higher BABIP when, in fact, the batting average on fly balls is much lower than the BA on ground balls. What's actually being demonstrated by this correlation is that bad pitchers give up both a higher BABIP and a higher HR per Contact (as well as striking out fewer and walking more batters).





#9 philly sox fan


  • SoSH Member


  • 9,741 posts

Posted 03 November 2009 - 09:53 PM

I have to say that this thread has been entertaining on a few different levels.

My first impression - and one that subsequent posts haven't really changed - is that everything you (and excel) are spitting out is forced from a foundation that we know not to be true. One year park factors, one year BABIP data, one year UZR data - none of these things would be used individually. There are strong reasons that they would not. But, but, but if you ignore those things and assume some others and keep grinding through excel functions you can generate r values and p value that make it seem like your onto something.

Over at the Book blog there's a link to a post combining UZR and the Fan Scouting Report. So when mgl ranted thusly:

QUOTE
As I have always said, if you EVER see me quote a one year number in the context of discussing someone’s current true talent or future value/performance, just shoot me on the spot. I’d rather quote nothing to tell you the truth (just in principle). And, at the very least, if you absolutely must quote a one-year number for anything, it is meaningless unless it is regressed given that one year sample. That is the thing that doubly pisses me off when they do that (quote one year numbers).


I couldn't help, but think of this thread.

You should run this work by mgl. Did we ever have a chat with him? Not sure if he can post here or not, but certainly would be fun to read the exchange. For one thing, I'm pretty sure that mgl would strongly disagree with the way you think you've derived BABIP skill from one year of UZR data. And that connection - simplifying assumption really - gets repeated a lot in this thread.

In general, I always think it's hugely important to spend more time thinking about why you're wrong, then why you (and those frisky excel r values) are unlocking pent up secrets. I'd like to see some more of that before moving onto the next great discovery.

Are you waiting to get someplace down the road before opening things up to wider peer review?

#10 Eric Van


  • fails often, thus succeeds


  • 10,844 posts

Posted 03 November 2009 - 10:58 PM

QUOTE (philly sox fan @ Nov 3 2009, 09:53 PM) <{POST_SNAPBACK}>
I have to say that this thread has been entertaining on a few different levels.

My first impression - and one that subsequent posts haven't really changed - is that everything you (and excel) are spitting out is forced from a foundation that we know not to be true. One year park factors, one year BABIP data, one year UZR data - none of these things would be used individually.

But I'm not. All of this work is based on the sum of an entire team's UZR performance, and it is, much more importantly, based on 240 such seasons. The conclusions are all based on hugely significant patterns you see across the 240 seasons.

Absolutely the one-year adjustments I've calculated are not "correct" the way a BA is correct, but just as certainly they are more correct than the unadjusted data. I agree that any one-year UZR figure must be taken with three grains of salt. What I've done here is identifty and remove one grain. And, again, I have to stress that I've done it by manipulating the largest possible data sets.

QUOTE
Over at the Book blog there's a link to a post combining UZR and the Fan Scouting Report. So when mgl ranted thusly:

Well, every fielding projection I've posted here is based on my own heavily regressed formulas (.5 * last year + .25 * 2 years ago is a great quick-and-dirty one). So in that sense I agree wholeheartedly with MGL. But he is vastly overstating the case for the amount of noise in UZR. The plain fact is that there is no significant drop in year-to-year correlation as you lower the bar for minimum innings played to way below the levels that MGL or Tango would consider viable. In fact, if I can find it again, there's one big data set where the correlation was stronger for guys who played less. I am very open to arguments as to why this might not disprove the CW about how big defensive samples need to be, but I can't think of one. And the argument that real year-to-year differences are behind the low correlation (rather than SSS noise) is driven home by the dramatically better correlation for guys traded mid-season than guys traded off-season -- despite the samples of the former being half the size!

QUOTE
You should run this work by mgl. Did we ever have a chat with him? Not sure if he can post here or not, but certainly would be fun to read the exchange. For one thing, I'm pretty sure that mgl would strongly disagree with the way you think you've derived BABIP skill from one year of UZR data.

What are you talking about? It's every UZR number that exists. Eight seasons, 30 teams a year. And that BABIP skill, as so derived, correlates profoundly with other pitching skill, more so than any other BABIP measure you could come up with.

QUOTE
In general, I always think it's hugely important to spend more time thinking about why you're wrong, then why you (and those frisky excel r values) are unlocking pent up secrets. I'd like to see some more of that before moving onto the next great discovery.

I would counter by saying that it's hugely important to set aside what you think everybody already knows when you read a study that attempts to demonstrate that the CW is in fact wrong. Your counter-argument here is entirely circular in that it uses as its only ammunition precisely what I think I've disproven (that UZR data is incredibly noisy). ("My first impression - is that everything you (and excel) are spitting out is forced from a foundation that we know not to be true." (Emphasis mine.) That's a horrendous way to approach any scientific paper. We don't know anything in the sense that you mean it. That's not the way science works. (I will cut you some slack, though: the key fact about the unexpected reliability of UZR is in the second post, even though it was in fact the first thing I looked at and it's the first thing I plan to present when I write this all up for publication). Note that I'm not assuming anything about the reliability of that data, one way or the other; I'm just examining it to see what it tells us. Your argument is "you can't do that with UZR data, it's too noisy." My rebuttal is, "umm, I just did it. *

*Quick edit: I can't stress this strongly enough. You're saying, "you can't do that with this data, because you can't do anything with this data, it's notoriously noisy. We know that!" But I've taken this supposedly incorrigible, useless data and found it to correlate to p = 0.0000004 to something I thought it ought to, when I started out with the study. (One of the main reasons I did this was to cook up a better K/BB/HR to true BABIP formula, since I already had a crude one that used raw BABIP unadjusted for defense.)

QUOTE
Are you waiting to get someplace down the road before opening things up to wider peer review?

Yeah, this is a work in progress. I actually want to derive projection formulas and publish the whole thing as "Fun With UZR" or the like.

I think you're a very sharp guy, philly. I don't think your skepticism was at all unwarranted given that I buried the lead (that when you eliminate guys who changed ballparks, UZR correlates really well all the way down to a minimum of 400 innings, which basically blows the entire CW about its unreliability out of the water) and that I didn't make it clear that I was using all the UZR data that exists (although that can absolutely be inferred by my reporting the findings for all 8 seasons). Try reading all this again. I don't think the findings are easily explainable a anything other than what I pass them off for. MGL would be the first person to admit that even Team UZR, let alone individual UZR, is noisy. Well, DE is not, it's a real stat like BA, and so are the PAFs that are used to turn it into PADE -- they are a real measurement of the difference in play-making rate, home and away, for MLB teams. And I have essentially found a hugely significant relationship between MGL's calculation of Team UZR and this home / road difference, which shouldn't be there if UZR were completely park-adjusted. I would welcome an alternative explanation, but I sure can't think of one.

Edited by Eric Van, 03 November 2009 - 11:11 PM.


#11 absintheofmalaise


  • too many flowers


  • 8,711 posts

Posted 23 November 2009 - 04:01 PM

It looks like this topic is up for discussion over on Tango's site.

#12 Eric Van


  • fails often, thus succeeds


  • 10,844 posts

Posted 24 November 2009 - 09:58 AM

Some interim revised findings (see the edit of the first post for why revision was needed!) ...

You can add (unadjusted) UZR Rng and Err to get the fielding component of DE. (I should have done that to begin with rather than using total UZR.) After converting DE to runs above / below average, you can subtract the fielding from DE to give you the "staff BABIP skill and luck" component.

The two should not correlate, but they do, r = -0.17, p < .01. However, this correlation is entirely caused by the five outlier teams I mentioned in the first post. If you remove them, the correlation is now r = .02, p = .8. This suggests that UZR has no systematic problem but just got those teams screwy for some reason.

You can also compare the year-to-year change in fielding for each club with the year-to-year change in BABIP skill+luck, as another test of UZR's robustness. Again, if UZR is grabbing all of the real fielding performance, the two should not correlate. And they do not (r = .02, p = .75), even with the outlier teams included.

The standard deviation of the fielding component is 30.6. Of the BABIP-skill component, 30.3. So they are equal in magnitude.

Here are the Y2Y correlations of the team fielding component (all 30 teams vs. their previous year performance):

.48, .47, .53, .33, .41, .32, -.017.

You can see that in the last four years that teams are doing more revampimg of defense, and that last year so many teams overhauled their defense that it wiped out the correlation completely.

Any DIPSophile would tell you that the leftover part, the team BABIP skill+luck component, would have a lot less robust Y2Y correlation, even though it includes the park effect. Here they are:

.50, .32, .52, .59, .60, .30, .47.

We know that there is some luck included here. So the question is: how much of this is driven by the park, and how much by a persistence of staff BABIP skill? Based on the work I've done so far, I think the latter component will be surprisingly large. There is, after all, a robust correlation between staff BABIP and other pitching rates -- which is made more robust if you use this defense-subtracted measure of BABIP rather than the raw number.

I'm eager to complete this work. When I first looked at player Y2Y correlation with the original UZR data that MGL made available as a spreadsheet, I couldn't find any predictive value at all for three year's previously (which again flies in the face of the CW); the best projection was .48 * (Year-1) + .25 * (Year -2). I am very curious to see what comes out of the wash now that we have much more data to play with.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users