I thought I would take a stab at focusing things a bit in an attempt to bring into perspective what the data abs has posted says. Some of this will be overly-descriptive for some of you quant-jocks, but hopefully it will be instructive for others.
Specifically, I believe there is reason to believe that Buchholz’s poor performance this season is not a product of bad luck or mere chance.
To this end, we can use the spin data and use the 2500RPM breaking point to construct a binomial that we can analyze to consider what the probability that Buchholz’s recent small proportion of balls rotating above 2500RPM is the product of chance. I believe this is an acceptable approach because, for starters, we know that pitchers attempt to put more spin on the ball*, and that more spin is more movement and more movement is good (and the strong positive correlation between changes in spin and outcomes per pitch from 2010 to 2012 bear this out). Also, comparing 2010 to 2012, the data shows that there is a strong positive correlation between deviations in spin by pitch and the usage rate for that pitch. Finally, while four games is a small sample, 392 pitches is a reasonably big sample** (although that they occur on four different days only is an obvious potentially confounding variable). The 66 some odd four-seamers he has thrown is a smaller one, but reasonably big , especially if we are operating on the assumption that Buchhols is intentionally trying to put high rates of spin on the ball.
I decided to test his high RPM proportion (HRPM%) for both total pitches thrown and just for four-seam fastballs. Although it might seem like only the four-seamers are relevant, the fact that he’s throwing them less frequently—probably because of the lower spin rates he’s achieving—makes looking at them as a function of total pitches interesting. Part of the strength of abs's approach of looking at spin is that it is not only fielding independent, but largely batting independent except for the matter of pitch-selection. In other words, we're looking as directly at only what the pitcher is doing.
To do this, then, I set his HRPM% of 15.5% for his total pitches in 2010 as the null hypothesis for over-all pitches, and the HRPM% of 55.2%(!) for his four-seamer in 2010 for fastballs (Note: This excludes data from 4/17/10 and 9/27/10 which for some reason won’t come up on BrooksBaseball.) The reason for this is I am trying to test whether or not the 2010 version of Buchholz “still exists” in the sense that Buchholz can still throw today like he did in 2010. So I want to measure the probability—the likelihood that—Buchholz2010 would produce a “sample” of pitches with the proportion of HRPM% pitches that he has in 2012. The lower the probability, the more safely we may reject the null hypothesis and conclude that he ain’t that guy anymore.
Buchholz’s HRPM% for 2012 have been 3.1% for total pitches and 18% for just four-seam fastballs.
OK, not looking good so far. (As abs has pointed out, the data is such that this really isn't necessary, but whatever--it bets working.)
Anyway, here is the key data and results for those who are interested in such things.
Spin-Rate Significance Test
| HRPM%: Total Pitches | | | HRPM%: Four-Seamer | |
| H(null): p = | 15.5% | | H(null) p= | 55.2% |
| p(TP) | 3.1% | | p(FF) | 18.1% |
| Deviation | 12.4% | | Deviation | 37.2% |
| n | 391 | | n | 66.47 |
| s.e. | 0.0183 | | s.e. | 0.0610 |
| test-stat | -6.7768 | | test-stat | -6.0918 |
| Sig. | p < .0001 | | Sig. | p < .0001 |
I lost my bookmark for an on-line p-calculator that can go beyond 6 standard errors deviation from the mean, but the probability is below .0001. Like, a lot lower.
This would mean that it is incredibly unlikely that the HRPM% we have witnessed would occur if Buchholz was the kind of pitcher he was in, from a spin standpoint, that he was in 2010.
Except for one thing: after doing this, I realized that the conditions for using the statistics to analyze a binomial distribution have not met because
he hasn’t yet managed to throw the requisite 15 HRPM pitches needed to safely consider the sample distribution as approaching normal.In other words, he’s hit 2500RPM too few times to even analyze it in this way. Which is probably way worse than I had initially thought.
*Those who are not druid freaks like Wakefield anyway.
**There appears to be a discrepancy in the pitch counts on BrooksBall between the total pitches thrown in the outcome data on Buchholz’s Player Card and the game logs, so I am using the lower estimates for both total pitches and four-seam fastballs, which is the more conservative approach.
Edited by Reverend, 28 April 2012 - 06:52 PM.