Yeah it would be totally sweet to have some kind of statistical test that would compare two distributions' kurtosis (and, if we're going to be really rigorous about it, we should test for skewness too, shouldn't we?). But of course, we know that R/G can't be normal since it's a counting stat, so we can't just do some stupid F-test to compare the variances. Yes, as you have well explained, what we really need is some non-parametric test to compare the kurtosis (and, just to make me feel better, skewness) of two distributions. I wish some really smart Russian had actually put some thought in to how to do that, because it could be a really useful test, not just for baseball but for a bunch of different fields, including politics and business. I bet if some really smart Russian - or, actually, this happens a lot, TWO really smart Russians - had come up with a test like that, it would be taught in statistics classes!
Oh well, maybe if you have some free time you can come up with something off the top of your head.
Completely missing the point much?
No, I didn't say we needed to compare the kurtosis of two distributions. I explained in some detail that we don't know jack about the kurtosis (and skewness) of MLB run distributions, in terms of whether their difference from one another is random or meaningful. Is there any tendency for teams with a very peaked distribution to have a peaked distribution the next season? I don't believe anyone's ever studied that.
Yes, your smart Russians tell us that the Sox run distribution was not unusual when compared to other MLB run distributions. That's not the question I was asking.
And, yes, the run distribution has regularized. It also seems as if they've also had some good offensive games against good pitchers (Blanton), and some bad offensive games against bad pitchers (Matusz), neither of which I remember seeing earlier in the season, although of course that memory may be highly subject. Which is why I wanted to actually look at the numbers.
The reason I started this thread is because I was curious as to whether the strange run distribution the Sox had early in the season was because they were consistently beating up on crap pitching while not hitting good pitching, which was the simplest explanation for the wide spread in values. If that is the case, that's an interesting phenomenon whether or not it has predictive value, because we are interested in the game of baseball.
(I mean, does that ever
happen, over a short period of time, to a degree that looks like it might not be completely random? I don't know.) And it is certainly an interesting phenomenon even if the same distribution of RS per game could have been produced at random without the phenomenon. That the latter is true does not tell you that the phenomenon is not present.
And your reference distribution may itself include the phenomenon.
Here's an analogy. A band goes on tour, playing small clubs, and they play to either empty houses or packed houses, nothing in between. Very strange. And we're curious why. You grab some tour attendance figures for some bands of equal stature and tell us that there's absolutely nothing unusual about that sort of distribution, it happens all the time in a tour that short
. Tour a bit longer and you're bound to see some half-filled houses.
But this doesn't address any of the questions we're curious about. Why is it that some nights the house is empty, and some nights packed? Is it the amount of college radio airplay? The presence or absence of a preview article in the local paper? Or is it much closer to what you'd call "random," e.g., the weather, whether local sports teams have an important game, or a host of other factors unrelated to the music biz? These are all factors that influence the reference distribution
. And we don't know jack about that. We don't know whether there are bands that get consistent airplay in every city and hence don't see many empty or packed houses, and other bands that get ignored in some cities while getting massive airplay in others. We don't know whether some bands are very reliant on press previews while others have fans that are largely illiterate. All interesting questions that are not addressed at all by simply reporting that the observed pattern isn't the least bit unusual in a sample of the given size. Is it not unusual because of the weather? Or because there are lots of bands dependent on college radio airplay and we're looking at one of them, and in a tour this short you might hit three terrible cities and three great ones, just at random?
I want to know the answer to the baseball equivalents of these questions. I really don't give a shit about statistics except as a tool to answer those questions. And I can't think of anything more wrong-headed than running a single stat test and then concluding that we should cease our curiosity about those questions because the one stat test tells us that the questions are uninteresting. Maybe to you they are. My condolences, really.