Probability and Baseball: 90% of this is true, the other half is false

Rasputin

Will outlive SeanBerry
Lifetime Member
SoSH Member
Oct 4, 2001
29,507
Not here
As someone who used to play 16 poker games at once when online poker was a thing, people also drastically underrate just how often something happens when it's only supposed to have 3-4% of the time. If you see 1 million hands, you are going to see a lot of crazy things happening. You are also going to see some crazy stuff happen over the course of 190,000 PA. I've lost with 4 of a kind Kings, with pocket kings.
To build off this, I got 7 royal flushes before I stopped counting, lost with straight flushes, and once had trip 3s v trip 2s on the flop only to turn a 2 and river a 3.

Everything that can possibly happen will happen eventually.

I mean, this is a team that did something that had literally never happened before in a hundred years of the game.
 

lexrageorge

Member
SoSH Member
Jul 31, 2007
18,226
I think there are a couple of factors that are causing some of these projections to be misunderstood.

First, it's not clear that all of the probabilities are indeed "accurate". For example, the 0.4% chance of the Sox missing the playoffs in 2011. It's possible that the tail ends of the distribution could be a bit larger than what the normal distribution (or whatever distribution was used to make that estimate) would predict.

The other issue has to do with the projection systems reporting a number (e.g., projected wins, or projected OPS) that is presented as being far more precise than what the inherent error bars would indicate. Some system projects the Sox to win 89 games, and the media reports that as the most likely outcome, and Shank writes a column recapping all of the Sox free agent mistakes. What's not reported is that the most likely range is somewhere between 83 and 95 wins, with significant chances of the team being outside that range. And if the error bars are that large (I honestly don't know), then it does call into question whether the projection system is really meaningful or useful; an 83 win season would have a very different feeling from a 95 win season.

The above problems are exacerbated, IMO< by proprietary projection systems, which are sometimes so opaque that there is no good way for outsiders to critique what went wrong and therefore find potentially useful info about the projection in question.
 

williams_482

Member
SoSH Member
Jul 1, 2011
391
The other issue has to do with the projection systems reporting a number (e.g., projected wins, or projected OPS) that is presented as being far more precise than what the inherent error bars would indicate. Some system projects the Sox to win 89 games, and the media reports that as the most likely outcome, and Shank writes a column recapping all of the Sox free agent mistakes. What's not reported is that the most likely range is somewhere between 83 and 95 wins, with significant chances of the team being outside that range. And if the error bars are that large (I honestly don't know), then it does call into question whether the projection system is really meaningful or useful; an 83 win season would have a very different feeling from a 95 win season.
This is a really tricky thing to work out, especially because "what actually happened" isn't a reliable indicator either. Hell, right now there's a very good case to be made that the Rays (60-58, BaseRuns record of 66-52) are a substantially stronger team than the Mariners (69-51, BaseRuns record of 60-60), but the Mariners have a strong shot at a wildcard berth while the Rays are hopelessly buried. You get a handful of significant over- and under-performers every year, and that's strictly backwards looking, making no attempt to project how each and every player is going to perform in the future.

Forget the projection system itself, trying to figure out how player performance is going to translate into wins is highly imprecise.

As a case in point, these are the Fangraphs projected standings on March 28th 2018. The Yankees are projected to win 94.4 games (an obviously impossible number), but more accurately, 50% of the time they would win between 90 and 99 games. There's a not-quite-perfect bell curve of possible outcomes which clearly indicates that it was seen as possible but improbable that the yankees could finish with 80 wins, or 110. This isn't even taking into account the inherent volatility of the projections themselves, merely the most likely outcomes for a true talent .583 baseball team over the course of 162 games.
 

tims4wins

PN23's replacement
SoSH Member
Jul 15, 2005
37,461
Hingham, MA
This is a really tricky thing to work out, especially because "what actually happened" isn't a reliable indicator either. Hell, right now there's a very good case to be made that the Rays (60-58, BaseRuns record of 66-52) are a substantially stronger team than the Mariners (69-51, BaseRuns record of 60-60), but the Mariners have a strong shot at a wildcard berth while the Rays are hopelessly buried. You get a handful of significant over- and under-performers every year, and that's strictly backwards looking, making no attempt to project how each and every player is going to perform in the future.

Forget the projection system itself, trying to figure out how player performance is going to translate into wins is highly imprecise.

As a case in point, these are the Fangraphs projected standings on March 28th 2018. The Yankees are projected to win 94.4 games (an obviously impossible number), but more accurately, 50% of the time they would win between 90 and 99 games. There's a not-quite-perfect bell curve of possible outcomes which clearly indicates that it was seen as possible but improbable that the yankees could finish with 80 wins, or 110. This isn't even taking into account the inherent volatility of the projections themselves, merely the most likely outcomes for a true talent .583 baseball team over the course of 162 games.
OMG that Orioles distribution. They are in like the bottom 0.1% of outcomes or something like that.
 

williams_482

Member
SoSH Member
Jul 1, 2011
391
As an example of variance within the projections, here is an article from march about the most and least confident projections made by Steamer. Even for the best and most consistent players, there are very large gaps between 90th and 10th percentile projected performance, indicating not that the projections are too stupid to figure out how good these guys are, but that the projections understand that there is a long history of players running into good or bad luck on balls in play, suffering major or minor injuries, forgetting how to throw to 1st base, etc, which makes any player a candidate to take a noticeable step forward or fall off a cliff.
 

charlieoscar

Member
Sep 28, 2014
1,339
You can't just say that
Of course I can say it, I just did.

Whether I should have said it is open to discussion and I don't think a lot of people understand that when someone talks about the law of large numbers, x is approaching infinity. And I don't think people realize that most of the standard statistical math functions are based on normal distribution and not everything in baseball is normally distributed (it might be close enough so you get a reasonable approximation, but not where you can say it's 100%). There is too much thinking that something should be "n" when it really is "n plus or minus something."
 

lexrageorge

Member
SoSH Member
Jul 31, 2007
18,226
Of course I can say it, I just did.

Whether I should have said it is open to discussion and I don't think a lot of people understand that when someone talks about the law of large numbers, x is approaching infinity. And I don't think people realize that most of the standard statistical math functions are based on normal distribution and not everything in baseball is normally distributed (it might be close enough so you get a reasonable approximation, but not where you can say it's 100%). There is too much thinking that something should be "n" when it really is "n plus or minus something."
I think where we would disagree is when you said "a lot of people" on this forum don't understand probability theory. I can tell you, that on this forum more than any other I frequent, there is a very solid understanding of probability. And statistics. And there is a HUGE gap between here and 2nd place.

Sure, there is a distribution of understanding among posters here, and lots of discussion (and disagreement) on how to apply these tools. But this is one of the "go to" hobby forums when it comes to having a discussion on stats and probs.
 

pokey_reese

Member
SoSH Member
Jun 25, 2008
16,315
Boston, MA
I think that it's also disingenuous to fault projection systems because of the assertion that no one goes back and evaluates them for accuracy after the fact. They certainly do, and one of the things I look forward to most from Fangraphs and BP each year are the articles that look at what the projection systems got right and wrong. I mean, you can say that ERA is a better measurement of what actually happened than FIP or xFIP, but it also does something different on purpose. If a system like xFIP is adjusted to normalize a pitcher's home run rate, it doesn't mean that the system is pretending that those home runs didn't happen (and thus show up in the ERA), it's saying that pitchers have a limited ability to impact HR/FB rates within a normal range and therefore a guy who gave up 6 home runs over two starts shouldn't be expected to continue doing so.

Data people do better when we are forced to account for things that go wrong in our projections, but it doesn't mean that the approaches are worthless because they sometimes go wrong. If you want to say 'no one should ever try to predict who will do well over the course of a baseball season,' fine, but it isn't necessarily an indictment of the methodology or accuracy of those who do choose to try and predict things.
 

Reverend

for king and country
Lifetime Member
SoSH Member
Jan 20, 2007
64,533
As an example of variance within the projections, here is an article from march about the most and least confident projections made by Steamer. Even for the best and most consistent players, there are very large gaps between 90th and 10th percentile projected performance, indicating not that the projections are too stupid to figure out how good these guys are, but that the projections understand that there is a long history of players running into good or bad luck on balls in play, suffering major or minor injuries, forgetting how to throw to 1st base, etc, which makes any player a candidate to take a noticeable step forward or fall off a cliff.
I don't suppose you have a good primer on the radically different meanings for statistical reality of the small differences in semantics in English that people fuck up all the time?

Like, I'm 85% confident in this result ≠ this explains 85% of the variance ≠ it's 85% likely this will happen ≠ it's 85% that this is accurate ≠ &tc?

I think a baseball focused primer on the specific meaning of statistical expressions (no pun intended, math jerks) could be invaluable. I also know that I can't even keep Type I and Type II error straight without materials in front of me half the time, so I'm not the guy to do it.

But now that we have this thread... that might be a really useful thing to go through, is anyone else is interested.

Like, a lot of people who are interested in the systems don't know what the syntax means, which is just an informational asymmetry that we can correct, yeah?
 

pokey_reese

Member
SoSH Member
Jun 25, 2008
16,315
Boston, MA
Useful reference:

https://www.stat.berkeley.edu/~stark/SticiGui/Text/gloss.htm

Like, I'm 85% confident in this result ≠ this explains 85% of the variance ≠ it's 85% likely this will happen ≠ it's 85% that this is accurate ≠ &tc?
I think this is a great example of the kind of things stats people take for granted. So using it:

an r-squared value of .85 commonly means that A (or combined set of 'A's in a multi-variate situation) explains 85% of the variance in B, in a relational sense

a confidence interval generally has a few components: a predicted value, an upper bound, a lower bound, and a percentage; as the percentage (the confidence level) increases, the bounds naturally move apart, as the expected range that the predicted value will fall in. It works logically, in that if I want to predict the value of the stock market tomorrow, I could say I'm 95% certain that the Dow will be between 24,500 and 26,000 because I know where the value is today, and I know that it almost never moves more than 500-1000 points in one day. If I want to predict the value of the stock market a year from now, either my bounds (24,500 and 26,000) need to move much further apart, or my confidence level (95%) needs to shrink significantly. I could say that I'm 5% certain that the Dow will still be within my original bounds, or that I'm 95% certain that it will be between 16,000 and 30,000 because the market can move a lot in a year. But the important thing to realize is that your confidence level and your bounds are related, and changing one should always involve changing the other, unless you have gotten some new information (i.e., your population/sample data set has changed).

If someone says 'it's 85% likely this will happen' they are almost certainly referring to Bayesian priors (also the basis for most Markov systems), which basically means that out of 100 previous observations, the specified outcome occurred 85 times, and we have no reason to think that the pattern won't continue to hold. This usually an over-simplification and something to watch out for unless it is a very stable system, like a manufacturing process where they know that the incidence of bad outcomes per batch tends to fall into a very narrow range.
 

DJnVa

Dorito Dawg
SoSH Member
Dec 16, 2010
54,125
Look, can I still say the weatherman on TV was wrong if he said there was only a 10% chance of rain and it rained?
 

dcmissle

Deflatigator
Lifetime Member
SoSH Member
Aug 4, 2005
28,269
Look, can I still say the weatherman on TV was wrong if he said there was only a 10% chance of rain and it rained?
Yes, if on the morning of the forecast you were worried about rain but told you are a moron for doing so.

And that’s what happens many times when these projections are wielded as a sword. The assertion is not, *everything that can happen will happen eventually.*

It’s — “Listen, cretin’, don’t come into this thread ruinin’ my summah because PECOTA has us at 99.78% on September 4, 2011. So go back into your hole.”

And if your response had been, “well PECOTA had us only at 92-70 back in February” — only two games off the actual record ...

(http://www.espn.com/mlb/hotstove10/insider/news/story?id=6120583)

The rejoinder would have been, “that was then, this is now, now go back in your hole ...”

The smartest-kid-in-the-class snark jumps out of the historical record demonstrating the value of archived threads.
 

Sampo Gida

Member
SoSH Member
Aug 7, 2010
5,044
I read somewhere that the Standard Error of a teams preseason projection is +/- 6 Wins. A projection of 94 wins with 95% confidence (2xSD) is considered accurate if the team wins between 82 and 106 games.

Thats why they play the games
 

slamminsammya

Member
SoSH Member
Jul 31, 2006
9,423
San Francisco
Yes, if on the morning of the forecast you were worried about rain but told you are a moron for doing so.

And that’s what happens many times when these projections are wielded as a sword. The assertion is not, *everything that can happen will happen eventually.*

It’s — “Listen, cretin’, don’t come into this thread ruinin’ my summah because PECOTA has us at 99.78% on September 4, 2011. So go back into your hole.”

And if your response had been, “well PECOTA had us only at 92-70 back in February” — only two games off the actual record ...

(http://www.espn.com/mlb/hotstove10/insider/news/story?id=6120583)

The rejoinder would have been, “that was then, this is now, now go back in your hole ...”

The smartest-kid-in-the-class snark jumps out of the historical record demonstrating the value of archived threads.
I can only speak to this most recent argument about projection.

There are good and interesting arguments to be had about projection in baseball, and forecasting the future, the meaning of probability, and statistics, and all that stuff. None of this is settled.

Just like there are interesting arguments to be had about the existence of god. But the skeptical side in this debate is more "tide goes in - tide goes out. You can't explain that" than it is Critique of Pure Reason.
 

jaytftwofive

New Member
Jan 20, 2013
1,182
Drexel Hill Pa.
Don't forget Derosa, Billy Ripken,Reynolds,Plesac and others on MLB saying that if the Yankees get a good starting pitcher at the trade deadline the division is over. Well they got 2 and it looks like the divisions over, lol.
 

dcmissle

Deflatigator
Lifetime Member
SoSH Member
Aug 4, 2005
28,269
Ohhhhh, ok, I get it now. Someone was mean to you.
The downside of being admonished that you worry too much is exceeded by the amusement of watching people Ordway articles of faith like so many abandoned children.

I regard all this stuff as a useful data point, but nothing more. How useful depends on the sport and the system.
 

Al Zarilla

Member
SoSH Member
Dec 8, 2005
59,323
San Andreas Fault
Don't forget Derosa, Billy Ripken,Reynolds,Plesac and others on MLB saying that if the Yankees get a good starting pitcher at the trade deadline the division is over. Well they got 2 and it looks like the divisions over, lol.
When did they say that? Yesterday, DeRosa said “what do the Red Sox have, 120 wins?” Kiddingly, of course.
 

lexrageorge

Member
SoSH Member
Jul 31, 2007
18,226
Yes, if on the morning of the forecast you were worried about rain but told you are a moron for doing so.

And that’s what happens many times when these projections are wielded as a sword. The assertion is not, *everything that can happen will happen eventually.*

It’s — “Listen, cretin’, don’t come into this thread ruinin’ my summah because PECOTA has us at 99.78% on September 4, 2011. So go back into your hole.”

And if your response had been, “well PECOTA had us only at 92-70 back in February” — only two games off the actual record ...

(http://www.espn.com/mlb/hotstove10/insider/news/story?id=6120583)

The rejoinder would have been, “that was then, this is now, now go back in your hole ...”

The smartest-kid-in-the-class snark jumps out of the historical record demonstrating the value of archived threads.
Ohhhhh, ok, I get it now. Someone was mean to you.
Some of us are still scarred by a poster here declaring Kyle Weiland one of the best pitchers ever, once you removed all of his bad innings. It was one of many bad things to happen to us in 2011.
 

JohntheBaptist

Member
SoSH Member
Jul 13, 2005
11,410
Yoknapatawpha County
I regard all this stuff as a useful data point, but nothing more.
No one posts these as anything other than, again, data points and context for a current reality.
In any event, in my experience, they're always "thrown around" here as a basis for discussion; they tell you what they see based on a set of data and are fodder for consideration.
I guess I don't see your point. You bellyached about the 2011 team collapsing, someone responded with that number, and because you ended up being right, seven years later, you needed to pull a discussion about whether the concept of probability itself is worthy of pitchforks whenever broached off track to some navel-gazing?

They're a data point, yes. That was my point.
 

Reverend

for king and country
Lifetime Member
SoSH Member
Jan 20, 2007
64,533
Well, as do most questions here on SoSH, it comes down to exactly how many monkeys you've got.
Certainly, but surely it must have happened at least once, even if it was a long time ago in a galaxy far, far away, yeah?
 

Buzzkill Pauley

Member
SoSH Member
Jun 30, 2006
10,569
Certainly, but surely it must have happened at least once, even if it was a long time ago in a galaxy far, far away, yeah?
I can think of one example of monkeys writing the complete works of William Shakespeare happening.

Closer by to here than that, too.
 

Reverend

for king and country
Lifetime Member
SoSH Member
Jan 20, 2007
64,533
I can think of one example of monkeys writing the complete works of William Shakespeare happening.

Closer by to here than that, too.
I take your meaning, but the idea that even the authorship, much less the species of authorship, would not also be points of contention on SoSH strike me as imaginative.
 

EdRalphRomero

wooderson
SoSH Member
Oct 3, 2007
4,481
deep in the hole
Certainly, but surely it must have happened at least once, even if it was a long time ago in a galaxy far, far away, yeah?
You know, I kinda feel the same way about The Phantom Menace and Attack of the Clones as I do about Coriolanus and Timon Of Athens. In both cases I used to know a lot about the subject matter, it brought me no joy, and now I do my best to forget they exist.