Probability and Baseball: 90% of this is true, the other half is false

Adrian's Dome · Aug 12, 2018

Sam Ray Not said:
The Ringer's pre-season top six teams, and their current games behind the Red Sox:

1. HOU -11.5
2. NYY -9.5
3. LAD -20.5
4. CLE -17.5
5. WAS -23.5
6. CHI -16

"Science"

slamminsammya · Aug 12, 2018

Adrian's Dome said:
"Science"

I know this is your thing but those rankings were old fashioned humans picking whichever teams felt right.

NoXInNixon · Aug 12, 2018

Sam Ray Not said:
The Ringer's pre-season top six teams, and their current games behind the Red Sox:

1. HOU -11.5
2. NYY -9.5
3. LAD -20.5
4. CLE -17.5
5. WAS -23.5
6. CHI -16

To be fair, they did say there was very little separation between #5 and #7.

Rasputin · Aug 13, 2018

NoXInNixon said:
To be fair, they did say there was very little separation between #5 and #7.

Turns out the separation it between 1 and everyone else.

JohntheBaptist · Aug 13, 2018

Adrian's Dome said:
"Science"

So not only do you still not get what you're mocking here, you didn't read the article and it isn't an example of the thing you're trying to mock here.

I think coming off last year, around 5-7 as an educated guess is extremely fair. This team had a bunch of question marks--it speaks to how great they've played.

Reverend · Aug 13, 2018

JohntheBaptist said:
So not only do you still not get what you're mocking here, you didn't read the article and it isn't an example of the thing you're trying to mock here.

I think coming off last year, around 5-7 as an educated guess is extremely fair. This team had a bunch of question marks--it speaks to how great they've played.

He also mocked the idea of anyone hooking up with a ten which, while a kinda crappy way of making his point, also ignores things like that a board member is literally married to Gwenyth Paltrow.

teddywingman · Aug 13, 2018

Is Mookie a 10 or just a 9? That patchy beard has me leaning towards a 9. Or maybe a 9.5.

Edit: I'll go with a soft 10.

brandonchristensen · Aug 13, 2018

Reverend said:
also ignores things like that a board member is literally married to Gwenyth Paltrow.

What

JohntheBaptist · Aug 13, 2018

brandonchristensen said:
What

It's me, he means me.

Adrian's Dome · Aug 13, 2018

JohntheBaptist said:
So not only do you still not get what you're mocking here, you didn't read the article and it isn't an example of the thing you're trying to mock here.

I think coming off last year, around 5-7 as an educated guess is extremely fair. This team had a bunch of question marks--it speaks to how great they've played.

Computers, people, whatever. The point is the same: projections and rankings are BS. There's literally no point to any of them.

tims4wins · Aug 13, 2018

Adrian's Dome said:
Computers, people, whatever. The point is the same: projections and rankings are BS. There's literally no point to any of them.

Fun. They are fun. They are something to talk about. Apparently you hate fun.

JohntheBaptist · Aug 13, 2018

Adrian's Dome said:
Computers, people, whatever. The point is the same: projections and rankings are BS. There's literally no point to any of them.

I wonder if you even realize what you're saying here.

Reverend · Aug 13, 2018

JohntheBaptist said:
I wonder if you even realize what you're saying here.

If you think about it, the games themselves don’t actually prove who is better—they’re just sims we run to get some data on the issue.

tonyarmasjr · Aug 13, 2018

Adrian's Dome said:
Computers, people, whatever. The point is the same: projections and rankings are BS. There's literally no point to any of them.

Curious, are you meaning all types of projection models and rankings? Or just baseball? Or just a certain kind of baseball ones?

Adrian's Dome · Aug 13, 2018

tonyarmasjr said:
Curious, are you meaning all types of projection models and rankings? Or just baseball? Or just a certain kind of baseball ones?

Baseball. And it's not that I don't think they can be "fun", I'm just sick of people quoting them (mostly the computer-generated ones) as anything factual or substantive, like the playoff odds or predicted future winning percentages. Those numbers are thrown around often and they're meaningless. They're inaccurate and you're better off doing your own analysis based off actual records relative to run differentials and divisional quality.

dcmissle · Aug 13, 2018

Adrian's Dome said:
Baseball. And it's not that I don't think they can be "fun", I'm just sick of people quoting them (mostly the computer-generated ones) as anything factual or substantive, like the playoff odds or predicted future winning percentages. Those numbers are thrown around often and they're meaningless. They're inaccurate and you're better off doing your own analysis based off actual records relative to run differentials and divisional quality.

Well, we’ll always have 2011.

But if that were not enough, that bloody mess in the corner is named ELO. Its missing limbs are basketball tournament, soccer match and football prognostications gone awry. The blushing fellow in the corner opposite is named DVOA, who has yet to account, in a prognosticating sense, for fellows who have hit injured reserve and for the wonderful concept of “matchups.”

You typically get what you pay for.

JohntheBaptist · Aug 13, 2018

Adrian's Dome said:
Baseball. And it's not that I don't think they can be "fun", I'm just sick of people quoting them (mostly the computer-generated ones) as anything factual or substantive, like the playoff odds or predicted future winning percentages. Those numbers are thrown around often and they're meaningless. They're inaccurate and you're better off doing your own analysis based off actual records relative to run differentials and divisional quality.

So one projection system is ok but the other isn't? Why?

In any event, in my experience, they're always "thrown around" here as a basis for discussion; they tell you what they see based on a set of data and are fodder for consideration.

When the models said on Sept 1 2011 that the Red Sox had, whatever, a 4% chance of collapsing, didn't it feel like the longest possible odds playing out in real time? It sure did to me. That's all the 4% is saying. Not that it can't happen, or won't.

dcmissle · Aug 13, 2018

JohntheBaptist said:
So one projection system is ok but the other isn't? Why?

In any event, in my experience, they're always "thrown around" here as a basis for discussion; they tell you what they see based on a set of data and are fodder for consideration.

When the models said on Sept 1 2011 that the Red Sox had, whatever, a 4% chance of collapsing, didn't it feel like the longest possible odds playing out in real time? It sure did to me. That's all the 4% is saying. Not that it can't happen, or won't.

99.78% chance of making the postseason, Sept. 4, 2011. And the way rounding works …

https://www.si.com/more-sports/2011/09/29/greatest-collapsesever

JohntheBaptist · Aug 13, 2018

dcmissle said:
99.78% chance of making the postseason, Sept. 4, 2011. And the way rounding works …

https://www.si.com/more-sports/2011/09/29/greatest-collapsesever

And?

dcmissle · Aug 13, 2018

JohntheBaptist said:
And?

The "64 Phillies were the gold standard when I was a kid. They are in the lowest quadrille, on a percentage basis, of that SI list of collapses. More notably, there have been 13 epic collapses since, as of the publication date of the article.

If your team is breaking bad and there are tangible reasons for it, PECOTA & Co. are fairly regularly cold comfort.

Adrian's Dome · Aug 13, 2018

JohntheBaptist said:
So one projection system is ok but the other isn't? Why?

In any event, in my experience, they're always "thrown around" here as a basis for discussion; they tell you what they see based on a set of data and are fodder for consideration.

When the models said on Sept 1 2011 that the Red Sox had, whatever, a 4% chance of collapsing, didn't it feel like the longest possible odds playing out in real time? It sure did to me. That's all the 4% is saying. Not that it can't happen, or won't.

And what I'm saying is that that "4%" is meaningless whereas you're saying it's a real number. Carl Crawford doesn't drop that easy soft liner against the Orioles that I could've freaking caught and all of a sudden the system was correct with it's 96% estimate? It literally cannot factor in any one of a million real, actual variables that happen on and off baseball fields that have real, actual effects on what happens. That's what I've been saying all along. BP says this, Fangraphs says this, and it's all completely irrelevant guesswork.

On-field results. Standings. Run differentials. Those are measures of what has happened (and include said variables,) and are factual.

Long story short, sorry I'm skeptical of theories that literally go against everything on-field results have been telling us, and were both brutally incorrect pre and mid-season.

Oh, they tell us now that the Sox have a 99.8% chance of making the playoffs? Great. That's reassuring. I watched 2004, 2011, and 2013. I watched the 2001 Mariners (and Yankees, for that matter) lose. I know it drives the numbers-driven insane, but things routinely happen, and often times they are completely unexpected and unexplainable.

JohntheBaptist · Aug 13, 2018

dcmissle said:
The "64 Phillies were the gold standard when I was a kid. They are in the lowest quadrille, on a percentage basis, of that SI list of collapses. More notably, there have been 13 epic collapses since, as of the publication date of the article.

If your team is breaking bad and there are tangible reasons for it, PECOTA & Co. are fairly regularly cold comfort.

Cold comfort? They're approximating the chances something like that could happen. It didn't mean it couldn't happen.

Adrian's Dome said:
And what I'm saying is that that "4%" is meaningless whereas you're saying it's a real number. Carl Crawford doesn't drop that easy soft liner against the Orioles that I could've freaking caught and all of a sudden the system was correct with it's 96% estimate? It literally cannot factor in any one of a million real, actual variables that happen on and off baseball fields that have real, actual effects on what happens. That's what I've been saying all along. BP says this, Fangraphs says this, and it's all completely irrelevant guesswork.

On-field results. Standings. Run differentials. Those are measures of what has happened (and include said variables,) and are factual.

Long story short, sorry I'm skeptical of theories that literally go against everything on-field results have been telling us, and were both brutally incorrect pre and mid-season.

Oh, they tell us now that the Sox have a 99.8% chance of making the playoffs? Great. That's reassuring. I watched 2004, 2011, and 2013. I watched the 2001 Mariners (and Yankees, for that matter) lose. I know it drives the numbers-driven insane, but things routinely happen, and often times they are completely unexpected and unexplainable.

This doesn't make any sense.

The projections make an educated guess on the chances of "x" happening and "y" happening. Of course it isn't perfect and cannot account for unforeseen events--it doesn't claim to.

It was a less than 1% chance that the 2011 Red Sox collapsed. So, not impossible. They were long odds, it felt like it, and they came through. Where are you getting "real number" and that I'm "saying" that? Go back and read the post you've quoted again. No one posts these as anything other than, again, data points and context for a current reality.

I genuinely do not get anyone having an issue with that outside fundamentally not understanding what they're claiming to "tell" you.

Of course unlikely things often come true? Calling them "unlikely" is the exact same thing as saying "it has a 3% chance of happening."

Adrian's Dome · Aug 13, 2018

JohntheBaptist said:
Calling them "unlikely" is the exact same thing as saying "it has a 3% chance of happening."

No, because saying "this has a 3% chance of happening" implies that you know what could or will happen and what all the potentials are and that's my issue. It's like saying "I give Gonzalez a 4% shot of getting a base hit against Rivera here". Say something like that and you'd get laughed at, how are the projections different?

Especially given the systems are all "predicting" different things, and nobody seems to have any interest in the actual past accuracy (or lack thereof) when it comes to them, but they still look at the numbers as if they're meaningful.

They aren't.

Unpredictable things happen in baseball all the time for reasons that often can't be quantified or explained.

InstaFace · Aug 14, 2018

Adrian's Dome said:
Computers, people, whatever. The point is the same: projections and rankings are BS. There's literally no point to any of them.

All models are wrong. Some models are useful.

Adrian's Dome said:
And what I'm saying is that that "4%" is meaningless whereas you're saying it's a real number. Carl Crawford doesn't drop that easy soft liner against the Orioles that I could've freaking caught and all of a sudden the system was correct with it's 96% estimate? It literally cannot factor in any one of a million real, actual variables that happen on and off baseball fields that have real, actual effects on what happens. That's what I've been saying all along. BP says this, Fangraphs says this, and it's all completely irrelevant guesswork.

...I know it drives the numbers-driven insane, but things routinely happen, and often times they are completely unexpected and unexplainable.

I think you have a big void in your education where statistics should have gone. The mere notion of what it means to say, "that event has a 4% probability of occurring" seems to be whooshing right past you. The collection of discrete events that we lump into an outcome (like, you know, "a game") is far from guesswork, and aggregation, while it eliminates some precision, does not leave only noise behind in its wake.

Now, look, that's OK that you're not statistics-literate. I mean that sincerely, not in a patronizing way. There's shit I don't get at fucking all, like colors that "match" or "clash" in an outfit. Or, say, manga. Or when to type less. But I'm aware it's a thing, and universally regarded as a thing, and I don't mind if people are talking about it (even if I find it banal, even unlistenable). I've recognized that the "fault" is my own, or my own tastes, and that the other people aren't "wrong". You may want to adopt the same pose as regards statistics, because they're just not clicking for you, and it appears you lack the patience to seek an understanding of them - but continuing to blame others for that is, well, unseemly.

Lose Remerswaal said:
The Dentist tried to put together a ceremony [for the 1918 centennial] but none of the players RSVP'D

If SoSH were still in its prime, someone would make a burner account called Zombie Tris Speaker, all just to reply talking about how he ate Harry Frazee's braaaaaiiinnnsss.

jon abbey said:
@Dahabenzapple2 can testify that I was totally calm as he called me right after it ended. Winning four times in five seasons really makes it hard to get upset about anything baseball-related for the next 15 or 20 years after that, at least if you are a somewhat rational fan (contradiction in terms, I know). (Some) Patriots fans can maybe understand this at this point, although I'm not holding myself up as typical in any way.

I totally get you. I was mildly disappointed after we lost the Super Bowl this year. What was there to be upset about? Better team played a better game, exposed our well-known weaknesses, Brady dropped the fucking ball and got strip-sacked. It wasn't against any sort of a rival or team we had history with. I suppose it was a close game, but nothing stung like 2007-08 (or the 2006 AFCCG, or the 1996-97 SB for that matter, and the farce that followed).

I'm not sure any of those mitigants would have applied to being on the wrong side of the 2004 ALCS though. I'd give you a Zen Master award, but you'd refuse it anyway.

Reverend · Aug 14, 2018

Adrian's Dome said:
No, because saying "this has a 3% chance of happening" implies that you know what could or will happen and what all the potentials are and that's my issue. It's like saying "I give Gonzalez a 4% shot of getting a base hit against Rivera here". Say something like that and you'd get laughed at, how are the projections different?

Especially given the systems are all "predicting" different things, and nobody seems to have any interest in the actual past accuracy (or lack thereof) when it comes to them, but they still look at the numbers as if they're meaningful.

They aren't.

Unpredictable things happen in baseball all the time for reasons that often can't be quantified or explained.

You don't understand probability. Like, as a basic concept.

Loudly.

Devizier · Aug 14, 2018

caveat: People assume normal distributions *far too often* when computing statistics

With that out of the way, they are a very useful way to think about what we mean by probability. There are incredibly intuitive ways to demonstrate them. I always liked the pachinko/bean machine way of making normal distributions real. There's a great example in Boston's own Museum of Science:

Cesar Crespo · Aug 14, 2018

Devizier said:
caveat: People assume normal distributions *far too often* when computing statistics

With that out of the way, they are a very useful way to think about what we mean by probability. There are incredibly intuitive ways to demonstrate them. I always liked the pachinko/bean machine way of making normal distributions real. There's a great example in Boston's own Museum of Science:

As someone who used to play 16 poker games at once when online poker was a thing, people also drastically underrate just how often something happens when it's only supposed to have 3-4% of the time. If you see 1 million hands, you are going to see a lot of crazy things happening. You are also going to see some crazy stuff happen over the course of 190,000 PA. I've lost with 4 of a kind Kings, with pocket kings.

BuellMiller · Aug 14, 2018

InstaFace said:
If SoSH were still in its prime, someone would make a burner account called Zombie Tris Speaker, all just to reply talking about how he ate Harry Frazee's braaaaaiiinnnsss.

But why would Tris Speaker go for Harry Frazee's brains? He was traded before Frazee bought the team. OTOH Zombie Harry Hooper or Zombie Duffy Lewis would work, though.

uk_sox_fan · Aug 14, 2018

bosox79 said:
As someone who used to play 16 poker games at once when online poker was a thing, people also drastically underrate just how often something happens when it's only supposed to have 3-4% of the time. If you see 1 million hands, you are going to see a lot of crazy things happening. You are also going to see some crazy stuff happen over the course of 190,000 PA. I've lost with 4 of a kind Kings, with pocket kings.

To quad aces or a straight flush?

Average Reds · Aug 14, 2018

InstaFace said:
Now, look, that's OK that you're not statistics-literate. I mean that sincerely, not in a patronizing way. There's shit I don't get at fucking all, like colors that "match" or "clash" in an outfit. Or, say, manga. Or when to type less. But I'm aware it's a thing, and universally regarded as a thing, and I don't mind if people are talking about it (even if I find it banal, even unlistenable). I've recognized that the "fault" is my own, or my own tastes, and that the other people aren't "wrong". You may want to adopt the same pose as regards statistics, because they're just not clicking for you, and it appears you lack the patience to seek an understanding of them - but continuing to blame others for that is, well, unseemly.

Your entire response was great, but this paragraph, with an emphasis on the bolded, was sublime.

Cesar Crespo · Aug 14, 2018

uk_sox_fan said:
To quad aces or a straight flush?

straight flush.

charlieoscar · Aug 14, 2018

Reverend said:
You don't understand probability. Like, as a basic concept.

TaDa! There are a lot of people on this board who don't understand probability theory.

Savin Hillbilly · Aug 14, 2018

Adrian's Dome said:
No, because saying "this has a 3% chance of happening" implies that you know what could or will happen and what all the potentials are and that's my issue.

As one non-math-geek to another, I'd offer that one way to understand the bit in quotes is as a statement not about what will happen, but rather about how surprised we should be if it does happen. Surprising stuff does happen. Just not as often as unsurprising stuff, which is why it's surprising. So "Event X has a 3% chance of happening" is not in any way a denial that Event X could happen--in fact it's a positive affirmation that it could happen (otherwise the probability would be 0%). But it would be very surprising if it did.

Lose Remerswaal · Aug 14, 2018

charlieoscar said:
TaDa! There are a lot of people on this board who don't understand probability theory.

You can't just say that

NobodyInteresting · Aug 14, 2018

pk1627 said:
Here’s the timeline.

One month ago
Models: So sorry. The Yankees will win the division. SoS, organizational depth (and they’re only a few games back in the loss column).

Now
Models: you know nothing about probability.

Of course, it's thinking that this is what the models are saying that is causing the people who understand what the models are actually saying to suggest that the people who don't understand what the models are saying don't understand what the models are saying.

If you see what I'm saying.

tims4wins · Aug 14, 2018

I have a question about the playoff odds. Like the 99.78% figure quoted for 2011. Is that figure saying that, historically, teams up that many games with that many games to go have made the playoffs at a 99.78% rate? (like how the run expectancy matrix is based on history). Or is it saying that based on projections for the rest of the season, the Sox make the playoffs in 99.78% of the simulations?

InstaFace · Aug 14, 2018

BuellMiller said:
But why would Tris Speaker go for Harry Frazee's brains? He was traded before Frazee bought the team. OTOH Zombie Harry Hooper or Zombie Duffy Lewis would work, though.

It's the only plausible explanation for his subsequent business dealings. YOU CANNOT DISPROVE THIS THEORY.

SirPsychoSquints · Aug 14, 2018

tims4wins said:
I have a question about the playoff odds. Like the 99.78% figure quoted for 2011. Is that figure saying that, historically, teams up that many games with that many games to go have made the playoffs at a 99.78% rate? (like how the run expectancy matrix is based on history). Or is it saying that based on projections for the rest of the season, the Sox make the playoffs in 99.78% of the simulations?

The latter. There haven't been 10,000 of the same scenarios in the past.

BuellMiller · Aug 14, 2018

InstaFace said:
It's the only plausible explanation for his subsequent business dealings. YOU CANNOT DISPROVE THIS THEORY.

But the models said it only had a 4% chance of being true.

Reverend · Aug 14, 2018

InstaFace said:
It's the only plausible explanation for his subsequent business dealings. YOU CANNOT DISPROVE THIS THEORY.

Can we model it?

InstaFace · Aug 14, 2018

NobodyInteresting said:
Of course, it's thinking that this is what the models are saying that is causing the people who understand what the models are actually saying to suggest that the people who don't understand what the models are saying don't understand what the models are saying.

If you see what I'm saying.

Give this man a membership. He knows how to say 100 words when 10 will do. He'll fit in great around here.

tims4wins said:
I have a question about the playoff odds. Like the 99.78% figure quoted for 2011. Is that figure saying that, historically, teams up that many games with that many games to go have made the playoffs at a 99.78% rate? (like how the run expectancy matrix is based on history). Or is it saying that based on projections for the rest of the season, the Sox make the playoffs in 99.78% of the simulations?

It's saying the latter, because we simply don't have enough sample size of seasons in each relative league position to draw statistically useful predictions for the former. So they run a simulation of the rest of the season 10,000 times, estimating the probability of each team winning each game, and see where things end up at the end of each run. It's a subset of the Monte Carlo techniques, if you really want the full theory behind it.

Whereas, with the run expectancy matrix, there are far fewer input states (base/out situations, vs standings and remaining schedule vs near or far-placed competitors), fewer output states (runs scored in the inning, vs final standings and tiebreakers), and orders-of-magnitude more data (18 data points per game played in the majors, vs one data point per team per season). So you can build a historical expectation off of it. Accounting for more variables makes it harder (e.g., is it a late inning, and will that make it more likely that specialist relievers come in?), but with enough data you can do so in a way that improves the predictions' accuracy.

The number of situations in the real world where we have enough homogeneous data to do historical modeling are very few (mostly because context changes so often that the noise swamps the signal), so you see lots of other approaches (like monte carlo simulations) tried for lack of a better alternative.

Devizier · Aug 14, 2018

My main issue with projections is that variance is not (directly) included in them. Since variance is essential for understanding precision, leaving it out is unfortunate at best (suspicious at worst). After all, the variance and the distribution model is how you determine % likelihood in the first place.

Cuzittt · Aug 14, 2018

Split out from the Best Red Sox team thread.

Reverend · Aug 14, 2018

Cuzittt said:
Split out from the Best Red Sox team thread.

Great. Now we’ll never know if this team is any good.

Max Power · Aug 14, 2018

Reverend said:
Great. Now we’ll never know if this team is any good.

I think there's a pretty good chance that they are.

slamminsammya · Aug 14, 2018

Thank god.

Is anyone here familiar with the term "not even wrong"? There have been some paradigmatic examples of that in this discussion.

I thought the "an improbable event happened math is invalid" line of argument was retired cerca 2008 among baseball fans.

Buzzkill Pauley · Aug 14, 2018

slamminsammya said:
I thought the "an improbable event happened math is invalid" line of argument was retired cerca 2008 among baseball fans.

I thought that it was put to rest rather late in the evening, on 10/20/2004.

tims4wins · Aug 14, 2018

SirPsychoSquints said:
The latter. There haven't been 10,000 of the same scenarios in the past.

InstaFace said:
Give this man a membership. He knows how to say 100 words when 10 will do. He'll fit in great around here.

It's saying the latter, because we simply don't have enough sample size of seasons in each relative league position to draw statistically useful predictions for the former. So they run a simulation of the rest of the season 10,000 times, estimating the probability of each team winning each game, and see where things end up at the end of each run. It's a subset of the Monte Carlo techniques, if you really want the full theory behind it.

Whereas, with the run expectancy matrix, there are far fewer input states (base/out situations, vs standings and remaining schedule vs near or far-placed competitors), fewer output states (runs scored in the inning, vs final standings and tiebreakers), and orders-of-magnitude more data (18 data points per game played in the majors, vs one data point per team per season). So you can build a historical expectation off of it. Accounting for more variables makes it harder (e.g., is it a late inning, and will that make it more likely that specialist relievers come in?), but with enough data you can do so in a way that improves the predictions' accuracy.

The number of situations in the real world where we have enough homogeneous data to do historical modeling are very few (mostly because context changes so often that the noise swamps the signal), so you see lots of other approaches (like monte carlo simulations) tried for lack of a better alternative.

Thanks. I figured as much, but wanted to be sure.

nighthob · Aug 14, 2018

charlieoscar said:
TaDa! There are a lot of people on this board who don't understand probability theory.

There is a 72.9% chance that this is true.

Probability and Baseball: 90% of this is true, the other half is false

Member

Member

Member

Member

Will outlive SeanBerry

Member

for king and country

Looks like Zach Galifianakis

Loves Aaron Judge

Member

Member

PN23's replacement

Member

for king and country

Member

Member

Deflatigator

Member

Deflatigator

Member

Deflatigator

Member

Member

Member

The Ultimate One

for king and country

Member

79

New Member

Member

Member

79

Member

loves the secret sauce

Experiencing Furry Panic

Member

PN23's replacement

The Ultimate One

Member

New Member

for king and country

The Ultimate One

Member

Bouncing with Anger

for king and country

thai good. you like shirt?

Member

Member

PN23's replacement

Member