Jump to content


Yo! You're not logged in. Why am I seeing this ad?

Photo

Why is Baseball Bounded by .400/.600?


This topic has been archived. This means that you cannot reply to this topic.
51 replies to this topic

#1 MentalDisabldLst


  • Prefers spit over lube


  • 13427 posts

Posted 18 June 2009 - 12:00 AM

It's tangential to the larger point of EV's analysis, but:

QUOTE
I'm not cognizant of such a structural (vs an emotional, psychological) advantage in the other sports you mentioned. But, in pro baseball, almost all teams win between 40% and 60% of their games...while in the other sports the gaps between the top and bottom teams are huge. Some NFL teams win/lose 1 or 2 games in an entire season!

I've always thought it would be interesting to get the STDEV of the distribution of WPCT of teams across various leagues. In football and basketball, you're right, winning percents of .700 or even .800 are not uncommon for the best teams in the league, and conversely .300 down to .200 isn't that uncommon either, whereas it's an extraordinary year in baseball when even a single team is outside of .350-.650.

What's more interesting to me about that factoid is why that is. My hypothesis would be that a team's superiority or inferiority to another team is more "repeatable" in other sports (well, at least MLB/NHL vs NFL/NBA), in that a top-5 team in the league will beat a bottom-5 team in the league 9 times out of 10 in basketball (And probably 95 times out of 100 in football), whereas it's rare that a season series for two teams in the same division is more lopsided than 13-5 (72%-28%) either way. A basketball or football team is more likely, it seems, to have better talent win out over the course of a single game - whereas it takes a 162-game season in baseball for the cream to rise to the top.

I have to assume that's either due to (A) a significant talent disparity between the best and worst coaching of NFL/NBA teams, (B) more discrete events (possessions/plays) over which to measure a team's skill, © better achievement of parity in MLB (doubtful considering the STDEV of WPCT probably hasn't changed much from the game's early days), or (D) a narrower gap in MLB/NHL between the merely-average players at the major league level vs the stars, and a greater gap between average/stars in the NFL/NBA.

Maybe it's a discussion perhaps better suited to General Sports, but knowledge of that disparity (assuming it exists) probably should affect how one feels about season-long performance - i.e., there's a lot more noise in baseball's results, so don't take a game or two difference in the season as seriously, or something along those lines.

#2 Eric Van


  • Kid-tested, mother-approved


  • 10990 posts

Posted 18 June 2009 - 01:10 AM

QUOTE (MentalDisabldLst @ Jun 17 2009, 11:00 PM) <{POST_SNAPBACK}>
I have to assume that's either due to (A) a significant talent disparity between the best and worst coaching of NFL/NBA teams, (B) more discrete events (possessions/plays) over which to measure a team's skill, © better achievement of parity in MLB (doubtful considering the STDEV of WPCT probably hasn't changed much from the game's early days), or (D) a narrower gap in MLB/NHL between the merely-average players at the major league level vs the stars, and a greater gap between average/stars in the NFL/NBA.

What would be the StDEv of Win% in the NFL if playing QB were so stressful that it required four weeks of rest, and every team had 5 of them that worked in rotation? And if (for some crazy reason) the matchups got scrambled so that you were essentially equally as likely to be facing [best QB on average team] with your 5th stringer as Tom Brady?

It is entirely about the pitching rotation. Every team, no matter how good or bad, is much better with their ace starting and much worse with their #5.


#3 Myt1


  • thinks tim thomas is a dick-fil-a


  • 18963 posts

Posted 18 June 2009 - 01:16 AM

MDL, you raise a really, really cool issue for discussion. You can tell I think so because I used 2 "reallys."

There are a few things about baseball that are very different from most other team sports.

IMHO, the most important is the individual nature of baseball, and the discreteness of players and their impact. In basketball, the best player in a game can more easily dominate the outcome because he is more likely to be involved in the vast majority of the game, and because he can influence ever possession, and has to dominate fewer players. There are also only 5 players on the court at any given time, and no more than (I can't remember the rule) 16 active per game, with the vast majority of playing time going to a relative few. In baseball you have 8 starters and 5 starting pitchers, and any number of relievers.

In hockey the most dominant player is active for less time than basketball, but may still dominate the opposition. However, goaltender play is a remarkable influence on a team's W-L record. Goaltenders are basically the anti-pitchers. Instead of being the center of attention every 5 days, they are the center of attention all the time. A strong season performance by a tender is incredibly valuable.

Football has a few positions that are often more relevant to a teams chances to win. However, it is also the sport likely most susceptible to the influence of one man, the head coach. While the players certainly play, the QB tries to kill your team while the DE tries to kill him, while the LT tries to stop it, none of the big 4 rely on disparate gameplans and preparation like football does.

Further, baseball is a streaky game that is really deliberately normalized over the huge, 162 game season. The other three sports are much less so, especially football.

Edited by Myt1, 18 June 2009 - 01:18 AM.


#4 drleather2001


  • given himself a skunk spot


  • 13924 posts

Posted 18 June 2009 - 09:26 AM

I think the time factor has something to do with it as well.

A mammoth performance during any single part of the game in Basketball (a 24-3 point run, say), Football (going up 24-0 in the 1st Quarter), or Hockey (scoring 3 goals in 10 minutes) is harder to overcome because there is a time limit, and teams can tailor their strategies to preserving that advantage with an eye on the clock. In baseball, there is no time factor working against the disadvantaged team. Obviously, outs are a concern, but while it is very difficult, and highly unlikely, to come back from a moderate deficit such as 4-0, or 6-1, late in a baseball game, it is virtually impossible to come back from a proportional deficit in any of the other sports.

Therefore, it seems to me, that superior teams in the other sports can rest more easily on a short period of dominance (and often will, creating lopsided records) while in baseball (as others have pointed out), a solid performance is never safe because it can be undone by a single inning of ineptitude.

Edited by drleather2001, 18 June 2009 - 09:30 AM.


#5 Morgan's Magic Snowplow


  • SoSH Member


  • 8704 posts

Posted 18 June 2009 - 09:34 AM

The standard deviation on the team level of Winning Percentage is not as high in baseball because the standard deviation on the individual level of "Ability to Influence Game" is not as high either. Even the best players are worth only 5-10 wins over the course of a 162 game season. Lebron James might be worth 30 wins over the course of an 82 game season.

Also, I think myt's point is very important. Its not just that no baseball player is that much more likely to be better in any given game than most of his peers, its also that one player can't have as big an impact by design.

Edited by Morgan's Magic Snowplow, 18 June 2009 - 09:37 AM.


#6 OilCanShotTupac


  • Not Clowning Around


  • 8270 posts

Posted 18 June 2009 - 09:39 AM

I think it has a lot to do with the elusiveness of peak performance in baseball.

Hitting and pitching are both extraordinarily difficult skills, for which it's almost impossible to maintain top performance over a long period of time. Even excellent hitters and pitchers are prone to underperform their level of skill quite often. Johan Santana just got absolutely slaughtered by the Yankees. That's not rare. An equivalent in basketball would be LeBron James putting up a 2-for-27. Just doesn't happen as much.

In other sports, the team with the better athletes is going to win most of the time, because athletic ability translates directly into results on the field more readily than it does in baseball, IMO. The "better" baseball team is just not going to win as often, because it's so easy to fail at baseball compared to other sports.



#7 PrometheusWakefield


  • SoSH Member


  • 6522 posts

Posted 18 June 2009 - 09:43 AM

QUOTE (Myt1 @ Jun 18 2009, 02:16 AM) <{POST_SNAPBACK}>
There are also only 5 players on the court at any given time, and no more than (I can't remember the rule) 16 active per game, with the vast majority of playing time going to a relative few.

Twelve. I don't think that's really a relevant criteria though. In football obviously the numbers are huge, but football has the greatest differential between the best team WP% and the worst. That could be because of small sample size, although my suspicion is that it is not - I suspect if you matched the 2004 New England Patriots against the 2008 Detroit Lions 162 times, the Patriots would go something reasonably close to 162-0. While close football games often come down to whether a single fumble ends up bouncing towards a defensive or offensive player, a clearly superior football team can dominate like nothing else.

I think the reason baseball tends towards the middle is directly related to everything we've been discovering about BABIP. There is just so much luck in the result of the batter-pitcher confrontation; the hitter who hits a line drive double and the hitter who hits the line drive out are really only distinguished by result, rather than ability. The difference between a .250 hitter and a .300 hitter, after all, only matters in 1 out of 20 at bats (or, as Crash Davis said, "just one extra flare a week - just one - a gorp… you get a groundball, you get a groundball with eyes… you get a dying quail, just onemore dying quail a week"). There is certainly luck in basketball and football as well, of course. But ultimately, either your big man can outmuscle the other guys big man for the rebound or he can't; either your starting corner can read the WR's move and keep up with his route or he can't, and if the answer to those questions is no, it's probably going to be no fairly consistantly.

#8 sachilles


  • Rudy-in-training


  • 630 posts

Posted 18 June 2009 - 09:44 AM

I'm not sure if football is worth comparing. Their number of games is minuscule compared to other sports. I would think if you compared the first 20 games of any pro sport you would find a pretty wide variation in winning percentage.

#9 Max Power


  • thai good. you like shirt?


  • 2126 posts

Posted 18 June 2009 - 09:48 AM

QUOTE (Myt1 @ Jun 18 2009, 02:16 AM) <{POST_SNAPBACK}>
IMHO, the most important is the individual nature of baseball, and the discreteness of players and their impact. In basketball, the best player in a game can more easily dominate the outcome because he is more likely to be involved in the vast majority of the game, and because he can influence ever possession, and has to dominate fewer players. There are also only 5 players on the court at any given time, and no more than (I can't remember the rule) 16 active per game, with the vast majority of playing time going to a relative few. In baseball you have 8 starters and 5 starting pitchers, and any number of relievers.


This is exactly Eric's point. There's only one player who can dominate a baseball game, the starting pitcher. Everyone else has almost the exact same number of chances to impact the game (lineup position notwithstanding). The reverse is also true. In many sports the defense can focus on a particular player and take him out of the game. In baseball you have to pitch to everyone or give up a base.

I think luck is a much, much larger factor in baseball than other sports. If a quarterback throws a perfect pass, 99 out of 100 times the receiver will catch it and something positive will happen. If a pitcher makes a perfect pitch, a great hitter can still hammer it. A batter can put a perfect swing on a ball and hit a screaming line drive right at a fielder. When BABIP for line drives, the optimal thing for a batter to do, is only .700, it tells you everything you need to know about why the best teams only win 70% of their games.

#10 Hendu for Kutch

  • 3491 posts

Posted 18 June 2009 - 10:04 AM

I think there's a degree of randomness and luck in baseball that isn't quite as prevalent in the other sports. For example, a pitcher can throw the exact same pitch to the exact same batter 10 times and get ten different results.

Let's say Josh Beckett is pitching with 1 out and the bases loaded. He throws a 3-2 fastball on the inside corner. We'll consider this as established.

Now, Beckett's role in this situation is done, and for the sake of this argument we'll say the Red Sox' role in this situation is done (let's ignore fielding for right now).

In one case that pitch is called strike 3. In another, the umpire doesn't give him the call and it's Ball 4, scoring a run. Or perhaps the batter turns on the pitch and screams a groundball down the 3rd base line, scoring 3 runs. Maybe he does this, but hits it 4 feet to the right and it's an easy double play for the 3rd baseman, ending the inning. Or any number of different scenarios that range from a doube play to 4 runs scored.

This scenario of random chance exists on literally every pitch thrown during a major league season. There's just no way for anyone, no matter how good, to escape from it unscathed. It's the same reason pitchers don't go 30-0 or have an ERA of 0.00, no matter how good they are. The random distribution of luck is like a black hole, pulling the very best teams and the very worst teams back towards the .500 mark.

Certainly there's elements of luck in any sport. Baseball is just on a different level though. It's the same reason that a community college can beat the Tigers in a spring training game, but Duke would get creamed by the Celtics and Florida State would lose by over 50 to the Patriots.

Edit: I need to learn to type faster, Prometheus and Max Power covered a lot of this.

Edited by Hendu for Kutch, 18 June 2009 - 10:07 AM.


#11 dcmissle


  • SoSH Member


  • 11750 posts

Posted 18 June 2009 - 10:04 AM

QUOTE (sachilles @ Jun 18 2009, 10:44 AM) <{POST_SNAPBACK}>
I'm not sure if football is worth comparing. Their number of games is minuscule compared to other sports. I would think if you compared the first 20 games of any pro sport you would find a pretty wide variation in winning percentage.


But even if you run the football numbers over several seasons, they seem to hold up at the extremes if not to the same extent. The Colts have won at least 75% of their games over the past 6 seasons; on the other side of the ledger, the Lions have won a little over 25%. In each case, you're talking about 96 games. And this is in a sport lauded/derided for its competitiveness/mediocrity, where a hard salary cap, scheduling and the draft are designed to keep as many teams in the playoff hunt as long as possible and thus nudge every team to 8 and 8.

#12 PrometheusWakefield


  • SoSH Member


  • 6522 posts

Posted 18 June 2009 - 10:21 AM

QUOTE (Eric Van @ Jun 18 2009, 02:10 AM) <{POST_SNAPBACK}>
It is entirely about the pitching rotation. Every team, no matter how good or bad, is much better with their ace starting and much worse with their #5.

But even Pedro Martinez only wins 68.4% of his decisions. That's damned good for baseball - a team that had a rotation full of Pedro Martinez would win an average of 111 games per season, a historically great figure. But the Patriots won 68% of their decisions last year and missed the playoffs. A basketball team who wins 68.4% of their decisions wins 56 games - good enough for the #4 seed in the East or the #2 seed in the West, but nothing remarkable. And that's the most dominant pitcher in recent baseball history.

#13 sachilles


  • Rudy-in-training


  • 630 posts

Posted 18 June 2009 - 10:37 AM

Is there any significance to how MLB orients its playing schedule? MLB typically has 3 game series(with exceptions) through out the season, where the other major sports only have series play in the playoffs.

#14 glennhoffmania


  • Miracle Whipper


  • 8382989 posts

Posted 18 June 2009 - 10:45 AM

QUOTE (dcmissle @ Jun 18 2009, 11:04 AM) <{POST_SNAPBACK}>
But even if you run the football numbers over several seasons, they seem to hold up at the extremes if not to the same extent. The Colts have won at least 75% of their games over the past 6 seasons; on the other side of the ledger, the Lions have won a little over 25%. In each case, you're talking about 96 games. And this is in a sport lauded/derided for its competitiveness/mediocrity, where a hard salary cap, scheduling and the draft are designed to keep as many teams in the playoff hunt as long as possible and thus nudge every team to 8 and 8.


My first thought was sample size. Then I read your post and thought it made sense and I was wrong. But don't you have to look at every season individually? There is a lot of player movement in the NFL and teams turn over fairly quickly. So by taking one team and looking at their 6 year record, are you really eliminating the sample size issue?

I think it's more about the extent to which different positions contribute to wins. Take Manning off the Colts and they could easily be a .500 team. How many baseball players account for a 25% difference in winning percentage?

#15 dcmissle


  • SoSH Member


  • 11750 posts

Posted 18 June 2009 - 10:53 AM

I think the QB can have an awful lot to do with it. Somebody touched on it above -- if you have a great one, it's akin to your staff ace starting every game.

#16 sachilles


  • Rudy-in-training


  • 630 posts

Posted 18 June 2009 - 11:07 AM

QUOTE (dcmissle @ Jun 18 2009, 11:04 AM) <{POST_SNAPBACK}>
But even if you run the football numbers over several seasons, they seem to hold up at the extremes if not to the same extent. The Colts have won at least 75% of their games over the past 6 seasons; on the other side of the ledger, the Lions have won a little over 25%. In each case, you're talking about 96 games. And this is in a sport lauded/derided for its competitiveness/mediocrity, where a hard salary cap, scheduling and the draft are designed to keep as many teams in the playoff hunt as long as possible and thus nudge every team to 8 and 8.

Given that their regular season is 16 games, it is a very small sample size. Extrapolating it over several years creates a few challenges.
1) In the NFL, you play your division mates twice per season. This leaves you 10 games to face the remaining 28 teams. The result is you do not play every team every year.
2)The rest of the sports mentioned for comparison have a long enough schedule, that they can play every other opponent in the league. The admitted flaw with that is the AL vs the NL in MLB.
This is why I think it is tough to include the NFL into this comparison. I think the more appropriate comparison if you want to include the NFL is to take a random sample of 16 games from the other sports, but I can see where there might be flaws there as well.

#17 Harry Hooper


  • SoSH Member


  • 13760 posts

Posted 18 June 2009 - 11:08 AM

In baseball the defense has the ball most of the time. Each team gets 27 outs, so the ability of the superior team to dominate possession/field position is much more limited than in other sports. Working within those constraints, the performance of pitchers on a given day and plain old luck have a heavy influence of the game results.


#18 Toe Nash

  • 2937 posts

Posted 18 June 2009 - 11:09 AM

QUOTE (PrometheusWakefield @ Jun 18 2009, 11:21 AM) <{POST_SNAPBACK}>
But even Pedro Martinez only wins 68.4% of his decisions. That's damned good for baseball - a team that had a rotation full of Pedro Martinez would win an average of 111 games per season, a historically great figure. But the Patriots won 68% of their decisions last year and missed the playoffs. A basketball team who wins 68.4% of their decisions wins 56 games - good enough for the #4 seed in the East or the #2 seed in the West, but nothing remarkable. And that's the most dominant pitcher in recent baseball history.

And Pedro Martinez often had to have Nomar Garciaparra fielding behind him and Wilton Veras and Darren Lewis providing offense for him. The Bulls won 72 games with Michael Jordan, Dennis Rodman and Scottie Pippen playing almost every game -- and providing over half the minutes of the team.

Also, you're taking the sample size of "all of Pedro's career" instead of just one season -- and he played for some lousy teams. In 1999, the Sox were 24-5 when he started - an .828 winning percentage -- even with Darren Lewis and Wilton Veras playing in a bunch of his games.

Another thing about baseball is the batting order. If you need a last-second shot in basketball, you can give the ball to your best player. Tom Brady is always going to be the one leading the team down the field for a last-second comeback, if necessary. But in baseball, your guys at bat in the ninth inning might be your three worst hitters, and there's nothing you can do about that besides put in a pinch hitter (who probably isn't your best hitter, either).

#19 bakahump

  • 4732 posts

Posted 18 June 2009 - 11:13 AM

I dont know the best way to put it but in baseball:

Gm1 you depend on a lot of your chances for victory on pitcher A
Gm2 you depend on Pitcher B
Gm3 you depend on Pitcher C
Gm4 Pitcher D
Gm5 Pitcher E

This doesn't even begin to address the inconstancy issues that A,B,C,D and E suffer from start to start.

You factor in Bullpens and it even more emphasizes the point. Anyone who saw a brilliant Pedro start torpedo by the pen can relate.

Where as in Hockey the "primary player" (Goalie) may play 90% of their games with what I would imagine is more consistent play.
Footballs "primary player" (QB) also would be playing in 90%+ of his teams games.
Basketball would have 5 players playing playing 75% of the time with 2-3 more playing almost all the other 25%

I don't know how to begin to scientifically prove this.....but it was my first thought when reading your post.

There just seems to be too many players you need to depend upon in baseball to be goo and consistent to do much better then .600

Conversly the weaker teams are able to steal more games because of "elite teams" inconsistency, and able to stay around .400

Edited by bakahump, 18 June 2009 - 11:15 AM.


#20 LoweTek

  • 758 posts

Posted 18 June 2009 - 11:18 AM

QUOTE (drleather2001 @ Jun 18 2009, 10:26 AM) <{POST_SNAPBACK}>
Therefore, it seems to me, that superior teams in the other sports can rest more easily on a short period of dominance (and often will, creating lopsided records) while in baseball (as others have pointed out), a solid performance is never safe because it can be undone by a single inning of ineptitude.
While I see the logic of the luck factor, I really think the clock is the key, maybe moreso than length of season. In all three other games, even with a shot clock in BB, once a largish lead is established, a team can undertake a number of strategies to slow the game down, eat the clock, etc. In baseball, you still have to throw the pitches and make the plays for the remaining 21 outs. The pitcher isn't going to strike out swinging 21 consecutive batters. He'll need skilled help from everybody out there to hold the lead. I think also in baseball, unlike the other sports, you have the spectre of runners on base. The result of the same basic successful act, say a single, could be a relatively harmless thing or drive in three runs. There is nothing comparable in the other sports. OBP league average is .346 so pitchers with a four run lead and 21 outs to go are likely to face multiple situations where the lead can be cut in half or more by one swing. In all the other sports, the value of a discrete act of offensive success is constant.

#21 Saints Rest

  • 3654 posts

Posted 18 June 2009 - 12:38 PM

I think if you look at the range of winning pct's of starting pitchers, say with >=16 starts in a season, the curve would start to look like that of team's winning pct's in the NFL. Of course, the low-end tails might not match as well since SP's with poor winning pct's eventually lose their job.

#22 MentalDisabldLst


  • Prefers spit over lube


  • 13427 posts

Posted 18 June 2009 - 12:45 PM

Lurker Alpha Bat contributes some further reading:

QUOTE
There is work in the 'physics'/economics literature on the competitiveness of different sports leagues.
This paper is the one I've seen before:
(Full study with explanation text): http://arxiv.org/PS_...2/0512143v1.pdf

also these slides contain more background material
(PPT version of above study): http://cnls.lanl.gov...sports-mich.pdf

Bottom line: you can quantify competitiveness by upset probability: given a stronger team playing a weaker team, what is the probability the weaker team wins?
The time evolution figure is interesting: that baseball is getting more competitive (i.e. less parity) over the past few decades.


I'm not sure if I equate "more competitive" with "less parity" as he does - I might suggest a league is more 'competitive' among its teams if every team is more likely to win closer to 50% of the time against every other team - but the articles and studies are fascinating.

Edited by MentalDisabldLst, 18 June 2009 - 12:58 PM.


#23 Kevin Youkulele


  • wishes Claude Makelele was a Red Sox


  • 1574 posts

Posted 18 June 2009 - 12:50 PM

This idea was implicit in several of the posts above that mentioned outs in baseball vs clock in the other sports, but I think the factor that each team gets 27 outs (if it needs them in the case of the home team) no matter how well the other team does is huge. In the other sports, being able to stay on offense/keep the ball/puck away and eat clock is huge. It is like positive feedback in a circuit, or a force multiplier--success breeding success. Doing well in a game amplifies your chances of winning through the score and lets you keep the ball longer, reducing the opponents' opportunity to score. The second part of this just doesn't happen in a structural way in baseball--offense cannot take away opportunities from the other offense (discounting psychological factors, like discouragement due to a big score differential).

The other big reason which has been explained pretty well already is the pitching rotation. A team is expected to have 2.5 below average starting pitchers (unless the prevalence of injuries and the usage of depth means it's even higher? It depends on how an average pitcher is defined). Anyway, the other sports don't make the best players at probably the single most influential position in the game (since the pitcher is involved in every at bat) unavailable for 80% of the time (or more, if you consider that even aces do not throw a CG *every* 5th day).

#24 Eric Van


  • Kid-tested, mother-approved


  • 10990 posts

Posted 18 June 2009 - 01:35 PM

Here's the necessary historic study that needs to be done.

For each MLB season, identify the teams that were in the top 20% or 25% in offense. Identify the ace pitchers of those teams, those that are within the top 20% or 25% in league ERA+.

Now find all the games when those pitchers faced a team in the bottom 20% or 25% of offense and whose starter was in the bottom 20-25% of ERA+ of full-time starters*, or was a scrub starter with a worse ERA+.

*You set the IP bar somewhat lower than ERA qualifiers, low enough to get 5 starters per team. That's because sucky pitchers don't get to qualify.

(You could go further and eliminate from the good teams those whose bullpen was below league average, and from the bad teams those whose pen was better.)

What we've done here is find "teams" with matched offense and defense.

What is the winning percentage in these games? Because these and only these games are the equivalent of, say, an 11-3 NFL team playing a 3-11 one in week 15.

If it's comparable to the percentages of excellent NBA or NFL teams, than pitching rotation explains everything.

You'd need to go back many years because these games are rare.

#25 Eric Van


  • Kid-tested, mother-approved


  • 10990 posts

Posted 18 June 2009 - 01:42 PM

QUOTE (Max Power @ Jun 18 2009, 08:48 AM) <{POST_SNAPBACK}>
If a quarterback throws a perfect pass, 99 out of 100 times the receiver will catch it and something positive will happen.

Oh great! So Asante Samuel made that interception on the sideline!

Actually, you do have good point, but I think it's more like 85%, with 10% great defense and 5% oops.

#26 mjswarner

  • 466 posts

Posted 18 June 2009 - 01:48 PM

QUOTE (Myt1 @ Jun 18 2009, 08:16 AM) <{POST_SNAPBACK}>
Further, baseball is a streaky game that is really deliberately normalized over the huge, 162 game season. The other three sports are much less so, especially football.

I'm not sure why I did this outside of curiosity or what it shows, but here is the top & bottom win percentage for the last few years in the AL in two week intervals:

I fudged April 1st a little bit since some years there hasn't been a game played yet.

Interesting how quickly the bottom and top teams approach 0.700 and 0.300, and how it takes ~3 months (80-85 games) to stabilize around 0.600-0.400.


#27 BucketOBalls


  • SoSH Member


  • 5644 posts

Posted 18 June 2009 - 02:30 PM

I think another reason(that hasn't been mentioned) is that the central contest of baseball(pitcher vs batter) is opposed. Picture a game like this.

For each round A and B each roll two dice and pick one. The two dice are added to make a sum and player A gets max(0, sum-7) points and player B gets max(0, 7 - sum) points. Highest point total after a fixed number of rounds wins.

Obviously, the strategy for A is to pick their highest die and the strategy for B is to pick their lowest die. The result of that is that your sum will rarely hit the extremes of the possible range(2-12) and tend to stay around 7. You get the same effect in baseball where the pitcher is trying to get the batter out, and vice versa. Other spots don't really have this direct oppositional dynamic.

#28 zenter


  • slumdog idol


  • 4661 posts

Posted 18 June 2009 - 02:34 PM

QUOTE (Hendu for Kutch @ Jun 18 2009, 11:04 AM) <{POST_SNAPBACK}>
I think there's a degree of randomness and luck in baseball that isn't quite as prevalent in the other sports. For example, a pitcher can throw the exact same pitch to the exact same batter 10 times and get ten different results...


My football fan friends always ask me why I like baseball so much. They focus on length of the season, slowness of games, dependence on luck and umpires, etc. I tell them, in my nerdy way, "because baseball is noisy." You can tell in one or two games which teams are superior in football, soccer, hockey, etc. One guy can fundamentally change a game (KG). Not in baseball. Four factors:

Expectations Vs. Execution: In basketball, soccer, hockey, etc. (versus baseball), what you did last time has much less to do with what happens now. In football, the major strategy ends at the snap, and it's all about skill and execution. The best teams improvise a little, but it's usually about building full contingency plans into plays. Even if the opponent knows what's coming (call it the Larry Bird scenario), the offense may still succeed, and change the game. In baseball, it's much more about what the other guy expects you to do. Taking HfK's scenario, there are several in-game tactical questions - what was the last pitch? what does the batter expect? what was the sequence last time? does that batter have tendencies? does Josh? does the batter know Josh's tendencies?, etc - that affect action. This is not invariably a difference-maker (Dave Roberts, Gm 4), but in aggregate, I bet it reduces individual influence over the game.

Teams Taking Turns: No other team sport discussed is so orderly with turns. No single play (in a vacuum) can change who is on offense - you can't steal the ball/puck. Imagine if every football team were allowed EXACTLY 4 (or 8) downs, no matter what. Records would probably be a lot less lopsided. Or in baseball, if every walk also gave the offensive team an additional out, I bet that would make records more lopsided.

Parallel Vs. Series: In the other team sports, all players on offense work together at the same time (in parallel) to reach success. In baseball, offense works in series, but defense works in parallel. (Series is like the old X-Mas lights, where if one bulb went out, all the ones after it in the string go out. Three bulbs for baseball, but same idea.) To have consistent success takes a string of sequential events.

One Vs. Nine: The inherent lopsided nine-on-one nature of baseball gives defense a numbers advantage. You need one guy to succeed against one other guy, and then eight others (in very quick succession). The nine are all focused on disabling your success. It would be as if, in basketball, teams took deliberate turns on offense and each team had the opportunity to set up against the one offensive player. LeBron may still succeed against five guys once in a while, but the chances go down because all their efforts are oriented towards preventing his, and only his, success. On offense, you show up 11% of the time. On defense, you may be a total non-factor. As a pitcher, you are very dependent on the eight behind you, as well as the game theory of the batter/pitcher relationship.

In the end, these all make baseball significantly more "equal" versus other sports. You can accumulate a whole lot of talent, but still perform badly. There are other "luck" factors (park, umps, errors) which contribute, but the structure of the rules, and the way to deal with them, actually enforce a bit of parity.

NOTE: I don't have data to backup how each factor further equalizes or dis-equalizes a game, but it would be interesting to develop a means to quantify value of each factor.

NOTE 2: Sorry for length, but I'm irrationally exuberant about this, my first post as a full member. Wah-hoo!

Edited by zenter, 18 June 2009 - 02:42 PM.


#29 Timmeh49

  • 1752 posts

Posted 18 June 2009 - 04:21 PM

QUOTE (Eric Van @ Jun 18 2009, 02:35 PM) <{POST_SNAPBACK}>
Here's the necessary historic study that needs to be done.

For each MLB season, identify the teams that were in the top 20% or 25% in offense. Identify the ace pitchers of those teams, those that are within the top 20% or 25% in league ERA+.

Now find all the games when those pitchers faced a team in the bottom 20% or 25% of offense and whose starter was in the bottom 20-25% of ERA+ of full-time starters*, or was a scrub starter with a worse ERA+.

*You set the IP bar somewhat lower than ERA qualifiers, low enough to get 5 starters per team. That's because sucky pitchers don't get to qualify.

(You could go further and eliminate from the good teams those whose bullpen was below league average, and from the bad teams those whose pen was better.)

What we've done here is find "teams" with matched offense and defense.

What is the winning percentage in these games? Because these and only these games are the equivalent of, say, an 11-3 NFL team playing a 3-11 one in week 15.

If it's comparable to the percentages of excellent NBA or NFL teams, than pitching rotation explains everything.

You'd need to go back many years because these games are rare.
I don't know if this is the right study. If you look for games where a team with a good offense and a good pitcher faces a team with a crappy offense and a crappy pitcher, then, sure, the "good" team will win a large majority of the games (75-80% seems about right). I think the apt analogy with football would be that teams only play one game per week, so that there are only #1 starters. So in a historical study, you would look for games between #1 starters. And I doubt that the best #1 starter would have a winning percentage of 75%.

In any case, another factor that smooths out the differences between baseball teams is the relative lack of specialization, especially compared with football. In baseball, every player in the lineup has to have some amount of hitting skill. In football, there are, what, 75 plays in a game? What if each player could only take a maximum of 7 snaps (75 / 11)? Even fat offensive lineman.

#30 inoffensiv philosophy

  • 137 posts

Posted 19 June 2009 - 12:01 AM

QUOTE (Timmeh49 @ Jun 18 2009, 10:21 PM) <{POST_SNAPBACK}>
I don't know if this is the right study. If you look for games where a team with a good offense and a good pitcher faces a team with a crappy offense and a crappy pitcher, then, sure, the "good" team will win a large majority of the games (75-80% seems about right). I think the apt analogy with football would be that teams only play one game per week, so that there are only #1 starters. So in a historical study, you would look for games between #1 starters. And I doubt that the best #1 starter would have a winning percentage of 75%.


I suppose it depends on how you feel about the relative skill distribution of these two sets of sportsmen. There are, obviously, thirty-two starting quarterbacks compared to one hundred and fifty starting pitchers active at any one time. So both the worst "ace" and the worst starting QB are theoretically ~30th in skill level amongst those sets.

But it might be that a "good" quarterback is simply harder to find than a "good" starting pitcher, so that the distribution of talent across the entire sets is still roughly the same. Basically, is (say) Derek Anderson (a "bad" starting QB who nevertheless started the majority of his team's games) the equivalent of (the 2008 incarnations of) Jesse Litsch or John Lackey (who were amongst those tied for 29th place in ERA+ amongst starters)? Or is he more like 2008 Nate Robertson (a guy who qualified for the ERA title, but sucked)?

Personally, I have no idea since I know basically nothing about football (I chose Anderson via football-reference's "passer rating" leaderboard for 2008, incidentally, so if he's a forty-seven time pro bowler who was playing with a broken head last year, I apologize). But I think this is the issue at hand.

Edited by inoffensiv philosophy, 19 June 2009 - 12:04 AM.


#31 gregl

  • 158 posts

Posted 19 June 2009 - 08:27 PM

After seeing the chart above I came up with some numbers that seem interesting enough to share. Consider this: in a perfectly competitive league every team would have a 50% chance of winning any given game (call this their theoretical or potential win%). A team and therefore that league can be replicated with a coin toss/binomial model. I worked through a pretty simple analysis looking at this question using a normal approximation for a binomial distribution. This is a fair approximation when n is large, which in football it really isn't but it's close enough to get a glimpse of something meaningful. Using that approximation, after 16 games a team's expected range of win percentage would be .125 to .875 using a plus/minus 3 standard deviation range. 3 standard deviations includes ~99.7% of outcomes most observed results should fall within the range. After 82 games the range narrows to .334 to .666. After 162 games it's .382 to .618. Pretty much mirrors the chart posted above by mjswarner and ties nicely to the premise of baseball being bounded by 400-600.

Now for a league with less than perfect competition to establish the outer limits. For a "good" team you might assume a 60% expected win probability. There's also a bad team out there with a 40% expected win percentage. Using the 3 sigma upper bound win% for the good team and the 3 sigma lower bound win% for the bad team, the ranges for an unbalanced league look like this:

After 16 games: .033 to .967 (.5 wins to 15.5 wins). NFL actual historical range is 0 to 16 wins.
After 82 games: .238 to .762 (19 wins to 62 wins). NBA actual range is 9 to 72 wins (72-73 Sixers and the 95-96 Bulls)
After 162 games: .285 to .715 (46 wins to 116 wins). Records from 162 game seasons: 2003 Tigers had 42 wins while Seattle had 116

A caveat here is that the expected win totals of good teams versus bad teams is probably different in each sport. That's where the comments made throughout this thread come into play. The nature of the rotation in baseball, for example…. These factors enable one team to have a higher or lower chance of success over a very large number of trials and therefore for any given individual contest. [/size][/font][font="Times New Roman"][size=3]For basketball for example, the actual historical range suggests something like a 70/30 or 75/25 win expectation for the good and bad teams, respectively. In baseball 60/40 seems about right.

In other words, the 400-600 phenomenon is just the tendency of life to trend toward normal distributions and a function of games played. If football played the same number of games it's very likely their records would converge to approximately the same range (or at least something much much smaller than we've seen in 16 games). I'm not trying to explain this the same way as the rest of you or even disagree with anyone. Just adding some context to the competitiveness/parity of each league and sport in a way that backs out the impact of the length of each league's season.

If you think a league allows for a baseball team to be better than 60% favorites you'll have to ask yourself why, out of pure chance alone, hasn't a 65% win team come along and won 120 games. Or at the very least you'll have to be willing to expect that at some point.

#32 Bellhorn


  • Lumiere


  • 2000 posts

Posted 24 June 2009 - 09:57 AM

QUOTE (gregl @ Jun 19 2009, 09:27 PM) <{POST_SNAPBACK}>
After seeing the chart above I came up with some numbers that seem interesting enough to share. Consider this: in a perfectly competitive league every team would have a 50% chance of winning any given game (call this their theoretical or potential win%). A team and therefore that league can be replicated with a coin toss/binomial model. I worked through a pretty simple analysis looking at this question using a normal approximation for a binomial distribution. This is a fair approximation when n is large, which in football it really isn't but it's close enough to get a glimpse of something meaningful. Using that approximation, after 16 games a team's expected range of win percentage would be .125 to .875 using a plus/minus 3 standard deviation range. 3 standard deviations includes ~99.7% of outcomes most observed results should fall within the range. After 82 games the range narrows to .334 to .666. After 162 games it's .382 to .618. Pretty much mirrors the chart posted above by mjswarner and ties nicely to the premise of baseball being bounded by 400-600.


Very nicely done. Another way of stating this is that observed variance = true variance + noise; the noise component from a 162-game sample will necessarily tend to be smaller than from a 16-game sample, so even if true variance were equal in baseball and football, observed variance would be smaller in baseball.

I have to wonder, though, doesn't the .118 differential in either direction from the binomial distribution seem a little high, even at the 3-sigma level? This implies that a true .500 team should win 100 games (or lose 100 games) 0.3 percent of the time - of course, I can't prove that this doesn't correspond to baseball reality, but it certainly doesn't feel right. It would imply either a 19-game delta between 3rd-order winning percentage and actual (which I'm pretty sure has never been seen over a full season*) or a substantial difference between 3rd-order winning percentage and "true" winning percentage - I'm at a loss as to what could generate such a discrepancy.

The issue may be that the aggregate of a team's true win probability over the course of the season is in fact an average of a quantity that fluctuates quite a bit from day to day. A team who has, on average, a 50% chance of winning a game over the course of the season doesn't necessarily have a 50% chance of winning each individual game - they may be a 70% favorite when they have a favorable pitching match-up against a poor team, a 70% underdog when the roles are reversed. The more the total deviation from the average value of 50% over the course of the season, the greater the reduction in variance obtained from the binomial model. This is true even if you start from a baseline other than 50%: a team with an average 65% win probability has their variance decreased more by their 80% games than it is increased by their 50% games, for example.

So two points arise from this:

1) the absence of a 120-game winner does not necessarily preclude the existence of true .650 teams in baseball as something other than a complete anomaly - though of course I agree that they are more rare than would be concluded by simply counting up the number of 105-win teams

2) It might, in principle, be possible to reverse-engineer the amount of fluctuation in day-to-day win probability by comparing observed variance between actual wpct and 3rd-order (if we accept this as a reasonable proxy for "true" wpct) and comparing it to the amount predicted from a binomial calculation based on the team's 3rd-order wpct.

* I know that BP tends to underestimate the opponent adjustment component of this, but I don't think this is enough to get the delta anywhere close to 19 games.

#33 John DiFool

  • 1096 posts

Posted 24 June 2009 - 11:21 AM

Inspired by an article by Bill James in the 1990 Baseball Abstract (yeah, the one published on a shoestring after Bill passed the wand-didn't mean he couldn't write a few things), I basically replicated his simulation of baseball seasons via a computer program similar to the one he used. Some of things he (I) discovered:

Teams which reach the postseason tend to be c. 5 games better than their true quality*. Just about every year you have a team which wins the division, league, or even WS even tho they should have had no business doing so (think D-Backs '07); likewise almost every year a good team, even the best in the league, doesn't make the postseason (Sox '02).

If you attempt to match all teams' true quality to the actual (typical) winning percentages seen in real baseball, you end up with teams winning 110+ games a year and losing 110+ as well most every year (add in some luck on top of a .650 true quality team and this easily happens). Thus to properly model a real baseball season you must reduce the standard deviation of true qualities below what we actually see.

If you set everybody to .500 TQ, you still get some teams winning 90 and some losing 90 (and yes on rare occasions even 100).

Talent starts to outweight pure luck around the 100th game of the season-after that talent starts to take hold. Not an absolute postulate by any means, but with the low SD we see in baseball you do need the longer season to better sort out the good teams from the bad (and even then as I said it often doesn't work out that way).

I've been itching to do this again actually (isn't there something called WinBasic?).


[* "True quality" is the winning percentage that the simulation used in its calculations.]

Edited by John DiFool, 24 June 2009 - 11:22 AM.


#34 Bellhorn


  • Lumiere


  • 2000 posts

Posted 24 June 2009 - 02:21 PM

QUOTE (John DiFool @ Jun 24 2009, 12:21 PM) <{POST_SNAPBACK}>
If you set everybody to .500 TQ, you still get some teams winning 90 and some losing 90 (and yes on rare occasions even 100).

I know that this happens if you assume that the season can be modeled using IID weighted coin flips. But does this really correspond to reality? Even if you set the win probability at .650 or .350, 3 SD is still over 18 games, so the large variance does not depend on a team being close to .500. So basically, every five seasons a team should overplay or underplay its true win probability by 18 games. Is there really a candidate for such a team at any point in recent baseball history? As I recall, it seems that by any sensible system of measurement, the extreme outliers tend to be closer to 12-14 games.

The point is that I think the estimates of variance derived from the binomial model need to be reduced somewhat (not a lot, but some) in order to account for the fact that the individual win probabilities are not IID - something which is obvious when you consider that teams do not face the same caliber of opponent or have the same quality of pitching match-up from day to day.

Edited by Bellhorn, 24 June 2009 - 02:25 PM.


#35 John DiFool

  • 1096 posts

Posted 25 June 2009 - 01:32 PM

James' point on the "coin flip" thing was that, in the simulation, we KNOW how good each team actually is, and he saw that as a strength of his simulations. Sure you could give the Sox a "true quality" of .700 when Beckett pitches, and .500 when Dice-K '09 pitches. but by simplifying things you actually get closer to the heart of the matter. I guess you could model individual pitchers in that way, not sure if it will substantially change any conclusions you draw.

I'm not entirely sure what you mean by "3 SD is still over 18 games." Please elaborate.


#36 gregl

  • 158 posts

Posted 26 June 2009 - 10:01 AM

QUOTE (John DiFool @ Jun 25 2009, 02:32 PM) <{POST_SNAPBACK}>
I'm not entirely sure what you mean by "3 SD is still over 18 games." Please elaborate.

At a 500 expected win percentage the team expects to win 81 games plus or minus some standard deviation. One standard deviation is about 6 games for expected win percentages anywhere in the 350-650 range, so 3 SDs is about 18 games. The significance is that teams should over or underperform their "true" quality by 18 games with some degree of frequency, which happens to be about .3%.

QUOTE (Bellhorn @ Jun 25 2009, 10:57 AM) <{POST_SNAPBACK}>
This implies that a true .500 team should win 100 games (or lose 100 games) 0.3 percent of the time - of course, I can't prove that this doesn't correspond to baseball reality, but it certainly doesn't feel right.

I had the same reaction but it's not entirely unreasonable. The figure is actually .26% for the two tailed distribution so .13% on each end. That means that one in every 770 seasons played (with one year representing 30 seasons played) a true 500 team should win 99 games through what could be considered sustained good luck, karma, etc. The same would be true on the other end.

QUOTE (Bellhorn @ Jun 25 2009, 10:57 AM) <{POST_SNAPBACK}>
1) the absence of a 120-game winner does not necessarily preclude the existence of true .650 teams in baseball as something other than a complete anomaly - though of course I agree that they are more rare than would be concluded by simply counting up the number of 105-win teams

True. The analysis concludes - most likely incorrectly - that 650 is impossible. This is largely because of a conditional probability component which is being ignored for simplicity. That is that if this model holds up the probability of a 120 game winner as a 3SD outlier from a 650 "true" quality team is not simply .13% (which is large enough to think that it should have happened already). It's actually .13% given the existence of a true 650 team, which presumably is also a relatively rare creature, making the overall probability of 120 wins extremely low.

#37 Bellhorn


  • Lumiere


  • 2000 posts

Posted 26 June 2009 - 10:07 AM

QUOTE (John DiFool @ Jun 25 2009, 02:32 PM) <{POST_SNAPBACK}>
James' point on the "coin flip" thing was that, in the simulation, we KNOW how good each team actually is, and he saw that as a strength of his simulations. Sure you could give the Sox a "true quality" of .700 when Beckett pitches, and .500 when Dice-K '09 pitches. but by simplifying things you actually get closer to the heart of the matter. I guess you could model individual pitchers in that way, not sure if it will substantially change any conclusions you draw.

Well, that's my point - I think it will give a more realistic SD for team season wins. It's not just pitching match-ups, it's strength of opponent (I'm not familiar with this James study, but I assume he at least used his "log5" method in the simulations to give the probability that a team with TQ x would beat a team with TQ y, right?) home-field advantage, and perhaps, if the data indicate that an adjustment is still necessary, even team "hotness" or "coldness" at various points over the course of a season. I agree that it's important to know what a team's TQ is in the sense of an average value of individual win probability (and to make sure that any deviations from this in the model balance out on both sides), but I can't see why this average should be imposed on every single trial in the simulation - it doesn't make sense intuitively, and the results it generates don't make much sense either, as far as I can see.
QUOTE
I'm not entirely sure what you mean by "3 SD is still over 18 games." Please elaborate.

If you take a team whose TQ is .500, and you hold this win probability constant for every game on the schedule, then their number of wins on the season is a binomial RV with mean 81 (.5 x 162) and variance 40.5 (.5 x .5 x 162). So the SD is approximately 6.36. As gregl pointed out in his post, you expect that the RV will fall within 3 SD about 99.7% of the time, so for teams whose TQ is set at .500, 0.3 percent of them should be 19 or more games away from .500. My contention is that this doesn't pass the smell test when it comes to a comparison with actual baseball history.

The point of my second post is that it's not the TQ = .500 assumption that is responsible for the large SD. Set TQ = .650, hold it constant for every game on the schedule, and you get a mean for season wins of 105, and a variance of 36.86 (162 x .65 x .35.) Here, the SD is still over 6 (6.07) so 0.3 percent of TQ = .650 teams should still be 18 games above or below this. Since .650 is something of an outer limit for TQ, and the same principle works as you go away from .500 in the other direction, 6.07 can be taken as a lower bound for a team's SD in season wins, if you hold their probability constant for each game on the schedule. So I think it is this assumption that needs to be relaxed if we are to model baseball reality more closely.

If I'm wrong about a SD of 6 being too high, it's probably because there's a significant difference between TQ and BP's W3 that I'm not seeing. (W3, of course, normalizes the conversion of events into runs and of runs into wins, and adjusts for strength of schedule.) Is deviation in player performance another factor that needs to be considered? Is a TQ = .500 team one whose players do achieve league-average results on a per-event basis, or one whose players were projected to do so? It seems to me that the former makes considerably more sense, but if a case can be made for the latter, this somewhat circumvents my objection to the whole thing.

#38 Bellhorn


  • Lumiere


  • 2000 posts

Posted 26 June 2009 - 10:18 AM

QUOTE (gregl @ Jun 26 2009, 11:01 AM) <{POST_SNAPBACK}>
I had the same reaction but it's not entirely unreasonable. The figure is actually .26% for the two tailed distribution so .13% on each end. That means that one in every 770 seasons played (with one year representing 30 seasons played) a true 500 team should win 99 games through what could be considered sustained good luck, karma, etc. The same would be true on the other end.

I get this, I'm just having trouble reconciling this magnitude of good luck/karma with what is measured by BP's W3. Leaving aside the possibility that there really has been an 18-game outlier at some point in baseball history, with SD = 6, we should see almost 5% of teams over/underperforming by 12. Unless I'm much mistaken, this is not the case. So is there something else, other than event-run efficiency or run-win efficiency that could play into the TQ/actual win gap? Good luck/karma in terms of the events themselves, perhaps? But then, wouldn't that already be factored into TQ to begin with?


#39 Alternate34

  • 2461 posts

Posted 26 June 2009 - 10:24 AM

.13% of a 99 win team of actual 81 win quality sounds about right to me. That is 13 out of 10000 team seasons. I don't think there have been 10000 team seasons yet in baseball. There have been 108 seasons of AL/NL baseball since 1901. There have been 2,226 team seasons (actually 2,296 not including this season). If there is a .13% chance of a 99 win team actually being a 81 win team, then it would happen about 3 times (a little under that).

I don't know what the math is for a lower amount of games per season. With 162 games per season being played for 48 years that's 1248 162 game seasons. You can subtract the strike years yourself if you want to be more accurate. That would be about 1 such overachieving season during the 162 game season period.

Also, how many of those 2,226 team seasons are actually .500 quality teams? Trying to find the 3 standard deviation teams is nearly impossible considering the variables involved.

Edited by Alternate34, 26 June 2009 - 11:27 AM.


#40 gregl

  • 158 posts

Posted 26 June 2009 - 11:14 AM

QUOTE (Bellhorn @ Jun 26 2009, 11:18 AM) <{POST_SNAPBACK}>
I get this, I'm just having trouble reconciling this magnitude of good luck/karma with what is measured by BP's W3. Leaving aside the possibility that there really has been an 18-game outlier at some point in baseball history, with SD = 6, we should see almost 5% of teams over/underperforming by 12. Unless I'm much mistaken, this is not the case. So is there something else, other than event-run efficiency or run-win efficiency that could play into the TQ/actual win gap? Good luck/karma in terms of the events themselves, perhaps? But then, wouldn't that already be factored into TQ to begin with?


One factor is how you think about something like injuries. At the beginning of the season a team is expected to win 81 games, has a lot of injuries and wins 69. Did they underperform by two standard deviations or do we have to say that given the injuries they are really a 69 win team and therefore performed the way they should? In other words, are injuries bad luck or a fundamental change to the team and therefore to our expectations for their performance?

The model I have in mind reflects deviations from quality of the players on the roster and the outcome at the end of the season. If a couple guys get injured, other models implicitly take that into account and lower expectations for the team and therefore show a lower deviation from what would be expected. For example, some great pitcher gets injured and the team ends up giving up more runs so they lose more games but not more than expected given the fact that the pitcher is injured. Expectations in the model I put up don't take the pitcher's injury as a given and therefore don't lower expectations for the team, showing a larger deviation.

Put differently, one model measures differences between the quality of the team's roster and the actual outcome over the length of the season. The other model measures differences between the actual outcome and the expectations for the team given the guys on the roster that actually make it onto the field that day.

#41 John DiFool

  • 1096 posts

Posted 26 June 2009 - 11:58 AM

QUOTE (Bellhorn @ Jun 26 2009, 11:18 AM) <{POST_SNAPBACK}>
I get this, I'm just having trouble reconciling this magnitude of good luck/karma with what is measured by BP's W3. Leaving aside the possibility that there really has been an 18-game outlier at some point in baseball history, with SD = 6, we should see almost 5% of teams over/underperforming by 12. Unless I'm much mistaken, this is not the case. So is there something else, other than event-run efficiency or run-win efficiency that could play into the TQ/actual win gap? Good luck/karma in terms of the events themselves, perhaps? But then, wouldn't that already be factored into TQ to begin with?


I think ~12+ game swings happen most/all of the time; the D-Backs of '07 & LAAA '08 only being the most notorious recent examples. The Angels last year turned a meager advantage in OPS (.743 to .729) into 100 wins. Heck even this year the Rays are -8.2 in 74 games, Nats -10, Giants +7.2 (W3 from BBPro); I'd be perfectly willing to bet that at least one of these teams is +/- 12 by October. Long before BBPro became the big thing I was doing this same general kind of stuff on Usenet, and noticed the same things.

5% of 30 is 1.5 teams a season (I think the 5% figure is too high, but not that far off the mark: 2-3% is my guess (about 1 team a season more or less). It does happen. In many such cases it may be disguised to our eyes (even us saberheads) to one extent or another.

There's (at least) 3 ways in which luck can manifest:

1. Good/poor performance in arrangements of runs scored vs. runs allowed (win all the close ones and lose the blowouts, for ex.). Covered with Pythag & other related methods.

2. Good/poor performance in arrangements of run-scoring elements (stranding tons of runners vs. driving most of them in, home runs with nobody on vs. many people on, etc.). "Clutch" hitting (or choking), IOW. Covered by BBPro's W2 et al.

3. Good/poor performance with respect to ability vs. results (liners right at people, bloops dropping in, long blasts held up by the wind vs. pop flies blown into Wrigley's bleachers).

That last one is the trickiest to tease out and is only partly accounted for by park effects and even PECOTA type projections. If a hitter has a true ability to hit .330/.400/.500 in a neutral park, and instead only hits .310/.370/.450 because his liners find gloves and fly balls get hit to the deepest parts of the park, well, how do you know how good he really is for sure? Put that on top of the effects from 1 & 2, along with some probably even "deeper" fluke effects not already covered (manager biases and whatnot, and, in recent years, schedule differences), and it is not outside the realm of possibility at all that you occasionally get teams which are +/- 18 games over a full season.

Feel free to craft your own simulation and see what happens-make it as detailed as you like. I suspect it would take a lot of delberate "fudging" to tamp down all the extreme outliers. It actually would be a nice weekend challenge to find the biggest outliers in baseball history-we can provisionally compare OPS for a quick and dirty study (OPS of course leaves a lot of stuff out and distorts the stuff that's "in", but it's a start).

#42 Bellhorn


  • Lumiere


  • 2000 posts

Posted 26 June 2009 - 12:46 PM

QUOTE (Alternate34 @ Jun 26 2009, 11:24 AM) <{POST_SNAPBACK}>
.13% of a 99 win team of actual 81 win quality sounds about right to me. That is 13 out of 10000 team seasons. I don't think there have been 10000 team seasons yet in baseball. There have been 108 seasons of AL/NL baseball since 1901. There have been 2,226 team seasons (actually 2,296 not including this season). If there is a .13% chance of a 99 win team actually being a 81 win team, then it would happen about 3 times (a little under that).

When you consider deviation in both directions (i.e. a team being 3 SD above/below TQ) you find that it should happen about once every thirteen seasons. (I don't know where on earth I got five from in my previous post - my bad on that one.) You should also get teams being above/below by 17, 16, 15 etc. games with increasing frequency, which reaches the neighborhood of 5% (i.e. 1.5 teams per season, as John DiFool notes) at around 12. gregl's idea concerning injuries is definitely intriguing - I'm going to want to think about that a bit more before responding.
QUOTE
Also, how many of those 2,226 team seasons are actually .500 quality teams?

That's the point, though - this doesn't matter. 3 SD is highest (19) for .500 teams, but is still over 18 for teams of any realistic level of quality.

#43 Alternate34

  • 2461 posts

Posted 26 June 2009 - 12:57 PM

QUOTE (Bellhorn @ Jun 26 2009, 12:46 PM) <{POST_SNAPBACK}>
When you consider deviation in both directions (i.e. a team being 3 SD above/below TQ) you find that it should happen about once every thirteen seasons. (I don't know where on earth I got five from in my previous post - my bad on that one.) You should also get teams being above/below by 17, 16, 15 etc. games with increasing frequency, which reaches the neighborhood of 5% (i.e. 1.5 teams per season, as John DiFool notes) at around 12. gregl's idea concerning injuries is definitely intriguing - I'm going to want to think about that a bit more before responding.

That's the point, though - this doesn't matter. 3 SD is highest (19) for .500 teams, but is still over 18 for teams of any realistic level of quality.


I want to understand this a little better. A team would be 3 SD above the norm in luck .15% (it's really .13%? whatever math is easier with .15) of the time from what I understand. For a .500 team that would be 99 wins in a 3 SD above the norm lucky season. That would mean that .15 out 100 teams in a sample would be at 3 SD above the norm for luck. Or 1.5 out of 1000. Or 1 out of 667. Wouldn't that be more 1 out of every 22 seasons you would see this in a 30 team league? Where is my brain fudging if anywhere?

Edited by Alternate34, 26 June 2009 - 12:57 PM.


#44 gregl

  • 158 posts

Posted 26 June 2009 - 01:54 PM

QUOTE (Alternate34 @ Jun 26 2009, 01:57 PM) <{POST_SNAPBACK}>
I want to understand this a little better. A team would be 3 SD above the norm in luck .15% (it's really .13%? whatever math is easier with .15) of the time from what I understand. For a .500 team that would be 99 wins in a 3 SD above the norm lucky season. That would mean that .15 out 100 teams in a sample would be at 3 SD above the norm for luck. Or 1.5 out of 1000. Or 1 out of 667. Wouldn't that be more 1 out of every 22 seasons you would see this in a 30 team league? Where is my brain fudging if anywhere?


Both are correct just different ways of looking at it plus some rounding error. Your math says once every 22 seasons a team should finish with at least 18 wins more than expected. Bellhorn's number says once every 13 seasons a team should be at least 18 games away from expected and this could be either 18 more or less than expected. It's a symetrical distribution which is why it's about half of what you came up with where the difference is attributable to the .0002 rounding. Your math at .0013 would show 1 in 26 instead of 1 in 22.

#45 gregl

  • 158 posts

Posted 26 June 2009 - 03:37 PM

QUOTE (Bellhorn @ Jun 26 2009, 01:46 PM) <{POST_SNAPBACK}>
gregl's idea concerning injuries is definitely intriguing - I'm going to want to think about that a bit more before responding.

One more thought on pitching matchups to add while you're thinking. I believe these could also be considered luck but are not factored that way by BP (although I could stand to update my understanding of the math behind that model so could be missing something).

Consider again the perfectly competitive league. Every team is exactly equal and should win 81 games but every team also has above average and below average pitchers. If one team faces a disproportionate number of 1/2/3 pitchers and another faces 3/4/5 pitchers the first team should score less runs and win less games. That will look normal in the BP world, won't it? They'll just appear to be a bad team that doesn't score a lot. But given the construct of this hypothetical world we know this isn't the case. We know the two teams are of equal quality. The difference in outcomes therefore has to be considered some form of chance outcome and the deviation from 81 wins isn't a movement in the quality of the team but rather is a movement along the curve due to randomness.

If my understanding of BP is adequate this should be one more reason that the model I put forth earlier shows a wider distribution than what I think you have in mind given BP historical data.

#46 Bellhorn


  • Lumiere


  • 2000 posts

Posted 26 June 2009 - 09:22 PM

QUOTE (gregl @ Jun 26 2009, 12:14 PM) <{POST_SNAPBACK}>
Put differently, one model measures differences between the quality of the team's roster and the actual outcome over the length of the season. The other model measures differences between the actual outcome and the expectations for the team given the guys on the roster that actually make it onto the field that day.

It seems to me that the particular example of injuries doesn't fit into the binomial approach, because injuries get in the way of the independence of the trials. This is actually similar to the point I was making regarding pitching match-ups. Setting a team's TQ at .500 (prospectively or retrospectively) is the same as saying that a team is as likely to be a > 50% winner on a given day due to a superior match-up as they are to be a < 50% winner due to an inferior match-up. The degree to which the probability fluctuates on either side of 50% doesn't affect the mean, but it does affect the variance. Same with injuries: suppose that a projected .500 TQ team has no luck impact other than injuries all through the season, but has perfect team health in the first half, going 50-31, and a run of injuries in the second half, going 31-50. Clearly the variance in this team's wins (if you assume the season is played in multiple universes) is different from that of another .500 team that has a precisely average injury impact throughout the season. So you can certainly say that injury impact adds to the variance that will be observed when actual results are compared with projected TQ. But I think that if you insist on using a TQ that doesn't change with injury information, the binomial approach ceases to be a valid method of computing variance.

QUOTE (gregl @ Jun 26 2009, 04:37 PM) <{POST_SNAPBACK}>
Consider again the perfectly competitive league. Every team is exactly equal and should win 81 games but every team also has above average and below average pitchers. If one team faces a disproportionate number of 1/2/3 pitchers and another faces 3/4/5 pitchers the first team should score less runs and win less games. That will look normal in the BP world, won't it? They'll just appear to be a bad team that doesn't score a lot. But given the construct of this hypothetical world we know this isn't the case. We know the two teams are of equal quality. The difference in outcomes therefore has to be considered some form of chance outcome and the deviation from 81 wins isn't a movement in the quality of the team but rather is a movement along the curve due to randomness.

You're correct in your understanding of BP's adjusted standings - the third adjustment is for general quality of opponent, but not for quality of pitching staff faced within each opponent. So yes, something that we could identify as luck if we had knowledge of perfect competition would show up as difference in quality on BP's page. And there are other types of player-based luck that wouldn't interfere with the independence of the trials, as in the previous paragraph. So I will concede one general point here: that it makes sense for there to be a wider distribution of results due to luck (as understood by some omniscient being) than is seen in BP's adjusted standings. But I guess my main remaining objection is derived from the parenthetical: in the absence of such omniscience regarding the separation of luck and skill/performance, a method of basing TQ on a projection of game events risks confusing the one with the other. So I would think it far more productive to base estimates of TQ retrospectively based on objective measurements of game events (though as suggested by John DiFool this could be extended to include batted ball types.) Of course, you're free to differ on that - and if you base your model on an alternate premise involving projections, I can't object to a SD of six games due to luck, by this definition. I continue to maintain that 1) 6 games is too high for a SD that only defines luck in run or win efficiency terms, 2) it should be intuitively apparent that win probabilities are not in fact identically distributed throughout the games of a season, and 3) this fact means that calculations of variance based on a binomial distribution need to be adjusted downward. As long as you agree on those, we have no substantial disagreement.

#47 Bellhorn


  • Lumiere


  • 2000 posts

Posted 27 June 2009 - 09:50 AM

Here's another way of putting it:

the binomial approach suggests that for some notion of a team's underlying quality (TQ) we can assume that this value holds as the probability of a win for each game of the season, and then see that we expect the standard deviation of actual wins to be somewhere between 6.07 and 6.36, depending on the value of TQ. This is a powerful and potentially useful insight. But leaving aside the question of whether or not it makes sense to assume that TQ is the probability of winning each game on the schedule without making a substantial adjustment, let's deal with a more basic question: what does TQ actually correspond to? It seems to me that there are several candidates:

TQ1: the quality of the team based on a pre-season projection of playing time, assuming average luck due to injuries.

No doubt the variance between actual wins and the mean number of wins based on this estimate can be fairly large. But the binomial approach doesn't work here, so we would have to resort to other methods in order to find it.

TQ2: the average projected quality of results based on players who actually are on the 25 man roster, or who are actually in the lineup on a given day.

Now the binomial approach begins to become coherent. But we run into difficulties when it comes to being sure that this measures what we want it to: is variance due to quality of play properly chalked up entirely to bad luck, or can it also reflect on the quality of the projection itself? e.g. Team A is projected to hit 20% LD, to hit 40% FB with a certain run value per FB, etc. etc. If they end up hitting only 15% LD, and have a lower run value per FB than projected, that is consistent with bad luck, but it's also consistent with an inaccuracy in projection, or simply with play that is of a lower quality than expected. How do we separate these possibilities?

TQ3: normalized win pct based on run values of batted ball types on both sides of the ball.

Seems to me to be the most promising candidate. This is an objective record of what a team actually did on the field, and any deviation in results seems to fit the criterion of being IID from day to day.

TQ4: normalized win pct based on PA outcomes and other run-creation events. (i.e. BP's W2 or W3, depending on whether opponent adjustment is added)

Obviously fits binomial approach, but data doesn't seem to support SD > 6.

So it seems to me that it's either TQ2 or TQ3 for which we should expect to see a SD of 6 games. It has to be one or the other....which is it?

#48 gregl

  • 158 posts

Posted 27 June 2009 - 05:23 PM

QUOTE (Bellhorn @ Jun 26 2009, 10:22 PM) <{POST_SNAPBACK}>
As long as you agree on those, we have no substantial disagreement.

Generally yes, although a few caveats and further thoughts.

QUOTE (Bellhorn @ Jun 26 2009, 10:22 PM) <{POST_SNAPBACK}>
The degree to which the probability fluctuates on either side of 50% doesn't affect the mean, but it does affect the variance.

I think you're making an assumption here that I'm not. Sounds like you're saying this: take these two teams you propose and take the injury as a given, then re-run the season multiple times and the distribution of their wins would be different. That is true* but what I'm saying is that the injury itself isn't a given and that in each of those trials the injury distribution is a random variable itself. With that framework the two teams would have the same distribution of wins because both would end up with the same injury distribution. A tough difference to deal with would be injuries that are truly chance and clearly independently distributed (eg, I twist my ankle stepping on the bag awkwardly) and those that might be predictable (eg, I had surgery in the offseason and it's just a matter of time before I go to the DL). Teams with more broken down players might be considered less good, so perhaps this type of injury could be factored into TQ and as long as the distribution of DL time during the season is random there should be no bias. I'm thinking about the idea of the Yankees this season for example: if we could re-run this season 1000 times in a theoretical world how persistent would A-Rod's games played pattern be? If it would be the same in most cases then you're right. If his games played stat in those 1000 trials would be a distribution and this season is an unfavorable (for the Yankees) outlier I think the binomial approach still holds.

* For a quick test to gauge the impact I run 2,000 trials where a team has a xxx TQ for 81 games and a 1-xxx TQ for the next 81 games. At 200 TQ/800 TQ the variance falls to about 5.00. At 300/700 it's about 5.7 and converges from there to 6.4 for the season long 500 TW.

I also think we're now talking to two slightly different points. My attempt was to address the initial question about 400/600 and see what we can infer about the level of competitiveness in baseball and perhaps compare it to other sports.

To that end, today I took a quick look at some data from 1970 through 2008. There are 776 162 game seasons in that period. The average win total is 81.3 and the standard deviation is 11. How might that result come about? If you run trials with two distributions you can get pretty close... one distribution establishes the teams: I use an average TQ of .500 and an SD of .050 to come up with 2,000 teams of varying TQ. With that assumption the best and worst come in at .314 and .686. The second distribution is binomial for each team and predicts a win total using a constant TQ for all 162 games. The result is a predicted distribution that looks a lot like the actual history although a slightly lower standard deviation (~10.3 versus 11) and a taller predicted peak at the 500 level (actual results show slight negative kurtosis). In-season trades of good players from bad teams to good teams could support that outcome.

Now the problem... for these insights to be valuable as a predictive tool you would have to be omniscient and know the team's TQ, which is what you're pointing out. My analysis, I think, gets at this question: what is the range and distribution of TQ? To get from this info to something that you can use for one team, in the season, to make predictions is another issue. Your last post raises some interesting points about "what TQ actually corresponds to" which I will spend some time considering.



#49 Bellhorn


  • Lumiere


  • 2000 posts

Posted 28 June 2009 - 02:03 PM

QUOTE (gregl @ Jun 27 2009, 06:23 PM) <{POST_SNAPBACK}>
* For a quick test to gauge the impact I run 2,000 trials where a team has a xxx TQ for 81 games and a 1-xxx TQ for the next 81 games. At 200 TQ/800 TQ the variance falls to about 5.00. At 300/700 it's about 5.7 and converges from there to 6.4 for the season long 500 TW.

I also think we're now talking to two slightly different points. My attempt was to address the initial question about 400/600 and see what we can infer about the level of competitiveness in baseball and perhaps compare it to other sports.

Yes, that's true. To the extent that I was objecting to taking the binomial SD at face value, I'll admit that I was haggling over what I knew would be a fairly small difference in the grand scheme of things - and your test above confirms that it is small. As an approximation for the SD in wins in different sports based on the length of the season schedule, I am in complete agreement with the method and its conclusions.
QUOTE
To that end, today I took a quick look at some data from 1970 through 2008. There are 776 162 game seasons in that period. The average win total is 81.3 and the standard deviation is 11. How might that result come about? If you run trials with two distributions you can get pretty close... one distribution establishes the teams: I use an average TQ of .500 and an SD of .050 to come up with 2,000 teams of varying TQ. With that assumption the best and worst come in at .314 and .686. The second distribution is binomial for each team and predicts a win total using a constant TQ for all 162 games. The result is a predicted distribution that looks a lot like the actual history although a slightly lower standard deviation (~10.3 versus 11) and a taller predicted peak at the 500 level (actual results show slight negative kurtosis). In-season trades of good players from bad teams to good teams could support that outcome.

This might be another productive approach for examining the intuition of how this actually applies to baseball results. If we assume that there is no correlation between TQ and the delta variable (|162 * TQ - actual wins|) we can use the fact that the variances in these two variables add up to the observed variance, 121 according to your data. If we approximate the binomial variance as 40, this leaves us with 81 resulting from differences in team quality, i.e. the SD in team quality, measured in wins over a full season, is about 9. Part of what bugs me about this is the thought that team quality beats the delta factor by such a comparatively small margin. But I can't see anything seriously wrong with what you did here (it doesn't make much difference, but shouldn't the SD of TQ be .055?) so I guess I'll just have to reconcile myself to the idea.

#50 Bellhorn


  • Lumiere


  • 2000 posts

Posted 28 June 2009 - 07:15 PM

QUOTE (Bellhorn @ Jun 28 2009, 03:03 PM) <{POST_SNAPBACK}>
(it doesn't make much difference, but shouldn't the SD of TQ be .055?).

To be clear on this, this is what arises from the 9-game SD in TQ which seems to be implied by the data in observed variance combined with the assumption of binomial variance in the delta factor. And on second thought, I wonder if it does mess up your first distribution enough so that the results no longer fit actual data as nicely (or I suppose this could actually help raise the SD from 10.3 to 11, I really don't know.)

Anyway, it seems that we can use this approach to definitively solve the question. Basically, we're looking for some approximate factor <= 1 by which to weight binomial variance in order to account for the possibility that the non-identical distribution of win probabilities lowers this quantity in reality. The "null hypothesis" is that this factor (let's call it x) = 1, as in the test you already ran. This gives us a SD in TQ of about .055, which may or may not lead to a distribution that's a best fit for the data. But what happens if we repeat the process for other values of x? If we subtract, say, (0.95 * binomial variance) from the observed variance of 121, the variance in TQ will be slightly higher - using my approximation of the original binomial variance = 40, the SD in TQ is now .056. If you repeat the two distributions under these conditions, with TQ SD = 0.56 in the first distribution, and a binomial distribution with variance compressed by the factor of 0.95 in the second, do the results match actual data better or worse than the original experiment?

I have no idea if it's worth the time or effort to do this - I'm technologically retarded myself, so I have no idea what goes into it. But in principle this seems like a brute-force method for pinning down the right answer.

Another point to keep in mind is that the variance in TQ seems to get alarmingly large here - with an SD of .056, a sample of 2000 should generate teams above .700 and below .300. So it's possible that there are some hidden assumptions concerning TQ distribution that need to be addressed. Perhaps it is even flatter than a normal distribution at the outer edges, for some reason.