Jump to content


Yo! You're not logged in. Why am I seeing this ad?

Photo

Live Nude Girls! (team offense consistency)


This topic has been archived. This means that you cannot reply to this topic.
28 replies to this topic

#1 Worst Trade Evah


  • SoSH Member


  • 10834 posts

Posted 09 September 2005 - 01:32 PM

Our offense this year has been fantastic. We lead the Yankees by a slender margin in EQA, in addition to the simple lead in runs scored.

But it's even better than it looks: one thing that's struck me is how consistent our offense has been. Later I'll try to do some breakdowns by hitter, but it occurred to me that one way to express consistency on a team level is simply by looking at the standard deviation of runs scored, or by expressing standard deviation as a percent of the average.

Anyway, here is a chart for all AL teams through 2005 of runs scored by run increment. 0 starts at the bottom, then 1, etc. A general downward push of the colors is a good thing. Note also the relative evenness of the sizes in the middle-upper range, esp. light blue down to dark purple. These are through the games of 9/8/05.
Posted Image

Here's a summary of results:

Team Games Runs Avg stdev %dev Mode
ANA 138 634 4.59 3.08 0.67 3
BAL 139 625 4.50 2.84 0.63 3
BOS 138 794 5.75 3.27 0.57 6
CHA 138 649 4.70 2.93 0.62 2
CLE 138 663 4.80 3.19 0.66 4
DET 137 628 4.58 3.21 0.70 3
KCA 137 579 4.23 3.08 0.73 1
MIN 139 600 4.32 2.80 0.65 1
NYA 138 739 5.36 3.52 0.66 5
OAK 139 670 4.82 3.54 0.73 5
SEA 138 610 4.42 2.91 0.66 3
TBA 140 643 4.59 2.77 0.60 4
TEX 140 743 5.31 3.36 0.63 2
TOR 139 654 4.71 3.19 0.68 2

Obviously, normally there's a direct, linear relation between number of runs scored and the standard deviation. But here the Sox are with both the highest run average and the lowest percent standard deviation. (Amazing!) I think this a measure of consistency of team run production.

The mode helps illustrate that. Our mode is an amazing 6, which we've scored 22 times (!). We've also scored 7 runs 16 times, and 8 runs 12 times.

We lead the league in the number of times we've scored 6, 7, and 8 runs each, and in total for the number of times scored from 4-8. The White Sox and Texas are relatively close also.

Some questions:
-Are there attributes of offenses that make them more consistent?
-Is it a matter of collecting consistent hitters?
-What is consistency, and how do we look at it?
-Month by month breakdowns are accessible, but are they adequate?
-Do consistent teams perform better overall?

edit: is there a better way to do tables? does this even make sense? is stdev useful in this context?
edit2: feeble effort to drum up some interest.
edit3: tidied up writing

Edited by Worst Trade Evah, 10 September 2005 - 06:50 AM.


#2 smastroyin


  • simpering whimperer


  • 16569 posts

Posted 09 September 2005 - 01:48 PM

Beyond the consistency, I think the real key stat here is the number of times the Sox have score 5+ runs, which is significantly more than anyone but the Yankees and Rangers.

#3 Worst Trade Evah


  • SoSH Member


  • 10834 posts

Posted 09 September 2005 - 02:04 PM

Here's some of the raw data, through 9/8/05

TEAM	ANA	BAL	BOS	CHA	CLE	DET	KCA	MIN	NYA	OAK	SEA	TBA	TEX	TOR
0	6	5	3	5	9	9	9	9	2	12	5	4	7	12
1	12	15	9	11	10	13	22	19	16	12	16	11	9	6
2	17	17	8	25	19	16	16	14	9	19	19	20	18	22
3	25	20	16	7	16	26	18	16	17	14	22	15	9	16
4	18	20	16	18	21	14	16	19	17	12	20	24	14	19
5	20	18	14	24	14	11	13	16	19	20	18	20	17	14
6	8	11	22	19	18	11	10	12	19	17	10	19	18	13
7	9	13	16	8	5	10	11	16	10	4	8	6	15	10
8	6	9	12	8	4	12	10	6	9	7	6	6	13	9
9	7	5	5	5	10	4	7	6	6	5	5	4	5	7
10	3	1	6	2	4	5	1	4	2	5	5	3	6	3
11	1	1	5	1	3	2	1	1	3	5	3	4	1	2
12	1	2	2	3	3	2	2	1	4	3	0	2	3	5
13	4	1	1	0	3	1	0	0	2	2	0	1	1	0
14	1	1	0	1	0	0	0	0	0	0	2	1	1	0
15	0	0	0	1	0	0	0	0	1	0	0	0	0	1
16	0	0	1	0	0	0	0	0	0	2	0	0	1	0
17	0	0	2	0	0	1	1	0	0	0	0	0	0	0
18	0	0	0	0	0	0	0	0	0	0	0	0	2	0
19	0	0	0	0	0	0	0	0	1	0	0	0	0	0
20	0	0	0	0	0	0	0	0	1	0	0	0	0	0


*meh, how do you format this stuff? I'm copying and pasting from excel, and using the code tag.

Edited by Worst Trade Evah, 09 September 2005 - 02:05 PM.


#4 Worst Trade Evah


  • SoSH Member


  • 10834 posts

Posted 09 September 2005 - 06:23 PM

I guess this interests no one but me. I'll just go ahead and stick other things in here as they come.

If %stdev, or stdev/average run per game, is a team run consistency metric, then the rankings would be:

BOS 0.57
TBA 0.60
CHA 0.62
BAL 0.63
TEX 0.63
MIN 0.65
NYA 0.66
SEA 0.66
CLE 0.66
ANA 0.67
TOR 0.68
DET 0.70
KCA 0.73
OAK 0.73

Edited by Worst Trade Evah, 09 September 2005 - 06:24 PM.


#5 joyofsox


  • empty, bleak


  • 6338 posts

Posted 09 September 2005 - 10:31 PM

  ANA BAL BOS CHA CLE DET KCA MIN NYA OAK SEA TBA TEX TOR

0   6   5   3   5   9   9   9   9   2  12   5   4   7  12

1  12  15   9  11  10  13  22  19  16  12  16  11   9   6

2  17  17   8  25  19  16  16  14   9  19  19  20  18  22

3  25  20  16   7  16  26  18  16  17  14  22  15   9  16

4  18  20  16  18  21  14  16  19  17  12  20  24  14  19

5  20  18  14  24  14  11  13  16  19  20  18  20  17  14

6   8  11  22  19  18  11  10  12  19  17  10  19  18  13

7   9  13  16   8   5  10  11  16  10   4   8   6  15  10

8   6   9  12   8   4  12  10   6   9   7   6   6  13   9

9   7   5   5   5  10   4   7   6   6   5   5   4   5   7

10  3   1   6   2   4   5   1   4   2   5   5   3   6   3

11  1   1   5   1   3   2   1   1   3   5   3   4   1   2

12  1   2   2   3   3   2   2   1   4   3   0   2   3   5

13  4   1   1   0   3   1   0   0   2   2   0   1   1   0

14  1   0   1   0   0   0   0   0   0   2   1   1   0   0

15  0   0   0   1   0   0   0   0   1   0   0   0   0   1

16  0   0   1   0   0   0   0   0   0   2   0   0   1   0

17  0   0   2   0   0   1   1   0   0   0   0   0   0   0

18  0   0   0   0   0   0   0   0   0   0   0   0   2   0

19  0   0   0   0   0   0   0   0   1   0   0   0   0   0

20  0   0   0   0   0   0   0   0   1   0   0   0   0   0


#6 Eric Van


  • Kid-tested, mother-approved


  • 10990 posts

Posted 09 September 2005 - 11:38 PM

I wish I could remember which SoSHer explored the relationship of RS and RA standard deviation to Pythagorean performance last year (stand up and take a bow!). The short version is that you get a more accurate version of the Pyth formula if you subtract some percentage of the SD from both RS and RA (and change the exponent). I chimed in and explained the logic of why offensive consistency yields extra wins and why high defensive inconsistency does the same. (The added extra wins scored or given up in a blowout boosts your RS or RA total and your SD, without gaining or costing you an extra win or loss. Gain the same number of RS (or RA) but spread them out evenly across a number of games, you lower your SD and have a relatively large effect of wins and losses.)

The Sox' extremely low SD of RS is certainly a very big reason why they have "overperformed" Pyth by 4.3 wins going into tonight's game. So this is a very cool discovery.

The next step is to try and figure out what sorts of lineups have a low SD. The obvious geuess is that top-to-bottom strength lowers SD, while focusing all the offense on a couple of guys would raise it, but there may be other factors as well.

#7 Worst Trade Evah


  • SoSH Member


  • 10834 posts

Posted 10 September 2005 - 12:25 AM

I wish I could remember which SoSHer explored the relationship of RS and RA standard deviation to Pythagorean performance last year (stand up and take a bow!).  The short version is that you get a more accurate version of the Pyth formula if you subtract some percentage of the SD from both RS and RA (and change the exponent).  I chimed in and explained the logic of why offensive consistency yields extra wins and why high defensive inconsistency does the same.  (The added extra wins scored or given up in a blowout boosts your RS or RA total and your SD, without gaining or costing you an extra win or loss.  Gain the same number of RS (or RA) but spread them out evenly across a number of games, you lower your SD and have a relatively large effect of wins and losses.)

The Sox' extremely low SD of RS is certainly a very big reason why they have "overperformed" Pyth by 4.3 wins going into tonight's game.  So this is a very cool discovery.

The next step is to try and figure out what sorts of lineups have a low SD.  The obvious geuess is that top-to-bottom strength lowers SD, while focusing all the offense on a couple of guys would raise it, but there may be other factors as well.

<{POST_SNAPBACK}>


I've started doinking around with a database of basically all major league games ever played. If I can make it work, I'll try to isolate offenses with a similar run per game average, and see how their variances correspond, and what sorts of offenses are most similar.

Thanks for the feedback!

And thank you joyofsox for the edit. Was there a trick, or you just typed out spaces until it worked?

Edited by Worst Trade Evah, 10 September 2005 - 01:10 AM.


#8 philly sox fan


  • SoSH Member


  • 9748 posts

Posted 10 September 2005 - 08:08 AM

I guess this interests no one but me. I'll just go ahead and stick other things in here as they come.

<{POST_SNAPBACK}>


No, this is a really good thread. I just didn't have time to look up and post some stuff yesterday.

The issue of run consistency was the subjct of some work by Dave Studeman over the The Hardball Times earlier this year. There was a lot of talk about the White Sox "smartball" offense and whether or not that approach was helping them to be more consistent.

By your numbers, the White Sox small ball and lots of HR approach has been the 3rd most consistent offense this year. That their approach is so different from Boston and yet they've been nearly as consistent is perhaps suggestive that it'll be hard to find good evidence that any one approach is consistent with, well, consistency.

This is a link to the followup article which contains a link to the original:

Runs Per Game

To me the table in the center is the most important nugget of information.

RS    Win%   Diff
    0    .000
    1    .077   .077
    2    .208   .131
    3    .339   .131
    4    .471   .132
    5    .593   .122
    6    .686   .092
    7    .776   .090
    8    .840   .064
    9    .874   .034
   10    .921   .047
   11    .939   .018
   12    .963   .025
   13    .987   .024
   14    .978  -.009
   15    .976  -.001
   16    .983   .007
   17   1.000   .017
In terms of winning ballgames, the second through the fifth runs have the most impact, followed by the sixth and seventh runs, and then the first run.

As an offense scores marginal runs the team Wpct goes up dramatically through 5 runs and then starts to trail off.

#9 philly sox fan


  • SoSH Member


  • 9748 posts

Posted 10 September 2005 - 08:42 AM

This post had me thinking about the Sox approach and I looked at their team HR totals and was pretty surprised at how the Sox HR distribution compared to other AL teams.

Note: all stats are prior to last night’s games.

First lets look at team HR rank.

Tex   232
NYY*  192
CWS*  178
CLE*  171
BAL   168
Bos*  168
TB    140
DET   135
OAK*  132
LAA*  122
TOR   119
SEA   117
MINN  116
KC    104

* denotes playoff contender just to see if there’s anything interesting about the good teams in the AL this year.

Sox are tied for 5th in the AL which is probably a bit low for a team with such a good offense. Four of the playoff contenders are amongst the top 6 teams in HR. Hitting lots of HR is good. Oak and LAA lag well behind, but have very good pitching.

When I saw the Sox total of 168 the first thing that dawned on me is that a huge chunk of that comes from just Ramirez and Ortiz. Let’s next look at the number of double digit HR hitters per team.

TEX   9
CWS*  9
CLE*  9
NYY*  8
BAL   7
TB    7
DET   7
OAK*  5
TOR   5
MINN  5
BOS*  4
LAA*  4
SEA   4
KC    4

There’s a natural split in the AL between teams with 7-9 double digit HR hitters (including 3 playoff contenders) and teams with 4-5 double digit HR hitters (including 3 playoff contenders).
We spent so much time making fun of the epic 200 AB home run draughts for various infielders, but I hadn’t realized that the Sox had so few hitters that cleared the 10 hr mark. The Sox big four are: Ortiz 38, Ramirez 33, Varitek 21 and Nixon 12. And after that there are a bunch of guys between 6-9 HR. The Sox have good to great OBP up and down the lineup, but the HR power is really heavily concentrated in just three players. For example, in last night’s lineup (basically the regular one these days) the big 3 of Ortiz-Ramirez-Varitek had 92 HR and the other six players combined for 50 HR.

Finally let’s look to see if Ortiz and Ramirez make a disproportionate percentage of the Sox team total. This table compares the percentage of HR by each team’s top 2 HR hitters.

SEA   43.6
BOS*  42.3
NYY*  35.4
CWS*  35.4
TOR   34.5
LAA*  33.6
MINN  31.9
KC    30.8
OAK*  30.3
TB    30.0
TEX   29.3
DET   28.1
BAL   27.4
CLE*  26.3

The Sox duo is definitely outside the norm. Seattle tops the Sox on a combination of one great HR hitter, Sexson, and an overall very low total, 12th in the league.

Most teams are in the 30-35% range. Amongst the playoff contenders the Sox are the heavily concentrated team at 42% and the Indians are the more distributed team at a league low 26%.

Again, most teams, good and bad, behave similarly, but there are also good teams at both extremes.

During the Yankees late 90s dynasty, there was a lot made of how poorly teams with single big HR hitters performed in the post-season. That was misconstrued to mean that HRs were overrated, but in fact the Yankees were a high HR team they just happened to distribute them more evenly throughout the lineup. They won lots of post-season games with HRs, it was just that they did it by often getting those HRs from guys with modest HR power, ie Brosius, O’Neill, Jeter.

From that example – and not much more – I’ve always thought a more even distribution was preferable. If the main couple guys are held down, then there are several other players who can hit a big home run. The Sox, at least in the regular season, have been dependent on two guys to hit most of their home runs.

Good thing, he crosses his fingers making typing really, really hard, the Sox duo are unstoppable clutch gods.

#10 Worst Trade Evah


  • SoSH Member


  • 10834 posts

Posted 10 September 2005 - 09:02 AM

As an offense scores marginal runs the team Wpct goes up dramatically through 5 runs and then starts to trail off.

<{POST_SNAPBACK}>


Very interesting, philly -- thank you. Along these lines, it's worth noting that the White Sox have a relatively large mode at 5 runs, which they've done 24 times. They have an oddly bimodal run distribution, with another chunk of 25 games with 2 runs -- but only 7 times with 3 runs this year, so 2, 3, 4, 5 runs account for 25, 7, 18, and 24 games, respectively. Just an odd fluke I guess -- an aspect of their fluky year. Most, but not all, teams have a more normal looking run distribution.

In my initial playing with correlation matrices, team OBP seems to correlate better than either SLG or OPS, but I don't think what I have so far means anything and I don't have any confidence in what I'm doing there.

I'd like to explore the impact of variance in lineups: is there a place to go to find out stats for teams' lineup positions, eg, what Angel number 6 hitters have done all year?

#11 Worst Trade Evah


  • SoSH Member


  • 10834 posts

Posted 10 September 2005 - 09:16 AM

From a link from philly's link, I can see that Sean Ehrlich did something like this in BP in June. He uses stdev of runs scored, notes the linear relation to average runs scored, and then employs multiple regression to test component significance, not finding any.

I'm not sure he could find it with his method though, since he's not exploring the impact of variance distribution within a team, which is probably where the issue is. There might be attributes of team-level offense that are more predictable (an even distribution of an OPB weighted OPS?), but it also might be an attribute of *people*. Is consistency an attribute of hitters? If so, a team-level analysis might just blur everything.

Edited by Worst Trade Evah, 10 September 2005 - 09:19 AM.


#12 URI


  • stands for life, liberty and the uturian way of life


  • 10038 posts

Posted 10 September 2005 - 10:37 AM

One thing to note with the Red Sox HR's is that a disproportional amount of them are hit with runners on base.

On my other computer, I have the exact number, but it's a far amount of HR hit above what could normally be expected.

I'll edit this post in a few minutes with the Sox, and Yankees (the others will have to wait).

#13 Stuffy McInnis

  • 693 posts

Posted 10 September 2005 - 10:52 AM

One thing to note with the Red Sox HR's is that a disproportional amount of them are hit with runners on base. 

On my other computer, I have the exact number, but it's a far amount of HR hit above what could normally be expected.

I'll edit this post in a few minutes with the Sox, and Yankees (the others will have to wait).

<{POST_SNAPBACK}>


That may very well be an direct result of the organization's OBP-heavy offensive philosophy and when it applies. Without runners on base, patience is encouraged but in situations with multiple runners on base they can swing away like any other club. Of course all teams will generally have that approach (swing for the fences more with runners on base) but the Sox's OBP focus might be depressing HR in other situations (and distorting the ratio).

#14 67WasBest


  • Concierge


  • 1967 posts

Posted 10 September 2005 - 11:42 AM

That may very well be an direct result of the organization's OBP-heavy offensive philosophy and when it applies. Without runners on base, patience is encouraged but in situations with multiple runners on base they can swing away like any other club. Of course all teams will generally have that approach (swing for the fences more with runners on base) but the Sox's OBP focus might be depressing HR in other situations (and distorting the ratio).

<{POST_SNAPBACK}>


I think it's simply a case where we have more guys on base than any other team. If we hit HR's at normal frequecy we will have more opportunities with runners on base.

#15 Eric Van


  • Kid-tested, mother-approved


  • 10990 posts

Posted 10 September 2005 - 11:49 AM

Folks should know that the Sox have scored 798 R versus 789.7 according to Contextual Runs, 793 according to EqA (which is less accurate).

#16 philly sox fan


  • SoSH Member


  • 9748 posts

Posted 10 September 2005 - 11:56 AM

Folks should know that the Sox have scored 798 R versus 789.7 according to Contextual Runs, 793 according to EqA (which is less accurate).

<{POST_SNAPBACK}>


I don't think accurate is quite the right word. Both methods are about 1% off of the Sox actual run total. That doesn't leave much room for meaningful accuracy differences.

#17 anaxamandr


  • Unleashed the Brent


  • 746 posts

Posted 10 September 2005 - 04:41 PM

oops
my table solution didn't work.
does html not come out on this new board?

Edited by anaxamandr, 10 September 2005 - 04:42 PM.


#18 Eric Van


  • Kid-tested, mother-approved


  • 10990 posts

Posted 10 September 2005 - 05:51 PM

I don't think accurate is quite the right word.  Both methods are about 1% off of the Sox actual run total.  That doesn't leave much room for meaningful accuracy differences.

<{POST_SNAPBACK}>

Umm, more accurate when you look at hundreds of teams over many years? That's the claim to fame of CR.

#19 philly sox fan


  • SoSH Member


  • 9748 posts

Posted 10 September 2005 - 08:20 PM

Umm, more accurate when you look at hundreds of teams over many years?  That's the claim to fame of CR.

<{POST_SNAPBACK}>


You don't think CR and EQR are within each other's margin of error when looking at a specific team?

#20 Eric Van


  • Kid-tested, mother-approved


  • 10990 posts

Posted 10 September 2005 - 10:13 PM

You don't think CR and EQR are within each other's margin of error when looking at a specific team?

<{POST_SNAPBACK}>

Yeah, but which one is more likely to be more accurate?

CR includes ROE and OOB, which can sometimes skew EqA.

#21 philly sox fan


  • SoSH Member


  • 9748 posts

Posted 11 September 2005 - 11:26 PM

Yeah, but which one is more likely to be more accurate?

CR includes ROE and OOB, which can sometimes skew EqA.

<{POST_SNAPBACK}>


The one with the significant digit to the right of the decimal point? :lol:

#22 URI


  • stands for life, liberty and the uturian way of life


  • 10038 posts

Posted 12 September 2005 - 10:12 AM

That may very well be an direct result of the organization's OBP-heavy offensive philosophy and when it applies. Without runners on base, patience is encouraged but in situations with multiple runners on base they can swing away like any other club. Of course all teams will generally have that approach (swing for the fences more with runners on base) but the Sox's OBP focus might be depressing HR in other situations (and distorting the ratio).


Sorry I didn't get to this the last few days...

Lets call the OBP approach you mentioned as starting in 2003, because that's when Theo started, and thats what I have readily available data for.

Home runs with men on base vs. expectation:
2003: -22.520
2004: -10.742
2005: 2.669

Which is, of course, not what I originally expected, but also shows that over the last three years, the Sox OBP-approach did nothing to help push across runs at a higher rate than they should have, given team norms.

At least as it comes to HR with men on base.

Hitting in scoring position is probably where the bulk of the run scoring I mentioned before came.

The Sox raw number of hits with runners in scoring position vs. expectation:
2003: 0.358
2004: 20.501
2005: 17.788

#23 Timmeh49

  • 1752 posts

Posted 12 September 2005 - 12:39 PM

From a link from philly's link, I can see that Sean Ehrlich did something like this in BP in June. He uses stdev of runs scored, notes the linear relation to average runs scored, and then employs multiple regression to test component significance, not finding any.

<{POST_SNAPBACK}>

WTE... Could you post a little more of the Ehrlich article for those of us without a BP subscription? Thanks.

#24 Worst Trade Evah


  • SoSH Member


  • 10834 posts

Posted 12 September 2005 - 12:53 PM

My thought was that "consistency" is not an attribute that implies either more or less runs, but that it moved the run profile to a more valuable shape (per philly's link/table above). If that were true, then you'd expect "consistent" teams (as indicated by the %stdev metric here) to tend to outperform their pythagorans.

Sadly, I can't really see that's true:

%st	BP_Diff	dif1	dif2	dif3
BOS	0.57	12	3.3	3.2	2.2
TBA	0.60	-18	2.4	-2.8	-5.4
CHA	0.62	7	6.7	10.2	10.5
BAL	0.63	-45	1.6	-4.8	-5.8
TEX	0.63	-2	-2.1	-6.6	-8.1
MIN	0.65	-9	-1	-0.1	-0.5
NYA	0.66	-16	2.9	0.9	-0.4
SEA	0.66	31	-6.1	-0.8	-3.8
CLE	0.66	-23	-0.7	-3.4	-2
ANA	0.67	8	-0.9	3.5	2.6
TOR	0.68	14	-5.5	0.1	-1.2
DET	0.70	-11	-1	-4.6	-4.9
KCA	0.73	17	-4.5	-4.1	-6.2
OAK	0.73	27	-3.3	-1.2	-1.9

correl bp 0.327504361
correl dif1 -0.64534997
correl dif2 -0.294767694
correl dif3 -0.258967357

http://www.baseballp...s/standings.php

%st = the %stdev metric employed earlier, or stdev/average runs
BP Diff = the difference between actual runs and BP's expected runs
Dif1, Dif2, and Dif3 are the deltas between actual wins and W1, W2, and W3 (which are various and increasingly complicated ways of estimating expected wins).

At bottom are the correlations between each of the last 4 columns and the %stdev column.

I don't see anything of value here. Does anyone else? Maybe the %stdev metric doesn't measure consistency meaningfully, or maybe consistency really just doesn't have much value? Or is so evenly distributed among teams that none of it really matters. I'm not very good with stats, so am wide open to suggestions here.

If anything it's a negative correlation -- but that might be messed up by the fact that there's interleague play, and the AL teams have underperformed relative to the NL teams (even while beating them more often overall). At least, in W3, which adjusts for strength of schedule, AL teams are down -24. Probably NL teams should be mixed in here, but I didn't want to combine them originally. Should go back and do other years, but this is pretty time consuming.

Edited by Worst Trade Evah, 12 September 2005 - 01:05 PM.


#25 URI


  • stands for life, liberty and the uturian way of life


  • 10038 posts

Posted 12 September 2005 - 12:57 PM

I'm starting to think that Adam Smith is the only thing that is a good predictor of Pythag records.

I mean, you would think things like offensive and defensive consistancy would have an affect on over/under performance.

#26 Worst Trade Evah


  • SoSH Member


  • 10834 posts

Posted 12 September 2005 - 01:09 PM

nm

need more data -- not sure what the impact of games lost to the NL is here

Edited by Worst Trade Evah, 12 September 2005 - 01:17 PM.


#27 Worst Trade Evah


  • SoSH Member


  • 10834 posts

Posted 12 September 2005 - 01:13 PM

WTE... Could you post a little more of the Ehrlich article for those of us without a BP subscription?  Thanks.

<{POST_SNAPBACK}>


Can do some excerpts to try to get the gist of it (from http://www.baseballp...rticleid=4125):

The crux of this defense is that small ball will lead to more consistent run scoring than the take-and-rake approach of walks and homers. Waiting for a home run, it is argued, only leads to runs when the home runs come. On the other hand, runs can always be manufactured if you have people good at manufacturing runs. Quoted in a St. Paul Pioneer Press story about how the White Sox are trying to emulate the Twins approach that defeated them year after year, Willie Harris, a good small-baller himself forced out of a starting job by the off-season moves to bring in better small-ballers, nicely summed up the argument. "To me," he said, "a small-ball team is more consistent. You're going to have small ball every day as opposed to the home run, which won't be there every time."

To measure small ball, I added together the total number of stolen base attempts and sacrifice hits to create the Small-Ball Index (SBI). Admittedly, there is more to small ball than stolen bases and bunts, like hitting and running or moving the batter over, but SBI is easy to measure and should capture the essence of the approach. In 2003, for instance, the Marlins lapped the Majors with an SBI of 306 with Anaheim coming in second with an SBI of 240. Oakland and Toronto brought up the rear with SBIs of 84 and 73 respectively. This shouldn't be surprising, given the GMs in Oakland and Toronto and the emphasis on speed and putting the ball in play in Florida and Anaheim.

The essence of the take-and-rake approach is drawing walks and hitting home runs. I thus add isolated power (slugging percentage minus batting average) and “isolated patience” (on base percentage minus batting average) to create the Take-and-Rake Index (TRI). Boston and the Yankees led the Majors in TRI in 2003, with scores of .273 and .268 respectively, while the Dodgers and Tampa Bay came in last. Again, none of this should be surprising, although coming in last in SBI seems to be by design while coming in last in TRI might be more through overall offensive incompetence.

If the performance analysts are right that walks and home runs are the road to runs scored, then teams with a high TRI should score more and should, for that reason, have higher variance in runs scored. Looking only at TRI and SBI, therefore, may be giving credit to the manner in which the runs are scored when the effect is really driven by the number of runs scored. Therefore, I turn to a multiple regression framework to assess the effect.

A multiple regression framework enables you to see the effects of one variable while controlling for the effects of other variables. In other words, it enables you to hold constant the number of runs scored while examining the influence of hitting approach on variance in runs scored. With this framework, I can answer the following question: if two teams scored an identical number of runs, would the team playing small-ball have lower variance than the team taking and raking?

The results strongly suggest that the answer to this question is no.

Hope that's not too much. Apologies to bp if so.

Edited by Worst Trade Evah, 12 September 2005 - 01:14 PM.


#28 philly sox fan


  • SoSH Member


  • 9748 posts

Posted 12 September 2005 - 01:32 PM

Willie Harris, a good small-baller himself forced out of a starting job by the off-season moves to bring in better small-ballers, nicely summed up the argument. "To me," he said, "a small-ball team is more consistent. You're going to have small ball every day as opposed to the home run, which won't be there every time."


I think that attitude - that small ball or speed or whatever - doesn't slump is part of the noise in this kind of data. While Willie Harris or whomever may be able to run fast or lay down a bunt each and every day, the big things that drive run production are always going to have a lot of variability for every player and every team.

Every team is going to have stretches when key players simply aren't hitting enough to help their team. The only real way to prevent those normal random variation slumps from leading to team wide inconsistency is to simply have as many good players as possible so that you give your team the best chance possible of getting up to 4/5 runs each game.

A team that is consistent around 4 R/G isn't likely to win more than a team that is inconsistent around 5 R/G (though I know you can make extreme examples where it may be true for stretches).

Total talent always has to be more important than the philosophy or usage guinding that talent. I make that same point in any bullpen usage debates that come up as well. Knock yourelf out with maximumn efficiency, maximum leverage Jamesian bullpen models, but when it comes right down to it an inefficent pen with better talent will be more effective.

It's a flat and uninteresting thing to say, but when you get right down to it the answer is usually - more and better talent is the way to go.

#29 Timmeh49

  • 1752 posts

Posted 12 September 2005 - 11:25 PM

I wish I could remember which SoSHer explored the relationship of RS and RA standard deviation to Pythagorean performance last year (stand up and take a bow!).  The short version is that you get a more accurate version of the Pyth formula if you subtract some percentage of the SD from both RS and RA (and change the exponent).

<{POST_SNAPBACK}>

Maybe this was me.

Here's what I did. I looked at runs scored and allowed frequency data for every year from 1985-2004 (558 team-seasons; data from Baseball Prospectus). For each team I looked at a number of different linear regressions. They were all of the form:

ln(W/L) = A + B*ln(scored/allowed) + C*ln(sig_scored/sig_allowed)+D*ln(k_scored/k_allowed)

where
* scored and allowed are obviously runs scred and runs allowed
* sig_scored and sig_allowed are the standard deviations of runs scored and allowed for a team
* k_scored and k_allowed are (sort of) the skewnesses of the distributions

A brief digression... "sig", i.e. the standard deviation is simply a normalized second moment:

sig_x = sqrt[avg(x - avg(x))^2 ], i.e. the square root of the average square deviation. My "k" is defined as

k = [avg(x-avg(x))^3]^(1/3), i.e. the cube root of the average cubed deviation

Anyway, I'll simply provide the highlights...

* The intercept (the A term) was not statistically different than zero

* When fixing A = C = D = 0 (i.e. fit the runs only)
*** Adj r-squared = 0.884
*** B = 1.938, std_err = 0.0295, t-stat = 65.6

* When fixing A = D = 0 (i.e. include runs and standard deviation of runs):
*** Adj r-squared = 0.905
*** B = 2.208, std_err = .0359, t-stat = 61.44
*** C = -0.5054, std_err = 0.0450, t-stat = 11.23

* The coefficient on the k-ratio was not statistically different from 0

Thus, it appears to me (and by no means am I any kind of statistics expert) that a good bit of the data can be explained by including only runs scored and allowed, but the explanation can be improved by including standard deviation of the distribution.

The "Pythagorean" formula, including standard deviations, would therefore be

                      scored^2.208 * sig_allowed^0.5053
pct = ---------------------------------------------------------------------
      scored^2.208 * sig_allowed^0.5053 + allowed^2.208 * sig_scored^0.5053
which is consistent with what Eric wrote above: that inconsistency on defense (high sig_allowed) and consistency on offense (low sig_scored) helps to win games.

Edited by Timmeh49, 13 September 2005 - 08:02 AM.