OBP and Run Scoring in 2014

Status
Not open for further replies.

Snodgrass'Muff

oppresses WARmongers
SoSH Member
Mar 11, 2008
27,644
Roanoke, VA
There is a lot of interesting discussion about run scoring and roster construction in the new run scoring environment going on all over the site right now. Check out the excellent bullpen usage thread on the main board for an example. With the Royals and the Giants in the World Series and the general sentiment around baseball that the run scoring environment being down is not a blip on the radar, there are more questions than answers when it comes to building an offense going into this winter, especially for the Red Sox. Ben Cherington was recently quoted as saying...
 
 
“Offense has changed. Power, in terms of at least home runs, is down, and even on-base is down across the league. There are all sorts of reasons. We know we need to build a better offense to produce more. I think we have to be careful not to throw out the baby with the bath water. If we can see pitches, grind at-bats, get on base and still hit with power and hit with runners in scoring position, then I still think that’s a formula to score runs, and more runs than our opposition.’’ 
 
 
The front office is clearly trying to bring back some of the diversity and flexibility the offense displayed in 2013 with a signing like Castillo (speed), or by trading for Cespedes (power) but Ben seems committed to getting on base a lot as the foundation for the offense. So how did high OBP lineups correlate with run scoring this year versus SLG? Edit: This is a graph showing runs scored (X axis) and how OBP and SLG correlate with them (Y axis). The red line is SLG and the blue line is OBP.  I'm sorry the Y axis is not labeled with three decimal places, but I can't figure out how to make Open Office Calc do that. I'll keep digging around tomorrow to see if there's a way to fix that and replace the images again if I can find it.
 

 
It's pretty obvious that OBP correlates with run scoring better than SLG, even as run scoring in general is going down. How does this compare to previous years? Looking at it since the most recent expansion through last year (1998-2013) we see... Edit: This is a graph showing all runs scored between the start of 1998 and the end of 2013 (X axis). The red line is SLG and the blue line is OBP.
 

 
And if we cut it off at the start of the new PED testing in 2006... Edit: This graph shows all runs scored between 1998 and 2005 with runs as the X axis and OBP and SLG on the Y axis again.
 

 
It looks like there is more (as one of my professors used to say about evolution) "variation on the theme" in the 2014 chart than either of the previous two, but that might also be a matter of there being more noise in the smaller sample. The 1998-2005 chart looks to have more variation along the downward trend than 1998-2013, but less than the one from 2014 alone. What remains consistent is that the OBP lines are smoother in all three of them.
 
Run scoring is down, but I think Cherington is probably correct in not wanting to waiver from the high OBP, grinding lineups that have worked for the club in the past. It remains to be seen whether the additions of Cespedes, Castillo and Craig plus a full season of Mookie and some improvement from Bogaerts will be enough to catapult them back to the top of the league in runs scored, but he has certainly set the team up to have more speed and power than they had in 2014. I'm not sure if this tells us anything new about how to look at offenses across the league or whether having the number 8, 12, 14 and 23 run scoring teams in the majors as the last four standing means anything for long term planning by front offices, but it doesn't seem like 2014 was all that different from previous years in how OBP correlates to run scoring versus SLG.
 

Snodgrass'Muff

oppresses WARmongers
SoSH Member
Mar 11, 2008
27,644
Roanoke, VA
Runs scored. I didn't include it in the graph because I was mentioning it in the post, but I probably should have labeled the axis. I'll go back and add it. Thanks for bringing it to my attention.
 
Edit: More precisely, it's the order of runs scored from 1 to 30.
 
Edit 2: Updated the charts.
 

StupendousMan

Member
SoSH Member
Jul 20, 2005
1,932
I have an easier time understanding scatter plots, so here's an example for the year 2014.  If this is useful, let me know and I'll provide similar examples for other recent years.
 
The 'r' value mentioned on the graph is the correlation coefficient.   It suggests that SLG is a slightly better predictor of runs than OBP, but by a small margin.
 
 
 

Morgan's Magic Snowplow

Member
SoSH Member
Jul 2, 2006
22,519
Philadelphia
Thanks Stupendous Man. I find scatter plots much more intuitive as well.

I'm not sure what big conclusions we might draw. The correlations are close enough that I wouldn't make a big deal of the difference. And of course OBP and SLG are themselves highly correlated in practice.

The simplest answer to the question of how to build a team in this run scoring environment is probably just "Acquire good players, whatever their skill sets."
 

Snodgrass'Muff

oppresses WARmongers
SoSH Member
Mar 11, 2008
27,644
Roanoke, VA
Morgan's Magic Snowplow said:
Intuitively, I agree with your conclusions but I really can't interpret those graphs.  What are the axes, what are the units of analysis, and for what years?
 
I added some extra detail in the descriptions of each graph. Hopefully that clarifies it for you.
 
 
StupendousMan said:
I have an easier time understanding scatter plots, so here's an example for the year 2014.  If this is useful, let me know and I'll provide similar examples for other recent years.
 
The 'r' value mentioned on the graph is the correlation coefficient.   It suggests that SLG is a slightly better predictor of runs than OBP, but by a small margin.
 
 
Thank you for this. It's interesting that the correlation coefficient suggests a better correlation with slugging. When I look at that scatter plot I see a greater degree of volatility in the SLG plot and concluded that OBP is a more stable predictor. Is the reason why SLG is coming up with the higher r value that there is a greater difference between the low end values of slugging and the high end values as compared to the difference between the low end values of OBP and the high end?
 
Or is it just that there are more higher slugging percentages in the higher end of the runs axis, even if just slightly? I see eight values for each that are greater than 700 runs, and three for each less than 600. So the difference seems to be in the middle of the pack. If that's the case, I would suggest that we are probably more concerned with what how the best offenses correlate, but that starts to whittle down the sample size for a full season to a small enough number of teams that I'm not sure we could trust the data. I'd be curious to see what the correlation coefficient is in the span from 1998 through 2013.
 

williams_482

Member
SoSH Member
Jul 1, 2011
391
Batting average is a huge part of both OBP and SLG, and most of the benefit to getting hits is that it puts people on base as opposed to possibly advancing them more than one base. I would be interested what the charts would look like if you compared ISO and OBP to get a clearer picture of power vs on base skills. 
 
EDIT: if the red line is SLG and the blue line is OBP, than it looks like SLG is the one that has always correlated better with scoring runs. Am I still not understanding the meaning of those charts, or are the labels wrong?
 

StupendousMan

Member
SoSH Member
Jul 20, 2005
1,932
Here are the league totals -- not team-by-team, but one datum per league per year -- for 1998 - 2014.   I've modified the axes so that it's runs per game, rather than runs, for reasons that aren't clear to me :)
 
The correlations are tighter on a league basis than team-by-team basis.
 
[Edit] Remind me how to compute ISO from the basic stats and I'll give it a shot tomorrow.
 
 

Snodgrass'Muff

oppresses WARmongers
SoSH Member
Mar 11, 2008
27,644
Roanoke, VA
williams_482 said:
EDIT: if the red line is SLG and the blue line is OBP, than it looks like SLG is the one that has always correlated better with scoring runs. Am I still not understanding the meaning of those charts, or are the labels wrong?
 
I think where people are getting mixed up is that the X axis is runs scored from 1st in rank to 30th, meaning the higher run scoring totals would be to the left and lower would be to the right, which is not intuitive, I admit. I'll see if I can either label the axis in a way that makes this clear, or just reverse the X axis and re-post the charts. I was trying to avoid cluttering the graph. Sorry for the confusion.
 

Savin Hillbilly

loves the secret sauce
SoSH Member
Jul 10, 2007
18,783
The wrong side of the bridge....
Now that I understand how the graphs work, it seems the import is that (as CW already holds) OBP is more strongly correlated to run production than SLG, because it takes a smaller increase in OBP than in SLG to produce a given increase in R. Is that correct?
 

Jnai

is not worried about sex with goats
SoSH Member
Sep 15, 2007
16,162
<null>
Savin Hillbilly said:
Now that I understand how the graphs work, it seems the import is that (as CW already holds) OBP is more strongly correlated to run production than SLG, because it takes a smaller increase in OBP than in SLG to produce a given increase in R. Is that correct?
 
OBP and SLG are on completely different scales, though.
 

StupendousMan

Member
SoSH Member
Jul 20, 2005
1,932
Here you go: league-average comparison of ISO vs. runs, and OBP vs. runs.   It seems that OBP correlates better than ISO, but not by a huge margin.
 
 

Savin Hillbilly

loves the secret sauce
SoSH Member
Jul 10, 2007
18,783
The wrong side of the bridge....
StupendousMan said:
Here you go: league-average comparison of ISO vs. runs, and OBP vs. runs.   It seems that OBP correlates better than ISO, but not by a huge margin.
 
Interesting, but that wasn't what I meant by "league average." I meant that in order to remove the difference of scale that Jnai pointed out from the comparison, you'd have to present each OBP and SLG value not as the raw percentage, but as a number scaled to league average in the same way that OPS+ is. I.e., you'd need to compare "SLG+" and "OBP+", not SLG and OBP.
 

crystalline

Member
SoSH Member
Oct 12, 2009
5,771
JP
Savin Hillbilly said:
Now that I understand how the graphs work, it seems the import is that (as CW already holds) OBP is more strongly correlated to run production than SLG, because it takes a smaller increase in OBP than in SLG to produce a given increase in R. Is that correct?
Yes. Hence Theo's quote that OBP should be weighted ~3x more heavily than SLG to properly predict run impact.

The key is the slopes of the best-fit lines on the scatterplots. The x and y axes would normally be swapped as Runs is the dependent variable, so for these plots SLG slope looks higher, but OBP has a higher slope when the plots are flipped about the main diagonal. The OBP line slope should be about 3x the SLG slope, if Theo's statement holds in this runscoring environment.
 

williams_482

Member
SoSH Member
Jul 1, 2011
391
I believe the actual ratio is closer to 1.7 to 1 in favor of OBP, but I don't have a good source on that.
 

smastroyin

simpering whimperer
Lifetime Member
SoSH Member
Jul 31, 2002
20,684
I apologize for this post in advance because I don't have the time to do the actual analysis right now.  But, the one thing we need to understand is that the most important part of statistical modeling is understanding the question you are trying to answer.
 
We have loads of data crunched by loads of people that model team runs versus component offense.  This is the very backbone of things like linear weights.  I urge people to learn them if they are interested in this kind of modeling.  
 
Beyond that, I'm not quite sure which question we are trying to answer.  Is it whether lower run scoring environments put different relative value on components?  You can look that up, too, but it's an interesting question.
 
Correlation and prediction are two different things and the importance of underlying statistics cannot be determined by correlation alone, which in case of graphs above merely state that the variance of the SLG predicts the variance in RS more than the variance in OBP, but this may be because SLG has a wider variance of its own (*May* be I have not done the analysis).  In general, it is well understand that the marginal utility of a point of OBP is higher than a point of SLG.
 
But this analysis is just the first part of figuring out how to assemble a team.  And I'm not sure what the next steps that are being attempted in this thread are.
 
One way this could go is to look at marginal utility in more detail.  Let's just start with some basic assumptions that 10 points of OBP over 600 PA is worth 1 run, and that 17 points of SLG are worth the same run, in the overall model.  You could then look at all of the groups across the range and see if that value stays the same.  In that, let's just say that above a .400 OBP you see diminishing returns and 10 points of OBP are only worth .8 runs from .400-.450 and .5 runs from .450-.500.  Now let's say that for SLG you maintain your marginal utility no matter what.  So the jump from .400 to .417 gives you a run, and the jump from .600 to .617 gives you the same run.  Then you could say that above a certain threshold, you are better off looking for SLG.  (I don't believe any of this, it is an example).
 
Another way would be to look at efficiency of various offenses.  You could probably do this through simulation.  String together a bunch of high OBP/low SLG guys and mix it up by inserting high SLG guys, etc.  
 
But at the end of the day the most important thing in determining the way to go is replacement value, which is why despite the difficulty in determining replacement value, it remains important enough to try.  If the replacement value for OBP is consistently higher, then SLG can become more important on an individual basis.
 
All of that said, as frustrating as DPs and LOB are, I take OBP every time, without running any of the numbers.
 
Status
Not open for further replies.