Statistical Reference Page
From SoSH
Contents |
[edit]
Offense
- ABR (Adjusted Batting Runs)
- From Total Baseball by Pete Palmer & John Thorn. A measure of the number of runs a batter generates beyond what an average batter generates in the same number of plate appearances, park-adjusted. The formula is:
- ABR = (.47)1B + (.78)2B + (1.09)3B + (1.40)HR + (.33)(BB + HB) - (.25)(AB - H) - (.50)OOB
- where OOB is Outs On Base.
- The -0.25 value for an out is modified depending on the league and season in question such that the league ABR total is zero (thus an ABR of zero means the batter was exactly average).
- BPA (Bases per Plate Appearance)
- Created in 1992 by Bill Gilbert, a SABR member. He's been a salary arbitration specialist for Tal Smith Enterprises in Texas since 1991.
- BPA = (TB+BB+HBP+SB-CS-GIDP) / (AB+BB+HBP+SF)
- BABIP (Batting Average on Balls in Play)
- A player's batting average on the balls he puts into play, which excludes strikeouts and HRs that leave the park. It's designed to illustrate the effects of luck and the opposing team's defense on a batter's performance. The most common formula is:
- BABIP = (H - HR) / (AB - HR - SO + SF)
- Eric Van has started a probably Quixotic movement to call BABIP "BPA" (Ball in Play Average) for the simple reason that it takes up 60% as much space as a spreadsheet column label.
- BOP (Base Out Percentage)
- Created by Barry Codell, it was published in the Baseball Research Journal in 1979. Similar to Tom Boswell's Total Average, it's the ratio of bases gained to outs made. The difference is that Codell includes SH as bases gained in the numerator.
- BOP = (TB + BB + HBP + SB + SH + SF) / (AB - H + CS + GIDP + SH + SF)
- BR (Batting Runs)
- From Total Baseball by Pete Palmer & John Thorn. A linear weights formula used to determine the value of a player's offensive play above or below average:
- .47*1B + .78*2B + 1.09*3B + 1.4*HR + .33*(BB+HBP) + .3*SB - .6*CS - .27*(AB-H)
- BRAA (Batting Runs Above Average)
- The number of runs better than a hitter with a .260 EQA in the same number of outs.
- BRAR (Batting Runs Above Replacement player)
- The number of runs better than a hitter with a .230 EQA and the same number of outs.
- BRARP (Batting Runs Above Replacement player of same Position)
- Simply BRAR calculated to each position.
- BRC (Batting Runs Created)
- Jim Bennett's (Vermonter-At-Large) version of Pete Palmer and John Thorn's Batting Runs, using a more extensive formula (based in part on work published by Tom Tango) that calculates theoretical batting runs from the sums of a batter's core batting rates. The basic element of BRC are actually calculated at the per-plate-appearance level to allowe for more modular use of BRC in projections, comparisons and other analytic study.
- The formula for BRC/PA is:
- BRC= .32*BB/PA + .48*1B/PA + .77*2B/PA + 1.07*3B/PA +1.48* HR/PA - .333*(.29*K/PA + .31*GO/PA + .28 *AO/PA)
- C# (Clutch Number)
- Created by Bill James as a part of his runs created formula, and co-opted by URISoxFan (Jeff Kuhn), this is a component of Runs Created. It measures home runs hit with men on base (relative to normal level), plus hitting with runners in scoring position (relative to normal level). The formula is:
- C# = [HRmob - (ABmob * (HR/ab)] + (Hrisp - (BA * ABrisp)
- where mob = men on base, and risp = runners in scoring position.
- CBA (Contact Batting Average)
- Created by Eric Van. CBA = H / (AB - K + SF)
- CON (Contact Percentage)
- Created by Julien Headley, it measures the percentage of non-BB PA that aren't SO. CON = (AB + SF - K) / (AB + SF)
- CR (Contextual Runs)
- Created by Eric Van in an attempt to create a Runs Created-type formula more logical and more accurate than any of the alternatives (Linear Weights, EqA, RC itself, etc.). The form of the metric is extremely similar to David Smyth's BaseRuns; BaseRuns is even more logically constructed, but the less rigorous form of CR allows for better estimation of the coefficients. Contrary to the belief of some, Van invented CR before his employment by the Red Sox and has published the formula several times on SoSH, at Baseball Think factory, and (an earlier version) on Usenet.
- CR = HR + [WROB] * [ .038 + (.514 * H + .403 * 2B + 1.413 * 3B + .534 * HR + .627 * SB + .093 * (BB - IBB + HBP) + .064 * (AB – H – SO + SH + SF) ) / OUTS]
- Where WROB ("Waiting Runners on Base") = H – HR + BB + HB – CS – GDP + [ROE – OOB]
- And OUTS = AB – H + SF + SH + CS + GDP – [ROE – OOB]
- [ROE - OOB], Reached on Error less Outs on Base, can be counted by hand, or divined magically by R + LOB - (H + BB + HBP - CS - GDP)
- There is no term for SH in the linear expession because there is no significant correlation of SH/Out to baserunner scoring. The metric can be made more accurate by including SF in the linear expression, but it is omitted for philosophical reasons, since SF is a measure of situational hitting.
- The coefficients in the linear expression are not fixed for all time but rather reflect aspects of contemporary play, especially the agressiveness of baserunning. While CR with the above coefficients is dramatically better than any other metric for estimating team scoring in the 1990s, it was less accurate than EqA in 2006, suggesting that the coefficients need to be tweaked. Van hopes to derive and publish a new version of CR this year (possibly with a term for WP / PB / BK).
- While CR can be used on an individual batting line, Van's preferred method for hitters is to take the batting line of an average team (for that season and league) and substitute the rate stats of the player being evaluated, scaled to equal 1/9 of the total team PA. The resulting difference from average is then multiplied by 9 in order to make the result scale with stats which assume a lineup of 9 of the hitter in question. To evaluate the impact of a player on a specific team, the opposite is done: take the actual team totals and substitute an average hitting line (or a replacement-level hitting line) for the actual PA of the hitter in question. Note that in both cases the new team totals must be scaled to the same number of outs in order to deteremine the revised run total.
- CV (Contact Value)
- Created by Jim Bennett (Vermonter At Large), CV is a vast improvement on BABIP in evaluating the quality of the contact. CV incorporates the linear weighted value of the contact events and therefore expresses contact quality three-dimensionally in terms of runs. The formula for CV is:
- CV = (.48*(1B) + .77*(2B) + 1.07*(3B) + 1.48*(HR) - .333*(.314*(GO) + .28*(AO))) / (AB - SO)
- EP (Effective Power)
- Created by Eric Van in conjunction with OBE. Together they make up OPE. Eric says EP is SLG modified as follows: "I included SF in the numerator, and changed the denominator to PA instead of AB. Both of these changes improved its accuracy for predicting scoring. Well, it turns out that 2.12 * OBE + this slugging stat gives a terrific estimation of runs scored. So, the last tweak I make to the slugging stat is to divide it by 2, since it's half as important as OBE." In addition, "I make no adjustment for SB and CS in my formulas for On Base Efficiency, Effective Power, etc. I've not found a fair way to do so." That makes his formula:
- EP = (Hits + 2B + (3B x 2) + (HR x 3) + SF) / (2 x PA)
- EqA (Equivalent Average)
- Created by Clay Davenport, EqA is a key component of Clay's "Davenport Translations”, or DTs, for hitters. DTs are a variation on Major League Equivalent (MLE) and were originally published by Baseball Prospectus. EqA is used to compare minor league statistics at different levels, and to project player performance at higher levels including the majors. The data is normalized to a number that can be judged much the same as batting average where .265 is considered to be the average major league reference point point. A breakdown of EqA is available here.
- EqR/27 (Equivalent Runs per 27 Outs)
- An adjusted metric used by Nate Silver in his PECOTA calculations to calibrate a hitter's batting and baserunning outcomes (2B, HR, SB, etc.) with his overall offensive value. The correlation between EqR/27 and EqA is extremely high.
- ERP (Estimated Runs Produced)
- Created by Paul Johnson, this is his alternative to Runs Created. Johnson's original work along with Bill James' commentary was published in the 1985 Baseball Abstract and is archived here. Additional comments are on Stephen Tomlinson's site.
- ERP provides a linear measure, meaning that if you add up all the ERP's of each hitter on a team, it will equal the ERP from the team's batting stats. Runs Created, by James' own admission, is non-linear because it overestimates run production from players with high OBP and/or SLG.
- GPA (Gross Production Average)
- Created by Aaron Gleeman, who terms it a variation of OPS, but more accurate and easier to interpret. The formula, which is usually adjusted for park factor:
- GPA = [(OBP*1.8) + SLG] / 4
- The interpretive scale is similar to BA: .200 is lousy, .265 is around average and .300 is a good hitter.
- IsoP (Isolated Power), a.k.a. IsoSLG
- Isolated power is simply a player's SLG (Slugging Percentage) minus his batting average. Baseball Prospectus uses a slightly different formula that weights doubles and triples the same because triples are considered a speed stat rather than a power stat.
- IsoD (Isolated Discipline), a.k.a. IsoOBP
- Isolated Discipline is a player's OBP (On Base Percentage) minus his batting average.
- MLV (Marginal Lineup Value)
- Created by David Tate and modified by Keith Woolner. Like ERP, MLV seeks to improve on Bill James' Runs Created formula by leveling the productivity shown by players with high SLG and/or OBP. Woolner's presentation of the statistic is here.
- OBP (On Base Percentage)
- A measure of how often a batter gets to first base (or beyond) for any reason other than a fielding error or a fielder's choice. The official formula used by MLB is:
- OBP = (H + BB + HBP) / (AB + BB + HBP + SF)
- OBE (On-Base Effectiveness)
- Created by Eric Van in conjunction with EP. It's a modified version of OBP with GIDP subtracted from the numerator. Eric says: "I make no adjustment for SB and CS in my formulas for On Base Efficiency, Effective Power, etc. I've not found a fair way to do so." Therefore, his formula is:
- OBE = (H + BB + HBP - GIDP) / (AB + BB + HBP + SF)
- OPE (On-Base plus Power Effectiveness)
- Eric Van created this, calling it a more accurate metric than OPS. "There is a very big overlap between OBP and SLG. They measure a lot of the same stuff." So he first created EP and OPE, which replace SLG and OBP and are defined earlier on this page:
- OPE = OBE + EP
- He also uses OBE as a run estimation tool. Says Eric: "OPE correlates to RC/27 much better than OPS does. The average OBE, in a typical year, is about .315. The average EP is about .070."
- OPS (On-base plus slugging)
- Batting statistic that adds OBP (On Base Percentage) and SLG (slugging percentage).
- OPS+ (Adjusted On-base plus slugging)
- OPS normalized to league and park the player played in. 100 is average.
- OPS Wins
- Created by Tom Tango and used at Fangraphs.com. OPS Wins yields a good approximation of WPA (wins above or below average) from OBP and SA. OPS Wins = .025 * (1.7 * OBP + SA - 1) * PA. For a theoretical full 162 games, OPS Wins = .27 * (expression)
- By multiplying by 10.7, OPS Wins becomes a very quick estimator of run impact (an even quicker estimator is Eric Van's 4 OPS points = 1 run). Per 162 games, this is 186 * (expression).
- POW (Power Percentage)
- Created by Julien Headley. POW = (HR + 2B + 3B) / (AB + SF - K)
- PROa (Adjusted Production)
- From Total Baseball by Pete Palmer & John Thorn. A park- and league-adjusted version of OPS to compare players from different eras. Defined as:
- PROa = (OBP / LgOBP) + (SLG / LgSLG) - 1
- where OBP and SLG have been adjusted for the player's home park, and LgOBP and LgSLG are the league average OBP and SLG, respectively. A PRO+ of 100 is league average.
- PSN (Power-Speed Number)
- Created by Bill James, it's the same formula used by Sean Forman at Baseball-Reference.com.
- PSN = (HR * SB * 2) / (HR + SB)
- RC (Runs Created)
- Created by Bill James, there are now numerous variations. James' most basic definition is:
- RCb = Basic Runs Created = OBP * SLG * AB, or RCb = OBP * TB
- His more advanced formula is:
- RC = A * B /C
- where
- A = H + BB + HB - CS – GDP
- B = TB + .52 * (SB + SH + SF) + .26 * (BB + HB - IBB)
- C = AB + BB + HB + SH + SF
- A quick and dirty alternate formula is:
- RC = [(H + BB) * TB] / (AB + BB)
- RCAA (Runs Created Above Average)
- Created by Lee Sinins, author of the Sabermetric Baseball Encyclopedia. Lee calculates each player’s Runs Created, and then compares it to the league average, given that player’s number of plate appearances. Lee uses a different version of RC than James, though the two are very similar.
- RPG (Normalized Runs Per Game)
- Created by MIT mathematician Jeff Sagarin of USA Today. Jeff uses Markov Chain analysis to project how many runs per game would be scored by an entire lineup of clones of a certain player. He also uses Markov's method to evaluate pitchers by NPERA. Sagarin posts his 2006 data at USA Today.com for AL and NL hitters.
- SecA (Secondary Average)
- A measure of extra bases gained by a player, independent of Batting Average. It's designed to track contributions of power, speed and plate patience/eye:
- SecA = (TB - H + BB + SB - CS) / AB
- SBR (Stolen Base Runs)
- From Total Baseball by Pete Palmer & John Thorn. The number of runs produced by base stealing.
- SBR = (0.3 * SB) – (0.6 * CS)
- or
- 0.3 * [SB – (2 * CS)]
- SLG (Slugging Percentage)
- A player's TB (Total Bases) divided by the number of his at-bats.
- TA (Total Average)
- Created by Tom Boswell, a Washington Post sportswriter. Essentially it's bases achieved divided by outs made. One negative is that it gives a stolen base the same weight as a base gained by a hit or walk, which Eric Van will tell you simply ain't the case. Another is that it accounts for advancing baserunners through SF, but not by SH or in any other scenario.
- TA = (TB + BB + HBP + SF + SB) / (AB - H + SH + SF + CS + GDP)
- TB (Total Bases)
- TB = 1B + 2x2B + 3x3B + 4xHR
- However, since singles aren't typically broken out in most stats listings, Total Bases can be also calculated using Hits:
- TB = Hits + 2B + (3B x 2) + (HR x 3)
- TTO. Strikeouts, Walks, and HR's.
- wOBA (Weighted On Base Average)
- (.72xNIBB + .75xHBP + .90x1B + .92xRBOE + 1.24x2B + 1.56x3B + 1.95xHR) / PA . From The Book, wOBA is an attempt take the best of OBP and SLG and combine them into one number. The formula begins with the event values relative to making an out, then adds .15 to those values. The idea is come up with a number that "looks" like OBP.
- XR (Extrapolated Runs)
- Created by Jim Furtado of Baseball Think Factory. Jim based this linear regression stat on Paul Johnson's EPR (Expected Runs Produced), and his full derivation is here. Jim's formula is:
- XR = .5(1B) + .72(2B) + 1.04(3B) + 1.44(HR) + .34(BB+HBP-IBB) + .25(IBB) + .18(SB) - .32(CS) - .09(AB-H-SO) - .098(SO) - .37(GIDP) + .37(SF) + .04(SH)
- Jim also has formulas for Extrapolated Runs Reduced (XRR) and Extrapolated Runs Basic (XRB), the latter of which omits HBP:
- XRR = .50(1B) + .72(2B) + 1.04(3B) + 1.44(HR) + .33(HBP + BB) + .18(SB) - .32(CS) - .098(AB - H)
- XRB = .50(1B) + .72(2B) + 1.04(3B) + 1.44(HR) + .34(BB) + .18(SB) - .32(CS) - .098(AB - H)
[edit]
Pitching
- APW (Adjusted Pitcher Wins)
- From Total Baseball by Pete Palmer & John Thorn. A method for calculating a pitcher's value in wins. Baseball Prospectus uses a similar metric based on runs instead of earned runs. It is typically used to evaluate relievers based on the number of wins they produced, and is used by several Baseball Prospectus writers when they to compare starters to relievers since Support-Neutral Value-Added (SNVA) statistics are only tabulated for starters.
- APR (Adjusted Pitching Runs)
- From Total Baseball by Pete Palmer & John Thorn. The number of runs a pitcher prevents from scoring compared to a league average pitcher in a neutral park over the same number of innings. The quantitative counterpart to the ERA+ rate stat. The formula is:
- IP/9 * (LgERA – ERA)
- where ERA has been park adjusted.
- BAABIP (Batting Average Against on Balls in Play)
- Similar to the BABIP stat used for hitters, this measures how a pitcher does when opposing batters put balls in play. It excludes strikeouts and HRs that leave the park. It's designed to illustrate the effects of luck and team defense on a pitcher's performance. The most common formula for pitchers is:
- BAABIP = (H - HR) / (BFP - HR - BB - HBP - SO)
- One problem that arises for stat geeks is that accurate BFP data is not available for all eras. Using Voros McCracken's formula to estimate BFP, the equation for BAABIP (which he calls $H) becomes:
- BAABIP = (H - HR) / (2.9*IP - .966*SO + H - HR)
- Using Eric Van's simpler estimation for BFP, we get:
- BAABIP = (H - HR) / (3*IP - SO + H - HR)
- BBpct (Unintentional Walk Rate)
- Used by Eric Van in many of his posts, it's denoted as BB%. A pitcher’s unintentional walks divided by the number of batters faced (excluding sacrifice hits and intentional walks):
- BB% = (BB – IBB) / (BFP – IBB – SH)
- $BFP (Estimated Batters Faced by Pitcher)
- Created by Voros McCracken. He estimates BFP as follows:
- {[(IP*3) - SO] * .966} + SO + H + BB
- This comes in handy when analyzing or comparing pitchers from eras where no BFP data is available.
- In the past, Eric Van has used a simpler version where BFP = (IP*3) + H + BB
- BQR (Bequeathed Runs Prvented)
- Measures how many more or fewer of the bequeathed baserunners subsequent relievers allowed to score than would be expected from league average performance.
- CEE (Contextual ERA Extrapolation)
- Eric Van's name for the ERA generated by using his Contextual Runs formula on a pitcher's pitching line and then multiplying by a league average ER / R. SH and IBB are ignored. Van notes that CEE "adjusts ERA for what I call 'situational karma'. That is, it's the ERA you'd expect if everything the pitcher had given up had been distributed randomly or normally across all his outings, rather than clustered or scattered as it actually was."
- CEE* (League-Adjusted Contextual ERA Extrapolation)
- Created by Eric Van. A pitcher’s CEE recalculated using the league-average H+/BIP.
- CEEplus (Adjusted Contextual ERA Extrapolation)
- Created by Eric Van, denoted in his posts as CEE+. A pitcher’s CEE recalculated using his own career rate of H+/BIP, relative to his teammates. The assumption is that any career difference in BABIP from that of a pitcher's teammates represents the best estimation of his true BABIP skill. The shorter the career, the more this number needs to be regressed to CEE*, so choosing between CEE* and CEE+ is always somewhat of an art form (although one could probably devise a formula for weighting the two bnased on career BFP).
- dERA (Defense-Independent Earned Run Average)
- Created by Voros McCracken, dERA is an example of defense-independent pitching statistics (DIPS). Voros' theory is that pitchers do not have an inherent ability to control hits resulting from balls batted in play. Further, a pitcher's performance can best be measured by outcomes not controlled by fielders: strikeouts, walks, hit batsmen and home runs hit out of the park. His initial formula (version 1.0) is:
- {(IP*2.4) + (H*.83) + (HR*11.05) + (BB*2.81) - (SO*1.59)}
-
- ----------------------------------------------------------------------------------
-
- {(IP*0.71) + (H*.244) + (SO*.097) - (HR*.244)}
- (Voros' extensive rundown of the formula has been archived with permission by Jay Jaffe at Futility Infielder.)
- McCracken later amended his formula (Version 2.0) to account for the ability of knuckleballers and other trick pitchers to influence balls batted in play. His initial essay on the topic is here.
- This stat is alternately referred to as DIPS ERA.
- EIP (Equivalent Innings Pitched)
- An Eric Van joint. A pitcher’s BFP divided by the league-average rate of BFP per IP. "It's your sample-size statement. I've used it instead of IP because the sample is how many guys you face, not how many you retire. But I've translated it into IP at a league-average rate because IP is instantly meaningful to folks and BFP takes some thought."
- EqERA (Equivalent Earned Run Average)
- An adjusted Earned Run Average which is calibrated to an ideal major league where the overall leaguewide EqERA = 4.50. This stat is the basis for Nate Silver's four primary rate-based PECOTA projections for both major and minor league pitchers. EqERA is adjusted for park effects and the quality of a pitcher's defense.
- ERA (Earned Run Average)
- The number of earned runs yielded by a pitcher normalized over a 9-inning game.
- ERA = ER * 9 / IP
- ERA+ (Adjusted Earned Run Average)
- ERA normalized to league and park the player played in. 100 is average.
- FIP (Fielding Independant Pitching)
- Created by Tom Tango (aka TangoTiger), this is a defense-independent pitching statistic (DIPS).
- FIP = (13*HR + 3*BB - 2*K) / IP . In order to make this number look more like an ERA, you add 3.20 to the outcome.
- Tom also calculates a pitcher's gross total of Fielding Independent Runs as:
- FIR = (league FIP - player FIP) x IP / 9
- TangoTiger's baseball research page is here
- GSC (Game Score)
- Start with 50 points. Add 1 point for each out recorded, (3 points per inning). Add 2 points for each inning completed after the 4th. Add 1 point for each strikeout. Subtract 2 points for each hit allowed. Subtract 4 points for each earned run allowed. Subtract 2 points for each unearned run allowed. Subtract 1 point for each walk.
- $H
- Voros McCracken's abbreviation for a pitcher's BAABIP. Voros believes that $H tells you a lot less about a pitcher than his HR, BB, or SO rates. In 2004, Mitchel Lichtman published a study of $H that's still posted over at Baseball Think Factory.
- Eric Van did a number of studies on it during his pre-SoSH days at the old alt.rec.sports usenet board. The threads include responses from Voros and others:
- Hplus/BIP
- Created by Eric Van and denoted by H+/BIP, he uses this rate stat when calculating CEE+ and CEE*. THe formula is:
- (H + ROE – OOB) / BIP
- where ROE = Reached on Error, OOB = Outs on Base, and BIP = Balls in Play.
- HRpct (Home Run Rate)
- A rate stat denoted by HR% and used by Eric Van in many of his SaberPosts. The number of home runs given up by a pitcher divided by the number of batters faced excluding sacrifice hits and intentional walks:
- HR% = HR / (BFP – IBB – SH)
- Kpct (Strikeout Rate)
- A rate stat denoted by K% and used by Eric Van in many of his SaberPosts. A pitcher’s strikeouts divided by the number of batters faced excluding sacrifice hits and intentional walks:
- K% = K / (BFP – IBB – SH)
- KWH = .75 * (K*K / H*BB)
- Pitchers with a KWH ratio above 1.00 will generally be more successful than those below the 1.00 mark. Many of the game's best pitchers have a KWH approaching (or above) 2.00.
- Mates
- Created by Eric Van, this establishes a comparative team baseline for H+/BIP by calculating it for a pitcher’s teammates. According to Eric, "it is almost entirely a measure of team defense, ballpark factors, etc. rather than measure of his pitching teammates' collective ability (which will tend greatly to be average for any decently sized number of pitchers)".
- NPERA (Normalized Predicted ERA)
- Created by MIT mathematician Jeff Sagarin of USA Today. Jeff uses Markov Chain analysis method to project what a pitcher's ERA would be if he pitched in a hypothetical composite major league of all players from 1946 through 1999. In such a scenario, an average pitcher would have an ERA of 3.87. He also uses Markov's method to evaluate hitters through RPG. Sagarin posts his 2006 data at USA Today.com for AL and NL pitchers.
- PAP (Pitcher Abuse Points}
- Created by Rany Jazayerli at Baseball Prospectus. Rany uses cumulative pitch counts and also groups single-game counts into five "stress categories": less than 100, 101-109, 110-121, 122-132, and more than 133 pitches. He then weights each group to come up with an overall stress level, which is separate from PAP.
- Abuse Points are assigned based on single-game pitch counts. No points are assigned for a start where a pitcher throws 100 pitches or less. For all other starts, abuse points for each game are determined by cubing the number of pitches in excess of 100:
- PAP = (PC - 100)3
- PERA (Peripheral ERA)
- A pitcher's expected Earned Run Average based on his park-adhusted hits (EqH9), walks (EqBB9), strikeouts (EqK9) and home runs allowed (EqHR9). PERA is considered to be a better predictor than traditional ERA, since it is less prone to random affects.
- PRAA (Pitcher Only Runs Above Average)
- The individual pitching + defense total is compared to a league average pitcher + team average defense, and the difference is win-adjusted.
- PRAR (Pitcher Only Runs Above Replacement)
- Similar to PRAA, except that the comparison is made to a replacement level player instead of average.
- RAA (Runs Above Average)
- Its simple. It's the amount of runs a pitcher is better or worse then average. It is park and league adjusted.
- ReSu (Relief Success)
- Created by Eric Van. His formula is (W + Sv + .85 * Hold) / ( W + L + Sv + BS + .85 * Hold). The coefficient of .85 for holds creates a correlation of ReSu to ERA that is not biased in favor of closers or set-up men.
- ReSuC (Relief Success Correlated)
- Created by Eric Van, who admits he has little memory of devising it or idea of what it means. The formula is:
- .581 + .051 R/G - .067 ERAbp
- Where R/G is the team’s runs scored per game, and ERAbp is the team’s bullpen ERA.
- RSV (Runs Saved)
- Devised by URISoxFan (Jeff Kuhn). He takes the theoretical replacement level of the league (one run better than 50% worse than the league average). Using innings pitched, Jeff determines the expected runs allowed by a league replacement level pitcher in that environment (using league/era to figure the replacement level, and applying park factors to the expectation). Next, he subtracts the actual amount of runs allowed. The formula is:
- RSV = [PF * (IP * Rlgrplv / 9 )] - ER
- where Rlgrplv = League Replacement Level Runs.
- During the off-season, Jeff uses a more complex formula to adjust for defense, and is more exact in figuring out the adjustments (league and park).
- RSAA (Runs Saved Above Average)
- Created by Lee Sinins, author of the Sabermetric Baseball Encyclopedia. A measure of a pitcher’s effectiveness and contribution to a win. The formula is
- RSAA = [(RA/IP) - (league-average RA/IP)] x total IP
- where RA = Runs Allowed
- S/C (Scatter/Cluster)
- An Eric Van stat. A pitcher’s True ERA (TRA) divided by a a version of his CEE that includes SH and IBB. Says Eric: "It used to be based on actual RA / CEE (Theoretical RA), so it included inherited runner support (good or bad) and defensive support, including the number and degree of damage of errors made behind him. It is now based on True ERA / Theoretical ERA (TRA / CEE), so it simply reflects whether the pitcher has been pitching out of jams ( < 1.00) or clustering his hits."
- Eric continues: "An S/C of 1.00 (or 0.98?) means the pitcher's situational karma has been neutral. Below 1.00, he's been scattering his hits and giving up fewer runs than his CEE would predict. Above 1.00, he's been prone to the big inning and thus has been giving up more."
- S/C = TRA / CEE
- SNL (Support-Neutral Losses)
- Created by Michael Wolverton. The expected number losses a starting pitcher would have if he got average support from his offense and his bullpen. See SNVA.
- SNVA (Support-Neutral Value Added)
- Created by Michael Wolverton, this measures a starting pitcher's value independent of the support he received, either from his team's offense and from his team's relievers. It measures the total number of games that an average team would win given the pitcher's starts, over the number of games they'd win with a league average starter. By nature, the stat is park-adjusted.
- Related stats include Support-Neutral Wins (SNW) and Support-Neutral Losses (SNL), which measure the expected number of wins and losses a starting pitcher would have if he got average support from his offense and his bullpen. SNW, SNL and SNVA are calculated for each individual start, and then summed to get seasonal totals. The summmation limits the imapact of a single bad outing (for instance, giving up 12 R in 2 IP) on a pitcher's season or career value, since he can only cost his team a single game in a single start.
- Wolverton's original piece on the Support-Neutral statistics, which SABR published in 1993, is available here.
- SNW (Support-Neutral Wins)
- Created by Michael Wolverton. The expected number wins a starting pitcher would have if he got average support from his offense and his bullpen. See SNVA.
- TRA (True Earned Run Average)
- Created by Eric Van. "It starts with ER and adjusts for inherited runners and for errors made behind the pitcher. You get credit for pitching out of jams caused by errors made behind you and get penalized for letting a minor error (e.g., boot with no on and two outs) blow up in your face by following it with a rain of hits. It can be negative if you keep on pitching out of jams (either as a reliever with inherited runners or as a victim of shoddy defense)."
- WHIP (Walks + Hits / Innings Pitched)
- Walks and Hits Per Inning Pitched, a variant of OBP for pitchers. This is a good stat when judging how efficient a pitcher is and a good substitute for ERA when judging relief pitchers.
- XNH (Expected No Hitters)
- How cool is this? Bill James created an estimation of the number of no hitters a pitcher would be expected to pitch in his career.
- XNH = GS * [(3 * IP) / ((3 * IP) + H)]26
- Yup, you read that right. Raise everything inside the brackets to the 26th power, which signfies the probability of the pitcher getting 26 outs. Bill bases this on the assumption that most no-hitters involve a baserunning out or a double play. GS here is career games started. Bill has a write-up on the stat at Rob Neyer's site.
[edit]
Defense
- CERA (Catcher's earned-run average)
- Earned-run average of club's pitchers with a particular catcher behind the plate
- DWARP (Defensive Wins Above Replacement Player)
- Its WARP except for defense. It also accounts the difficulty of the position played.
- FR (Fielding Runs)
- From Total Baseball by Pete Palmer & John Thorn. Palmer sought to create a fielding compoment as part of his Total Player measurement. Fielding Runs are a linear weights measure of runs saved beyond what a league-average player at that position might have saved, defined as zero. The stat is adjusted for each position, for which Palmer drew criticism in stats circles since his weighting was subjectively assumptive. For outfielders his model failed to account for the three positions separately, nor did it account for ballpark geometry. Mike Emeigh has a primer at Baseball Think Factory, and a thread at Baseball-Fever.com provides some different info.
- FRAA (Fielding Runs Above Average)
- A simple stat that measures a players defense.
- FRAR (Fielding Runs Above Replacement Player)
- Its the same as FRAA except a replacement level fielder is used instead of an average fielder.
- Plus/Minus
- Created by John Dewan of Baseball Info Solutions. He sought to use improved accuracy in charting the locations of batted balls and determining a player's performance within his Relative Range. It was published in The Fielding Bible, and an excerpt is available here in Adobe Acrobat format. John discussed his system in an interview with Joe Hamrahi.
- In April 2006 the Boston Globe's Chris Snow wrote about the book and what Dewan concluded about the Sox' fielders. Here's how Chris explains the system:
- Baseball Info Solutions records each batted ball's specific direction, distance, speed (soft, medium, hard), and type (grounder, liner, fly, bunt). Direction and distance are computed by clicking a location on a baseball diamond on a computer. The computer then determines how often each type of ball hit to each location at each speed is converted into an out.
- If a ball is converted into an out only 25 percent of the time, the expectation that the play will be made is 0.25. If a player makes the play, he is scored a 1.00, minus 0.25 (the expectation that the play will be made), resulting in a score of plus-0.75. If he does not make the play, he is scored a 0, minus 0.25 (the expectation that the play will be made), resulting in a score of minus-0.25.
- By adding up all of the credit a player receives or loses for plays he makes or doesn't make, the result is a player's plus-minus.
- The plus-minus system as a comprehensive measure of a player's defensive ability does have flaws, or limitations. For example, it doesn't account for a first baseman's ability to handle throws, which J.T. Snow should do much better than Kevin Millar. It doesn't account for a hit-and-run play that forces a middle infielder to vacate his position. It does not account for an outfielder's arm (meaning it doesn't care for Manny Ramírez's league-leading outfield assist total). And it does not include how well a player handles bunts, which Mike Lowell has done better than anyone in baseball the past three years. And, the plus-minus system doesn't account at all for catchers.
- But it effectively does explain, in a rather mathematically sound manner, whether a guy can field his position or not.
- A graphic comparing Sox' fielders to the best and worst in the league by position is here.
- PMR (Probabalistic Model of Range)
- Created by David Pinto at Baseball Musings. Culled from interviews by David Laurila and Joe Hamrahi, here's how Pinto explains it:
- [It measures] range by looking at the ability to turn a ball into an out compared to the probability of turning a ball into an out. Ground balls hit right at the normal shortstop position should be easy to field. Ground balls hit into the hole should be difficult to field. If you get a lot of the tough balls, you'll have a good number; if you miss a lot of easy balls, you'll have a poor number.
- Range isn't all about the ability to move. It's also about positioning, arm strength and throwing accuracy. All of those contribute to being able to turn a batted ball into an out. Eventually, we'll add run expectancies to these, and measure the number in terms of runs allowed above or below average.
- PMR attempts to measure range based on the ease or difficulty in fielding a specific ball in play. Easy plays made shouldn’t count much toward determining a fielder’s range. Difficult plays made should. What the system does is use examples of balls in play to determine what plays are difficult and what plays are easy.
- PMR uses six factors; the batted ball type (fly, ground, liner, etc), how hard the ball was hit, the direction the ball was hit, the handedness of the batter, the handedness of the pitcher, and the park.
- RAA/F (Runs Above Average for Fielders)
- Runs above average at a certain position.
- RF (Range Factor)
- The average number of defensive plays successfully made per game by a given fielder. The formula is:
- RF = (PO + A - E)/G.
- RR (Relative Range)
- Created by Bill James. His latest development forms the basis for John Dewan's Plus/Minus fielding system. Both are explained in their book The Fielding Bible. An excerpt is available here in Adobe Acrobat format.
- UZR (Ultimate Zone Rating)
- Created by Mitchel Lichtman, UZR is designed to measure and quantify only that skill which enables a fielder to turn batted balls into outs. The baseline shows the number of outs an average fielder would have had if he'd received the same number of balls in play for each sub-zone that our specific fielder received. UZR means essentially the same thing as a simple ZR - namely the number of balls fielded (turned into at least one out) divided by the number of chances; however, UZR rate is a weighted average of a player's ZR in each of several zones. ZR does not address an outfielder's arm or an infielder's skill at turning the double play. Any player with an average defensive performance will, by definition, have exactly zero UZR runs.
- Mitchel's UZR data has been proprietary since 2004, when he went to work for the St. Louis Cardinals. Data from 2000-2003 is still available at Baseball Think Factory, where Lichtman also has a two-part explanation of the stat (Part 1, Part 2).
- ZR (Zone rating)
- Created by John Dewan. The percentage of balls fielded by a player in his typical defensive "zone," as measured by STATS, Inc.
[edit]
Value
- Elias Player Rankings
- Tabulated for MLB by the Elias Sports Bureau. A set of player ratings used by the league to gauge free agent compensation for teams losing players. Compensation is in the form of amateur draft picks in June following the off-season. Elias Player Player rankings as of 10/31/06 are available here.
As of the 2006 CBA, draft pick compensation has been changed significantly, including the elimination of "Type C" free agents. Details on the changes to the system can be found here.The current compensation system is as follows:
- Type A - Players whose rankings place them in the top 20% at their position. A team losing a Type A free agent gets a sandwich pick between the 1st and 2nd rounds. Any team signing a Type A player forfeits their 1st round pick to the team that lost the player unless:
- * The signing team had one of the 15 worst records in MLB the previous season, or
- * The team losing the player signs another free agent rated higher than the player lost.
- Type B - Players whose rankings place them among the next 20% at their position (between 20% and 40%). A team losing a Type B free agent receives a sandwich pick between the 1st and 2nd rounds.
- MORP (Marginal Value Above Replacement Player)
- MORP is modeled on the actual behavior of recent free agent markets, and accounts for non-linearity in the market price of baseball talent.
- VORP (Value over Replacement Player)
- Created by Baseball Prospectus' Keith Woolner. The number of runs contributed beyond what a replacement-level player at the same position would contribute. A replacement-level player is defined as a player of the caliber which is freely available to the average major league team, either through call-ups from the minor leagues, or through waiver claims from other organizations. Sortable rankings are available here.
- WARP-1 (Wins Above Replacment Player)
- The number of wins a player contributed, above what a replacement level hitter, fielder, and pitcher would have done, with adjustments only for within the season.
- WARP-2 (Wins Above Replacement Player, Level 2)
- Wins Above Replacement Player, with difficulty added into the mix.
- WARP-3 (Wins Above Replacement Player, Level 3)
- WARP-2, expanded to 162 games to compensate for shortened seasons.
- Win Probability
- A measure of a team's chance of winning at any point in a game. For each half-inning, there are 24 states which cover all possible out/runner situations. Tables have been compiled which provide the win probability for the 24 states in each half-inning over a range of run differentials. These tables are produced either using statistical methods and the run scoring environment, or from historical data.
- WPA (Win Probability Added)
- The difference in Win Probability resulting from a particular play during a game. Although 2006 has seen an explosion of web sites devoted to tracking teams' games and individual players using WPA (such as FanGraphs), the concept behind WPA is actually quite old, having been introduced by the Mills brothers in their 1970 book entitled Player Win Averages. Dave Studeman has written a nice summary of WPA at The Hardball Times, which includes its history, use, and related measures.
- Win Shares
- Created by Bill James. A measure combining players’ contributions in four facets (batting, baserunning, defense, pitching) towards his team’s wins. More info on Win Shares is available here, and a summary of James’ original methodology is posted by Dave Studeman at BaseballGraphs.com. Dave also does Win Shares work for Hardball Times, where he's tweaked James’ original formula a bit.
[edit]
Team
- AEqR (Adjusted Equivalent Runs)
- The number of equivalent runs scored by a team, adjusted for the quality of their opponent's pitching and defense.
- AEqRA (Adjusted Equivalent Runs Allowed)
- The number of equivalent runs allowed by a team, adjusted for the quality of their opponent's offense.
- Log5 (Log5 Expected Winning Percentage
- Created by Bill James. Team A can be expected to have the following winning % vs Team B:
- A - ( A * B )
- ---------------------------
- A + B - (2 * A * B)
- where A is Team A's winning percentage and B is Team B's winning percentage.
- Pythagorean Record
- Created by Bill James, with a nod to The Greek God of Numbers. A formula for converting a team’s run differential into a projected Won/Loss record. The formula is:
- RS2 / (RS2 + RA2)
- Where RS = Runs Scored and RA = Runs Allowed. Teams’ actual won/loss records are usually similar to their Pythagorean Record.
[edit]
Miscellaneous
- ANOVA (Analysis of Variance)
- A collection of statistical models and their associated procedures which compare means by splitting the overall observed variance into different parts. A more thorough explanation is here.
- Arvin Hsu, a grad student at Cal-Irvine, used ANOVA in a baseball context in 2000 while working to refine Voros McCracken's $H data. The original thread was on the alt.rec.sports usenet, where Eric Van posted before he was sucked in by the SoSH gravity field. Anyway, Voros asked Arvin to explain ANOVA, and this was his somewhat technical reply:
- The basic idea is to have 2 groups of data and see if their mean differs significantly. e.g. 2.68+-.20 may be significantly different from 2.80 +-.15 eventhough both their error bars will overlap. The significance will depend on the data points available. For the second factor, you can imagine that 2.68+-.30 ERA for lefties in the NL and 2.80 +- .25 for righties is not significant, and then we have 3.50 +- .30 for AL lefties, and 3.75 +-.30 for AL righties which is not significant. Well lump them together, and you can take advantage of having more data points.
- This wouldn't work without specifying that you have a second factor, the league, since simply doing a straight mean comparison would shoot the s-dev's for lefties and righties through the roof, as the mean AL ERA is so much higher than the mean NL ERA. But the ANOVA will compensate and actually check the difference between each item, and then do the test, revealing significance between lefties and righties. You can then check for interaction, and in this case see that AL lefties are better than AL righties to a greater extent than NL lefties over their NL rightie brethren.
- The whole reason I went into this was because I saw that your analyses revealed a ton of variation, and it certainly looked like random noise. But that correlation techniques didn't reveal any significance. Thus, I felt that a different statistical approach utilizing differences at the extremes might reveal something that might cut through the random noise, and reveal any differences that were occluded by the noise in the correlative approach. It did not.
- CRA (Composite Rate Analysis)
- Created by Jim Bennett (Vermonter At Large). A linear weights analysis system that examines the holistic relationship between core rate statistics and run production on the player, team and league level.
- DT (Davenport Translations)
- Created by Clay Davenport of Baseball Prospectus. DTs are a variation on Major League Equivalent (MLE) and were originally published by Baseball Prospectus. The data generated is used to compare minor league statistics at different levels, and to project player performance at higher levels including the majors. A piece written by Clay comparing DTs to MLE is available here. Also see EqA.
- Linear Regression Model
- A common mathematical tool used to derive the relative value of each element within a data set, and produce formulas that can accurately reflect those values. Essentially, it involves estimating the value of a dependent variable from one or more independent variables. For example, in the Batting Runs formula, each statistic (1B, 2B, 3B, etc) is multipled by a number, or factor (.47, .78, 1.09, etc). Those factors weren't pulled out of thin air -- they were determined using linear regression, and then plugged into the batting runs formula. A more extensive and technical explanation is availabe here.
- Marcels (Marcel the Monkey Forecasting System)
- A simplistic player projection system developed by Tom Tango (aka Tangotiger), which performs surprisingly well as compared to more sophisticated projection systems. It is described as "the most basic forecasting system you can have, that uses as little intelligence as possible. So, that's the allusion to the monkey. It uses 3 years of MLB data, with the most recent data weighted heavier. It regresses towards the mean. And it has an age factor." Tangotiger provides the annual projections for free download at his site.
- Park Factors or Park Indices
- A calculation of how much an individual park affects the numbers of Runs Scored, Hits, Home Runs, etc, by comparing how a team hits/scores at home vs. how they perform on the road. PF = ((homeRS + homeRA)/(homeG)) / ((roadRS + roadRA)/(roadG)). During the season, espn.com keeps a running tally of park factors as well as several years of history. A factor of greater than 1.0 favors the hitter. Bill James also publishes park factors by LHB and RHB.
- PECOTA
- A player projection system created by Nate Silver of Baseball Prospectus. The acronym stands for Player Empirical Comparison & Optimization Test Algorithm. What Nate does is project a range of possible outcomes for a player in several different value metrics. He then provides the likelihood of each outcome. In some cases, these forecasts are displayed as finite percentage data, and in others they're depicted graphically. Among the metrics Nate works with are EqA, WARP (5-year projections), MLV and VORP.
- Nate also summarizes his projections in four groupings:
- Breakout Rate - The percent chance that a batter's EqR/27 (or a pitcher's EqERA) will improve by at least 20% relative to the weighted average of his EqR/27 (or EqERA) in his three previous seasons of performance. High breakout rates are indicative of upside risk.
- Improvement Rate - The percent chance that a batter's EqR/27 (or a pitcher's EqERA) will improve at all relative the weighted average of his EqR/27 (or EqERA) in his three previous seasons of performance. A player who is expected to perform just the same as he has in the past will have an Improvement Rating of 50%.
- Collapse Rate - For batters, the percent chance that his EqR/27 will decrease by at least 20% relative to the weighted average of his EqR/.27 in his three previous seasons of performance. For pitchers, the percent chance that his EqERA will increase by at least 25% relative to his baseline EqERA over his past three seasons. High Collapse Rates are indicative of downside risk.
- Attrition Rate - The percent chance that a batter's plate appearances (or a pitcher's opposing batters faced) will decrease by at least 50% relative to his baseline playing time forecast. Although it is generally a good indicator of the risk of injury, Attrition Rate will also capture seasons in which his playing time decreases due to poor performance or managerial decisions.
- A full breakdown of the PECOTA system was published in the 2004 Baseball Prospectus annual. PECOTA data is only available to Baseball Prospectus subscribers, but the site has a free article by Nate where he summarizes his system, and another where he compares it to other forecasting systems. A glossary of terms used in PECOTA is available here
- ZiPS
- A computer-based projection system created by Dan Szymborski, editor-in-chief of Baseball Think Factory (formerly Baseball Primer). The projections are available for free download at the site in Microsoft Excel (.xls) format. According to the disclaimer at the bottom of each team's ZiPS charts:
- "Performances have not been allocated to predicted playing time in the majors - many of the players listed are unlikely to play in the majors at all in 2006. ZiPS is projecting equivalent production - a .240 ZiPS projection may end up being .280 in AAA or .300 in AA, for example. Whether or not a player will play is one of many non-statistical factors one has to take into account when predicting the future."


