Chat with John Dewan
#21
Posted 13 July 2009 - 12:34 PM
"To get better this year, the Sox need to get production from the guys they already have. Going forward, they'll probably have to be a bit more aggressive and a lot more successful in the free agent market, or else be willing to take a few steps back while praying. " - The Rudy Pemberton
#22
Posted 13 July 2009 - 01:07 PM
"To get better this year, the Sox need to get production from the guys they already have. Going forward, they'll probably have to be a bit more aggressive and a lot more successful in the free agent market, or else be willing to take a few steps back while praying. " - The Rudy Pemberton
#23
Posted 13 July 2009 - 01:09 PM
1) does BIS* compute things like inter-rater reliability for all of their personnel? Are unreliable people discarded or weighted appropriately? I wonder about the effectiveness of using human judgments for many of these measures.
2) should defensive stats report both an average value and also a measure of variability? So, for example, players that have fewer (or greater) plays would have smaller or larger variability scores? The point is: could defensive evals do a better job communicating the amount of variability in each player (or team's) measurement, so that fans could get a good sense of whether or not a difference is really a significant difference?
*Question originally said STATS, not BIS (apologies)
This post has been edited by Jnai: 15 July 2009 - 10:57 AM
"Quantitatively, you're right -- he's been a stud. Qualitatively, IMO, he has not." - mabrowndog
"Most of the internet makes Imgran look like Mark Twain." -NomarRS05
#24
Posted 13 July 2009 - 01:15 PM
I'll give a concrete example: Jacoby Ellsbury's defense has fallen off a cliff this year, according to most advanced metrics. I find it hard to believe that he was an elite defender in 2008 and is now a poor one. Can you make an educated guess whether one year's rating is more likely to be an aberration than the other? If so, what factors would you look for as signs that a particular player's rating is an aberration (or, inversely, is especially likely to be accurate)?
"It aint where you from or how smart you are or how dumb you are or what you look like or where you're from or any of that shit. It's what you do." -- Rasputin
#25
Posted 13 July 2009 - 01:22 PM
- Adam Wheeler
"I wonder if he's any relation to Blake Wheeler, who was recently convicted of impersonating a hockey player."
- boston.com reader's comment on the Adam Wheeler story
#26
Posted 14 July 2009 - 09:55 AM
Please submit you questions for John here, and he will answer them, along with any follow-up questions, starting next Tuesday, July 14. If any non-members have questions, please feel free to PM or email me (frisbetarian@gmail.com) with them.
Thank you, John. This should be an interesting and informative chat.
Thank you to the good guys at Sons of Sam Horn for the opportunity!
#27
Posted 14 July 2009 - 09:57 AM
Could you please explain the differences, and the similarities, between your system and UZR.
Thanks.
First, let me give you a little background. I developed Zone Ratings in the 1990’s during my days at STATS, Inc. The concept was based on the coding we were doing of the distance and location of batted balls. Each defensive position was assigned a zone where, based on the data, a majority of plays could be expected to be made. Zone Rating data was published annually in the Baseball Scoreboard beginning in 1990. The last edition of the Baseball Scoreboard was in 2001.
My last year at STATS was the year 2000 and in the Baseball Scoreboard 2000, I developed a new system. I called it Ultimate Zone Ratings, abbreviated UZR. This was essentially a forerunner of the system I developed in 2002 at Baseball Info Solutions which I named the Plus/Minus System. Ultimate Zone Ratings in the Scoreboard was also a system based on plusses and minuses. But with the new Plus/Minus System, I used more detailed data from Baseball Info Solutions (BIS) and included adjustments in many areas that I hadn’t done in the first version of UZR.
Getting back to your question, what’s the difference between my system and UZR? While I don’t know for sure if the current version of UZR is an extension of my original UZR, or if it was independently developed, the bottom line is that they are based on the exact same concept. Both systems break the field into small areas and look at the probabilities of plays being made in those areas. The differences lie in the various adjustments that are made.
My research assistant, Ben Jedlovec, prepared the following:
Based on my understanding of both systems,
Similarities
- Both use BIS Data. UZR started with STATS data, but the most commonly referenced version uses BIS data.
- Both have the same idea- break down balls in play by type, location, velocity.
- Both are measured on an above/below average scale.
- Both have runs saved systems with components for GDP, OF Arms, Range.
- We use similar run value multipliers at each position.
- Both are available online (Fangraphs or Bill James Online).
Technical Differences
- UZR uses multi-year samples, while Plus/Minus adjusts for year-to-year league changes. As teams are increasingly recognizing the importance of a strong defense, the league as a whole will be stronger defensively. It is important to handle this trend appropriately.
- Plus/Minus uses smaller, more precise zones, or “buckets” of plays.
- UZR has several minute adjustments, such as batter hand, pitcher hand, base/out state, and pitcher groundball/flyball tendencies. We remain focused on the value contributed to the team in the player’s specific context.
- Park adjustments are handled differently- I believe UZR applies blanket adjustment across all buckets, while Plus/Minus has park factors in form of more precise buckets. A ball hit 395 feet to Vector 190 that stays in the park is only compared to all other balls hit 395 feet to Vector 190 that stay in the park. If it leaves the park, it neither helps nor hurts the fielder. Also, we added the “Manny Adjustment”, which removes fly balls hit unreachably high off a wall. We named the system after the Green Monster’s most notable victim, who went from being by far the worst left fielder in baseball before the adjustment to being only arguably the worst left fielder after the adjustment.
- Plus/Minus accommodates plays where the first baseman holds the runner and middle infielders are covering second on hit-and-run plays. UZR adjusts for all base/out states.
- The two systems apply the run values at different stages in the calculations. UZR applies runs right away, while we convert to Enhanced PM then apply the Run Factors.
- Plus/Minus is a little more aggressive in awarding credit/penalty. An example: 100 balls in a ‘bucket’ (specified type, velocity, location), 30 fielded by the 2B, 20 by the 1B, 50 go through for singles. On a groundout to the second baseman, we give +50/(50+30) = 5/8 = +.625. UZR gives +50/100 = +.50. On a single through both fielders, Plus/Minus gives -30/80 = -.375 to the 2B, and -20/70 = -.29 to the 1B. UZR gives -30/100 = -.3 to the 2B, and -20/100 = -.2 to the 1B. You could make an argument for either method of accounting, but neither one is better than the other. The differences are the greatest at the middle infield positions, where overlap between fielders is the highest.
Fundamental Differences
- Runs Saved includes Bunt Runs Saved for corner infielders, pitcher fielding (Plus/Minus and holding runners), and catcher fielding (handling the pitching staff and the running game).
- Runs Saved measures the extra impact of HR Saving Catches. Runs Saved will add other Defensive Misplay/Good Fielding Play runs in the future.
There are some large similarities, but the bottom line is we’re not measuring exactly the same pieces of the puzzle, and we’re accounting for them differently.
#28
Posted 15 July 2009 - 11:04 AM
Thank you for your patience, and please hold off on asking any follow-ups until John has answered all of the original questions.
Ted Williams
#29
Posted 15 July 2009 - 07:07 PM
What do you think is the appropriate sample size in order for +/- to have utility? Do you think volitility in player's defensive numbers is more attributable to performance variation, or the metrics themselves still being fairly new ground?
What do you think is the next step in baseball evaluation?
Over time, we have all developed a feel for what baseball data means. For example, looking for a player with a long career I randomly picked Juan Pierre flipping through my Bill James Handbook. In 2004 he hit .326 for the Marlins. One year later with the same team, he hit exactly 50 points lower (.276). With the wisdom of hindsight, but even at the time, we know his real ability is somewhere in between.
So it is, for the most part, with our plus/minus numbers. But it can still vary from year to year and a player's true ability generally lies between the fluctuations.
Another example: if a player has a plus/minus of +3 after five games, he has played well in those five games. It's like going, say, 10-for-20 in those five games. There's no question that he played well. But the sample size is small and, in that limited timeframe, provides only a minuscule amount of insight into the player's true ability.
Like other numbers in baseball, a small sample size tells you what a player is doing, but the larger the sample size gets, the more you know about what he is really capable of doing.
When it comes down to it, I give our overall plus/minus numbers similar credibility as other baseball numbers, like batting average or on-base percentage. In my new book, The Fielding Bible—Volume II, we developed Runs Saved. I think of Runs Saved as the Runs Created of defense in that it encompasses a wide variety of methods. I give Runs Saved similar credibility to Runs Created.
#30
Posted 15 July 2009 - 07:10 PM
What do you think is the appropriate sample size in order for +/- to have utility? Do you think volitility in player's defensive numbers is more attributable to performance variation, or the metrics themselves still being fairly new ground?
What do you think is the next step in baseball evaluation?
On offense I believe we’re measuring 80-90 percent of the true ability of players. On defense, I believe we’re at about the 60 percent level. But we’re still at the tip of the iceberg in terms of precision and a ton more can be done, especially defensively. As new forms of data become available, we’ll be able to enhance our defensive systems. One example: BIS has now developed a batted ball timer, which we believe will greatly improve the accuracy of our system.
#31
Posted 22 July 2009 - 06:53 AM
Would you please detail how the data used for your defensive evaluations is compiled. Also, what are the differences, if any, in the method used to compile defensive data between BIS and STATS, Inc.? Finally, what are your thoughts on the strengths and weaknesses of these methods of acquiring data, and what do you anticipate being done to improve them?
Let me refer you to www.fieldingbible.com to get a better overview of the Plus/Minus System and my other defensive systems. We added new techniques, including Defensive Runs Saved, when we published The Fielding Bible—Volume II this spring. Daily updates are being posted on Bill James Online (www.billjamesonline.com). It's a subscription website, but it's only $3 per month.
Having been involved with the set-up of both the STATS and BIS data tracking systems, I believe that both organizations do an excellent job overall. For defensive data, if STATS still does what they were doing back when I was there, they rely on a scorer in the pressbox making the location of batted balls onto a grid system that breaks the field into 26 vectors emanating from home plate on one axis and 10-feet increments on the other. BIS utilizes a video review by its Video Scouts to pinpoint the batted ball onto a replica of the field on their video screen. In theory BIS data allows for greater precision as each pixel on the computer screen can represent a location.
As technology moves forward, we will be able to get better and better precision in our data. As I mentioned above, I'm very excited about looking at the batted ball timer data we've been starting to collect.
#32
Posted 22 July 2009 - 07:05 AM
1. At what point will some of the same techniques that evaluate defense (type, velocity, and direction of the hit) be used to adjust our understanding of player offense?
2. mgl has argued that at some level our understanding of defense is better than that of offense -- it's just that offense has neater bins to categorize event results. What would a neater set of defensive result categories look like? If you had the power to completely change our terminology of defense, what if anything would you change?
3. How much longer until little gps units are in the cleats of every fielder, or some equivalent, so that assessments of reaction and range can be made better? Will any team do this, and who will do it first? Sort of Fielder/Fx.
4. How do teams these days think about defense? Do they all subscribe to services like yours, or have their own, bigger ones?
1. While offense is a different beast, and has a different set of variables to consider, I believe there is a lot that can be done with the type of data that we are collecting that we haven't done yet. We're using data for analyzing defense that we don't use much to analyze offense. For example, Bill James did a study that showed there is some consistency among hitters regarding how they hit grounders and line drives, but that hitting flyballs is a significant factor that separates hitters from one another.
2. I don't think we understand defense better than offense. We've worked hard at it, but we're still getting there. As far as categories, I'd like to see people start to talk about defense in terms of defensive runs saved broken down into categories like OF Arm runs saved and double play runs saved while pushing errors and fielding percentage more into the background.
3. Sportvision is teaming with Major League Baseball to attempt to just that. There was a recent article in the Wall Street Journal on this topic. Measuring range and reaction time would be great, but the ultimate goal is to combine them and measure a player's skill at turning a ball in play into an out, which current systems are already getting at.
4. Each team handles defense in its own way. Many subscribe to our system and/or refer to The Fielding Bible frequently. I think teams have been catching on to the importance of defense over the past few seasons, and the media really started catching on this offseason. The lower-than-expected contracts signed by free agent defensive liabilities this past offseason (Burrell, Abreu, and Dunn, to name a few) indicate that the league as a whole has made an adjustment.
#33
Posted 22 July 2009 - 12:25 PM
http://www.insidethebook.com/ee/index.php/...ssistant_speak/
"Quantitatively, you're right -- he's been a stud. Qualitatively, IMO, he has not." - mabrowndog
"Most of the internet makes Imgran look like Mark Twain." -NomarRS05
#34
Posted 22 July 2009 - 05:35 PM
http://www.insidethebook.com/ee/index.php/...ssistant_speak/
Thanks for that link, Dan. There is some great stuff, as always, over there on this subject.
Please bear with John as he wades through our questions. I think we all would agree that with the tremendous responses he has provided it is well worth the wait.
I do ask that you refrain from any additional questions or follow-ups until John has finished.
Ted Williams
#35
Posted 26 July 2009 - 09:14 AM
Thanks to Fenway, Red Sox fans may be unusually interested in how the various fielding systems try to correct (or ignore) park effects on fielding. In light not just of the Green Monster, but Fenway's jutout behind 3b (which may reduce the run-scoring impact of some shots down the third-base line), its huge right field, its smaller foul territory, the Triangle, etc., how can fans best account for park effects on the currently available fielding ratings/metrics? Is there a reasonable way to normalize defensive metrics, or should we look just at "away" splits over a longer period of seasons (in order to avoid small sample size error)?
Many thanks.
Our system handles the major park effects well.
First off, a 360-foot fly ball to right in Fenway is only being compared to identical fly balls in stadiums where the 360-foot fly stays in the park. The fact that the same fly ball might be out in other parks doesn't affect how much credit or penalty is assigned for the play. Because our zones are small (i.e. more precise), a park adjustment is already built into the system.
Secondly, after the first Fielding Bible we added the "Manny Adjustment" for balls hit off the wall, which eliminates balls hit too high to handle off a wall.
Lastly, foul fly balls don't impact a player's plus/minus number. As for the Fenway "jutout", Mike Lowell may get a slight benefit when a ball goes for a single at Fenway that might otherwise go for a double, but the effect on the total is probably miniscule.
#36
Posted 26 July 2009 - 09:20 AM
How does +/- compensate for this problem and why does it compensate for this better than other defensive evaluations? Was this problem one that you particularly thought of while working on +/-?
Thanks alot for doing this chat.
That's exactly what Plus/Minus does. Both Plus/Minus and UZR factor in defensive positioning and give credit for it. Both systems account for both components of good defense – having good range and positioning well. In this way, both systems are complete. What still can be done is to break down each fielder's performance into separate components for range and positioning.
We explain our system at great length in The Fielding Bible and at www.fieldingbible.com. In 2008, a hard grounder to Vector 197 (slightly to a normal shortstop's right) was converted to an out by the shortstop 86% of the time. The average shortstop will make the play most of the time, but not always. If the shortstop makes this particular play, we award him +0.14 (1 - .86) plays above average. If he fails to get the out, we penalize him -.86 plays. We do this for every play and every position and add them up to get a player's plus/minus score, which we later convert to Runs Saved.
As mentioned in the first question, Plus/Minus uses more precise zones to determine the difficulty of a particular play, which in theory should give us more accurate results.
#37
Posted 26 July 2009 - 10:19 AM
I am interested in the weighting of throwing arms when one is considering defensive evaluations for outfielders, specifically in regards to Jacoby Ellsbury. Does his weak throwing arm set him apart that much negatively when considering votes for the Gold Glove? How are throwing values calculated? Also, how does Jacoby compare to someone like Ichiro considering arm and range? Observationally, it appears to me that Ellsbury makes more "spectacular" catches than Ichiro, and I'm wondering how well that relates to actual statistics and evaluations.
Thanks.
In The Fielding Bible—Volume II, we tackle the issue of combining a player's throwing arm with his range in the outfield by converting the systems we use for throwing (OF Arms) and range (Plus/Minus) into Runs Saved. A baserunner kill at home plate is the most valuable defensive play. In terms of range, making a more difficult play will earn the fielder more Plus/Minus Runs than a routine play.
Ellsbury's weaker arm has only cost the Red Sox about three runs in his career. He's made up for it with six defensive runs saved with his range. Ichiro is in a completely different universe. Ichiro has saved 30 runs with his throwing arm since 2003 and 58 runs with his range. Ichiro has established himself as one of the best outfielders in baseball, while Ellsbury seems to be an all-around average centerfielder. In left and right, Ellsbury's range would stand out, but in center he's nothing special. In center field, that's partially true for Ichiro as well; his range is average for a center fielder but his throwing arm more than makes up for it.
#38
Posted 29 July 2009 - 03:28 PM
Can you talk a little bit more about the difficulties of analyzing what catchers do behind the plate, and provide a bit more detail on how these numbers are derived? How far apart are the error bars? One thing I noticed was that in the rankings quite a few catchers were ranked #6, suggesting a huge bottleneck at the middle of pack. Does that point to the data being too hard to parse at that level of detail with any level of certainty?
I also notice (without too much surprise) that the Red Sox's own Jason Varitek does not do all that well in areas that the press usually fawn on him for (blocking balls and handling the pitching staff). Can his poor showing at blocking balls be attributed at all to advancing age? How far back does your analysis go along these lines; was he potentially better when younger, or have Sox fans had their own eeirie parallel to the Yankee captain's defense all this time?
We explain our method of rating catchers' ability to control the running game in The Fielding Bible—Volume II. We have also taken a first stab at measuring a catcher's handling of the pitching staff. We use an example comparing Brandon Inge, Ivan Rodriguez, and Jose Molina to illustrate how we calculate Catcher Earned Runs Saved. I have to refer you to the essay in the book, because to explain the system I'd have to copy the whole essay here.
You mentioned that our Catcher Runs Saved system seems to rely heavily on CERA (Catcher ERA) but that's a huge oversimplification. In the system, we use the earned runs in the CERA, but only as it relates to catchers catching the same pitchers. If Pudge has a 4.40 CERA with Joe Smith pitching and Molina has only caught him for one inning with a 9.00 CERA, there is almost no effect. There might be credit for Pudge for one earned run saved, but once we use our credibility factor the Adjusted Earned Runs Saved is 0.
Also, we use our Enhanced Fielding System of Defensive Misplays and Good Fielding Plays to evaluate catchers. Jason Kendall blocked more pitches than anyone, while Bengie Molina allowed the most balls to get by him. This is more of a scouting-based approach and contains very valuable information that is otherwise unrecorded.
Regarding Jason Varitek, the numbers suggest that he has been better than other catchers at handling the pitching staff in the last couple of years. Or, more specifically, with Varitek calling the game for his pitchers, it has resulted in an improved ERA for those pitchers compared to all other catchers who have caught those pitchers. We give him credit for nine Adjusted Earned Runs Saved – three in 2008 and six so far in 2009.
There is still a lot to do in this area, but we've gotten a good start on evaluating catchers.
#39
Posted 29 July 2009 - 03:29 PM
Given the advances in Pitch/FX analysis, and the forthcoming usage of Hit/FX, how might these trajectory-based technologies be better used in fielding evaluations?
In theory, PITCHfx and HITfx data are very useful tools for data collection and analysis, but both have their limitations. To their credit, MLBAM and Sportvision have invested a lot of time and money in both projects, and we're starting to see the benefits of this type of information.
Neither system in its present form adds much to our current fielding analysis. I expect that utilizing the new batted ball timer data collected by Baseball Info Solutions will be a huge advance not only for fielding analysis but for pitching and hitting evaluation as well.
#40
Posted 29 July 2009 - 03:33 PM
| Name | Batting | Fielding | Replacement | Positional | RAR | WAR | Dollars (Millions) |
| Matt Kemp | 13.6 | 10.1 | 11.4 | 1.2 | 36.3 | 3.6 | $16.00 |
| Torii Hunter | 20.6 | -2.8 | 10.8 | 0.7 | 29.3 | 2.9 | $13.20 |
| Franklin Gutierrez | 5.7 | 12.0 | 9.9 | 1.2 | 28.7 | 2.9 | $13.00 |
| Carlos Beltran | 20 | -3.8 | 9.3 | 0.6 | 26.1 | 2.6 | $11.80 |
| Mike Cameron | 8.3 | 5.1 | 10.7 | 1.2 | 25.3 | 2.5 | $11.40 |
| Curtis Granderson | 7.8 | 2.1 | 12.5 | 1.3 | 23.7 | 2.4 | $10.70 |
| Nyjer Morgan | -2.8 | 17.7 | 11.5 | -2.7 | 23.7 | 2.4 | $10.70 |
| B.J. Upton | 0.8 | 5.4 | 11.8 | 1.2 | 19.1 | 1.9 | $8.60 |
| Nate McLouth | 10.5 | -3.3 | 10.5 | 1.1 | 18.8 | 1.9 | $8.00 |
| Aaron Rowand | 8.3 | -1.1 | 10.6 | 1.2 | 19 | 1.9 | $8.00 |
| Adam Jones | 11.4 | -6.6 | 11.1 | 1.2 | 17.1 | 1.7 | $7.70 |
| Shane Victorino | 10.2 | -7.8 | 12.3 | 1.2 | 16 | 1.6 | $7.20 |
| Jacoby Ellsbury | 5.3 | -6.4 | 11.6 | 1.2 | 11.7 | 1.2 | $5.30 |
| Melky Cabrera | 1.3 | 0.6 | 8.7 | -1.6 | 9 | 0.9 | $4.00 |
| Grady Sizemore | 1.7 | -4.1 | 9.6 | -0.5 | 6.6 | 0.7 | $3.00 |
Dave reaches the conclusion that Franklin Gutierrez defense is so valuable, that his overall value exceeds nearly all major league CF (Beltran, Granderson, etc). He has used similar data to support his claim that Adrian Beltre has been well worth his contract with the Mariners.
Are defensive metrics so evolved and reliable, that we can credibly make such claims when the bulk of a player's value is wrapped up in defense. Take Nyjer Morgan as an extreme example. When calculated this way, his entire value is based on defense, and yet he rivals Beltran, Granderson, Upton, McLouth, and Rowand).
How do you respond to these conclusions? Shouldn't an adjustment be made for reliability of the data, where offensive metrics are weighted higher and defense indicators discounted, in order to render a credible judgement?
As mentioned earlier, we use caution in small samples of defensive data. However, in Gutierrez's case, his defense value is far from a small-sample fluke. He led all right fielders in Plus/Minus Runs Saved in 2007 in only 579 innings, and he repeated the feat in 2008 in just over a half-season's worth of innings. After last season, Fielding Bible Award voters were convinced of his ability and gave him the award over Nick Markakis, Ichiro Suzuki, and everyone else, despite the fact that Gutierrez has a sub-par arm in right field.
When new Seattle GM Jack Zduriencik brought in Gutierrez to play center and Endy Chavez to play left, we touted the Mariners as having the best defensive outfield in baseball. Sure enough, Seattle has the second best Defensive Runs Saved in their outfield with 29 through July 28, second only to Oakland's 30.
Gutierrez has handled the transition to center field well. While his Plus/Minus numbers are down at the tougher position as you would expect, he still leads the league in Plus/Minus Runs Saved, and his arm is less of an issue. On top of his defensive prowess, Gutierrez is having his best season at the plate. In The Fielding Bible--Volume II, we combined offense, defense, baserunning, and positional value into Total Runs. If Gutierrez keeps playing like this in the second half, he could find himself among the top players in baseball on our 2009 Total Runs leaderboard.
Nyjer Morgan is a different story because the sample size is smaller. He's played as an above average outfielder, but he's only logged 1300 innings across three seasons counting all three outfield positions. He rates above average with a total of 21 runs saved on defense in his limited time, but we don't consider him in Gutierrez' class (yet).

Sign In »
Register Now!
Help



This topic is locked











