First, methods: To do this, I took the raw Gameday data provided by MLB and mined it for a few variables: the starting pitch speed, the break in X and Z, and the spin (a computed statistic from Fast and Nathan's work). I then took these data for each pitch and ran a TwoStep Cluster analysis looking for 5 pitch types, hoping to isolate a curve, 2seam fastball, 4seam fastball, changeup, and cutter.
There are two ways to visualize this data that makes sense. The first is to look at breakX by breakZ - these are the usual graphs that SoxScout uses (the ones that are common to those looking at PitchFX data).
Here is Josh Beckett's stuff from last night:

First, the location on the Pitch tracking system seems a little bit screwy tonight. But, ignoring the absolute value of the X axis, What you can see in that graph is 5 pitch types (sorry about the yellow!), with a 4seam fastball, a 2seamer that breaks slightly more to the catcher's left, a cutter that breaks slightly more to the catcher's right, a changeup that primarily breaks down, and a curve that breaks hard down and right.
Compare that to the Gameday classifications for the same data:

First of all, we can see that Gameday is misclassifying pitches, and second, we see that we get no differentiation (even though codes for Fourseam, Cutter, and 2Seam exist) between his fastball types.
A second way to visualize this data is to look at Spin, which works backwards from the data to attempt to isolate what the pitcher actually did to the ball to get it to move the way it did.

Here, the graph includes speed on the Y axis, which gives us a very nice separation between the action of the change and the fastballs. The X axis of this graph is actually the opposite direction of the X axis of the above graph (think of it as the pitchers right and left), so his curve is breaking down and to his left, etc.
One thing that really stands out if we look at the data this way is that even though there seems to be some clear separation in terms of the action of the ball (from the previous graphs), the speed of the cutter, 4seam, and 2seam is almost equivalent. This is nearly identical to the conclusion that Mike Fast reached last year, but within a single game this is a clear picture of the fact that not only is the ball moving very fast, but the fact that it is a fastball gives hitters little indication of the actual movement on the pitch. He can throw the fastball at virtually the same velocity and have it move in 3 slightly different ways.
Finally, I thought I would share some new thoughts on Pitch Sequencing. I broke the game down by first half and second half (Beckett threw 8 innings, so 4 in each).

The above graph shows the likelihood (y-axis) of a particular pitch (category variables) following a particular previous pitch, broken down by first and second half of the game. If you're having trouble reading the graph: X axis shows what the PREVIOUS pitch was, and the bars show what the CURRENT pitch is. For example, let's look at the 2Seam. After you saw the 2Seamer in the first half of the game, 50% of the time you saw another 2Seamer and 38% of the time you saw a curve. This is in stark contrast to the second half of the game, where he almost always followed the 2Seam fastball with a 4Seam fastball.
The most striking pitch sequence information is in the Curveball. Check out the Curve - during the first half of the game, he never backs up a curveball with a second curveball, always going to either a Fourseamer or Cutter. In the second half of the game, it's totally different: 50% of the time he throws a curve, it's immediately followed by another curve.
There's plenty of other things to do with this kind of analysis, and hopefully this can be expanded to other pitchers. How successful are pitches followed by the same pitch? What's the best first pitch to throw? What kind of sequence generates the most swings and misses? I'm honestly still learning how to do lots of this stuff, and would appreciate feedback (positive or negative) on any / all of it.
Thanks for taking the time to read. =)
Edited by Jnai, 18 April 2008 - 12:14 AM.












