Jump to content


Yo! You're not logged in. Why am I seeing this ad?

Photo

Josh Beckett: The PitchFX Breakdown


This topic has been archived. This means that you cannot reply to this topic.
9 replies to this topic

#1 Jnai


  • is not worried about sex with goats


  • 9,616 posts

Posted 18 April 2008 - 12:10 AM

We all know that Josh Beckett is the ace of the staff - but what's he using to do it? Mike Fast broke down Josh Beckett's stuff in an article last year; during the end of his start @ the stadium I took the time to do the same.

First, methods: To do this, I took the raw Gameday data provided by MLB and mined it for a few variables: the starting pitch speed, the break in X and Z, and the spin (a computed statistic from Fast and Nathan's work). I then took these data for each pitch and ran a TwoStep Cluster analysis looking for 5 pitch types, hoping to isolate a curve, 2seam fastball, 4seam fastball, changeup, and cutter.

There are two ways to visualize this data that makes sense. The first is to look at breakX by breakZ - these are the usual graphs that SoxScout uses (the ones that are common to those looking at PitchFX data).

Here is Josh Beckett's stuff from last night:
Posted Image

First, the location on the Pitch tracking system seems a little bit screwy tonight. But, ignoring the absolute value of the X axis, What you can see in that graph is 5 pitch types (sorry about the yellow!), with a 4seam fastball, a 2seamer that breaks slightly more to the catcher's left, a cutter that breaks slightly more to the catcher's right, a changeup that primarily breaks down, and a curve that breaks hard down and right.

Compare that to the Gameday classifications for the same data:
Posted Image

First of all, we can see that Gameday is misclassifying pitches, and second, we see that we get no differentiation (even though codes for Fourseam, Cutter, and 2Seam exist) between his fastball types.

A second way to visualize this data is to look at Spin, which works backwards from the data to attempt to isolate what the pitcher actually did to the ball to get it to move the way it did.

Posted Image

Here, the graph includes speed on the Y axis, which gives us a very nice separation between the action of the change and the fastballs. The X axis of this graph is actually the opposite direction of the X axis of the above graph (think of it as the pitchers right and left), so his curve is breaking down and to his left, etc.

One thing that really stands out if we look at the data this way is that even though there seems to be some clear separation in terms of the action of the ball (from the previous graphs), the speed of the cutter, 4seam, and 2seam is almost equivalent. This is nearly identical to the conclusion that Mike Fast reached last year, but within a single game this is a clear picture of the fact that not only is the ball moving very fast, but the fact that it is a fastball gives hitters little indication of the actual movement on the pitch. He can throw the fastball at virtually the same velocity and have it move in 3 slightly different ways.

Finally, I thought I would share some new thoughts on Pitch Sequencing. I broke the game down by first half and second half (Beckett threw 8 innings, so 4 in each).

Posted Image

The above graph shows the likelihood (y-axis) of a particular pitch (category variables) following a particular previous pitch, broken down by first and second half of the game. If you're having trouble reading the graph: X axis shows what the PREVIOUS pitch was, and the bars show what the CURRENT pitch is. For example, let's look at the 2Seam. After you saw the 2Seamer in the first half of the game, 50% of the time you saw another 2Seamer and 38% of the time you saw a curve. This is in stark contrast to the second half of the game, where he almost always followed the 2Seam fastball with a 4Seam fastball.

The most striking pitch sequence information is in the Curveball. Check out the Curve - during the first half of the game, he never backs up a curveball with a second curveball, always going to either a Fourseamer or Cutter. In the second half of the game, it's totally different: 50% of the time he throws a curve, it's immediately followed by another curve.

There's plenty of other things to do with this kind of analysis, and hopefully this can be expanded to other pitchers. How successful are pitches followed by the same pitch? What's the best first pitch to throw? What kind of sequence generates the most swings and misses? I'm honestly still learning how to do lots of this stuff, and would appreciate feedback (positive or negative) on any / all of it.

Thanks for taking the time to read. =)

Edited by Jnai, 18 April 2008 - 12:14 AM.


#2 jayhoz


  • browndog's marshmallow bitch


  • 12,290 posts

Posted 18 April 2008 - 06:21 AM

Finally, I thought I would share some new thoughts on Pitch Sequencing. I broke the game down by first half and second half (Beckett threw 8 innings, so 4 in each).

Posted Image

The above graph shows the likelihood (y-axis) of a particular pitch (category variables) following a particular previous pitch, broken down by first and second half of the game. If you're having trouble reading the graph: X axis shows what the PREVIOUS pitch was, and the bars show what the CURRENT pitch is. For example, let's look at the 2Seam. After you saw the 2Seamer in the first half of the game, 50% of the time you saw another 2Seamer and 38% of the time you saw a curve. This is in stark contrast to the second half of the game, where he almost always followed the 2Seam fastball with a 4Seam fastball.

The most striking pitch sequence information is in the Curveball. Check out the Curve - during the first half of the game, he never backs up a curveball with a second curveball, always going to either a Fourseamer or Cutter. In the second half of the game, it's totally different: 50% of the time he throws a curve, it's immediately followed by another curve.


Really great stuff Jnai. Thank you for putting this together. At the risk of over complicating things I have a couple questions about the above chart. Does it take into account which pitches were first in an at bat? If the data includes things like a curve to finish a batter off followed by a 4 seamer to start off the next batter I think that may muddy the waters a little. Maybe add a category for first pitch in an at bat?

It might also be interesting to look at Beckett's (and Tek's) pitch selection depending on the count and compare that to the data in the chart above. It seemed to me that Beckett was less sharp in the later innings and we might see that his pitch selection had more to do with whether he was ahead of the batter or not rather than showing the yanks a different sequence in the later innings.

#3 BCsMightyJoeYoung

  • 2,867 posts

Posted 18 April 2008 - 08:04 AM

Excellent work as well.

The one thing that jumps out is that in the second half Beckett was doubling up on his off speed stuff quite often - changeups followed by changeups - curvers followed by curves. As previously mentioned his command seemed a bit off in the latter stages so he had to do more real pitching.

On an observational level it seemed as if the Yanks were hacking the first few innings - the first FB they saw they hit.

#4 Jnai


  • is not worried about sex with goats


  • 9,616 posts

Posted 18 April 2008 - 08:25 AM

Really great stuff Jnai. Thank you for putting this together. At the risk of over complicating things I have a couple questions about the above chart. Does it take into account which pitches were first in an at bat? If the data includes things like a curve to finish a batter off followed by a 4 seamer to start off the next batter I think that may muddy the waters a little. Maybe add a category for first pitch in an at bat?


It doesn't have this problem, First pitches are excluded.

It might also be interesting to look at Beckett's (and Tek's) pitch selection depending on the count and compare that to the data in the chart above. It seemed to me that Beckett was less sharp in the later innings and we might see that his pitch selection had more to do with whether he was ahead of the batter or not rather than showing the yanks a different sequence in the later innings.


Yeah. I'd be interested as well to use pitch count, but I think I'd need a better or bigger sample. I could do ahead in the count vs. behind in the count. It might also be interesting to scrape or calculate leverage (does anyone know the formula?) and compare pitches in high lev vs. low lev situations.


On an observational level it seemed as if the Yanks were hacking the first few innings - the first FB they saw they hit.


Interesting idea; I will go back in the data and check for their first swings later.

Thanks for the feedback. =)

Edited by Jnai, 18 April 2008 - 08:26 AM.


#5 Stuffy McInnis

  • 735 posts

Posted 18 April 2008 - 08:41 AM

Posted Image


I don't think it's clear at all that the cutter is a distinct pitch. The location variability of a the entire cluster of 4seam and Cutter is approximately the same area as the cluster of 2seam fastballs. That may just be the how much variability there is in the final location of a ball thrown at that speed by Josh Beckett. If the 4seam and Cutter are really the same pitch, then creating two categories really muddies the analysis further down the post.

For example, if you treat the whole cluster as single 4seam pitch, then Josh followed his curveball by a 4seamer 100% of the time in the first half of the game, which is more interesting than knowing he followed it up with a 4seamer 50% of the time and a slightly different cutter 50% of the time.

#6 Jnai


  • is not worried about sex with goats


  • 9,616 posts

Posted 18 April 2008 - 08:55 AM

I don't think it's clear at all that the cutter is a distinct pitch. The location variability of a the entire cluster of 4seam and Cutter is approximately the same area as the cluster of 2seam fastballs. That may just be the how much variability there is in the final location of a ball thrown at that speed by Josh Beckett. If the 4seam and Cutter are really the same pitch, then creating two categories really muddies the analysis further down the post.

For example, if you treat the whole cluster as single 4seam pitch, then Josh followed his curveball by a 4seamer 100% of the time in the first half of the game, which is more interesting than knowing he followed it up with a 4seamer 50% of the time and a slightly different cutter 50% of the time.


It's true, and it's a good point.

When I first showed these graphs to someone (Sprowl), I had only done the Cluster Analysis using 4 clusters, and the ones that come up are 4seam, 2seam, curve, and change. He mentioned that he seemed to remember that Beckett threw a cutter according to someone, so I went back to the original Beckett article. So blame those guys! =)

If you include 5 clusters in the cluster analysis, it *does* pull the 4seam cluster apart into two clusters, and they do roughly correspond to the 4seam/cutter distinction that had been previously made. So it is the weakest cluster in that sense, but it is also appropriately found. So it might exist, it might not. It's probably not a completely different pitch than the 4seam, to be honest, so you're right, it might be interesting to redo sequencing with 4Seam and Cutter grouped together.

#7 Snodgrass'Muff


  • smarter as Lucen


  • 21,190 posts

Posted 18 April 2008 - 01:34 PM

Yeah. I'd be interested as well to use pitch count, but I think I'd need a better or bigger sample. I could do ahead in the count vs. behind in the count. It might also be interesting to scrape or calculate leverage (does anyone know the formula?) and compare pitches in high lev vs. low lev situations.


A link for you. Leverage index as calculated by fangraphs.com: http://www.fangraphs...2...ankees&dh=0

That won't give you pitch by pitch leverage indexes, but it will give you every at bat. Hopefully that's helpful to you.

I was going to link to their glossary page for leverage index, but they don't post the formula so it probably won't be of any use to you. I'd like to say that I've learned a lot, especially from the first post, so keep up the good work. I hope you'll continue to do similar things in the Daisuke thread as that would be a fascinating compliment to the work I'm doing there.

Edited by Snodgrass'Muff, 18 April 2008 - 01:39 PM.


#8 Eric Van


  • Kid-tested, mother-approved


  • 10,990 posts

Posted 18 April 2008 - 05:41 PM

It's true, and it's a good point.

When I first showed these graphs to someone (Sprowl), I had only done the Cluster Analysis using 4 clusters, and the ones that come up are 4seam, 2seam, curve, and change. He mentioned that he seemed to remember that Beckett threw a cutter according to someone, so I went back to the original Beckett article. So blame those guys! =)

Beckett does throw a rare cutter but it's not subtle like the ones you've tentatively identified. He threw three of them in ALDS game 1, a game where his 4-seamer ranged from -3.7 to -10.5 on the x axis (but mostly -5.7 to -9.2), and they were -1.4 to -1.9, and they averaged 94.6 vs. 96.0 for his 4-seamer. I think my plot of that game is somewhere here.

One thing I've taken to doing is calculate not just the average spin axis and rotation (NB: spin axis vs. rotation is a wicked pissa chart, even more informative than axis vs. velocity) but their standard deviation. Once I've classified the pitches I look to see if there's any overlap between mean + 2 SD for each pitch. Minimal or no overlap indicates the pitches can be treated as distinct. Again, this is best done game-by-game for a starter.

This is in general great work, though.

#9 Hairps

  • 1,738 posts

Posted 21 April 2008 - 02:25 PM

For those of you who dare not tread beyond the Friendly Confines of The Main Board, Jnai has been doing some awesome work with PitchFX:

wiki

For those interested in lending a hand or providing additional insights, check out this thread:

A PitchFX Project

Great stuff, Jnai.

#10 Tangotiger

  • 447 posts

Posted 22 April 2008 - 10:04 AM

Leverage Index:

http://www.InsideTheBook.com/li.shtml

Click on ARTICLES if you want to see how it's calculated.