Jump to content


Yo! You're not logged in. Why am I seeing this ad?

Photo

Building a better pitch classifier?


  • This topic is locked This topic is locked
39 replies to this topic

#1 Kevin Youkulele


  • wishes Claude Makelele was a Red Sox


  • 1,335 posts

Posted 01 May 2008 - 01:18 PM

Some of the discussion in the Lester thread touched on the idea that with better/less error-prone pitch classification, we could do things like chart how much movement a pitch such as Lester's cutter has from day to day. Right now, some things like 2-seamers are misclassified as cutters and vice versa, meaning that the average would be thrown off.

A simple solution for the time being would be to use median movement instead; the median is much less sensitive to outliers (i.e., errors).

Building a better pitch classifier is also an interesting challenge, though, and I thought some discussion of what kind of algorithm would be best may be worthwhile. I might not be adverse to implementing such an algorithm if it seems like we come to a consensus that would likely be better.

Here are my thoughts about how I would build it. One big question is how much foreknowledge is available about what types of pitches a guy throws. I'm going to try to stick to a model where such information is usable but not strictly required. I'm also going to make a few very broad assumptions that will be useful in the majority of cases, e.g., everyone throws a fastball.

The basic idea is going to be to figure out what a guy throws (skip this step if it is provided as input), dynamically construct a median behavior for each type of pitch, and then categorize each thrown pitch into its best-fit model. For each type of pitch, the algorithm will see if there is a cluster of pitches close to the general expectation, and then modify the general parameters to fit that pitcher's particularities. I think it would be best to define speeds and other attributes of various pitch types as relative to the 4-seam fastball (4SFB), which I believe will reduce the effect of pitcher to pitcher variation.

The variables I plan to consider for each pitch are: speed, spin, h-break, and v-break. (h and v stand for horizontal and vertical, respectively). Should anything else be considered?

Enough generalities; here's an outline of what the algorithm would actually do, starting with a data table with the above quantities for each pitch.

Estimate 4SFB parameters: Take the mean (median?) of the 10 fastest pitches. (exact number used subject to optimization). This will probably give a speed value for the 4SFB that needs to be adjusted downward since we're selecting for the fastest ones.

From this, develop expected general profiles for each pitch type. h-break values are as for RHP; invert sign for LHP. Example profiles:
2SFB: speed = 4SFB - 2, h-break = 4SFB - 4, spin = 4SFB - 30, v-break = 4SFB - 2
cutter: speed = 4SFB - 5, h-break = 4SFB + 6, spin = 4SFB - ??, v-break = 4SFB - 4 (I haven't looked up comprehensive spin data)
slider: speed = 4SFB - 9, h-break = 4SFB + 8, spin = 4SFB - ??, v-break = 4SFB - 9
And so on.

Determine which profile each pitch fits best, probably using root-mean-square deviation for the four attributes. RMSDs may need to be weighted to account for different degrees of variability for each attribute. For now, exclude ambiguous pitches from consideration (ambiguous means that the two best RMSDs are closer than some threshold, to be optimized later). This gives a list of pitches something like this, assuming a set of ~100 pitches:
4SFB 35
2SFB 20
cutter 18
curveball 15
changeup 4

In cases where the repertoire is not known in advance, it will be tricky to decide whether to include sparsely represented pitch types. For now let's say we need 8 hits to move forward with a type; again, this is subject to optimization.

Recalculate the profiles for each pitch type based on the hits, potentially with some regression for rare pitches.

Re-fit all pitches to the new profiles; this is the final, classified set. (Alternatively, the process could be iterated until convergence is achieved.)

Comments?

#2 absintheofmalaise


  • too many flowers


  • 8,749 posts

Posted 01 May 2008 - 01:25 PM

One variable to plan for is that some pitchers will throw the same type pitch from a different arm angle/slot and it will move/break at a different angle than before.

#3 finnVT

  • 943 posts

Posted 01 May 2008 - 01:41 PM

From this, develop expected general profiles for each pitch type. h-break values are as for RHP; invert sign for LHP. Example profiles:
2SFB: speed = 4SFB - 2, h-break = 4SFB - 4, spin = 4SFB - 30, v-break = 4SFB - 2
cutter: speed = 4SFB - 5, h-break = 4SFB + 6, spin = 4SFB - ??, v-break = 4SFB - 4 (I haven't looked up comprehensive spin data)
slider: speed = 4SFB - 9, h-break = 4SFB + 8, spin = 4SFB - ??, v-break = 4SFB - 9
And so on.


This is a fun idea. But, I'm not sure that measuring everything relative to the fastball is necessarily the best strategy. It works fine for typical pitches, but what about something that's an outlier--say, has the speed of a cutter, but the break of a slider? Really, this is a perfect application of k-means clustering algorithms (or really any type of clustering analysis). For any give game, we've got about 100 samples of 4-dimensional data (speed, h-break, v-break, spin). If we combine over a bunch of starts, that's actually a pretty big data set, and once you've got your clusters defined, you can see how the cluster moves on a given start compared to the multi-start cluster (i.e., are the cuveballs breaking more than usual, etc). Also allows some flexibility in terms of the number of clusters.

#4 Jnai


  • is not worried about sex with goats


  • 8,169 posts

Posted 01 May 2008 - 02:00 PM

This is a fun idea. But, I'm not sure that measuring everything relative to the fastball is necessarily the best strategy. It works fine for typical pitches, but what about something that's an outlier--say, has the speed of a cutter, but the break of a slider? Really, this is a perfect application of k-means clustering algorithms (or really any type of clustering analysis). For any give game, we've got about 100 samples of 4-dimensional data (speed, h-break, v-break, spin). If we combine over a bunch of starts, that's actually a pretty big data set, and once you've got your clusters defined, you can see how the cluster moves on a given start compared to the multi-start cluster (i.e., are the cuveballs breaking more than usual, etc). Also allows some flexibility in terms of the number of clusters.


I use Cluster analysis to group pitches; the problem is that the pitchfx systems are actually subject to variability as well, so it is difficult to do these cross-start comparisons and have good fidelity that you are really measuring changes in the pitch relative to changes in the system. I actually use those four variables - you can see that from the Josh Beckett thread (I think this has fallen from the mainboard by now).

There are a lot of issues here. The main issue with the MLB Gameday algorithm is trying to solve the problem for every pitch thrown in real-time, and this is a very difficult mathematical problem, especially given variability between pitchers and variability between systems on a given day.

There are a few other problems with KYouk's suggestion at the top, which is that we will run into the same problem that Gameday runs into - unless we are going to manually define the repertoire for each pitcher, it is going to be difficult to distinguish pitches like DiceK's slurve or his cutter. Will we be using the same set of expected variables for each pitcher? If so, how will we deal with crazy extremes, like Tim Wakefield's fastball (which will, by all rights, probably show up as a changeup) vs. Jonathan Papelbon's Fastball (which will probably be faster than the average by quite a lot)? It's a tough problem for which there's not a great solution.

By the way, if we can get an algorithm working that would classify pitches on a per-start basis I would have no problem updating and adding it to my PitchFX tool. It'd (fuck the filters, seriously, fuck them) be project I am interested in pursueing.

Edited by Jnai, 01 May 2008 - 02:23 PM.


#5 Noah

  • 3,142 posts

Posted 01 May 2008 - 02:03 PM

If you're only worried about doing the classification as post-processing and don't care about doing it in real time, I would first "plot" all pitches in 5-dimensional space (speed, h-break, v-break, spin, rpm). Then smooth it into a density plot. And then all you have to do is identify the local maxima (higher than some threshhold) that correspond to individual pitch clusters, which I would guess would be pretty easily identifiable in 5D space. And you can of course tweak the parameters of how you do the smoothing in each dimension and the threshholding until you find something that works empirically. It's possible that you wouldn't be able to find a set of parameters that works for every pitcher, but again, with 5 attributes I'd bet you can. This would be a fantastic Matlab project.

edit: Jnai, I assume this is basically what you mean by "cluster analysis."

In any case, like you said, the good way about doing it this way is that you don't have to make up attributes for each individual pitch. You identify the clusters (which is really the hard part anyway), and then worry about the actual classification later.

Edited by Noah, 01 May 2008 - 02:06 PM.


#6 Worst Trade Evah


  • SoSH Member


  • 10,824 posts

Posted 01 May 2008 - 02:10 PM

RQ-mode principal components analysis? Pretty much what's been talked about already by FinnVT and Kevin I think.

I've been thinking about trying this in some scripts I have using R, but I'm a little rusty with this stuff and don't have loads of time.

Each game should be a separate run, or each pitcher in each specific park at least.

Edited by Worst Trade Evah, 01 May 2008 - 02:13 PM.


#7 Kevin Youkulele


  • wishes Claude Makelele was a Red Sox


  • 1,335 posts

Posted 01 May 2008 - 02:25 PM

I was thinking along the lines of a post-processing approach.

As for cluster analysis: it seems like there are a couple different flavors of it being discussed now. I've done some cluster analysis looking at gene expression data for my real job, and in that context it's been pearson-correlation based hierarchical clustering, although the software has some other options too (k-means, self-organizing maps, principal component analysis). I wouldn't mind playing a bit with some pitch data. Jnai (or anyone else who knows), how are you parsing the XML? I need to get it in a tab-delimited form for the clustering software.

#8 Jnai


  • is not worried about sex with goats


  • 8,169 posts

Posted 01 May 2008 - 02:28 PM

I was thinking along the lines of a post-processing approach.

As for cluster analysis: it seems like there are a couple different flavors of it being discussed now. I've done some cluster analysis looking at gene expression data for my real job, and in that context it's been pearson-correlation based hierarchical clustering, although the software has some other options too (k-means, self-organizing maps, principal component analysis). I wouldn't mind playing a bit with some pitch data. Jnai (or anyone else who knows), how are you parsing the XML? I need to get it in a tab-delimited form for the clustering software.


I parse it two ways. One way is to use the PHP simplexml() functions - that's for the graphs generated on the web. The other is just the import utility in Excel.

If you don't have access to excel, I can write you a real quick tab delimited parser. But excel can export in tab'd format.

#9 OttoC


  • Mr. Excel


  • 6,446 posts

Posted 01 May 2008 - 02:39 PM

There is one other problem that can produce "false positives," namely pitches that don't do what they are supposed to do, such as breaking balls that don't break. Overall, this may not be great number but some pitchers may be more prone to things like that than others. I would think that pitches in the dirt might be hard to classify, too.

#10 mr guido

  • 3,104 posts

Posted 01 May 2008 - 04:05 PM

First: I wouldn't necessarily discount the MLB's classification of pitches. They have all the data and pay professionals to work on the problem, so unless you know what you're doing I wouldn't expect to surpass their results. Eyeballing the data and creating arbitrary rules to perform classification is not going to cut it. I don't know MLB's algorithm at all, but at the very least I can guess they are trying to do something smart, because if you look at the data you see their classifier spits out a confidence level for each pitch.

<pitch des="Foul" type="S" id="3" x="98.71" y="142.47" sv_id="080411_190704" start_speed="91.5" end_speed="82.6" sz_top="3.22" sz_bot="1.54" pfx_x="-4.136" pfx_z="11.464" px="-0.031" pz="2.549" x0="-1.211" y0="50.0" z0="6.341" vx0="4.492" vy0="-133.942" vz0="-7.686" ax="-7.341" ay="35.14" az="-11.753" break_y="23.7" break_angle="24.3" break_length="3.5" pitch_type="FA" type_confidence="1.3330829648258042" />
This is a good sign, even if someone needs to teach them the meaning of "significant digits".

Second: if you do want to try improving on their algorithm, then using machine learning algorithms is definitely the only way to go. I would suggest starting with something like WEKA, an open source data mining toolkit that can be used both as a standalone application or as a java library in your own code. It contains plenty of clustering and bootstrapping algorithms that would probably be helpful. You could experiment using it's GUI to see if you are on the right track, then write your own code to go step-by-step through the process. Of course if you understand what I'm saying here you probably already have your own goto ML library of choice anyhow, and if not you're probably already screwed.

Also, why use just 4 features (speed, h, v, spin)? Why not toss in all the various features (see example above) into something that can tell you what the most relevant ones are?

Warning: given that there is no "ground truth", it's going to be awful hard to prove that your pitch classifier is "better". Unless you can ask a pitcher or go through video and watch a catcher's signs or something.

What would be the use of investing all this effort into a 'better pitch classifier'? Is there a reason you're not happy with what's already out there?

I would much rather spend my time building an algorithm to predict pitch sequence, or analyzing the effects of pitch sequence (what works better, fastball followed by fastball, or fastball followed by change?, etc)

Just one AI geek's 2¢

#11 gator92

  • 178 posts

Posted 01 May 2008 - 04:24 PM

It's obvious why PitchFx would want an accurate pitch classifier (to put it on Gameday), but why does a serious analyst need it? Seems to me once you've gone to all the trouble of classifying a 5-dimensional distribution into a handful of discrete categories, you're just going to turn around and start asking questions like "what is it about Papelbon's fastball that makes it different from (pitcher X)'s fastball?" Why not just analyze with the factors, and dont' worry so much about the classification? I admit, it will be fun to say whose fastball is the fastest, whose curve breaks the most, etc., but considering pitchers intentionally and unintentionally don't throw every fastball the same, or every curve the same, classification is bound to be misleading anyway...

#12 Kevin Youkulele


  • wishes Claude Makelele was a Red Sox


  • 1,335 posts

Posted 01 May 2008 - 04:37 PM

It's obvious why PitchFx would want an accurate pitch classifier (to put it on Gameday), but why does a serious analyst need it? Seems to me once you've gone to all the trouble of classifying a 5-dimensional distribution into a handful of discrete categories, you're just going to turn around and start asking questions like "what is it about Papelbon's fastball that makes it different from (pitcher X)'s fastball?" Why not just analyze with the factors, and dont' worry so much about the classification? I admit, it will be fun to say whose fastball is the fastest, whose curve breaks the most, etc., but considering pitchers intentionally and unintentionally don't throw every fastball the same, or every curve the same, classification is bound to be misleading anyway...

How do you ask "what is it about Papelbon's fastball that makes it different from (pitcher X)'s fastball?" without knowing, accurately, which of Papelbon's pitches are fastballs? I started the topic because the gameday info was making some mistakes that look like they should be avoidable, at least from a post-processing standpoint.

"Why not just analyze with the factors, and dont' worry so much about the classification?" Are you saying that you want to take the various observations from a set of pitches (some of which are curves, some of which are fastballs) and combine them? Or what? I don't really understand this. I think you need to separate pitch types in order to make sense of them.

Another thing is that right now, pitches are defined in qualitative ways--how you throw them and the general behavior of the pitch. This is a chance to come up with, as much as is possible, a robust definition of the various pitch types based on their behavior. (The dynamic profiling I discussed in the original post, and/or various clustering approaches mentioned in subsequent posts might be useful for assessing the degree of continuity vs discreteness in the cutter-slutter-slider-slurve-curve spectrum.)

#13 gator92

  • 178 posts

Posted 01 May 2008 - 04:52 PM

How do you ask "what is it about Papelbon's fastball that makes it different from (pitcher X)'s fastball?" without knowing, accurately, which of Papelbon's pitches are fastballs? I started the topic because the gameday info was making some mistakes that look like they should be avoidable, at least from a post-processing standpoint.

"Why not just analyze with the factors, and dont' worry so much about the classification?" Are you saying that you want to take the various observations from a set of pitches (some of which are curves, some of which are fastballs) and combine them? Or what? I don't really understand this. I think you need to separate pitch types in order to make sense of them.

Another thing is that right now, pitches are defined in qualitative ways--how you throw them and the general behavior of the pitch. This is a chance to come up with, as much as is possible, a robust definition of the various pitch types based on their behavior. (The dynamic profiling I discussed in the original post, and/or various clustering approaches mentioned in subsequent posts might be useful for assessing the degree of continuity vs discreteness in the cutter-slutter-slider-slurve-curve spectrum.)

I think we agree here. When you use the names we've become accustomed to as the categories, you run the risk of tripping over those associations. So if one pitcher's cutter looks like another's slider, why not consider them to be the same? I guess there's some value to associating the outcome (in terms of path) with the pitcher's intent, so maybe you can't entirely do without it...

I'm coming at this from the persepctive of having done some analysis of batted balls. I can get a vertical launch angle for a hit, and a speed off the bat, and maybe some combination of those two can be called a "line drive", and some other combination can be called a "fliner", but breaking things up that way doesn't really help, I'd prefer to just consider regions of space like "hits with VLA 10-20 and SOB 90-100mph".

But maybe the linking to intent is the key point... there isn't really any corresponding intent with batted balls, or at least not any intent that is a selection like pitches are...

#14 Eric Van


  • Kid-tested, mother-approved


  • 10,901 posts

Posted 01 May 2008 - 05:37 PM

Should anything else be considered?

Too busy this moment to give a detailed reply, more later.

Spin axis and effective RPM (RPM in the plane perpendicular to the path of the ball) can be figured from the raw data and I have found them to be clearly more useful than break parameters. I do 90% of my classification from the spin charts, only checking break occasionally.

Elsewhere on this board I proposed a whole system for breaking pitches based on vertical break. That needs to be revised and re-proposed.

Many pitchers (consciously or unconsciously) throw a single pitch two or more different ways (I can make my two-seamer sink or run; I can make my curve sweep or go 12 to 6). So we want two levels of classification, one that is that fine and one that lumps them together by the catcher's probable sign.

Knowing a pitcher's repertoire is handiest for separating changes and splitters.

After I get my Mom from the airport I'll read this whole thread and make a detailed proposal!

BTW, mr guido, MLB's algorithm is pretty bad. It uses a neural net and someone has placed too much confidence in that methodology! Certain pitches by certain pitchers it nails, others it is very clueless about (most of Buchholz's changeups, e.g.).

#15 mr guido

  • 3,104 posts

Posted 01 May 2008 - 06:30 PM

BTW, mr guido, MLB's algorithm is pretty bad. It uses a neural net and someone has placed too much confidence in that methodology! Certain pitches by certain pitchers it nails, others it is very clueless about (most of Buchholz's changeups, e.g.).

Certain types of neural networks should be nearly ideal for this type of problem, so it's my guess that the training data is the issue. It's likely that they didn't train up a separate classifier for each pitcher, but rather used data across all pitchers to make one classifier. Of course this begs the question, if Wakefield's fastball is indistinguishable from someone else's (really slow) change, why call it a fastball? Does it really belong in the same category as a 95 mph heater from Felix Hernandez? (As gator argues above)

Any algorithm is only as good as the data you give it. So at the very least you'll need some sort of collaborative pitch marking effort to get some amount of meaningful data.

BTW, how do we know MLB uses a neural network, and do we know what kind? Is this published somewhere?

#16 OttoC


  • Mr. Excel


  • 6,446 posts

Posted 01 May 2008 - 08:47 PM

...Of course this begs the question, if Wakefield's fastball is indistinguishable from someone else's (really slow) change, why call it a fastball? Does it really belong in the same category as a 95 mph heater from Felix Fernandez? (As gator argues above)

Aren't we dealing with relative speeds here of a given pitcher's repertoire? One pitcher has a 70 mph change of pace and an 82 mph fastball and another pitcher has an 82 mph change of pace and a 94 mph fastball. Admittedly, there is more time to react to the 82 mph fastball but after a steady diet of junk balls, that fastball will look pretty quick.

#17 The Belly Itcher

  • 2,539 posts

Posted 01 May 2008 - 09:13 PM

Heh, I never look at the main board nowadays. I'd consider using a maximum likelihood estimate approach at first. But I can tell you that any type of analysis that requires an (re) expression of several variables (N) onto an N-dimensional space(e.g., PCA), followed by a categorization of some sort (e.g., cluster analyses, etc.), will likely require an analysis whose parameters (e.g., catergorization borders) are specific to a given player. Although one may be able to generalize these parameters across similar players (e.g., knuckleballers), which makes for an interesting question in and of itself. Hell, this is pretty simple analyses to do in principle. I could do it very easily if one has the data available as an Nxr matrix, with N representing each pitch and "r" representing each variable you are measuring (x coordinate, y coordinate, speed, etc.) during a given pitch.

Edited by The Belly Itcher, 01 May 2008 - 09:35 PM.


#18 Kevin Youkulele


  • wishes Claude Makelele was a Red Sox


  • 1,335 posts

Posted 01 May 2008 - 09:27 PM

here's some pitchF/X stuff for Wakey, just for the hell of it:
Posted Image
It's not distinguishing his "fastball" at all.

#19 Jnai


  • is not worried about sex with goats


  • 8,169 posts

Posted 01 May 2008 - 09:37 PM

Since there is obviously some interest in this, I am going to code a quick script and attach it to my pitchfx tool sometime tonight / tomorrow morning that takes the XML data and turns it into tab delimited data that you can copy/paste into your favorite stats/spreadsheet/porn-downloading program.

I use SPSS Two-Step Cluster analysis for this stuff usually, but I'm sure some stats guru really wants a shot at this problem and can solve it better.

If anyone really knows the formulas and wants to help write a script that does the analysis and then will actually plot the cluster centroids (to overlay on the graphs), that would be amazing. Some of you sound up to it!


Also: I know it wasn't the point (the point was that the thing doesn't even pick up his fastball), but Wakefield's PitchFX data is kinda iffy, because the trajectory smoothing that they use might not be so good at handling it, according to EV and Alan Nathan, who both commented on this someplace in a backwash post.

Edit: I fail at linking.

Edited by Jnai, 01 May 2008 - 11:28 PM.


#20 Noah

  • 3,142 posts

Posted 01 May 2008 - 09:51 PM

Not picking up on the curveball either. I assume those three in the lower right are curves ... but what are those other three that are slower than all the other knuckleballs? Soft knucklers or something? Hanging curves?

#21 Kevin Youkulele


  • wishes Claude Makelele was a Red Sox


  • 1,335 posts

Posted 01 May 2008 - 10:04 PM

Since there is obviously some interest in this, I am going to code a quick script and attach it to my PitchFX tool sometime tonight / tomorrow morning that takes the XML data and turns it into tab delimited data that you can copy/paste into your favorite stats/spreadsheet/porn-downloading program.

I use SPSS Two-Step Cluster analysis for this stuff usually, but I'm sure some stats guru really wants a shot at this problem and can solve it better.

If anyone really knows the formulas and wants to help write a script that does the analysis and then will actually plot the cluster centroids (to overlay on the graphs), that would be amazing. Some of you sound up to it!
Also: I know it wasn't the point (the point was that the thing doesn't even pick up his fastball), but Wakefield's PitchFX data is kinda iffy, because the trajectory smoothing that they use might not be so good at handling it, according to EV and Alan Nathan, who both commented on this someplace in a backwash post.

Great... this might turn into another way to waste more of my time thinking about baseball.

If it's not too much trouble, it would be great if your tab-delimited files included the derived parameters that are used for plotting but not necessarily explicitly present in the raw xml, like spin and rpm.

#22 alannathan

  • 202 posts

Posted 01 May 2008 - 11:05 PM

Certain types of neural networks should be nearly ideal for this type of problem, so it's my guess that the training data is the issue. It's likely that they didn't train up a separate classifier for each pitcher, but rather used data across all pitchers to make one classifier. Of course this begs the question, if Wakefield's fastball is indistinguishable from someone else's (really slow) change, why call it a fastball? Does it really belong in the same category as a 95 mph heater from Felix Hernandez? (As gator argues above)

Any algorithm is only as good as the data you give it. So at the very least you'll need some sort of collaborative pitch marking effort to get some amount of meaningful data.

BTW, how do we know MLB uses a neural network, and do we know what kind? Is this published somewhere?


Ross Paul, the MLBAM guy who created the neural network algorithm for pitch classification, will be at the upcoming PITCHf/x summit and will tell the assembled masses all about it. The plan is certainly to accumulate training data for individual pitchers. Obviously there are startup issues with that but hopefully that will improve as more data become available.

#23 alannathan

  • 202 posts

Posted 01 May 2008 - 11:13 PM

Great... this might turn into another way to waste more of my time thinking about baseball.

If it's not too much trouble, it would be great if your tab-delimited files included the derived parameters that are used for plotting but not necessarily explicitly present in the raw xml, like spin and rpm.


One word of caution: the spin direction (called "spin" by jnai) is well determined from the data (although not in the xml file, it is easily computed). It is related to the direction of the breaki. However, the spin magnitude (called "rpm" by jnai) is only approximately determined. I have written about this extensively (see my web site). What the data determines very well is the so-called lift coefficient, which is mainly determined by the total break. The relationship between the lift coefficient and the spin magnitude (rpm) is not perfectly well established.

One final word: looking at horizontal and vertical break is completely equivalent to looking at spin and rpm (again, using jnai's notation). That is to say, there is no new information contained in spin and rpm that is not already contained in horizontal and vertical break. The former tell us what the pitcher does to the ball whereas the latter tell us the effect on the actual trajectory.

#24 StupendousMan

  • 380 posts

Posted 01 May 2008 - 11:17 PM

There is an aspect to pitch classification which has not
yet been mentioned (as far as I know): what did the
pitcher INTEND to throw?

We look at the flight of the ball and try to draw
conclusions: "Oh, that must have been a 4-seam
fastball." But how often does our inference match
the actual grip? How often are we fooled by
different release points, or subtle changes in
grip, or other factors?

The "automatic" approach to classification
simply looks at the results and tries to lump
together pitches with similar trajectories.

The "manual" approach might involve watching
the catcher's signs for each pitch, and giving
each pitch a label based on that sign.

The best way to do this, of course, would be
to arrange for a special throwing session with
a pitcher, in which one asks him to throw
10 4-seam fastballs, then 10 curves, etc.
But I don't think that's going to happen.

Given the number of people watching games
with DVRs, though, it seems to me that it
might be possible to compare the result
of each pitch (as shown by f/x) with the
sign given by the catcher. Could someone
try this?

#25 Kevin Youkulele


  • wishes Claude Makelele was a Red Sox


  • 1,335 posts

Posted 01 May 2008 - 11:26 PM

I grabbed the data from Matsuzaka's start and ran it through a hierarchical cluster analysis. It seemed to do OK, although it strangely fragmented the cutters. Here is a graphical display of the output, where yellow is for high values, black is for average values, and blue is for low values within the range of each parameter (for spin, black is 180). This would certainly need some tweaking, but for a "ram the data down its throat and see what happens" trial, it's not too bad. It seems to be picking up some differences within the fastballs with respect to V-break, and I'm not sure how relevant that is.
Posted Image

#26 Reverend


  • B.P.I.W.


  • 14,586 posts

Posted 01 May 2008 - 11:40 PM

here's some pitchF/X stuff for Wakey, just for the hell of it:
Posted Image
It's not distinguishing his "fastball" at all.

That is one killer curve-ball though.

#27 Jnai


  • is not worried about sex with goats


  • 8,169 posts

Posted 01 May 2008 - 11:52 PM

As promised, there is now a "get tabled data" link that will parse the XML into an HTML table. Just spider as usual and it is under the pitch types selector.

If people actually want it tab delimited, I can do that as well, but you can simply copy and paste an HTML table into most software without too much trouble.

Edited by Jnai, 01 May 2008 - 11:53 PM.


#28 Eric Van


  • Kid-tested, mother-approved


  • 10,901 posts

Posted 02 May 2008 - 03:40 AM

One final word: looking at horizontal and vertical break is completely equivalent to looking at spin and rpm (again, using jnai's notation). That is to say, there is no new information contained in spin and rpm that is not already contained in horizontal and vertical break. The former tell us what the pitcher does to the ball whereas the latter tell us the effect on the actual trajectory.

Am I correct in thinking that in fact there is less information in the spin parameters than in the break parameters, and that this is desirable?

The observed break has to be a function of spin and velocity, right? Two pitches released identically in terms of spin but thrown at different speeds will break slightly differently. The result is that the break charts are basically the spin charts with added noise. (The velocity info is crucial by itself, mind you, but it also adds itself to the spin information to create the break information and in that role it's noise.)

The other reason I like the spin charts better is that the pitches divide more cleanly along the axes. In a break chart, the line between the 4-seam and 2-seam clusters is usually diagonal. In the spin chart, it's usually vertical; 2-seamers from a RHP have a higher axis and in most cases the gap between the two clusters is very easy to identify on the spin chart (especially because we've taken out the noise added by velocity variation).

BTW, am I the only person using bubble charts, where you get velocity in there as a third variable? For the third variable, I find a dividing line between fast and off-speed pitches, than exaggerate the distance to that line, so that the fast pitches are various sizes of blue and off-speed pitches various sizes of white, but nothing is too tiny to notice.

#29 mr guido

  • 3,104 posts

Posted 02 May 2008 - 01:10 PM

Done some more reading up on this stuff. Here's a Mike Fast article in which he claims to establish that the MLBAM algorithm is at 75% accuracy. His methodology and conclusions are a bit shady as it's a small sample (~700 pitches), he grades the algorithm against his own judgment (which he claims is 99% accurate, though I'd love to see inter-rater reliability numbers on that one), and he doesn't seem to have ever heard of a kappa statistic. But it's a start.

Anyhow, the 75% works out to a kappa of .55 (if you assume the baseline is to classify every pitch as a fastball), which is a reasonably decent number. Apparently MLBAM has since started limiting the classifier to only produce pitches that are known to be in a pitcher's repertoire, which will only push the kappa higher. All in all it seems that the algorithm is doing reasonably well, and could be improved substantially more if they put in more effort into distinguishing between fastballs and changeups (which is pretty heavily dependent on the individual pitcher). I imagine there will be a more rigorous analysis of the results presented at the pitch f/x summit by Ross Paul.

Fast provides a link to his spreadsheets if anyone wants to try building their own classifier on this stuff.

#30 Sprowl


  • mikey lowell of the sandbox


  • 16,491 posts

Posted 02 May 2008 - 01:24 PM

Also: I know it wasn't the point (the point was that the thing doesn't even pick up his fastball), but Wakefield's PitchFX data is kinda iffy, because the trajectory smoothing that they use might not be so good at handling it, according to EV and Alan Nathan, who both commented on this someplace in a backwash post.

Most pitches have a single trajectory, generated by the velocity and the spin applied by the pitcher. It makes sense to smooth those trajectories. The knuckleball could easily have multiple trajectories, since the pressure differentials caused by the passage of air over the seams will change at different points in the ball's flight. It might be necessary to break down the knuckleball's flight into several segments, each one of which could generate a different trajectory. I'm not suggesting that anybody put in the effort to do that: it would probably take at least 4 different sets of cameras to track the ball on its way to the plate, and it's just not worth the labor when so few pitchers throw knuckleballs.

Do any other pitches have multiple trajectories? In the 'all things baseball' thread, LoweTek wrote about his subjective experience as a batter seeing the cutter hook just before it got to the plate, in contrast to a slider whose movement can be picked up earlier in the arc. The tumbling action of a splitter seems to be somewhat idiosyncratic too, perhaps attributable to its lower RPM.

Ideally we would be able to analyze pitch choice and trajectory from at least 3 different perspectives: the objective trajectory of the ball, where PitchFX seems to do a pretty good job; the subjective intention of the pitcher (and catcher) in selecting a pitch type and applying the necessary spin and force to produce it (sometimes incorrectly, as in a hanging slider); and the subjective experience of the batter in identifying the pitch type (often incorrectly) and deciding whether and how to swing.

As promised, there is now a "get tabled data" link that will parse the XML into an HTML table. Just spider as usual and it is under the pitch types selector.

Dan, that's excellent. It makes life much easier for us Mactards whose Excel won't open XML properly. Many thanks!

The other reason I like the spin charts better is that the pitches divide more cleanly along the axes. In a break chart, the line between the 4-seam and 2-seam clusters is usually diagonal. In the spin chart, it's usually vertical; 2-seamers from a RHP have a higher axis and in most cases the gap between the two clusters is very easy to identify on the spin chart (especially because we've taken out the noise added by velocity variation).

A clean visual division of pitches, judged subjectively by the viewer, seems to me to be what the fan wants during the course of the game in order to evaluate pitch sequence and effectiveness. I keep returning to the speed * horizontal break because of the clear separation it produces for most pitchers. The primary exceptions seem to be young pitchers with iffy control but excellent movement who really don't know exactly what their pitches are going to do on the way to the plate (eg, Masterson and McGowan).

As Jnai pointed out in the Lester thread, the speed * spin direction chart does a very good job of separating Lester's 4-seamer and 2-seamer on one axis alone. The same separation is visible in the speed * horizontal break, but the gap is diagonal rather than vertical -- that is, Lester's 2-seamer is a few mph slower than the 4-seamer, and the slower it is, the more it moves horizontally. For the fan 'eyeballing' the chart, however, both chart types make the necessary distinctions, and speed * horizontal break can sometimes work better for distinguishing the changeup from other pitches, as in Lester's April 23 start when he was throwing 5 distinct pitches, including the changeups that he mostly discarded in his April 29 start.

#31 mr guido

  • 3,104 posts

Posted 02 May 2008 - 02:05 PM

Fast provides a link to his spreadsheets if anyone wants to try building their own classifier on this stuff.

Well, shoot.

I just tossed Fast's spreadsheet into Weka and did an evaluation of a neural network using 10-way cross validation. Still suffers from small sample size issues (here we have just 9 pitchers lumped into one model, instead of a couple hundred that MLBAM must contend with), but...

Correctly Classified Instances		 690			   96.7742 %
Incorrectly Classified Instances		23				3.2258 %
Kappa statistic						  0.953 
Total Number of Instances			  713	 

=== Detailed Accuracy By Class ===

TP Rate   FP Rate   Precision   Recall  F-Measure   Class
  1		 0.019	  0.98	  1		 0.99	 FAST
  0.85	  0.002	  0.992	 0.85	  0.915	CHANGE
  0.985	 0.012	  0.893	 0.985	 0.937	SLIDER
  1		 0.005	  0.961	 1		 0.98	 CURVE
  0.988	 0.006	  0.952	 0.988	 0.97	 CUTTER

=== Confusion Matrix ===

   a   b   c   d   e   <-- classified as
 350   0   0   0   0 |   a = FAST
   6 119   8   3   4 |   b = CHANGE
   0   1  67   0   0 |   c = SLIDER
   0   0   0  74   0 |   d = CURVE
   1   0   0   0  80 |   e = CUTTER
That ain't half bad. Guess we'd need more data to see why MLBAM is sucking.

#32 mr guido

  • 3,104 posts

Posted 05 May 2008 - 10:10 AM

Whoa guys, try not to stampede down my door in all the excitement over the new pitch classifier.

If anyone is actually interested in pursuing this on a larger scale, the first step will be to have some people classify a decent number of pitches by hand. Preferably we would have multiple people classifying the same pitches so we would know what the human rater reliability is as well.

#33 Tangotiger

  • 444 posts

Posted 05 May 2008 - 10:19 AM

One word of caution: the spin direction (called "spin" by jnai) is well determined from the data (although not in the xml file, it is easily computed). It is related to the direction of the breaki. However, the spin magnitude (called "rpm" by jnai) is only approximately determined. I have written about this extensively (see my web site). What the data determines very well is the so-called lift coefficient, which is mainly determined by the total break. The relationship between the lift coefficient and the spin magnitude (rpm) is not perfectly well established.

One final word: looking at horizontal and vertical break is completely equivalent to looking at spin and rpm (again, using jnai's notation). That is to say, there is no new information contained in spin and rpm that is not already contained in horizontal and vertical break. The former tell us what the pitcher does to the ball whereas the latter tell us the effect on the actual trajectory.


Alan, certainly each is derivable by the other. However, it is purely a question of perspective. Do pitchers, fans, and analyst want to know the direction of the spin, and the magnitude of the spin (the input) or do they want to know the effect (over and above the effect of a no-spin pitch with gravity)?

Your use of "break" here, as well as many other analysts just seems to muddle the picture here. If you ask someone about the "break" of a pitch, they do not mean how much a pitch break, after accounting for gravity, and over-and-above the spin-less pitch. I'm not sure anyone has that kind of baseline perspective to compare to!

At the very least, a different term ("movement" I'd propose) should be used from "break".

Anyway, going back to the issue, pitchers want to know both (they want to know what they need to do, to see a particular effect). Fans really only care about the break (movement plus gravity). Analysts? I don't know yet. But, I would prefer seeing both. Beer AND Nuts, as someone once said.

#34 Jnai


  • is not worried about sex with goats


  • 8,169 posts

Posted 05 May 2008 - 12:27 PM

Guido-




This is Jon Lester from Yesterday:







Clearly there is something wrong there.




Although there are codes for the 2-Seam and 4-Seam fastball, Gameday calls everything a 2-Seam fastball. It's sometimes labeling his cutter a fastball even though we know that's not the case (some of those be a slider, see the Lester thread). The curve and the change are done well, but curves and changes are very easy to ID for a guy like Lester.




So, about 1/3 of his pitches (the 4-Seam fastball) are incorrectly classified as 2-Seamers, and the Cutter is pretty iffy.




I'm not calling for anyone's head - and I think Gameday actually does a pretty good job, given what it has to work with - but I think it clearly can be improved upon. No?




Edited by Jnai, 05 May 2008 - 12:33 PM.


#35 mr guido

  • 3,104 posts

Posted 05 May 2008 - 01:08 PM

Jnai, maybe I wasn't clear above. Let me restate.

I looked at Mike Fast's brief analysis of the MLBAM classifer and saw that it achieved 75% accuracy and a .55 kappa, which is generally considered to be pretty good. Not perfect clearly, but good.

Then I took his data (which covered ~700 pitches, including one of Lester's starts) and trained my own classifier. It's a very simple multilayer neural network, trained via backpropagation, and evaluated with a 10-way cross-validation using Fast's best guess as a gold standard. (I can explain what this means if anyone is interested.)

Anyhow, my first attempt classifier achieved roughly 97% accuracy (a kappa of .96). So, yes, I'd say that's an improvement, albeit on small data and I would expect it to get worse as more pitchers are included in the sample.

I just retrained it now to differentiate between 4-seam & 2-seam, and it still did well.

Correctly Classified Instances		 691			   96.9144 %
Incorrectly Classified Instances		22				3.0856 %
Kappa statistic						  0.9577  
Total Number of Instances			  713	 

=== Detailed Accuracy By Class ===

TP Rate   FP Rate   Precision   Recall  F-Measure   Class
  0.994	 0.018	  0.978	 0.994	 0.986	FOURSEAM
  0.912	 0.001	  0.969	 0.912	 0.939	TWOSEAM
  0.893	 0.003	  0.984	 0.893	 0.936	CHANGE
  0.971	 0.008	  0.93	  0.971	 0.95	 SLIDER
  1		 0.005	  0.961	 1		 0.98	 CURVE
  1		 0.006	  0.953	 1		 0.976	CUTTER

=== Confusion Matrix ===

   a   b   c   d   e   f   <-- classified as
 314   1   0   0   0   1 |   a = FOURSEAM
   3  31   0   0   0   0 |   b = TWOSEAM
   4   0 125   5   3   3 |   c = CHANGE
   0   0   2  66   0   0 |   d = SLIDER
   0   0   0   0  74   0 |   e = CURVE
   0   0   0   0   0  81 |   f = CUTTER

What I said today is that if people want to use a custom classifier (such as the one I just made) then we will need more training data covering a wider range of pitchers. In other words, someone will need to sit around and watch guys like Maddux, Chamberlain, Wakefield, Mussina, Moyer, and Daisuke... writing down the type of pitch they throw. Ideally we would have multiple people coding each pitch, so we would know how accurate our codes are.

To bring this back to the point I was making before I trained my own classifier, if MLBAM has hired anyone minimally competent, then this will be unnecessary. They should have plenty of resources to do a good job on this project. And not to detract from the clear genius that it must be obvious I possess, the tools I used to build my classifier are freely available and don't really require more than a decent working knowledge of artificial intelligence to apply. So if they don't suck completely at their jobs, they should be able to make irrelevant any effort we can pursue.

Edited by mr guido, 05 May 2008 - 01:18 PM.


#36 Noah

  • 3,142 posts

Posted 05 May 2008 - 01:25 PM

What I said today is that if people want to use a custom classifier (such as the one I just made) then we will need more training data covering a wider range of pitchers. In other words, someone will need to sit around and watch guys like Maddux, Chamberlain, Wakefield, Mussina, Moyer, and Daisuke... writing down the type of pitch they throw. Ideally we would have multiple people coding each pitch, so we would know how accurate our codes are.


Shouldn't it be possible to do the classification without all of this manual work? I mean, I think pretty much anyone can look at some of this pitch tracking data and pretty well identify different pitch clusters, even for a pitcher you've never seen. So I would think it wouldn't be all that difficult to program a computer to identify the local maxima in the plots.

#37 Jnai


  • is not worried about sex with goats


  • 8,169 posts

Posted 05 May 2008 - 01:28 PM

Guido-

Sorry, I had misread something. =)

I agree that with enough classification data, it would be a pretty simple problem to build a backprop net to classify most of these pitches. But I'm still pretty sure that once you started to incorporate enough pitchers you'd have enough variability in the data to run into the same problems that the Gameday classifier has. For example, Mike Mussina's fastballs will all look like changeups, Tim Wakefield's fastball will always be a changeup, Papelbon's changeup will look like a fastball, etc.

I think you will still need to store some basic information about each pitcher - and rather than have units for each pitcher (which will get tedious), maybe include units that represent a pitcher's top speed or bottom speed, or median break, etc, which can then factor in to how each pitch is classified?

I think the issue is that it is a trivially easy problem with some labeled data and a small number of pitchers, but becomes insanely difficult because of the lack of good training data and the huge variability between pitchers.

Dan

Edited by Jnai, 05 May 2008 - 01:30 PM.


#38 alannathan

  • 202 posts

Posted 05 May 2008 - 01:48 PM

Your use of "break" here, as well as many other analysts just seems to muddle the picture here. If you ask someone about the "break" of a pitch, they do not mean how much a pitch break, after accounting for gravity, and over-and-above the spin-less pitch. I'm not sure anyone has that kind of baseline perspective to compare to!

At the very least, a different term ("movement" I'd propose) should be used from "break".


The question of "break" vs. "movement" has been discussed many times, so I won't repeat the discussion here. Check one of the many PITCHf/x glossaries to learn about the distinction. Most of the analyses that I have seen are using pfx values (i.e., movement) as opposed to the Gameday-reported values of "break".

On a related matter (and I don't recall if I have posted this here before), the values of pfx_x and pfx_z reported in the PITCHf/x data files are not completely accurate. I have examined this in detail and have suggested an alternate method (see http://webusers.npl.uiuc.edu/~a-nathan/pob/LiftDrag-1.pdf). To my knowledge, only Mike Fast has taken up on my suggestion. I will try to persuade SV to adopt it at the upcoming summit.

#39 mr guido

  • 3,104 posts

Posted 05 May 2008 - 01:53 PM

Shouldn't it be possible to do the classification without all of this manual work? I mean, I think pretty much anyone can look at some of this pitch tracking data and pretty well identify different pitch clusters, even for a pitcher you've never seen. So I would think it wouldn't be all that difficult to program a computer to identify the local maxima in the plots.

Having actual training data is the only way to verify that your algorithm is really working. Sure we could make 100 different algorithms that all come up with their own idea of what they're seeing, but the whole point is to find one that matches up with human judgment.

I think the issue is that it is a trivially easy problem with some labeled data and a small number of pitchers, but becomes insanely difficult because of the lack of good training data and the huge variability between pitchers.

I think 'insanely difficult' would be an overstatement given that pitchers are pretty easily separable given the diversity of repertoires and this is a simple thing to train a machine to acknowledge, but I think we're getting closer to agreement on the relative value of starting a community project to improve on the existing classifiers. MLBAM is already achieving a .55 kappa with their first effort, and they are reportedly working on improving the algorithm. All they need to do is find a couple of people who want to work in baseball so badly that they'll take minimum wage to sit around & classify pitches by hand all day long. Building a SVM or neural network using this data is dead simple, and is likely to provide good results. Assuming you guys don't come back from the Pitch F/X Summit saying "Man the dude who works for MLBAM has his head up his ass", I would expect their classifier to work just fine going forward. Of course if you want to go back & re-classify the pitch f/x data from last year, that may be a reason to develop your own.

#40 Noah

  • 3,142 posts

Posted 05 May 2008 - 02:04 PM

Having actual training data is the only way to verify that your algorithm is really working. Sure we could make 100 different algorithms that all come up with their own idea of what they're seeing, but the whole point is to find one that matches up with human judgment.


I see the problem the same way that gator92 sees it, in that there are really two steps to this problem:

1) identifying the clusters of data that correspond to different pitches
2) labeling the individual clusters with pitch names

If you are trying to accomplish both of these steps at once in real time, it's difficult and you would need extensive training information on every pitcher, like you are saying. However, I would imagine that you could accomplish #1 with a single algorithm that is applicable to nearly every pitcher. And then once you've done that, you can decide if you even care to do #2 at all.

Of course, what I am proposing here would be completely post-processing. If you wanted real-time pitch classification, then forget about it.




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users