Using network theory to measure and predict MLB player and team experience and performance

ProfPaul

New Member
Jan 19, 2016
13
Hi all - I'm new to this forum but not to analyzing professional athlete performance. I'm a college professor who does research applying network theory to various domains. (Network theory is a branch of math that investigates situations that can be modeled as a set of points connected by links.) Most of my recent work has been with MLB data wherein I measure the links that are made between players when those players appear together on the starting roster of a regular season game. I'm writing because, although I know the math and application of it, I don't know nearly as much about the details of MLB and would like to get the insights of people like you who are much more knowledgeable about current measures and analytical tools related to player experience and performance.

The underlying premise of my research is that people who practice and work together on a task learn task-relevant tactics, techniques, and procedures that they don't have (or are worse at) but that other "co-task-performers" do have (or are better at). In the MLB example, a player with lesser ability at some part of the game will learn to improve that ability when they become teammates with (or possibly even play *against*) another player who is better than they are in that particular ability. (I generally believe this to be true for MLB based on player quotes I have found in the popular press suggesting that players do go to each other for performance improvement tips.) By tracking and measuring over time the connections between MLB player teammates, I try to calculate the correlation and possible causality between teammate connections and subsequent experience and performance.

With this short background about my work, what words of advice would you have to help me focus my research in the most useful direction? And is this even too esoteric of a topic to be of interest to anyone except me (and those I harangue at parties)?

Thanks very much for any help you can give!

Paul Beckman
 

JimBoSox9

will you be my friend?
SoSH Member
Nov 1, 2005
16,677
Mid-surburbia
Hi all - I'm new to this forum but not to analyzing professional athlete performance. I'm a college professor who does research applying network theory to various domains. (Network theory is a branch of math that investigates situations that can be modeled as a set of points connected by links.) Most of my recent work has been with MLB data wherein I measure the links that are made between players when those players appear together on the starting roster of a regular season game. I'm writing because, although I know the math and application of it, I don't know nearly as much about the details of MLB and would like to get the insights of people like you who are much more knowledgeable about current measures and analytical tools related to player experience and performance.

The underlying premise of my research is that people who practice and work together on a task learn task-relevant tactics, techniques, and procedures that they don't have (or are worse at) but that other "co-task-performers" do have (or are better at). In the MLB example, a player with lesser ability at some part of the game will learn to improve that ability when they become teammates with (or possibly even play *against*) another player who is better than they are in that particular ability. (I generally believe this to be true for MLB based on player quotes I have found in the popular press suggesting that players do go to each other for performance improvement tips.) By tracking and measuring over time the connections between MLB player teammates, I try to calculate the correlation and possible causality between teammate connections and subsequent experience and performance.

With this short background about my work, what words of advice would you have to help me focus my research in the most useful direction? And is this even too esoteric of a topic to be of interest to anyone except me (and those I harangue at parties)?

Thanks very much for any help you can give!

Paul Beckman
Interesting stuff, glad you found us. I don't know jack or shit about most of what you're talking about, but as a person who reads things my question is what's your key target at the moment? I think I'm clear enough in the premise to say that your biggest problem is going to be noise. Baseball-focused stats nerds have a lot of experience trying to tease out one causation variable out of a lot of factors that can play up on any given day, and it doesn't go well. What discrete near-term obstacle or problem can we solve for you?

It seems logical to some extent that the players who are superduper awesome at a certain skill would be more likely to impact their teammates? We can certainly produce an exhaustive list of 'skill outliers', say Rickey Henderson or Maury Wills for stolen bases, or Wade Boggs for walks, that could be targets most likely to produce a verifiable effect for you.

All this is directly out my ass, of course.
 

StupendousMan

Member
SoSH Member
Jul 20, 2005
1,927
Welcome, ProfPaul! The world (and this board) need all the university-professors-interested-in-sports it can find :)

Let me make a few recommendations.

- it will help if you pick traits which have relatively strong correlations per player from year-to-year. That is, pick a quantity which has nearly the same rate every year for a given player. For example, OBP, or walks. That way, you can more clearly separate the difference between "this player's OBP changed greatly because he had a high-OBP teammate" and "this player's OBP changed greatly due to random chance."

[Damn it, Jimbo!]

- in the same vein, some quantity with many trials per year would be good -- AB or putouts, perhaps, but not triples

- I think it might help to pick 2 or 3 possibilities for hitters, and 2 or 3 possibilities for pitchers, and follow both tracks. I have a suspicion that pitchers might influence each other more than hitters, but that's just a guess. Even if you can't find a really strong connection for any one trait, you might find that one set of players has more influence than the others.

- perhaps you might look at infield defensive statistics: what happens when a new first-baseman arrives, for example? The defensive infield strikes me as a place in which players might most clearly influence each others' actions on the field.

Please feel free to share your thoughts with us as you develop them. There may be times when you realize that there are 10 things you'd like to check, but it will take an hour to gather the data for each one. There are a bunch of people on this board with little to do and plenty of time -- we might be able to lend you a hand every now and then.

Again, nice to meet you!
 

dbn

Member
SoSH Member
Feb 10, 2007
7,785
La Mancha.
Not much to add other than to ask if you use SQL? There exist amazing databases of baseball records, and using SQL to access specific information is a powerful tool.

Welcome, good luck, and keep asking questions and giving us updates if/when you find interesting things.
 

nothumb

Member
SoSH Member
Jul 27, 2006
7,065
yammer's favorite poster
This is a really interesting question. I think one of the challenges will be in distinguishing between a true network effect as opposed to the impact of a specific player tandem or a general strategic tendency of a team. Since we often don't know what big picture strategic adjustments teams are imposing or what skills they are explicitly emphasizing, it may be near impossible to say what is a network effect vs. a more direct effect of coaching.
 

Jnai

is not worried about sex with goats
SoSH Member
Sep 15, 2007
16,151
<null>
I am going to throw out a weird one: pitch selection - fastball or offspeed pitch - (by the pitcher) to start ABs.

Why this in particular? My guess is that in order to see any effect that rises anywhere beyond noise with this kind of analysis, you need something that is highly mutable and under direct control by the player. I've previously observed (but not published) that there are systematic team-wide approaches adopted for this choice. And, although it is heavily biased (offspeed pitches to start ABs are relatively rare), I think this is one of the purest examples of a simple choice in baseball. Almost every other measureable thing in baseball, apart from pitch selection, depends very strongly on some other contextual factor.
 

ProfPaul

New Member
Jan 19, 2016
13
First, welcome and this seems like very fascinating work. Now, this is a steroid sub-forum. So, a dope should probably move this to the MLB forum.

A link to work from the Professor.
Thanks! I think this shows my lack of knowledge in this area as I couldn't even create my post in the correct forum! How do I do about moving it to a better starting point? (And exactly what *would* be the best forum?)
 

ProfPaul

New Member
Jan 19, 2016
13
Interesting stuff, glad you found us. I don't know jack or shit about most of what you're talking about, but as a person who reads things my question is what's your key target at the moment? I think I'm clear enough in the premise to say that your biggest problem is going to be noise. Baseball-focused stats nerds have a lot of experience trying to tease out one causation variable out of a lot of factors that can play up on any given day, and it doesn't go well. What discrete near-term obstacle or problem can we solve for you?

It seems logical to some extent that the players who are superduper awesome at a certain skill would be more likely to impact their teammates? We can certainly produce an exhaustive list of 'skill outliers', say Rickey Henderson or Maury Wills for stolen bases, or Wade Boggs for walks, that could be targets most likely to produce a verifiable effect for you.

All this is directly out my ass, of course.


Thanks for your input - I'm happy to hear anything about my research from someone who knows MLB (because I really don't know that much about it).

Since my approach is pretty unique (I've not seen anyone else applying network theory to pro athletes on a "macro" scale), I'm not really sure what useful results I'll find. I've suggested in other forums that using network theory to understand team sports can be on either a "micro" level, meaning for example, tracking which players touch the ball during one possession in an NBA game (and almost all research in this area is at this level), or on a "macro" level, meaning for example, tracking which MLB players have taken the field with which other players over their careers. My research has always focused on macro-level data and analysis. This is unlikely to tell a manager what do to on any particular play or day, but it could tell some decision-maker which player to seek in a trade (as that player has made enough connections to other players that their performance is about to improve because of what they've learned by playing with those other players).

My near-term goal is to collect data about individual MLB player performance as I have all teammate connection data for all MLB players since 1914 (from Retrosheets) and run them through my database tools. I can tell you all kinds of weird things that no one else even thinks about, like which player has played with the greatest number of different other players (Rickey Henderson played with his 387th different player on August 24, 2003) and even which player has played *against* the greatest number of different other players (Barry Bonds played against his 1909th and 1910th other player on September 10, 2007). So I have all the teammate connection data, but it would be great if I had more precise individual player performance data (down to game-day performance would be awesome) so I could do some type of time-series analysis and see quite precisely if performance increases (or decreases) lag in time from connection changes. My "Holy Grail" moment will be when I can show that players who reach a specific number of connections (thereby playing with enough other players) that they've "added to their skill toolbox" the skills needed to move to the next higher level of performance.

Sorry this post is so long, but I've been doing this research for years and can talk about it for hours . . .

Anyway, let me know what you think.

Paul
 

StupendousMan

Member
SoSH Member
Jul 20, 2005
1,927
How do I do about moving it to a better starting point? (And exactly what *would* be the best forum?)
Well, fortunately for all of us, there are a bunch of clever fellows who run this place, and they have moved the post to the "MLB discussion" forum. I have no idea how they do it, or who they are, but I'm told that the following incantation will please them: "Thanks, Dopes!"

Now, stop dodging the questions, and start responding to the insightful comments made by all the posters in this thread :)
 

StupendousMan

Member
SoSH Member
Jul 20, 2005
1,927
My "Holy Grail" moment will be when I can show that players who reach a specific number of connections (thereby playing with enough other players) that they've "added to their skill toolbox" the skills needed to move to the next higher level of performance.
If I understand you correctly, I fear that you face a tough, confusing factor. Suppose that it _is_ true that a player will grow more skilled as he plays with more and more teammates, and suppose further that when he has reached some critical threshold in this number of teammates, he moves to a higher level of performance. There will very probably be a correlation between "number of seasons (or games) played in the majors" and "number of teammates", right? That means that there will likely be a related, critical number of seasons (or games) played before the player in question moves to a higher level of performance.

Okay, but how are you going to separate that effect from the simple effect that most new rookies get better with time, simply because they acquire experience against major-league competition?

In order to prove your hypothesis, you'll need to run a multi-factor regression which examines player performance as a function of (seasons or games played) and (number of teammates encountered). Because there is (very likely) a strong, strong correlation between those two different factors, it will be difficult to separate them -- no?

Perhaps you might look at young players who are traded from one team (or league) to another. Such players will accrue new teammates much faster than the average young player who remains part of his original team. If your hypothesis is correct, these traded players may reach their breakthrough moments sooner than those who stay home.

Of course, then you have to control for the reasons that the player was traded in the first place ....

Good luck!
 

iayork

Member
SoSH Member
Apr 6, 2006
639
It's an interesting concept, though I worry that there are many confounding factors. For example, being highly linked must itself correlate with performance in some ways: The journeyman player who moves from team to team has a different skill level than the franchise player who's locked up on a 10-year contract. Similarly, older players are almost inevitably going to be more highly linked than younger players, and age correlates with performance (in an inverted-U shaped curve, which complicates things further).

Then there are players like Johnny Gomes, who are brought on to teams on short contracts, even though they are not intrinsically high performers, because it's felt that they bring up overall team quality. Not sure if that supports or confounds the concept.

Defense is something that anecdotally would work with the network model, but defensive metrics are too crude and noisy to rely on for this, I think.

Standard batting measures are also risky, because they depend on relatively small sample sizes as well as on luck. However, if you can get a dataset looking at batted ball location, that's something that has a larger sample size, is less luck dependent, and anecdotally can be influenced by teammates.

I like Jnai's suggestion about looking at pitch selection. There you've got enough numbers in a season to get away from noise, and there's good anecdotal evidence about pitchers sharing their repertoire and grips, etc, that there is a logical foundation for it. Note that while PTICHf/x records pitch type, its algorithms are far from accurate, so while there's a database it's not a highly reliable one. (The gold standard for pitch selection is Brooks Baseball, which Jnai may be familiar with. However, it's more time-consuming to get large datasets that way.)
 

ProfPaul

New Member
Jan 19, 2016
13
Welcome, ProfPaul! The world (and this board) need all the university-professors-interested-in-sports it can find :)

Let me make a few recommendations.

- it will help if you pick traits which have relatively strong correlations per player from year-to-year. That is, pick a quantity which has nearly the same rate every year for a given player. For example, OBP, or walks. That way, you can more clearly separate the difference between "this player's OBP changed greatly because he had a high-OBP teammate" and "this player's OBP changed greatly due to random chance."

[Damn it, Jimbo!]

- in the same vein, some quantity with many trials per year would be good -- AB or putouts, perhaps, but not triples

- I think it might help to pick 2 or 3 possibilities for hitters, and 2 or 3 possibilities for pitchers, and follow both tracks. I have a suspicion that pitchers might influence each other more than hitters, but that's just a guess. Even if you can't find a really strong connection for any one trait, you might find that one set of players has more influence than the others.

- perhaps you might look at infield defensive statistics: what happens when a new first-baseman arrives, for example? The defensive infield strikes me as a place in which players might most clearly influence each others' actions on the field.

Please feel free to share your thoughts with us as you develop them. There may be times when you realize that there are 10 things you'd like to check, but it will take an hour to gather the data for each one. There are a bunch of people on this board with little to do and plenty of time -- we might be able to lend you a hand every now and then.

Again, nice to meet you!

Thanks for your insights - this is exactly the type of information I can use but don't have. To give you an example, my first foray into associating teammate connections and player performance was a poster at a SABR meeting where I was trying to show a relationship between the number of teammates a player had and their RBI production. I think it was the second person who looked at my poster and said something like "Why did you choose RBI production as a measure of performance - that's a terrible measure!" So I learned I can do the math but I need a lot of help on applying it to MLB.

Lately I've turned to WAR as that seems to be a more highly-accepted measure of individual player performance. With my teammate connectivity data, I can find which players are associated with the greatest increase in WAR value for their teammates in the year following playing with that player. Right now I don't have data on individual player performance at a game-day level, so I really can't associate player connections to player performance very accurately. The upside is that I DO have game-day player connections so I can find correlations between those values and pretty much any player (or team - I forgot to mention that I have calculated team-level connectivity) performance value that I can get my hands on.

Anyway, THANKS for your insights - I'm learning a ton already!

Paul
 

iayork

Member
SoSH Member
Apr 6, 2006
639
Lately I've turned to WAR as that seems to be a more highly-accepted measure of individual player performance.
WAR probably isn't the right metric. It's too luck-dependent. That is, even assuming WAR accurately measures what a player contributed in a particular year, it does not tell you how much of that contribution was actually due to the player, and how much was due to pure chance. On top of that, WAR is calculated using offensive and defensive components, and almost everyone would agree that defensive metrics are very, very crude, meaning that WAR doesn't actually even accurately measure what a player did contribute in a particular year, let alone how much of that was luck.
 

ProfPaul

New Member
Jan 19, 2016
13
Not much to add other than to ask if you use SQL? There exist amazing databases of baseball records, and using SQL to access specific information is a powerful tool.

Welcome, good luck, and keep asking questions and giving us updates if/when you find interesting things.

Thanks for your question: I currently have all of my network connectivity data (downloaded from Retrosheets) in a mysql database running on laptop. However, for some of my database query sets, I run on a similar mysql database on a gaming-type machine as some of the intermediary datasets run into the tens of millions of rows. This is so if you think of every existing set of records for (Player1, Player2, GameDate) of every regular-season MLB game since 1914. (Data before that is incomplete enough that I don't bother using it.) These large dataset manipulations sometimes took several days to run on my laptop so I ported them over to the gaming machine.

Paul
 

EricFeczko

Member
SoSH Member
Apr 26, 2014
4,852
Huh. Nice to meet you. I've done (still doing) a bit of work with network theory, studying functional brain organization and macaque social groups. I've a few recommendations for starters.
-I'd recommend avoiding defensive measures for now. While there may be a qualitative relationship between talent hubs (i.e. talented players who enhance those around them) and defense, current defensive measures are severely unreliable. Since you want to validate your hypothesis, I'd stick with reliable measures.
-For pitching measures (e.g. FIP, or ERA-), you may want to consider including catchers within those networks and excluding other position players. Catchers often work with pitchers in calling games (along with the manager), and there may be a benefit to having a strong defensive catcher working with pitchers.
-have you considered weighting your edges based on the assumed talent of the players? If the hypothesis is that more talented players improve less-talented players' ability, then a weighted network may better capture such learning than an unweighted one. One way to weight them would be to take the difference in mean performance prior to the formation of the connection.
-if you haven't previously, I recommend taking a look at work done in wild chimps on social learning - using a flow network to model the propogation of learning (as measured by increased performance here). I think Hobaiter is the first author on one of the more recent ones.

I'd be happy to send a PM if you want to talk about this or if you need/want help/collaboration.
 

ProfPaul

New Member
Jan 19, 2016
13
If I understand you correctly, I fear that you face a tough, confusing factor. Suppose that it _is_ true that a player will grow more skilled as he plays with more and more teammates, and suppose further that when he has reached some critical threshold in this number of teammates, he moves to a higher level of performance. There will very probably be a correlation between "number of seasons (or games) played in the majors" and "number of teammates", right? That means that there will likely be a related, critical number of seasons (or games) played before the player in question moves to a higher level of performance.

Okay, but how are you going to separate that effect from the simple effect that most new rookies get better with time, simply because they acquire experience against major-league competition?

In order to prove your hypothesis, you'll need to run a multi-factor regression which examines player performance as a function of (seasons or games played) and (number of teammates encountered). Because there is (very likely) a strong, strong correlation between those two different factors, it will be difficult to separate them -- no?

Perhaps you might look at young players who are traded from one team (or league) to another. Such players will accrue new teammates much faster than the average young player who remains part of his original team. If your hypothesis is correct, these traded players may reach their breakthrough moments sooner than those who stay home.

Of course, then you have to control for the reasons that the player was traded in the first place ....

Good luck!

You are correct - there is likely to be a related "performance improvement" effect purely based on age. I understand the standardtheory is that players (in general, not the immediate superstars who are rare and not the norm) start their careers at a relatively low level, then gain "experience" that improves their performance, and if they last long enough, eventually decline in performance as their older bodies can no longer physically produce what they were capable of at an earlier age.

There are statistical methods that would allow me to separate out the effect of age and see only the effect of connections. My biggest problem in this area is that I work for a teaching school (which I much prefer over a research school, however) and so do not have graduate students who could take over this branch of my research.

And your point of "most new rookies get better with time, simply because they acquire experience against major-league competition" is exactly what I'm getting at. How does one more accurately and precisely measure "acquire experience against major-league competition?" I'm trying to provide a completely new measure of "experience" that is much better than the old standard of "X number of games" or "Y number of years". Time-based measures have two serious flaws: 1) they are symmetric, and 2) they are linear. "Symmetric" means that they imply the same added value to every player, meaning that every player who survives another year in MLB is credited with the same increase in experience value when we know that's not true. Some players gain much more experience in their year while others don't gain much. How do we measure that difference? I propose the value of "number of new players played with." "Linear" means that every year provides the same added value to a player as every other year when we know that's not true either. Players gain different amounts of experience in different years depending on some other factor. Again, I propose a connection-based measure as it much more accurate and precise. My goal is now to show that it is *valuable*.

In any case, I have a colleague who specializes in time-based statistical analysis, but I haven't been able to get the time away from my current analyses to get her inputs on the issues you've pointed out.
 

ProfPaul

New Member
Jan 19, 2016
13
It's an interesting concept, though I worry that there are many confounding factors. For example, being highly linked must itself correlate with performance in some ways: The journeyman player who moves from team to team has a different skill level than the franchise player who's locked up on a 10-year contract. Similarly, older players are almost inevitably going to be more highly linked than younger players, and age correlates with performance (in an inverted-U shaped curve, which complicates things further).

Then there are players like Johnny Gomes, who are brought on to teams on short contracts, even though they are not intrinsically high performers, because it's felt that they bring up overall team quality. Not sure if that supports or confounds the concept.

Defense is something that anecdotally would work with the network model, but defensive metrics are too crude and noisy to rely on for this, I think.

Standard batting measures are also risky, because they depend on relatively small sample sizes as well as on luck. However, if you can get a dataset looking at batted ball location, that's something that has a larger sample size, is less luck dependent, and anecdotally can be influenced by teammates.

I like Jnai's suggestion about looking at pitch selection. There you've got enough numbers in a season to get away from noise, and there's good anecdotal evidence about pitchers sharing their repertoire and grips, etc, that there is a logical foundation for it. Note that while PTICHf/x records pitch type, its algorithms are far from accurate, so while there's a database it's not a highly reliable one. (The gold standard for pitch selection is Brooks Baseball, which Jnai may be familiar with. However, it's more time-consuming to get large datasets that way.)


You are right - there is definitely a confounding factor of age that impacts my analysis. One of my current goals is to tease out the connection-based effects from the age-based effects, but I've not been able to set aside the time to crunch this one. It's also not in my area of research so I'm counting on a colleague who knows this area much better than do I, but she's also busy with other university-related tasks.

One of the hardest parts of this whole research stream is getting access to quality useful data. Once I found Retrosheets, I knew I could create and manipulate the teammate connection results I needed. I've yet to find an open-source dataset containing the performance data I need that is also at the granularity I want. My connection data is down to the individual game although I don't typically look at that level of detail.

This forum has already been incredibly useful in pointing out to me what types of performance data is relevant. I guess one of my next steps has to be to find electronic datasets that are available in a form that I can eventually get them into my connectivity database. I've found that working with datasets from different sources can be quite a pain, as none of the original architects probably thought about combining their dataset with someone else's.
 

ProfPaul

New Member
Jan 19, 2016
13
WAR probably isn't the right metric. It's too luck-dependent. That is, even assuming WAR accurately measures what a player contributed in a particular year, it does not tell you how much of that contribution was actually due to the player, and how much was due to pure chance. On top of that, WAR is calculated using offensive and defensive components, and almost everyone would agree that defensive metrics are very, very crude, meaning that WAR doesn't actually even accurately measure what a player did contribute in a particular year, let alone how much of that was luck.

This is exactly what I need to know before I make my next baseball conference presentation! Coincidentally, I'm going to present some of my more recent results at a business conference in a couple of weeks where the audience will be Human Resources type people. At that point, I'm going to speak about performance in fairly general terms as Annual Performance Reviews can refer to widely varying types of "performance" in a business setting.

As long as I can get MLB player performance data, I'm getting pretty good at massaging it into the form I need for analysis. One of my objectives now is to NOT go down the road of doing good research using a variable that no one cares about or is inappropriate.
 

ProfPaul

New Member
Jan 19, 2016
13
This is a really interesting question. I think one of the challenges will be in distinguishing between a true network effect as opposed to the impact of a specific player tandem or a general strategic tendency of a team. Since we often don't know what big picture strategic adjustments teams are imposing or what skills they are explicitly emphasizing, it may be near impossible to say what is a network effect vs. a more direct effect of coaching.

You bring up a very important point that I have not included in any of my analysis so far: the impact of coaching on performance. Certainly players can and do get better because of the impact they have on each other, but it is generally not the players job to improve the performance of other players; that's the coaches job. So, a great next analysis direction would be to measure the impact of a coach on the players they coached via the connections they made to each other. In fact, this past semester I had an undergraduate student work on a project measuring the connections of NCAA women's basketball coaches, but this was my first foray into this analysis direction.

And you're right that I don't have access to (and never will) the reasons why particular teams are making the decisions that lead to the teammate connections that I measure. My hope is that whatever that effect is, if players learn from each and I can measure the impact of that learning via player connections, I won't have to worry about *why* two teammates were brought together, only that they perhaps learned from each other and I could measure the impact of the connection on the subsequent performance change.

Anyway, I'm quite happy with the discussion here so far, particularly as you started your comment with "This is a really interesting question." That's perhaps the first time anyone has said that about this research stream, and is at least a little confirmation that I'm not the only person who thinks it's interesting! (Or perhaps you were just being polite, but in any case, I'll take it!)
 

ProfPaul

New Member
Jan 19, 2016
13
I am going to throw out a weird one: pitch selection - fastball or offspeed pitch - (by the pitcher) to start ABs.

Why this in particular? My guess is that in order to see any effect that rises anywhere beyond noise with this kind of analysis, you need something that is highly mutable and under direct control by the player. I've previously observed (but not published) that there are systematic team-wide approaches adopted for this choice. And, although it is heavily biased (offspeed pitches to start ABs are relatively rare), I think this is one of the purest examples of a simple choice in baseball. Almost every other measureable thing in baseball, apart from pitch selection, depends very strongly on some other contextual factor.

Excellent! This is exactly the type of comment I was hoping to see, and I think you're getting at a point that I'm trying to find: what can a player do that is directly under their control and that improves their own performance. From the perspective of my research: what exactly are the TTPs (tactics, techniques, and procedures) that one player can teach to/learn from another player? It has to be something "teachable" and "learnable" and something of true value to the game otherwise I either can't measure it (e.g., higher level of visual acuity in one players eyeballs that allows them to better "see" a pitch) or it doesn't matter (e.g., how to best load chewing gum into your cheek).

In general, I'm going to be taken more seriously if I can find a relationship between player connections and the type of action you mention that a player has direct control over. So far my biggest stumbling block has been getting access to data, initially of ANY kind, but most recently, performance-related data.
 

ProfPaul

New Member
Jan 19, 2016
13
Huh. Nice to meet you. I've done (still doing) a bit of work with network theory, studying functional brain organization and macaque social groups. I've a few recommendations for starters.
-I'd recommend avoiding defensive measures for now. While there may be a qualitative relationship between talent hubs (i.e. talented players who enhance those around them) and defense, current defensive measures are severely unreliable. Since you want to validate your hypothesis, I'd stick with reliable measures.
-For pitching measures (e.g. FIP, or ERA-), you may want to consider including catchers within those networks and excluding other position players. Catchers often work with pitchers in calling games (along with the manager), and there may be a benefit to having a strong defensive catcher working with pitchers.
-have you considered weighting your edges based on the assumed talent of the players? If the hypothesis is that more talented players improve less-talented players' ability, then a weighted network may better capture such learning than an unweighted one. One way to weight them would be to take the difference in mean performance prior to the formation of the connection.
-if you haven't previously, I recommend taking a look at work done in wild chimps on social learning - using a flow network to model the propogation of learning (as measured by increased performance here). I think Hobaiter is the first author on one of the more recent ones.

I'd be happy to send a PM if you want to talk about this or if you need/want help/collaboration.

Hi Eric - thanks for your suggestions. I'm getting the sense that defensive performance is not a direction I should pursue, which is confirmed by one of my earlier analyses. I found that there was a correlation (between +0.25 and +0.48) between player connections and offensive performance measures but almost no correlation (+0.04) between player connections and defensive performance. My (ignorant) hypothesis was that offensive TTPs could be learned from almost any player but defensive TTPs could really only be learned from a player in the same position. Now it turns out that the lack of correlation may have been simply in the lack of meaning of defensive performance measures.

To your second point: I do have player position data for all of my (player1, player2, gamedate) records although I have not yet used it. So I could include the special relationship between pitchers and catchers and could in fact measure the impact of connections between specific pairs of pitchers/catchers and the subsequent performance changes in either or both.

So far my network is undirected and unweighted. It's been a big enough task just to manipulate the data I have in such a simple network. A much more robust approach would be to attach a value to the various abilities of each player to see the impact on teammates in changes associated with those abilities. About the only small step I've taken in this direction was to calculate the players who were associated with the greatest WAR value increase in their teammates one year after playing with those players. I used a 5-year minimum time range to help remove the possibility of spurious results where player X's teammates randomly had better subsequent year WAR value increases. I then looked up the top 10 players on that list and found that a couple of them did something either during their career or after it that indicated they had a propensity for "helping others." One was highly involved in a charity that helped children improve themselves and another started a business teaching high-school baseball players better playing techniques. Both of these activities suggested to me that these two players would have, during their MLB careers, had a higher likelihood of intentionally helping their teammates and thus end up on my list of players whose teammates WAR values increased the most after playing with them.

To your own work: I recall some years ago when I was first publishing my own SNA research, citing some projects related to dolphins and gorillas. It wasn't to provide any direct support of my own research, but to show that graph theory has been used in a variety of environments and domains.

I also see that you are a neuroscientist: one of my current tasks is to use a feedforward neural network to predict the career length of a MLB player based on the number of connections they make in their first 5 years. I'm working at this moment on massaging my data into the correct form for input in the NN. The gist of it is this: the input layer has 5 neurons, one each for the number of connections to unique other players a player has made in each of their first 5 years as a player. The output layer will have two neurons, one each to indicate "short career" (< 10 years) and "long career" (>= 10 years). My hope is to be able to train the NN to predict the length of a player's career using a subset of my player connectivity data as a training set. Does this make sense to you (from both a baseball and NN perspective)? My premise is obviously that the teammate connection pattern a player makes in their first 5 years can be an indicator of their career length, because, all other things being equal (or equaled out in my large training data set) a fortuitous connection pattern will lead to learning abilities that will in turn lead to a long career. Conversely, a disadvantageous connection pattern will lead to a lack of learning of abilities that would lead to a shorter career. What do you think?
 

iayork

Member
SoSH Member
Apr 6, 2006
639
I then looked up the top 10 players on that list and found that a couple of them did something either during their career or after it that indicated they had a propensity for "helping others." One was highly involved in a charity that helped children improve themselves and another started a business teaching high-school baseball players better playing techniques. Both of these activities suggested to me that these two players would have, during their MLB careers, had a higher likelihood of intentionally helping their teammates and thus end up on my list of players whose teammates WAR values increased the most after playing with them.
Did you include a negative control by searching for charities from a random group of players?

Virtually every highly-paid baseball player is associated with charities, and it says little about his personality. Some players are deeply serious about their charities, others just get their photo taken once a year, and it's very difficult to distinguish (from our side) which is which. The higher-paid a player is, the more high profile their activities are, because media report on it, but there are also fairly anonymous players who are deeply involved in charitable activities with relatively little media recognition. So I don't think that's a useful metric.
 

JimBoSox9

will you be my friend?
SoSH Member
Nov 1, 2005
16,677
Mid-surburbia
Excellent! This is exactly the type of comment I was hoping to see, and I think you're getting at a point that I'm trying to find: what can a player do that is directly under their control and that improves their own performance. From the perspective of my research: what exactly are the TTPs (tactics, techniques, and procedures) that one player can teach to/learn from another player? It has to be something "teachable" and "learnable" and something of true value to the game otherwise I either can't measure it (e.g., higher level of visual acuity in one players eyeballs that allows them to better "see" a pitch) or it doesn't matter (e.g., how to best load chewing gum into your cheek).

In general, I'm going to be taken more seriously if I can find a relationship between player connections and the type of action you mention that a player has direct control over. So far my biggest stumbling block has been getting access to data, initially of ANY kind, but most recently, performance-related data.
Jnai's option always works just as well in reverse, for hitters. The #1 thing most young hitters gain in MLB to improve performance is plate discipline, i.e. choosing which pitches to swing at. There are a couple ways to slice that discipline at the macro level (O-Swing % is one), but you could also hyper-focus in on a narrow slice, like watching how new MLB players change their behavior and outcomes on the first pitch of each plate appearance, based on their network - that's a spot you'll see a lot of verifiable trends, and exactly the spot you'd logically expect this kind of causation to occur from vets talking approach with the rookies.
 

Lose Remerswaal

Experiencing Furry Panic
Lifetime Member
SoSH Member
Did you include a negative control by searching for charities from a random group of players?

Virtually every highly-paid baseball player is associated with charities, and it says little about his personality. Some players are deeply serious about their charities, others just get their photo taken once a year, and it's very difficult to distinguish (from our side) which is which. The higher-paid a player is, the more high profile their activities are, because media report on it, but there are also fairly anonymous players who are deeply involved in charitable activities with relatively little media recognition. So I don't think that's a useful metric.
ian is right on here. I could tell you stories about famous athletes and their charities and what a sham some of them are. Some. Many are very legit but (not?) surprisingly, the good ones are the ones you hear the least about.
 

ProfPaul

New Member
Jan 19, 2016
13
Did you include a negative control by searching for charities from a random group of players?

Virtually every highly-paid baseball player is associated with charities, and it says little about his personality. Some players are deeply serious about their charities, others just get their photo taken once a year, and it's very difficult to distinguish (from our side) which is which. The higher-paid a player is, the more high profile their activities are, because media report on it, but there are also fairly anonymous players who are deeply involved in charitable activities with relatively little media recognition. So I don't think that's a useful metric.

Sorry for my tardy response; I got caught up in university-related tasks.

To answer your question: I did not have a control group for this particular datapoint. I included the "outside of baseball" references to placate a couple of reviewers of an academic article I had submitted to a journal. They wanted to see if there was any type of substantiation at all for my premise so I found these couple of instances.

But you bring up a valid point that I would have to address if I were to publish anything related to MLB player connections and "willingness to help others." This type of data is very difficult to gather and vet, as it is on a one-by-one basis. My connectivity data is pretty much the opposite of that, as I was able to add to my database all teammate associations for the 2015 season in a matter of a couple of minutes. As I've found out in most of my research projects, data collection is often the most difficult task as much of it is not yet in electronic format and the amount that is is often protected by its owners.
 

ProfPaul

New Member
Jan 19, 2016
13
Jnai's option always works just as well in reverse, for hitters. The #1 thing most young hitters gain in MLB to improve performance is plate discipline, i.e. choosing which pitches to swing at. There are a couple ways to slice that discipline at the macro level (O-Swing % is one), but you could also hyper-focus in on a narrow slice, like watching how new MLB players change their behavior and outcomes on the first pitch of each plate appearance, based on their network - that's a spot you'll see a lot of verifiable trends, and exactly the spot you'd logically expect this kind of causation to occur from vets talking approach with the rookies.

Thanks for this suggestion - it's the type of association that makes great baseball sense but that would never occur to me as a mathematician. My main goal is to find TTPs that would rise out of the noise that will be inherent in any association I attempt to find. The association also has to be for an action that has some relevance to increased performance at the individual or team level, and what you have indicated certainly fits the criteria I'm looking to satisfy.
 

pokey_reese

Member
SoSH Member
Jun 25, 2008
16,325
Boston, MA
If you want to look at overall offensive performance/position players, I would suggest something like wRC+ or wOBA (or even just something like Fangraphs' 'Offense' metric that takes base running into account), rather than WAR, in order to prevent the defensive component from being an issue. The stationarity issue at play in a time series like this is certainly a concern, but there have been a few detailed works done on creating average career arcs for different ages/positions, which could give you at least a baseline adjustment to work with that is non-linear, though it will have some of the symmetry issues you mentioned because it is aggregated from a group of players (though some have been weeded out if I recall, to avoid survivor bias).
 

mwonow

Member
SoSH Member
Sep 4, 2005
7,176
This is an interesting thread! Thanks, Prof...

FWIW, I think Eric's point about catchers didn't get enough love. They work closely with a large number of other players in a way that others on the diamond do not.

With respect to helping others, you have a large longitudinal data set - presumably, you could look back at the careers of players who became coaches or managers after their careers were over (and/or who were coaches or managers during their careers - there's a really interesting series of posts on sonsofsamhorn.com about player/managers). This isn't going to be a perfect representation of players who were interested in improving colleagues' performance, but it's the best proxy I can think of.
 

Jnai

is not worried about sex with goats
SoSH Member
Sep 15, 2007
16,151
<null>
For data, please send me an email at dan at brooksbaseball dot net.