MLB's long road to incorporating Negro League statistics

DJnVa

Dorito Dawg
SoSH Member
Dec 16, 2010
53,850
The Negro Leagues are major leagues — but merging their stats has been anything but seamless - The Athletic

Gibson read MLB’s press release that morning in stunned silence. The desegregation of baseball’s record books would mean his great-grandfather, Josh Gibson, a legendary Negro Leaguer, a Baseball Hall of Famer, a prolific power hitter and a Black man who died at 35 three months before Jackie Robinson broke baseball’s color barrier, would soon be rightfully listed alongside all-time MLB greats in a handful of single-season and career hitting categories.

But how soon the statistics would be incorporated was not made clear. Sean Gibson figured he’d hear an update on MLB’s progress in 2021. Or in 2022.

“Now here we are in 2023,” he told The Athletic earlier this spring. “I think we’ve been patient. We’re just hoping things happen soon.”
The league office was unable to reach an agreement with Seamheads Negro Leagues Database, the most complete set of Negro League statistics ever compiled, to use its data. The league ended its protracted negotiation with Seamheads this spring and now intends to use Retrosheet’s nascent database — a work in progress that Retrosheet president Tom Thress said likely won’t be finished for at least five years — as the basis for its records.
A lot more details in the article.

Seamheads Negro League Stats

Retrosheet's Negro League Stats
 

Phil Plantier

Member
SoSH Member
Mar 7, 2002
3,419
Thanks - this is a fascinating look at the first hurdles of implementation.

Can someone decipher this sentence for me?

According to sources familiar with the negotiations, the sticking point for Seamheads was not compensation but rather concerns about control of the data, how it would be used and who would have a say in its implementation.
 

JGray38

Member
SoSH Member
Oct 31, 2003
3,044
Rockport, MA
Good question. I can think of a couple of sticking points; Negro League data is still incomplete, a work in progress and will certainly change over time. MLB is not uncovering new games and data, Gary Ashwill is doing that research. How would MLB deal with the numbers changing every few months? If they accept the Seamheads data now as-is, but want months or years to vet stats from more games being added to the database, you end up with MLB and Seamheads data being mismatched. So who is right? Who "owns" the truth here?

Also, records- if Negro League stats are ML stats, what if they find Oscar Charleston had a 57 game hit streak? What does MLB do with that?
 

Max Power

thai good. you like shirt?
SoSH Member
Jul 20, 2005
7,878
Boston, MA
The tough part is how you handle single season rate stats because of the smaller sample sizes. Josh Gibson hit .466 over 302 plate appearances in 1943, but I don't know if I'd call that the highest single season batting average ever. There may be an MLB player who hit that high over the same number of plate appearances (or maybe not, that's really, really high).
 

Mantush

Member
SoSH Member
Jul 30, 2014
408
The tough part is how you handle single season rate stats because of the smaller sample sizes. Josh Gibson hit .466 over 302 plate appearances in 1943, but I don't know if I'd call that the highest single season batting average ever. There may be an MLB player who hit that high over the same number of plate appearances (or maybe not, that's really, really high).
This is actually a really interesting question. Do you give the 1943 batting title to Gibson for hitting .466 over 302 PAs? You're taking it away from Musial then who hit .357 over 701. Do you adjust Gibson's PAs to 477 (qualifying PAs in 1954) and apply the adjustment rules? If you do he hits .243.
 

santadevil

wears depends
Silver Supporter
SoSH Member
Aug 1, 2006
6,472
Saskatchestan
This is actually a really interesting question. Do you give the 1943 batting title to Gibson for hitting .466 over 302 PAs? You're taking it away from Musial then who hit .357 over 701. Do you adjust Gibson's PAs to 477 (qualifying PAs in 1954) and apply the adjustment rules? If you do he hits .243.
My initial thought would be that MLB wouldn't retroactively change award recipients
But then again, maybe they would

Interesting question to ponder, but it appears we are years away from even having that conversation

Also, I need to familiarize myself with the adjustment rules
Do you have a link that explains them? Seems relevant here
 

Garfinvold

New Member
Dec 8, 2022
18
This is actually a really interesting question. Do you give the 1943 batting title to Gibson for hitting .466 over 302 PAs? You're taking it away from Musial then who hit .357 over 701. Do you adjust Gibson's PAs to 477 (qualifying PAs in 1954) and apply the adjustment rules? If you do he hits .243.
I think they would just have Gibson be the leader of the league he played in. A Third Major League. Like when the Federal League existed.
 

Mantush

Member
SoSH Member
Jul 30, 2014
408
My initial thought would be that MLB wouldn't retroactively change award recipients
But then again, maybe they would

Interesting question to ponder, but it appears we are years away from even having that conversation

Also, I need to familiarize myself with the adjustment rules
Do you have a link that explains them? Seems relevant here
A player needs a minimum of 3.1 PAs per game to qualify for the batting title. This is where the 477 PAs comes from. A player can qualify for the batting title if their batting average would still be the highest in the league after taking the difference between their PAs and 502 and assuming they did not reach base at all in any of those PAs. He had 116 hits in 302 PAs and you now assume he didn't reach base at all in his next (377 - 372) 175 PAs. 116 / 477 = .243. Here's the Wikipedia article on it: https://en.wikipedia.org/wiki/Plate_appearance

I think they would just have Gibson be the leader of the league he played in. A Third Major League. Like when the Federal League existed.
Oh yeah! They did do that! That's a great solution for each league. But who leads the majors overall for the year? When they folded the Federal League in it played the same number of games as the AL and NL. If you take a year like 1914 where Kauff had the highest BA between the Federal, American, and National Leagues and they all played the same number of games, it's apples to apples. The Negro League played significantly less games. That's not their fault and it sucks. It sucks it even had to be a separate league. But if you're trying to determine who led the majors in BA in 1943, is it Musial (.357 in 701 PAs), Gibson (.466 in 302 PAs) or Tetelo Vargas (.471 in 136 PAs)? Baseball-Reference says it's Vargas but it's disputable in my opinion and it's fair to ask whether single season leaderboards should be changed.

For the record, I have no issue with Gibson's .466 being one of the all-time single season leaders. He did that in his league's season and that record shouldn't be punished because it's pre-integration, but I don't think the season leaders should change like baseball-reference did to put Vargas as the leader in 1943.
 

santadevil

wears depends
Silver Supporter
SoSH Member
Aug 1, 2006
6,472
Saskatchestan
A player needs a minimum of 3.1 PAs per game to qualify for the batting title. This is where the 477 PAs comes from. A player can qualify for the batting title if their batting average would still be the highest in the league after taking the difference between their PAs and 502 and assuming they did not reach base at all in any of those PAs. He had 116 hits in 302 PAs and you now assume he didn't reach base at all in his next (377 - 372) 175 PAs. 116 / 477 = .243. Here's the Wikipedia article on it: https://en.wikipedia.org/wiki/Plate_appearance



Oh yeah! They did do that! That's a great solution for each league. But who leads the majors overall for the year? When they folded the Federal League in it played the same number of games as the AL and NL. If you take a year like 1914 where Kauff had the highest BA between the Federal, American, and National Leagues and they all played the same number of games, it's apples to apples. The Negro League played significantly less games. That's not their fault and it sucks. It sucks it even had to be a separate league. But if you're trying to determine who led the majors in BA in 1943, is it Musial (.357 in 701 PAs), Gibson (.466 in 302 PAs) or Tetelo Vargas (.471 in 136 PAs)? Baseball-Reference says it's Vargas but it's disputable in my opinion and it's fair to ask whether single season leaderboards should be changed.

For the record, I have no issue with Gibson's .466 being one of the all-time single season leaders. He did that in his league's season and that record shouldn't be punished because it's pre-integration, but I don't think the season leaders should change like baseball-reference did to put Vargas as the leader in 1943.

Thanks