Introducing...[Name TBD] - NBA Box Score Projection Project

bowiac

Caveat: I know nothing about what I speak
Lifetime Member
SoSH Member
Dec 18, 2003
12,945
New York, NY
I have added season-long projections here. They're still a bit buggier than the daily projections (and are a really just the sum of each remaining games's projected stats), but I wanted to get them out there for those of you in season-long fantasy leagues. I have a weird bug causing a bunch of shooting percentages to tail off leaguewide for every team in March and April, so some of the percentages are low, but these issues affect every player it seems, so at least it balances out.
 

slamminsammya

Member
SoSH Member
Jul 31, 2006
9,152
San Francisco
Good point about the weights, but I think the general idea could be made to work. Rather than weighing by the actual minutes you could weigh by some rough prior estimate of minutes. Just spitballin.
 

bowiac

Caveat: I know nothing about what I speak
Lifetime Member
SoSH Member
Dec 18, 2003
12,945
New York, NY
The same issue applies, except in reverse. If my prior was 2 minutes for Carsen, and he plays 30, that's a huge mistake by the model. If I weight by the prior however, then model will mostly ignore that sort of error, and not learn from it, but it's a big deal in any real world sense.

Sample weights make sense for rate stats (e.g., FG%) to make sure a 2/4 performance gets a different weight than a 15/30 performance, but I don't know that they make sense in a minutes context. A minute is a minute, and I want to minimize the error on minutes.
 

DJnVa

Dorito Dawg
SoSH Member
Dec 16, 2010
53,837
Minor addition: I have added a tab to the spreadsheet which outputs the top 50 Draftkings Lineups using these projections.
Does this update the same time your daily projections update? I like comparing it to what I would typically pick, and I see today's projections, but the DK lineups don't seem to match.
 

bowiac

Caveat: I know nothing about what I speak
Lifetime Member
SoSH Member
Dec 18, 2003
12,945
New York, NY
Does this update the same time your daily projections update? I like comparing it to what I would typically pick, and I see today's projections, but the DK lineups don't seem to match.
Eventually they'll update the same time; I'm working through some issues with moving this process my local computer to a cloud service right now which causes the delay. The daily projections will also likewise update occasionally throughout the day.

DFS projections are updated now.
 
Last edited:

DJnVa

Dorito Dawg
SoSH Member
Dec 16, 2010
53,837
Gotcha.

One other thing--DK today, at least from what I'm looking at, doesn't have Omari Spellman available, but he's listed in some lineups. Does DK sometimes not list every player? You got a price for him to do your projections, but I'm not seeing him available.
 

bowiac

Caveat: I know nothing about what I speak
Lifetime Member
SoSH Member
Dec 18, 2003
12,945
New York, NY
I think you must be looking at a partial slate on DK. I'm seeing Spellman.
 

gingerbreadmann

Member
SoSH Member
Mar 11, 2008
750
Meant to respond to this earlier. I think that's a very good encapsulation of what you've done and gave me some more insight into things you have already gone into greater detail on in this thread.

A couple of questions that popped into my head while reading:
-Could you add maybe a paragraph explaining the machine learning aspect? I have some limited familiarity with machine learning but have never delved into it myself. If you ran DARKO two separate times on the exact same inputs (i.e. projections for tonight's games), is it possible the decision tree would produce a slightly different guess for what the outputs will be?
-I assume the model is always being trained on new data, so how does the model balance reactions to constant evolution of the overall league environment with fitting the projections to what it knows about the past?
-Finally, I'd love a little more detail on DPM. It seems this was added sometime after the thread was started and before you launched the Shiny app, and you mention that it's very much a work in progress, so understandable that you don't have much to add on that yet.
 

bowiac

Caveat: I know nothing about what I speak
Lifetime Member
SoSH Member
Dec 18, 2003
12,945
New York, NY
Good questions. Answers below in blue.
  • Could you add maybe a paragraph explaining the machine learning aspect? I have some limited familiarity with machine learning but have never delved into it myself. If you ran DARKO two separate times on the exact same inputs (i.e. projections for tonight's games), is it possible the decision tree would produce a slightly different guess for what the outputs will be?
    • No, the results would be identical with the same inputs. The model itself is deterministic. However, there is some randomization with respect to how the model itself was fit. Without getting too deep into the weeds, both the β parameter and the Kalman filter parameter are calculated via a genetic algorithm. Genetic algorithms have some elements of randomization, so if I were to refit the model from scratch (this would take about 5 days right now), I may get slightly different results. Likewise, I combine all the features of the model (the decay features and the Kalman features) via a gradient boosted decision tree. The gradient boosted decision tree implementation I use is LightGBM. LightGBM has a number of hyperparameters which impact the model's outputs. I am selecting these hyperparameters via a Bayesian process, which has some randomness. So if I run the model a dozen times today, I'll get the same result each time, but if I decide to refit the model from scratch, the results may be a bit different.
  • I assume the model is always being trained on new data, so how does the model balance reactions to constant evolution of the overall league environment with fitting the projections to what it knows about the past?
    • This is done in much the same way as the player projections actually. I use a combination of exponential decay and a kalman filter to model changes to the league environment in response to new data. Each new day of data is added to the decay/kalman filters, and a new projection for the strength of league environment factors is generated. You can think of the league environment as being it's own player essentially, and the rest of the writeup mostly applies in much the same way.
  • Finally, I'd love a little more detail on DPM. It seems this was added sometime after the thread was started and before you launched the Shiny app, and you mention that it's very much a work in progress, so understandable that you don't have much to add on that yet.
    • DPM is an attempt to capture overall player talent. There are two flavors, the box-only model, and the on-off model. The box-only model takes each player's projected box score stats and uses them to project plus-minus, using another gradient boosted decision tree model. This model is similar to BPM in concept. The on-off model is similar, except it adds each player's projected on-off data to the model as well, to better capture defense. Think of that version as being something like xPIPM (which uses box and on-off data).
    • The target variable for DPM is raw plus-minus. This differs from other statistical plus minus models like BPM or RAPTOR, which use RAPM as the target. There are a few reasons for this, but the main reason is that daily RAPM does not exist (and would be meaningless). RAPTOR for instance is trained on 6-year RAPM data. That's nice in that you should get a pretty clean target metric, but it really reduces the amount of data you have to work with (to something like 800 rows). To avoid this issue, I'm using raw plus minus data, which is noisier than 6-year RAPM, but I get to train on 700,000+ rows of data.
    • Generally, to reinforce the caveat in the article...I have not spent much time on DPM. It was added so people could eyeball players against each other in all-in-one stat, but it's sort of outside the core mission of DARKO (box score projections), so I haven't spent as much time there. The point of DARKO is have a projection system which isn't tied to an all-in-one metric.
 

lovegtm

Member
SoSH Member
Apr 30, 2013
11,996
  • The target variable for DPM is raw plus-minus. This differs from other statistical plus minus models like BPM or RAPTOR, which use RAPM as the target. There are a few reasons for this, but the main reason is that daily RAPM does not exist (and would be meaningless). RAPTOR for instance is trained on 6-year RAPM data. That's nice in that you should get a pretty clean target metric, but it really reduces the amount of data you have to work with (to something like 800 rows). To avoid this issue, I'm using raw plus minus data, which is noisier than 6-year RAPM, but I get to train on 700,000+ rows of data.
I really like this, methodologically speaking. Most basketball modeling I've seen (especially wrt on/off) seems to rely on a lot of simplifying assumptions by the authors, rather than letting an algorithm discover lots of finer black-box relationships. Interested to see how this goes.
 

bowiac

Caveat: I know nothing about what I speak
Lifetime Member
SoSH Member
Dec 18, 2003
12,945
New York, NY
I don't per se claim that that's the best approach; and I did try it the "normal" way as well: I got access to the same RAPM set that 538 used to train RAPTOR, and trained a model on that as well. This worked fine, but had more "head scratcher" results which didn't quite look right to me. It's something I need to look into more, but given I'm not very focused on the "all-in-one" metric aspect so much, it's sort of on the backburner relative to other aspects like adding more tracking data or NCAA data.
 

DJnVa

Dorito Dawg
SoSH Member
Dec 16, 2010
53,837
Any updates as to how performed throughout the season?

I'm thinking of throwing a few DraftKings lineups in during the bubble--will that be updated?
 
Last edited:

DJnVa

Dorito Dawg
SoSH Member
Dec 16, 2010
53,837
Gonna bump this for opening night... @bowiac are you continuing to update these? Or has it gone behind a paywall?
 

radsoxfan

Member
SoSH Member
Aug 9, 2009
13,622
Congrats Bowiac!!

https://hoopshype.com/lists/advanced-stats-nba-real-plus-minus-rapm-win-shares-analytics/

HoopsHype received answers from nearly 30 participants, including various media members as well as individuals who have a combined experience with more than half of the teams in the NBA. Answers came from folks at every level within an organization, including those who work on a coaching staff as well as several different directors of analytics departments.

Most who answered spoke on the condition of anonymity because they are currently employed for NBA teams and felt that speaking publicly could reveal proprietary information about their teams.

The “winner” of our composite metric survey goes to DARKO. This is an application developed by Kostya Medvedovsky and hosted by Andrew Patton.

SURVEY SAYS: Among the 29 individuals who participated in our survey, eight (8) said that DPM was their preferred catch-all metric. That was the most among all metrics. Ten (10) others said that they trust DPM as an all-in-one metric while only one (1) said that they did not.
 

bowiac

Caveat: I know nothing about what I speak
Lifetime Member
SoSH Member
Dec 18, 2003
12,945
New York, NY
Thanks both. Always nice to to see some adoption/acknowledgement.

I missed the question earlier, but I don't anticipate ever moving the core DARKO content behind a paywall. It is more of labor-of-love for me, and a resume item. I am continuing to build out additional functionality for it, although I haven't had as much development time lately due to job/kids/work on baseball-DARKO.

DFS projections/lineups will be back this season however. I'm hoping to get some career-projection tools up as well this year.
 

benhogan

Granite Truther
SoSH Member
Nov 2, 2007
20,111
Santa Monica
Wow, very impressive. Congratulations! You should lob a call into Palantir, they are aggressively investing/helping startups that crunch analytics/data.
 

radsoxfan

Member
SoSH Member
Aug 9, 2009
13,622
Poor Hollinger's PER came in last. That alone should give the article some weight around here.

When you are looking to take DARKO's future predictions to the next level by incorporating publicly available medical/injury info as an added fudge factor, you know where to find me :)
 

bowiac

Caveat: I know nothing about what I speak
Lifetime Member
SoSH Member
Dec 18, 2003
12,945
New York, NY
Poor Hollinger's PER came in last. That alone should give the article some weight around here.

When you are looking to take DARKO's future predictions to the next level by incorporating publicly available medical/injury info as an added fudge factor, you know where to find me :)
Yeah - I'd love to add this sort of stuff, but this data is tough to get reliably/cleanly.
 

radsoxfan

Member
SoSH Member
Aug 9, 2009
13,622
Yeah - I'd love to add this sort of stuff, but this data is tough to get reliably/cleanly.
Would be interesting to include something for publicly reported medical issues (ACL tear, Achilles tear, meniscus tear, cartilage damage, impingement, arthritis, etc). But I think you're right, the information likely wouldn't be clean enough and no 2 injuries are alike. Probably would do as much harm as good to any projections.

Actually having a player's imaging and knowing the specifics, degree of cartilage loss, severity of meniscus tear etc. would definitely be useful for any projections. But then you run into HIPAA and you can't include that stuff anyway. The scans on college and NBA guys I've actually seen I don't post about on here of course, I only give my opinion on those I just have the same media reports as anyone else.
 

cardiacs

Admires Neville Chamberlain
SoSH Member
Jul 15, 2005
2,993
Milford, CT
Not sure where to put this, but is it too early to bet on win projections? It's always fun to bet over on your favorite team, and I imagine I will be doing so this year.
 

gingerbreadmann

Member
SoSH Member
Mar 11, 2008
750
I have a feature request for the DARKO app... and I'm sure this has been considered in some form, but I think adding the option to view historical career trajectories on a calendar-based real time scale would be the cherry on top of the current options for career games and age. I have consistently encountered use cases where all 3 x-axis types would be relevant to whatever I'm looking into. (for example: comparing the relative levels of all-time greats as of a specific season where they faced off in the playoffs). Could get wonky if players from different eras are compared together but I don't think that would get in the way too often, especially if it isn't the default scale. Of course, I owe much gratitude to the fact that the data remains free to browse in great depth already. Thank you.