Introducing...[Name TBD] - NBA Box Score Projection Project

DJnVa · Oct 21, 2019

bowiac said:
Oh, and I should add, I am very actively soliciting naming ideas for this, as every sports projection system needs to have a witty/dumb backronym around it.

This is very cool. I may throw in a DraftKings lineup for opening night using it.

As far as names--who's your favorite role player? Tacko? Carsen?

pokey_reese · Oct 21, 2019

Trend
Adjusted
Comparison (of)
Known
Offense

RorschachsMask · Oct 21, 2019

This is awesome, some damn good work.

wade boggs chicken dinner · Oct 21, 2019

Can't help on the statistics but off the top of my head: "Tracking Algorithm Calculator (with) Current Optimization".

Phil Plantier · Oct 21, 2019

For naming purposes:

1. When did you start watching basketball?
2. Who does your system rank more highly than we would expect?

Edit: just so I don't spend all afternoon trying to figure out a way to name it DUEROD

DJnVa · Oct 21, 2019

bowiac said:
Yeah - this wasn't intended to be a DFS product really, but rather something more akin to season-long projections, but due to some design decisions, it accidentally ended up on a taking a distinctively DFS-focused flavor. I have no DFS experience, so I haven't done any true testing for how the projections would fare there.

For DFS purposes, especially for the first game, I would recommend playing with the minutes projections to account for the fact that a bunch of guys will play more/less than I project above. It's just very hard for an algorithm to do that accurately, especially with 40% of the league having changed teams.

Oh sure, more just for fun. Early season DFS can be beatable in the sense that some guys aren't priced correctly, and if your model has hit on some of them, then it helps. I was able to work LBJ and Leonard into my lineup.

luckiestman · Oct 21, 2019

Turbo

Tracking
Updating
Recency
Boosting
Optimization

HowBoutDemSox · Oct 21, 2019

Boxscore Indexed Regression Data system

Sam Ray Not · Oct 21, 2019

Bayesian Opponent-adjusted Boosting / Calculation & Optimization Updated Since Yesterday

(BOBCOUSY)

wade boggs chicken dinner · Oct 21, 2019

bowiac said:
Not a big Tacko guy, so not gonna try to force my way into a naming scheme around him. A couple of concepts which feature heavily here are 1) Exponential decay (or just decay); 2) Gradient; 3) Boosting; 4) Optimization; 5) Bayesian.

Working Tracking in there would be good too, to emphasize I'm using next-gen data.

Bayesian Optimization Of Gradient Exponentialy-decayed Ranking.

Do I win?

Dollar · Oct 21, 2019

ADJUSTED
INTEGRAL
NEXT-GEN
GRADIENT
EXPECTATION

benhogan · Oct 21, 2019

Boxscore
Adjusted
Yesterday's
Numbers
Equivalency
System

HomeRunBaker · Oct 21, 2019

benhogan said:
Boxscore
Adjusted
Yesterday's
Numbers
Equivalency
System

Lock The Thread!! We have a winner!!

tmracht · Oct 21, 2019

Boxscore
Offensive
Winshare
Influenced
Adjusted
Calculator

ElUno20 · Oct 22, 2019

Dont apologize. This is great work.

Also, this is telling me lebron's not getting a double double tonight?

Lazy vs Crazy · Oct 22, 2019

Celtics with more wins than Lakers makes me excited.

wade boggs chicken dinner · Oct 23, 2019

Litttle off-color: Boosting Optimization Of Bayesian Adjusted Gradient (with) Exponential-decay.

wade boggs chicken dinner · Oct 23, 2019

Gradient-assisted Optimization of Bayesian Exponential-decay with Recency and Tracking.

benhogan · Oct 23, 2019

bowiac said:
The minutes side is always going to be a bit wonky, since it just hard to predict coaching decisions, and incorporate a ton of outside information. It's going to be especially regressed to start the season as the system learns what the rotations are for each team. The minutes projections perform well overall (better than paid fantasy sites), but you can almost certainly do better through a combination of human input and computer analysis there.

I have no real front-end experience, but I hope to eventually be able to build a tool that'll let you update the minutes projections and the other box-score projections will update accordingly.

I'm probably not adding anything. BUT during the season wouldn't a player's minutes slide up/down with his performance VS. the performance from teammates that play a similar position?

For example, when Paul George returns I'd expect Harkless minutes to not be as impacted as PatPat's minutes since I expect Harkless performance to be superior. But if PatPat outplays Harkless I could see Doc changing how he doles out minutes

DJnVa · Oct 23, 2019

Does it use rosters from end of last season? For instance I see RJ Hunter listed for Celtics, with a set number of minutes.

DJnVa · Oct 23, 2019

What about no data for rookies and how that could affect the win projections?

djbayko · Oct 23, 2019

bowiac said:
Not a big Tacko guy, so not gonna try to force my way into a naming scheme around him.

Okay, fine. Have it your way....

Kng
Of all
Bsketball
Estimators

djbayko · Oct 23, 2019

Awesome stuff, @bowiac ! The first tab of the Google sheet is player projections for today, correct? Do you have full season projections at a player level? I assume you do.

Devizier · Oct 23, 2019

bowiac said:
I have added win projections based on these numbers here:

View: https://docs.google.com/spreadsheets/d/1mhwOLqPu2F9026EQiVxFPIN1t9RGafGpl-dokaIsm9c/edit#gid=1432098323

Apologies for getting these out so late.

BAYNES loves Boston!

BrazilianSoxFan · Oct 24, 2019

How will your model account for injured players or players returning from injury? Old veterans getting a maintenance day?

DJnVa · Oct 24, 2019

You likely don't care, but since I'm pretty bad at DraftKings I'm using some of these numbers when I make my lineups. I make one lineup my usual way and one using some of this info. Last night my "normal" lineup was pretty bad. The one where I used some of these numbers didn't win anything, but did much better. I didn't place because Kemba was a no-show and Embiid was maybe 10-15 points under projections.

The guys at the margin are where I usually need help--Dwayne Bacon and Goran Dragic--did really well for me.

DJnVa · Oct 24, 2019

Interesting that it has Giannis with 21 minutes tonight?

DJnVa · Oct 24, 2019

bowiac said:
And your observation that the guys at the margins are where the benefits are is also correct. The DFS sites selling projections mostly do them by hand, and can do a good job by focusing on the stars, but it's tough to put in that effort for 200 guys a night. The model isn't perfect, but it doesn't struggle for lack of effort on the fringier players.

I feel like I can give a decent prediction on how guys like Harden, Walker, Tatum, Lillard, Irving, Davis, James, etc. will play--but finding which cheap guys to play is where, for DFS, you make a difference.

DJnVa · Oct 25, 2019

I entered an inexpensive contest last night using your numbers with "crowns" and won a few bucks. I was up to 2000th out of like 36000 before the late game, and I had Draymond and Damion Lee ($3000 player)--if Green had an average game I would have finished around 750th and had he been a bit better than average I would have been in top 250, but Draymond got injured and while he returned, he didn't do much.

The upshot though, is that Green and Lee were the only 2 players I picked to not outperform their salary. In Lee's case it hardly mattered.

Tonight, it has Kemba, Hayward, and Tatum all as good values based on your projections and the DraftKings salaries.

Best value on the board based on your projections is old friend Marcus Morris.

gingerbreadmann · Oct 29, 2019

I've spent an inordinate amount of time messing around with these projections over the last week or so, and while I don't have any grand findings to share, I just want to share my kudos and thanks to the work behind them. Lots of fun to sift through and play with.

Lacking a more robust end goal at the moment, I too have been using them for daily fantasy lineup creation. I haven't played DFS in years so it's a bit of a refresher for me. None of my lineups using the projections have hit so far, but I'm sure that's a SSS issue as well as a me issue. I just wanted to think out loud a bit on the thought processes I'm stuck on.

-These projections are better suited for 50/50 games than tournaments, or am I wrong about that? The value will be on the margins no matter what game you play, but taking the numbers and translating them to an expected DFS value (relating to this, I've spent way too long fine-tuning a sigmoid function to estimate double-double and triple-double probabilities) ignores upside. Having, say, a 75th percentile outcome along with expected value would be very insightful, and is something I'm trying to produce, even in bare-bones fashion.
-On that note, I have been playing a lot with the spreadsheet you linked in the original Twitter thread @bowiac, of the game-by-game results from last year, to identify what factors are behind someone exceeding their projection. The upshot is that nothing I've found can even hold a candle to Minutes. Error in the MP projection accounts for almost half of the difference between actual and expected DraftKings points. There are a few factors that correlate, but really nothing I've tried has any value without knowing how the minutes will pan out. (In the 2018-19 sheet you have a column for whether the player started, which looks like the most valuable forecast input if available.)
-Lineup construction is also something I'm a bit of a novice at. So far I have strictly been using expected points, salary, and position inside a linear optimization function to create the lineup with the most expected points. Am I missing other important factors here? I have seen articles about football DFS that estimate the % drafted by player, which I suspect could be a useful tool to find players who are truly overlooked, but I have done zero work on estimating that. I've also tinkered with evaluating each player's value based on what position they're used at (obviously this only applies to DraftKings), but haven't decided how to approach that strategically.

Excited to hear more about the new minutes projection you have in the works. It's already impressive but any small accuracy improvements here would have a massive payoff. Any thoughts welcome towards anything I just mentioned.

slamminsammya · Oct 30, 2019

Im curious whether you used gbm or a neural net for the minutes model. In the case of gbm what were the features and did you consider weighting the data points to make the model better for higher minutes players?

slamminsammya · Oct 30, 2019

bowiac said:
I tried both, but settled on a gradient boosted model (LightGBM, having also tried xGBoost and Catboost) as my base estimator. I say base estimator, because I do plan on putting together a stacked model eventually, but right now LightGBM is doing everything.

As I mentioned, the core features of the model are a very expensive to calculate exponential decay function which takes the form of X^DaysAgo, where X is between 0 and 1, and varies by stat. At lower values of of X, more recent performance is weighted more heavily, while at higher values, older games take on more weight. At the extreme, X == 1, the feature is just an average of every game the player has played. I apply this exponential decay function to every stat, which itself can be thought of as giving me a projection for every statistic. In other words, have a weighted average for Marcus Smart's minutes (30.07 entering the next game), which is assembled by that X^DaysAgo weighting.

Separately, I also use a modified Kalman filter to generate projections for every statistic, which is likewise fit to minimize error in projecting the next game. This, together with the exponential decay function, give me interim projections for every stat.

These two sets of features (the exponential decay features, and the Kalman filter features) make up the bulk of the features for the model. That's 46 features in all right now, i.e., your interim projections in every stat impact every other stat's final projections.

For minutes, I also add an additional feature of how many minutes a played played last game. In principle, this shouldn't be necessary, as that should be captured by the features above, but I've found it helpful. Finally, I also add a team feature to help make the minutes add up to something close to 240 per night.

In terms of weighting, I'm not sure what you mean. What would the data points be weighted by exactly?

I am not sure I fully understand your features, because I am not totally sure I understand what the target variable of the GBM is. Do you have a GBM - per - statistic? Or is there a GBM purely for minutes? If the latter, are you predicting minutes per game over the course of the season per player, or is it predicting minutes in the next game?

Supposing I wanted a model to predict, for a given player, the number of minutes they would play in the next game, I would think you would do very well with only a few features: Career MPG (or maybe your exponentially decayed MPG), age, team W-L, game #, whether its a SEGABABA, team conference standing, draft position. Maybe you have already tried this. Those features should account for this weirdness with good players on good teams getting rested towards the end of the season.

If that were your output variable you have a lot of data points - one per player-game.

The weighting works basically when the GBM computes the loss function it is trying to optimize for - you can basically assign weights to the data that tell the GBM that minimizing loss for those data points counts "more" than minimizing loss for other data. So an easy first guess at how you would weight those is by minutes! If a player played 40 minutes in a game, you weigh that single data point by 40, etc. Maybe that weighting is too aggressive. But as it is the GBM will treat the error on a scrub who never plays as the same as the error on Giannis's minutes at the end of the season.

I just looked in the documentation for LightGBM and they have the weights column option: https://lightgbm.readthedocs.io/en/latest/Parameters.html go to weight_column.

Introducing...[Name TBD] - NBA Box Score Projection Project

Caveat: I know nothing about what I speak

Caveat: I know nothing about what I speak

Dorito Dawg

Member

Caveat: I know nothing about what I speak

Member

Member

Caveat: I know nothing about what I speak

Member

Dorito Dawg

Son of the Harpy

Member

Member

Member

Member

Granite Truther

bet squelcher

Caveat: I know nothing about what I speak

Member

Caveat: I know nothing about what I speak

Member

Member

Caveat: I know nothing about what I speak

Member

Member

Granite Truther

Caveat: I know nothing about what I speak

Dorito Dawg

Caveat: I know nothing about what I speak

Dorito Dawg

Member

Member

Member

Caveat: I know nothing about what I speak

Caveat: I know nothing about what I speak

Member

Dorito Dawg

Dorito Dawg

Caveat: I know nothing about what I speak

Caveat: I know nothing about what I speak

Caveat: I know nothing about what I speak

Dorito Dawg

Dorito Dawg

Member

Caveat: I know nothing about what I speak

Member

Caveat: I know nothing about what I speak

Caveat: I know nothing about what I speak

Member

Caveat: I know nothing about what I speak