1) does BIS* compute things like inter-rater reliability for all of their personnel? Are unreliable people discarded or weighted appropriately? I wonder about the effectiveness of using human judgments for many of these measures.
2) should defensive stats report both an average value and also a measure of variability? So, for example, players that have fewer (or greater) plays would have smaller or larger variability scores? The point is: could defensive evals do a better job communicating the amount of variability in each player (or team's) measurement, so that fans could get a good sense of whether or not a difference is really a significant difference?
*Question originally said STATS, not BIS (apologies)
I can only assume STATS is using the same process they used when I left. At BIS, we rigorously train our scorers and minimize any potential biases. During the season, we review each scorer's performance and make corrections as necessary. At the end of each season, we do a review of many plays to ensure that our data is recorded as accurately as possible.
The key with any statistic or number is the context. You can do your best to inform your readers of the process and thought behind each evaluation, and that's all you can do. We could attach a reliability score to every number we publish, but what about when people start misinterpreting the reliability indicator? Then we just have another statistic to explain. There's a fine line between educating readers and being too technical and losing their attention altogether.