Jump to content


Yo! You're not logged in. Why am I seeing this ad?

Photo

Adjusting College Statistics


This topic has been archived. This means that you cannot reply to this topic.
1 reply to this topic

#1 Hairps

  • 1729 posts

Posted 05 July 2006 - 10:18 AM

I recently completed an effort to isolate college hitting and pitching performance from the effects of parks, level of competition, and defense. The summary and full results can be found in The Sox Draft Forum.

Following is a full explanation of the metrics and methodologies used.

HITTING

Adjusting college hitting statistics for varying levels of competition faced and park effects was done the following way:

I chose to use wOBA, a metric proposed by Dolphin/Lichtman/Tango in The Book. It's a linear-weights based measure of offensive production, set to a similar scale as OBP. Since I don't know whether the exact formula they use is floating around, I'll hold off on sharing it here. But for the purposes of explanation, the important thing to note is that it is expressed in bases.

Park Effects

Using Boyd Nation's three-year Park Factors, begins our process. For those masochists who might be interested, a full explanation of how these Park Factors are derived can be found here.

For the 50+ hitters analyzed for The Baseball Analysts, I calculated custom park factors for each. For the rest, I used their team's 2002-2005 total park factors. This is less accurate, but oh well. You get what you pay for, I guess.

For example:

    PF     TPF                    Team
     159     142              Air Force
     109     112                  Akron
      85      86                Alabama
     111     106            Alabama A&M
     113     105          Alabama State
      90      92     Alabama-Birmingham
      79      89                 Albany
     102     101           Alcorn State
     113     107      Appalachian State

In plain English, Air Force's 2002-2005 Total Park Factor (the average park factor of all the parks in which they played during those three seasons), was 142. Boyd's park factors are expressed as a percentage of how run-friendly those specific parks are relative to a "neutral park." So, a PF of 159 is a park that has allowed an average of 59% more runs than a neutral park over the past three season. A PF of 85 means that park has allowed an average of 15% fewer runs than a neutral park over the past three seasons.

The key is that those Park Factors are expressed in runs. So, when adjusting wOBA (which is expressed in bases), it's necessary to convert the park effects back to bases, as well.

Remember: (Tip of the hat to Tangotiger) Take bases, and if you square it, you pretty much get runs. (And if you square that, you pretty much get wins).

As such, here's the equation for Park-Adjusted wOBA:

Park-Adjusted wOBA = wOBA*SQRT(100/TPF)

It's now necessary to further isolate this from the effects of different teams' varying levels of competition faced.

Strength of Schedule

If I haven't said it yet, Boyd Nation is a Great American. He also tracks a team's Strength of Schedule during the course of the college baseball season when he publishes his Iterative Strength Ratings. For our purposes, however, there is one minor issue. He tracks them as a rating, not as a raw number. Fortunately, the numbers we are looking for can be backed into with a good degreee of confidence the following way:

RANK     2005   2004    2003   2002   2001 Avg SoS   StDev
    1   114.5  116.7   115.6  114.1  116.8    115.5    1.23
    2   113.8  115.8   114.8  113.8  116.1    114.9    1.08
    3   113.6  114.5   113.9  112.7  115.8    114.1    1.15
    4   113.6  114.4   113.6  112.7  115.7    114.0    1.12
    5   113.5  114.3   113.5  112.7  115.2    113.8    0.95
    6   113.2  114.0   113.5  112.6  114.8    113.6    0.83
    7   113.1  113.8   113.3  112.5  113.5    113.2    0.49
    8   112.8  113.7   113.0  112.5  113.3    113.1    0.46
    9   112.2  113.6   112.8  112.3  113.1    112.8    0.58
   10   112.1  113.2   112.8  112.0  112.9    112.6    0.52
   11   112.1  113.1   112.7  111.9  112.8    112.5    0.50
   12   112.1  112.9   111.9  111.9  112.3    112.2    0.41
   13   111.7  112.4   111.6  111.9  111.9    111.9    0.31
   14   111.6  111.6   111.6  111.7  111.7    111.6    0.05
   15   111.6  111.3   111.3  111.7  111.5    111.5    0.18
   16   111.4  111.2   111.0  111.1  111.5    111.2    0.21
   17   111.3  111.1   110.9  111.1  111.4    111.2    0.19
   18   111.2  110.9   110.9  111.0  111.3    111.1    0.18
   19   111.2  110.8   110.8  110.9  111.3    111.0    0.23
   20   110.9  110.7   110.8  110.7  111.3    110.9    0.25
   21   110.7  110.7   110.4  110.5  111.2    110.7    0.31
   22   110.4  110.6   110.0  110.4  111.2    110.5    0.44
   23   110.4  110.5   109.8  110.2  111.1    110.4    0.47
   24   110.4  110.3   109.8  110.2  111.1    110.4    0.47
   25   110.3  110.1   109.4  110.0  110.9   110.14    0.54

So. If you look back over the last five seasons, the raw "score" of team's SoS Ranking has remained pretty consistent, at least it's good enough for me. So, what I did was take the team for which a hitter played, found their SoS ranking, and converted it to a "score" using the five-year average of that ranking.

Again, the important thing to note is that SoS is expressed in runs. So, again, when adjusting wOBA (which is expressed in bases), it's necessary to convert the SoS effects back to bases, as well.

As such, here's the equation for a hitter's AwOBA:

AwOBA = Park-AdjustedwOBA*SQRT(SoS/100)

I do not claim any of this to be ground-breaking. In fact, Boyd has done this for his Adjusted OPS for a number of years.

I can only hope that these efforts in some way build on his and Dolphin/Lichtman/Tango's great work and can help serve to expose them to an even broader audience.

Edited by Hairps, 17 July 2006 - 04:02 PM.


#2 Hairps

  • 1729 posts

Posted 05 July 2006 - 10:39 AM

PITCHING

The process used to isolate pitching statistics was similar to the above, with a few additional steps.

Defense

I'm not sure I can add anything to the topic of defense-independent pitching. Except to maybe re-emphasize that it can be done, and that it's important to do.

For the purposes of this project, I used a revised DIPS formula, tailored by Boyd Nation specifically for the college game:

dERA = 3.93 + (-2.54*K + 4.13BB + 14.78*HR)/IP

From this, we can begin the process of futher isolating dERA from the effects of parks and varying levels of competition.

Park Effects

For pitchers, it is important to remember that we must consider only the parks in which the pitcher has played. For example, here is a portion of the 2006 college game logs I kept for Red Sox draftee Justin Masterson:

  DATE       Opponent    PARK     SoS
   2-Feb        Hawaii       89   118.8
  11-Feb   Santa Clara      102   109.4
  18-Feb      Cal Poly      119   113.4
  24-Feb     UC Irvine       77   119.2
   3-Mar    Pepperdine      119   119.9
  10-Mar     San Diego       98   115.6
  17-Mar          UCLA      114   118.1
  24-Mar       Pacific      119   113.9
  29-Mar    New Mexico      159   106.3
   2-Apr          UNLV      119   108.9
   7-Apr    New Mexico      119   106.3
  13-Apr          Utah      119   102.9
  21-Apr          UNLV      123   108.9
  28-Apr           TCU      119   111.1
   5-May     air force      159    85.2
  11-May           byu      143   106.4

Because both Park Effects and dERA are expressed in runs, here's the equation for Park-Adjusted dERA:

Park-Adjusted dERA = dERA*(100/PF)

Strength of Schedule

As was the case with park adjustments, it is important to remember that we must consider only the teams against whom the pitcher has played.

Because SoS, Park Effects, and dERA are all expressed in runs, here's the equation for SoS & Park-Adjusted dERA:

SoS & Park-Adjusted dERA = Park-Adjusted dERA*(100/AvgSoS)

Edited by Hairps, 05 July 2006 - 12:19 PM.