Sabermetrics Stats 계산법

BATTING STATS

RC
(Bill James' Runs Created)
타자의 생산력을 평가하는 득점 창조력(공헌도) 입니다. 팀에 베이스를 추가시키는 공격행위는 +값을, 베이스나 공격기회를 날리는 행위는 -값을 가집니다. 위에서 언급했듯이 A는 출루, B는 전진, C는 공격 기회입니다. 지금껏 알려져 있는 RC 공식만도 24가지 이상에 달하는데, 베이직은 역시 (안타+볼넷)*루타 / 타석 입니다. 비교적 무난한 공식하나를 적어드리면,

A(출루)=안타+볼넷+사구-도실-병살
B(전진)=루타+.26*(볼넷+사구)+.53*(희생번트+희생플라이)+.64*(도루)-.03*(삼진)
C(기회)=타석

RUNS CREATED = ((A+2.4*C)*(B+3*C))/(9*C)-(0.9*C)

RC/27 (Runs Created Per 27 Outs)
27 아웃을 당할 동안의 RC, 다시 말해 팀 RC/27은 한경기 동안 이론적으로 올릴수 있는 점수를 말합니다. RC/27은 선수에게도 적용이 가능합니다. 가령 A 로드의 RC 27은 텍사스가 A 로드 9명으로 짜여진 팀이라 가정할때 경기당 뽑을수 있는 득점을 나타냅니다.

RC/27 = (27*RC) / (타수-안타+병살타+도실+희생플라이+희생번트)

타석에서 아웃되지 않는 주루사등을 감안하여 27개의 아웃카운트 대신 25 아웃을 고려한 RC/25를 사용하기도 합니다.

XR (Jim Furtado's eXtrapolated Runs)
linear weighted formula(선형가중식)로 구한 타자의 득점 공헌도 입니다. RC와 비슷한 개념이지만, 정확도 면에서는 낫다고 알려져 있습니다.

XR = 1루타*.5 + 2루타*.72 + 3루타*1.04 + 홈런*1.44 + (사구+볼넷-고의사구)*.34 + 고의사구*.25 + 도루*.18 - 도실*.32 - (타수-안타-삼진)*.09 - 삼진*.098 - 병살타*.37 + 희생플라이*.37 + 희생번트*.04

XR / PA -> XR 지수를 타석으로 나눈 값
XR / G -> XR 지수를 겜으로 나눈 값

Base Runs
David Smyth가 고안한 툴로 역시 타자의 득점 공헌을 평가합니다.

A = 안타 + 볼넷 - 고의사구 - 도실 - 홈런
B = 1.39*루타 - .58*안타 - 2.8*홈런 + .19*볼넷 - .19*고의사구 + 1.2*도루
C = 타수-안타
D = 홈런

Base Runs = A x B/(B + C) + D

OPS (Onbase Percentage Plus Slugging Percentage)
출루율과 장타율의 합으로 구성되는 가장 대중적인 세이버 툴로, 타자의 생산력을 나타냅니다.

출루율+장타율 (OPS) = OBP + SLG

SECA (Bill James' Secondary Average)
빌제임스가 1986년 고안한 타율의 허상을 보완한 개념입니다. 안타를 타수로 나누는 타율공식이 갖는 가장 큰 맹점은 장타와 단타의 가치를 동일시 하면서 볼넷은 인정하지 않는다는데 있습니다. SECA는 장타율의 가중치에 볼넷과 도루의 가치를 고려해 만든 수정타율 입니다. 개별 플레이어들에게 적용시킬 경우 .100에서 .600 사이를 기록하는 것이 보통입니다.

SECA = (2루타+2*3루타+3*홈런+볼넷+도루-도실) / 타수

ISO (Isolated Power)
장타율에서 인필드 싱글과 같은 단타를 제외한 타자의 순수한 파워배팅을 측정합니다.

ISO = 장타율- 타율 or ISO = (2루타+2*3루타+3*홈런) / 타수

Batting Runs (Pete Palmer's Linear Weights)
리그 평균적인 공격력을 가진 타자 보다 팀 득점에 공헌한 정도를 나타내는 선형 가중식 입니다. 공식에 따르면 완전히 리그 평균적인 타자들의 BR 값은 0으로 셋팅됩니다.

Batting Runs = .47*1루타 + .78*2루타 + 1.09*3루타 + 1.40*홈런 + .33*(볼넷+사구) + .30*도루 - .60*도실 - .25*(타수-안타) - .50*주루사

OW% (Bill James' Offensive Winning Percentage)

RC/27을 활용해서 공격력만으로 구한 팀의 기대 승률입니다. 가령 텍사스가 리그 평균적인 수준의 투수력과 수비력을 보유한 팀이라 가정하고, A 로드 9명으로 짜여진 텍사스 라인업에서 이론적으로 기대 할수 있는 승률 입니다.

OW% = (A 로드의 RC/27Outs)^1.83/[(게임당 리그 평균득점)^1.83+( A 로드의 RC/27Outs)^1.83]

EQA => (OW % / 1-OW %)^0.2 * 0.26

GPA =>( 1.8 * 출루율 + 장타율 ) / 4

Total Average => (TB+HBP+BB+SB)/(AB-H+CS+GIDP)

RCON => (.46)*1루타+ (.80)* 2루타+ (1.02)* 3루타+ (1.40)* HR+ (.33)* (BB+HBP)+ (.30)*SB- (.60)*CS- (.25)*(AB-H)



PITCHING STATS


GS (game score)
1988년 빌제임스가 야구발췌에서 소개한 투수의 이벤트를 수치화한 능력입니다.

50점에서 출발해서

(+) 원아웃시 +1, 4회이후에 각 이닝을 소화할 때 마다 +2, 삼진 하나당 +1
(-) 한 개의 안타를 내줄 때 마다 -2, 볼넷 한 개는 -1, 1 자책점 시 -4, 비자책점은 -2 로 계산합니다.

한편 2002시즌 AL는 페드로 마르티네스, NL는 랜디존슨이 GS 1위에 올랐습니다.

ERC (Bill James' Component ERA)
기존의 방어율 공식에서 투수가 허용한 자책점 대신 안타, 볼넷, 홈런을 몇가지 보정을 거쳐 만든 방어율입니다.

Estimated Component Earned Runs (CER) = {[(안타 - 홈런)*1.255 + 홈런*4]* .89 + (볼넷 + 고의사구 + 사구)* .56}*{사구 + 안타 + 볼넷}/ (상대타석)

ERC = CER*9 / 이닝 - .56

ERA+ (Adjusted Earned Run Average)
시대와 구장효과를 감안한 투수의 조정 방어율입니다.

리그 평균방어율 / (A 투수의 방어율 * 구장효과) * 100

ERA+는 리그 평균을 100으로 잡고 출발합니다. 만약 수정 방어율의 값이 120 이라면 A 투수는 리그평균보다 20% 뛰어난 방어율을 기록한 것이 됩니다.

피칭런 - 이닝 * (리그평균 방어율 / 9) - 자책

피칭런 (수정판) - 162경기 환산 이닝 * (리그 수정 평균 방어율 / 9) - 162 경기 환산 자책

수정 방어율 - 피출루율 * 피 장타율 * 31

K RATE - 투수에게는 삼진이 절대적이어서 해마다 9이닝당 평균 삼진을 구하는 것입니다. 거기서 증 감을 나타내는 그래프라고 봅니다.



TEAM STATS

피타고리안 승리 - 패배 -> (팀득점)^1.83 / (팀득점^1.83 + 팀 실점^1.83)

파크팩터 - 심플하게 할까 생각합니다.
(홈 팀득실점 / 원정 팀 득실점)


*******************

(아래는 kini의 블로그에서 퍼옴)

※ 기록에 관해


  • GPA는 Gross Production Average의 약자로 (출루율×1.8+장타율)÷4로 계산된다. OPS가 출루율과 장타율을 1:1로 더함으로써 생기는 장타율의 과대평가를 막기위한 메트릭. 무엇보다 타율과 유사한 범위의 값을 통해 직관적인 판단이 가능한 게 장점이다.


  • FIP는 Fielding Independent Pitching의 약자로 전체 실점 가운데 투수가 책임져야 할 점수를 보여주는 메트릭이다. 보로스 맥라켄이 주장한 DIPS(Defense Independent Pitching Stat.)의 수학적 원리만을 뽑아 Tango Tiger로 알려진 세이버메트리션이 창안해 냈다. 공식은 FIP = ( 13 × 홈런 + 3 × 사사구 - 2 × 삼진 ) ÷ 이닝 + 보정용 상수


  • DER은 Defense Efficiency Ratio의 약자로 인플레이된 타구(Balls In Play) 가운데 몇 %가 아웃으로 처리됐는지를 보여준다. 예를 들어 상대 타자가 10개의 공을 때려 그라운드 안에 공이 머물고 있을 때 이 가운데 3개만 안타로 연결됐다면 나머지 7개의 타구, 즉 70%의 타구가 아웃으로 처리된 것이다. 이 경우의 DER은 .700이다. 공식은 DER = ( 상대 타자 - 안타 - 삼진 - 사사구 - 에러로 인한 출루 허용) ÷ ( 상대 타자 -홈런 -삼진 -사사구 ) 


  • ISO는 ISOlated Power의 약자로 장타율에서 타율을 뺀 값이다. 이는 장타율에 타율이 개입된 점을 고려하는 과정이다. 예를 들어 3타수 3안타를 모두 단타로 기록한 선수는 타율과 장타율이 모두 1.000이다. 실제 장타는 하나도 없지만 말이다. 이 경우 ISO는 .000으로 해당 선수에게 장타 능력이 없음을 보여준다.


  • 피타고리안 승률은 팀의 득점과 실점을 바탕으로 계산된 팀의 예측 승률이다. 일반적으로 득점^2/(득점^2+실점^2)의 공식을 사용해 계산된다. 여기서는 지수에 2대신 X=0.45+1.5×log10((득점+실점)/경기)를 사용했다.




    http://www.waswatching.com/archives/stat_glossary/index.html

    One of the things that I've always wrestled with in doing this blog is the use of some of the sabermetric measures that I like to throw around. It's not me using them that's the issue - it's knowing whether or not that people understand the terms that concerns me.

    Should I use the full term or are acronyms OK? Do I need to provide the definition each time that I use them? Stuff like that.

    So, I've decided to create an entry here where I can list some of the terms that I use - and link to it at times (when I mention some of these sabermetric measures). It seems like a good Band-Aid now for this issue of mine.

    Here are some of the terms that I use here frequently and the skinny on each:

    Bases Per Plate Appearance [BPA]

    The formula is (TB+BB+HBP+SB-CS-GIDP)/(AB+BB+HBP+SF).

    Baserunners Per Nine Innings [BR/9]

    The total number of batters reaching base against a pitcher divided by the number of innings pitched and multiplied by nine. It measures how many batters reach base on a per game basis against a pitcher.

    A league best figure for this category is typically between 9 and 10.

    Blown Saves [BS]

    When a relief pitcher enters a game, he may be said to have a save opportunity if his team currently has the lead and he would be awarded a save if he finished the game. If, while he is pitching, his team loses the lead, either by way of the score becoming tied or by falling behind, that pitcher is said to have "blown the save." He is charged with a blown save, even if his team should eventually win the game, because he was entrusted with the responsibility to preserve his team's lead, and he failed to accomplish that.

    Command Ratio [K/BB]

    (Strikeouts / Walks) - A measure of a pitcher's raw ability to get the ball over the plate. There is no more fundamental a skill than this, and so it is accurately used as a leading indicator to project future rises and falls in other gauges, such as ERA. Command is one of the best gauges to use to evaluate minor league performance. It is a prime component of a pitcher's base performance value.

    Benchmarks: Baseball's upper echelon of pitchers will have ratios in excess of 3.0. Pitchers with ratios under 1.0 -- indicating that they walk more batters than they strike out -- have low probability for long term success.

    Defensive Efficiency Record [DER]

    The rate at which balls put into play are converted into outs by a team's defense.

    Game Score [G Sc]

    A measure of pitching performance for starting pitchers. Developed by Bill James. The formula consists of eight parts:

    1. Start with 50.
    2. Add 1 point for each out recorded.
    3. Add 2 points for each inning the pitcher completes after the fourth inning.
    4. Add 1 point for each strikeout.
    5. Subtract 2 points for each hit allowed.
    6. Subtract 4 points for each earned run allowed.
    7. Subtract 2 points for each unearned run allowed.
    8. Subtract 1 point for each walk.

    Consider this pitching line:
    IP H R ER BB K
    8.1 5 2 1 2 7

    The game score for the performance shown would be 72 (50+25+8+7-10-4-2-2).

    An average start would score 50. One start in 300 reaches a score of 90 or better, and an all-time great performance would reach 100.

    Isolated Power [ISO]

    A player's slugging average minus his batting average. Bill James provided its current name. Branch Rickey championed the stat, calling it "Power Average." A measure of a player's ability to hit for power considered apart from his ability to hit singles.

    ISO = SLG - AVG

    For an individual, ISO under .080 means he can be considered a singles hitter; ISO over .200 is very good power.

    Neutral Losses [NL]

    It is a projection for how many losses a pitcher would have if he was given average run support, considering the amount of actual decisions.

    Neutral Wins [NW]

    It is a projection for how many wins a pitcher would have if he was given average run support, considering the amount of actual decisions.

    Offensive Winning Percentage [OWP]

    A player's Offensive Winning Percentage equals the percentage of games a team would win with nine of that player in its lineup, given average pitching and defense. The formula is the square of Runs Created per 27 Outs, divided by the sum of the square of Runs Created per 27 Outs and the square of the league average of runs per game.

    Park Factor [PF]

    This is an estimate of a ballpark’s effects on batting and pitching and is expressed as either a decimal or a whole number. A neutral ballpark has a park factor of 1.00 or 100. Park factors are those used in many publications include three-year averages unless a ballpark was in use for fewer than three seasons. Park factors are also adjusted to reflect the fact that a batter or pitcher does not face his own team. Thus, different park factors are provided for a team’s batters and pitchers.

    Production [OPS]

    Sabermatricians (baseball statisticians) consider the ability to get on base (OBP) and the ability to hit for power (SLG) to be the two most valuable offensive abilities of a player. Thus one measure of a player's prime offensive talents, his "production" or PRO, is to simply combine OBP and SLG.

    OPS = OBP+SLG

    Pythagorean Winning Percentage [PW%]

    Developed by Bill James, is the predicted winning percentage based on runs and runs allowed. The formula is as follows: Runs^2/(Runs^2+Runs Allowed^2)
    Here is the calculation for the 1999 Yankees. The Yankees scored 900 runs and allowed 731 runs:

    900^2/(900^2+731^2)=.603

    Thus, the Yankees would be predicted to have a .603 winning percentage. In actuality, the Yankees had a .605 winning percentage. A more precise calculation uses a factor of 1.83, but a factor of two works almost as well. From Pythagorean winning percentage it is possible to figure Pythagorean wins (PW) and Pythagorean losses (PL).

    Range [RNG]

    This is an unofficial measure of a defensive player's fielding ability. In effect, it indicates how many defensive chances a player is able to convert into outs on a per game basis. Range for 1B, C and pitchers is not a meaningful stat. It is calculated as:

    RNG = 9*SC/INN

    where SC is successful chances and INN is innings played on defense.

    Performance differs by position. Typical season range factors are: 2B-4.5 to 6.0; 3B-2.0 to 3.3; SS-4.0 to 5.3; RF- and LF-1.5 to 2.5; CF-2.3 to 3.2.

    Runs Created [RC]

    A Bill James statistic. An estimate of the number of runs that a player would produce based on his offensive statistics. Runs created is an attempt to measure total offensive contribution in terms of runs (see also Runs Contributed). Divided by the runs required per win (in professional baseball, approximately 10), runs created becomes the total wins created by this player's offensive performance.

    RC = ((H+BB+HBP-CS-GIDP) * (TB+ 0.26*(BB+HBP-IBB) + 0.52*(SB+SH+SF)))/(AB+BB+HBP+SH+SF)

    Note: The formula shown here is the modern formula in current use by sabermetricians. Bill James created many variations of the basic formula to adjust for available data and other factors in bygone eras.

    RC typically ranges from 0 to 120 in a 162-game season. Only players who play a lot can have a very high season total, since the number is dependent on total stats. For a team, runs created is a projected estimate of the runs the team should have scored given its number of hits (by type), walks, stolen bases, and times caught stealing. Comparing team runs created to actual runs scored gives an indication of other factors at work, factors that effect the efficiency of a team's offense. For instance, high efficiency -- consistently scoring more runs than projected -- could be explained by good clutch hitting, good baserunning, good managing, or good luck (or maybe cheating). The more consistent the two figures, the less luck is probably involved.

    Runs Created Above Average [RCAA]

    This is a Lee Sinins creation. It's the difference between a player's runs created total and the total for an average player who used the same amount of his team's outs. A negative RCAA indicates a below average player in this category.

    Runs Created Per Game [RC/G]

    Runs created is an accumulation stat; the more a player bats, the more runs he creates (assuming he makes some positive contribution). Converting runs created into runs created per game provides an indication of how valuable this player is to have in the lineup. RC/G is somewhat like ERA is for pitchers; it recasts the offensive contribution of the player in the context of a nine inning (in this case, 27 out) game. To calculate RC/G, multiply RC by 27 and divide by the number of outs the player is responsible for (OM), thus:

    RC/G = 27*RC/OM

    [Note: The formula shown here is the modern formula in current use by sabermetricians. Since data is available to account for all outs made, it is appropriate to use 27 outs as the context. In earlier periods, data on some kinds of outs (GIDP and CS are examples) are incomplete or unavailable. Consequently, applying the formula to other eras requires use of 25.5 or 26 outs per game.]

    One way to look at RC/G is to imagine a lineup with the same player batting in every spot. A team made up of nine 1992 model Barry Bonds, for example, would be expected to score 11.34 runs per game on average. (Bonds had 147 runs created in 1992.)

    Runs Saved Against Average [RSAA]

    This is a Lee Sinins creation. It is the amount of runs that a pitcher saved versus what an average pitcher would have allowed. It is similar to the statistic Pitching Runs detailed in Total Baseball - except (1) both have different ways of park adjustments and (2) Total Baseball added a procedure to take into account the amount of decisions the pitcher had while RSAA does not. A negative RSAA indicates a below average player in this category.

    Secondary Average [SEC]

    Developed by Bill James to measure a player's offensive contributions beyond batting average. Secondary Averages of leagues are always very similar to the league batting average, but player secondary averages run from .100 (for truly inept offensive players) to upwards of .600. The formula is: (Total Bases-Hits+Walks+Stolen Bases)/(At Bats)

    Total Average [TA]

    A ratio of the bases a player accumulates for his team and the outs he costs his team. Total Average is a Thomas Boswell statistic included in his book "How Life Imitates the World Series."

    TA = (TB+HBP+BB+SB)/(AB-H+CS+GIDP)

    If a player has a TA over 1.000, that's very good.

    Win Shares [WS]

    A Bill James creation that aims towards allowing player evaluation across positions, teams and eras. It measures the total sum of a player's contribution expressed as one number.

    Zone Rating [ZR]

    STATS Inc. devised their own system of zones to track locations of batted balls. They use this data to measure a fielder's range in the field. Zone Rating areas of responsibility do not span the entire field -- some areas (for example, deep in the gap between CF and RF) are considered to be a "no man's land" that is ordinarily beyond the reach of fielders, and thus a ball hit there is not considered an opportunity.
  • by 넘나드는 | 2007/05/09 00:05 | STUFF | 트랙백 | 핑백(1) | 덧글(0)

    트랙백 주소 : http://toto5071.egloos.com/tb/202256
    ☞ 내 이글루에 이 글과 관련된 글 쓰기 (트랙백 보내기) [도움말]
    Linked at Let's Go Giants .. at 2007/07/18 22:55

    ... allowed)도 그럭저럭 리그 평균에서 크게 벗어나지 않는다. 선발 투수와 마찬가지로 리그 중간 수준이란 얘기.4. 피타고리안 기대 승률과 전망Bill James의 피타고리안 기대 승률(pw%, pythagorian expected win percentage)은 꽤 적중률이 높은 예측 도구이다. 최근엔 몬테 카를로 시스템을 활용하는 이들도 있지만, 계 ... more

    :         :

    :

    비공개 덧글

    ◀ 이전 페이지          다음 페이지 ▶