Lately I've had several people ask me where they could find an explanation of sabrmetrics in a basic, English form. So I thought it would be a good idea to start this thread.

I'll begin by posting Breaking Blue's Introduction to Sabrmetrics, by our very own JFaS.

As well, Fangraphs has a list of points at which important data becomes reliable:Hitting:

AVG – (Batting Average) A ratio of hits/at bats for a hitter. AVG assumes that all hits are equal, but we all know a home run is worth more than a single, so there are some issues here.

OBP – (On-Base Percentage) A ratio of times on base vs. times at the plate. A more useful stat than AVG since it includes walks, but still assumes that walks and all hits are of equal value. OBP is emphasized in Moneyball.

SLG – (Slugging) Similar to AVG, but each type of hit gets a different weight. 1B = 1, 2B = 2, 3B = 3, HR = 4. This is a good measure of power, but these weights are not exact, a home run isn’t exactly twice as good as a double, in reality it is a bit less.

OPS – (On-base Plus Slugging) A statistic that tries to paint the whole picture of batting by adding OBP and SLG together. While OPS is the best standard stat that is widely used, it still has issues as it assumes OBP and SLG have the same value. In reality, OBP is almost twice as important as SLG.

wOBA – (Weighted On Base Average) A statistic created by Tom Tango, that is used for a complete picture of hitting. It assigns a linear weight to each result of hitting (BB, HBP, 1B, 2B, 3B, HR) and gives it as a ratio to PA. It also removes intentional walks. The weights change from year to year but are usually fairly constant. wOBA is on the same scale as OBP, so it is easy to know what a good and bad wOBA is.

wRAA – (Weighted Runs Above Average) A measure of the amount of runs a player creates above or below league average. IT is calculated from wOBA and PA.

wRC – (Weighted Runs Created) Similar to wRAA above, but is not a measure of runs above or below average, just the amount of runs created for the team.

wRC+ – A statistic that measures the rate of wRC and compares to league average. It is the rate stat version of wRAA and wRC that also adjusts based on park factors. It uses 100 as league average. Above 100 is above average and below is below. It is in terms of percent above or below league average. For example Miguel Cabrera had a 192 wRC+ in 2013, meaning he was 92% better than league average. Similarly, J.P. Arencibia had a 57 wRC+ in 2013, meaning he was (100-57) 43% worse than league average.

ISO – (Isolated Power) A stat that attempts to measure only the power of a hitter. It is the ratio of extra bases to AB. It can be calculated as SLG – AVG.

Pitching:

ERA – (Earned Run Average) A measure of the amount of earned runs a pitcher would allow if the pitched 9 innings. ERA is the main pitching statistic used in baseball. It has the idea that the pitcher should not have to pay for his teams bad defence, so it eliminates runs caused by errors from the equation. If more knowledge of defence had been known at the time of creation, it would try to eliminate defence as much as possible. Since it doesn’t fully use defence, it is usually disregarded by sabermetricians.

WHIP – (Walks and Hits per Innings Pitched) Another widely used pitching stat that is used as a complement to ERA. It similar to OBP for pitchers as it measures how many walks an hits a pitcher gives up in an inning (3 outs).

DIPS – (Defence Independent Pitching Statistics) DIPS is not a statistic, but an ideology. It is the idea that a measure of pitching skill should not include the effect of the team’s defence. There are quite a few DIPS statistics, and I will go over the ones we use here.

FIP – (Fielding Independent Pitching) The most commonly used and well known DIPS stat is based off of results that only the pitcher and batter can control. These are strikeouts, walks, and home runs. FIP is scaled to ERA so it is easy to tell what is good and bad. FIP is better at predicting future ERA than ERA.

xFIP – (Expected FIP) A version of FIP that uses the idea that home runs aren’t controllable by a pitcher, but fly balls are. It substitutes HR in FIP for the amount of HR a league average pitcher would give up with the pitchers amount of FB. xFIP is better at predicting future ERA than both FIP and ERA.

SIERA – (Skill Interactive ERA) A more complicated DIPS statistic that uses ground ball rates as well as strikeout and walk rates. SIERA is not linear and goes on a per batter basis, instead of per inning like FIP and xFIP. This per batter basis gives SIERA a predictive edge over ERA and FIP, and a slight edge over xFIP.

TIPS – (Truly Independent Pitching Skill) TIPS is another DIPS stat created by our own Chris Carruthers. It branches on the ideology that strikeouts and walks should not be used in DIPS since the catcher and umpire play parts in each. TIPS uses three stats that only the pitcher and batter can control that correlate well to ERA. These are O-Looking% (PitchF/x ratio of pitches outside the zone that are watched), SwStr% (percent of total pitches that are swung on and missed), and Foul% (percent of contacts that are fouled off). TIPS also scales to ERA. TIPS is on a per pitch basis, and this allows it to stabilize very fast. The fast stabilization gives it a great predictive edge over SIERA, xFIP, FIP, and ERA in samples that are less than IP. SIERA and xFIP pass it at 70 IP, while FIP passes it at around 200 IP.

ERA- and DIPS- – These are stats that are calculated like wRC+, but for pitching. Each pitching stat can be put into XX- form with 100 as average. The “-” just indicates that a lower number is better (less than 100) while values above 100 are bad. This is to keep with the style of lower ERA being better. Park effects are accounted for.

Base-running:

UBR – (Ultimate Base Running) This is a measure of the runs above average (like wRAA) that a player contributes with their legs, aside from stealing. UBR takes into account advancing on hits, flyballs, throws, grounders, etc. Some players are good at advancing even if they don’t steal (Colby Rasmus).

wSB – (Weighted Stolen Bases) A measure of runs above average (like wRAA and UBR) that a player contributes from steal attempts. It uses SB and CS and weights them accordingly (about 0.25 runs for a SB and -0.5 for a CS).

BSR – (Baserunning Runs) A measure of total base running runs above average. Adds UBR and wSB together.

Spd – (Speed score) A measure of the speed of a player. Uses real events to determine and is rate based, meaning it does not accumulate unlike BSR.

Fielding:

UZR – (Ultimate Zone Rating) A relatively complicated stat that measures the runs above average at their position that a defender saves (contributes). For more complete information, click the link.

DRS – (Defensive Runs Saved) Similar to UZR in that it measures the runs above average that a defender contributes. Each play and location on the field has an assigned run value determined from average players at that position. If the player makes a play that 75% of players miss, they get 0.75 plays to their DRS. If a player misses the play they would lose (1.00-.75) 0.25 plays. Plays are then converted to runs (usually just divided by 2).

Batted Ball:

GB% – (Groundball rate) The rate of balls in play that a hitter makes (or pitcher gives up) that are groundballs.

FB% – (Flyball rate) Same as GB%, but for flyballs.

LD% – (Line drive rate) Same as FB% and GB% but for linedrives. GB% + FB% + LD% should always equal 100%.

HR/FB% – (Homeruns per flyball) The ratio of homeruns hit (or given up for a pitcher) to flyballs. It is used as a luck indicator for pitchers and a power indicator for hitters. A high HR/FB% for pitchers may mean they are getting unlucky.

BABIP – (Batting Average on Balls In Play) Similar to AVG for a batter, but is the ratio of hits to balls in play. HR are not counted as hits or balls in play. It is the most widely used luck indicator for pitchers and batters. Batters have more control over their BABIP than pitchers do, and batter BABIP is often compared to their career BABIP, where pitchers are compared to league average to indicate luck.

Plate Discipline:O-Swing%: The percentage of pitches a batter swings at outside the strike zone.There are two versions of plate discipline data. Raw PitchF/x data, and manually adjusted data. PitchF/x data is more consistent in values from year to year.

Z-Swing%: The percentage of pitches a batter swings at inside the strike zone.

Swing%: The overall percentage of pitches a batter swings at.

O-Contact%: The percentage of pitches a batter makes contact with outside the strike zone when swinging the bat.

Z-Contact%: The percentage of pitches a batter makes contact with inside the strike zone when swinging the bat.

Contact%: The overall percentage of a batter makes contact with when swinging the bat.

Zone%: The overall percentage of pitches a batter sees inside the strike zone.

F-Strike% – The percentage of first pitch strikes.

SwStr%: The percentage of total pitches a batter swings and misses on.

WAR:

fWAR – (FanGraphs Wins Above Replacement). WAR is the complete measure of a players contributions to their team. Position players WAR is calculated from batting, running, and fielding. It adds wRAA(adjusted for park), UZR, and BSR. It then adds a replacement value (see next point) and a positional adjustment (see below) for a total number of runs above replacement (RAR). Runs are then converted into wins. It is generally accepted that 10 runs = 1 win (it is usually a little less than 10). The WAR value means how many more wins that the player contributed for his team than a replacement player from AAA would produce. Pitchers WAR is the same concept as hitters, except that it is only based on FIP and park factors. WAR on a team scale should correlate highly to actual wins.

Replacement – A concept that that is determined to be a player that is readily available in AAA. A team made up of replacement players should theoretically win 47.7 or 48 games in a season.

Positional Adjustment – It is well known by everyone in baseball that some positions are easier to play than others. 1B is much easier to play than SS, CF is harder than LF and RF, etc. Positional adjustment accounts for the difficulty of playing certain positions and this is used in WAR calculations.

Park Factors – Not all baseball stadiums are created equally. They all have different dimensions and this affects results. A fly ball to left in Fenway usually turns into a double. The thin air in Colorado allows the ball to travel much farther. Differences in parks are accounted for and the factors can be found here.

rWAR – Baseball-Reference’s version of WAR. Click the link for information on differences from fWAR.

http://www.fangraphs.com/library/pri...s/sample-size/

Update 11/9: Note on projection systems from NJHStabilization Points for Offense Statistics:

- 60 PA: Strikeout rate
- 120 PA: Walk rate
- 240 PA: HBP rate
- 290 PA: Single rate
- 1610 PA: XBH rate
- 170 PA: HR rate

- 910 AB: AVG
- 460 PA: OBP
- 320 AB: SLG
- 160 AB: ISO

- 80 BIP: GB rate
- 80 BIP: FB rate
- 600 BIP: LD rate
- 50 FBs: HR per FB
- 820 BIP: BABIP

Stabilization Points for Pitching Statistics:

- 70 BF: Strikeout rate
- 170 BF: Walk rate
- 640 BF: HBP rate
- 670 BF: Single rate
- 1450 BF: XBH rate
- 1320 BF: HR rate

- 630 BF: AVG
- 540 BF: OBP
- 550 AB: SLG
- 630 AB: ISO

- 70 BIP: GB rate
- 70 BIP: FB rate
- 650 BIP: LD rate
- 400 FB: HR per FB
- 2000 BIP: BABIP

Update 08/22: Here's an article on the new BIS data that's being heavily misused.There are three points to understand:

- Projections are a single-point representation of a range of outcomes. The likelihood of those outcomes resembles a bell curve.

- A change in talent, temporary or permanent, can create arbitrage opportunities.

- Sample size affects the range of possible outcomes.

If you can internalize these three bullets, you’ll have a solid grasp on the strengths and weaknesses of projection systems. Remember, only history can possess a single truth. The present and the future can only be estimated.

http://www.hardballtimes.com/offensi...-optimal-uses/

This thread isSo how can we use the BIS contact data?

- Not for BABIP. This is seriously the wrong data to use if so-and-so has a low BABIP. Don’t say, “But he’s making hard contact (Hard%).” These stats do so very little to predict BABIP—in part because “hard contact” can be deep fly balls, and fly balls have the worst BABIP of all non-infield-pop-ups. And typically, weak or medium contact results in ground balls, and those have a higher BABIP. But ground balls can be hit hard too. Just stay away from BABIP with these stats.

- For ISO and SLG variations. Is your team’s prized slugger no longer lashing doubles and homers? Check the BIS data. Major fluctuations there might indicate he’s declining. Otherwise, give it some time.

- And to a degree, wRC+ variations. But a lot goes into a total-offense metric like wRC+. I’d be more inclined to look at a contact rate than a contact strength measurement. Contact is a clearly delineated event. Contact strength has a lot of noise. But in bigger samples, it can be useful. For instance: Nobody has even hit below 100 wRC+ when his Hard% is 35.5 percent or higher. In fact, very few hitters over 33 percent have been bad hitters—as a group, they average a 121 wRC+. Look at this:

WRC+ BY HARD-HIT RATE QUARTILE

QuartilewRC+Max (43.2%) 118 Q3 (31.4%) 102 Q2 (27.8%) 94 Q1 (24%) 82

- So fellas hitting under 24 percent Hard-rate are probably not doing well. But remember: there’s a lot of volatility here. The standard deviation in that bottom quartile is 13.6—meaning about 68 percent of the data lies between 68 wRC+ and 96 wRC+. It’s a wide swath.

BATTED BALL CORRELATIONS

StatisticLD%GB%FB%IFFB%IFFB/PABABIP 0.15 0.11 0.20 0.36 0.43 wRC+ 0.01 0.08 0.07 0.07 0.02 OBP 0.06 0.01 0.00 0.11 0.08 SLG 0.00 0.18 0.19 0.01 0.00 HR% 0.07 0.32 0.40 0.01 0.05 ISO 0.04 0.32 0.39 0.00 0.03 BA 0.11 0.04 0.09 0.11 0.07 What it tells us:

Taken together, these stats can give us a goodfeelfor a hitter’s style—especially when it comes to groundball or flyball tendencies. Andrew Koo found a few years ago that the Oakland Athletics were leaning heavily on flyball hitters—and doing so to great effect at the time. A hitter’s GB/FB ratio might very well inform us how a hitter will perform in given stadiums or against given pitchers. The problem with these data, though, is that we are far to quick to look at line drive percentage and make bigger conclusions.

- We can’t use LD% to rationalize a BABIP. You know, good for Dee Gordon that he is setting a career high in LD% during the 2015 season. That’s no reason to think he can keep his BABIP above .400 or above his career norms. Change “Dee Gordon” to “Starlin Castro” and “2015” to “2014” and we will see why LD% is a fickle master.
- We can’t use LD% to rationalize a wRC+. Yes DJ LeMahieu has an enormous LD%, but he had an even higher rate in 2013—back when he also had a 68 wRC+.
- We can build some strong xBABIP tools. These contact data fill out a lot of the gray area of “in play.” It helps differentiate duck snort doubles from scorched, near-homers. And so, unsurprisingly, it can pair nicely with other PA outcomes—walks, strikeouts and homers—to make a decent model for predicting BABIP.
notfor the flaming of those who enjoy sabrmetrics. This threadisfor people who wish to learn about sabrmetrics and discuss them. Post any questions you may have and they'll certainly be answered