In the top of the fourth inning, Pittsburgh at Cincinnati April
30, 1996, an intentional base on balls (IBB) figured in a 9 run
inning. With two runs in, Jacob Brumfield on second and only one out,
Charlie Hayes was given the intentional pass. Brumfield stole third
and Jason Kendall was hit by a John Smiley pitch. Denny Neagle's
sacrifice fly scored the third Pirate run of the inning. A run
scoring single by Carlos Garcia and a base loading walk to Al Martin
resulted in an early exit for Smiley. Jeff King greeted reliever Tim
Pugh by hitting his second home run of the inning. Orlando Merced
followed with another home run. Jay Bell was hit by a pitch and
Brumfield, the thirteenth Pirate to come to the plate in the inning
flied to first to finally end it. A total of nine runs were scored in
the inning with seven of them coming after the intentional pass.
Disasters of this magnitude, while unusual, do occur a couple of
times each season (Table 8). Did the IBB cause
the seven run outburst? Probably not. Still, Hayes had an on base
percentage of 0.303 at Pittsburgh so there was a better than a 2 to 1
chance of his making an out. Outbursts such as this one are very
dramatic but tend to obscure the real question which can only be
answered in a statistical sense: Does the IBB as it is used in the
Major Leagues, save the defense sufficient runs to justify its use?
Since it is not uncommon, ML managers clearly feel it is a useful
defensive strategy.
The raw data for this study is provided by the full season
play-by-play descriptions obtained from Retrosheet.
For the data in my library, both leagues for 1980-1990 and 1992-1996,
all 20638 IBBs were examined. Hitting statistics and total runs
scored during the remainder of the inning following each IBB were
also recorded. Table 1, Table
2, Table 6 and Table 8
summarize the IBB data. The use of the Designated Hitter in the
American League accounts for the most obvious differences between the
leagues in when the IBB is given (Table 1).
Table 3 summarizes total hitting and puts the
post IBB hitting in Table 2 into perspective.
Carefully examining the IBB data leads to three lines of analysis
that provide insight into the value or cost of an IBB. An Expected
Future Runs (EFR) analysis provides an estimate of the runs that
would have been scored if the IBB was not given. Linear regression,
more commonly known as linear weights in sabermetric baseball jargon,
provides an estimate of the average value in runs of an IBB.
Comparison of the observed distribution of runs following each IBB
with a simulation and the observed distribution for all runs scored
is the final piece of evidence to be considered.
Since there are 3 bases and runners can be any or all of them there
are 8 possible combinations possible. A team can be in an inning with
0, 1 or 2 outs thus there are 24 possible configurations of base
runners and outs. As the inning proceeds, various base runner and out
configurations will occur. A common progression will be from no one
on and 0 outs to no one on and 1 out and finally no one on and 2
outs. The third out ends the inning and does not count as a separate
configuration. If runners get on base the probability of scoring
increases. Configurations can repeat. A home run with no outs causes
the no outs no one on base inning starting configuration to repeat.
Similarly, a bases loaded walk will not change the base runner and
outs configuration.
Using these ideas a particularly insightful and useful summary of
scoring can be developed (1): For every inning played, record the
number of runs when each out and base runner configuration occurs.
Subtract this number from the number of runs scored in the inning.
Repeated configurations are recorded each time they occur. Finally,
for each configuration divide the total runs by the number of time
the configuration occurred. The result is the average number of runs
scored for each possible base runner and outs configuration
(Table 4). This table represents the Expected
Future Runs (EFR) that will be, on the average, scored from each base
runner and inning outs configuration. The number of times each inning
and outs configuration occurred and the average runs scored for each
of them computed all seasons and both leagues is given in Table
4A and 4B.
There is an elegant technique (2) due to Thomas Cover and Carroll
Keilers called Offensive Earned Run Average (OERA) for estimating the
24 EFR values from basic hitting data: numbers of at bats, singles,
doubles, triples, home runs and walks. Table 4C
shows the results of applying this calculation to the combined NL
& AL hitting for the same seasons used to generate the first two
entries in Table 4. This hitting data is given in
the TOT line in Table 3. Finally, the observed
EFR values (Table 4B) are divided by the EFR
values estimated from hitting data, Table 4C, to
obtain Table 4D, which shows the systematic
variations between the estimated and observed values. Besides
illustrating how the actual data differs from the OERA calculated
values, this table can also be used as a correction on estimates
generated from other hitting data.
The interpretation of the entries in Tables 4B and
4C is the average number of runs that can be expected to score in
an inning starting with the number of outs and particular combination
of base runners for the hitting specified.
Total runs scored following an IBB are known: 16121 (Table
8). To put this number into perspective, an estimate is needed of
the runs that would have been scored if IBBs were not given. Hitting
stats for the OERA calculation are obtained by weighting each
batter's season hitting by the number of IBBs he received. The more
often a particular player was given the intentional pass the more his
hitting should influence the hitting statistics used in this
calculation. The EFR calculated from this composite hitting is
adjusted by the correction factor determined from the overall hitting
(Table 4D) and is given in Table
5.
The EFR values in Table 5 and the counts of when
IBB were given, tabulated in Table 6, provide the
data needed to estimate runs if the IBBs had not been given. Consider
the case of 1 out and runners on second and third for the combined NL
& AL (TOT) data. This configuration was observed 4140 times. The
EFR data, Tables 5, indicates an average of 1.33
runs were scored each time this base runner and out configuration
occurred. Multiplying these two values gives 5506 runs for this
situation. This process is done for each base runner and outs
configuration and the sum of the contributions from all
configurations yields an estimate of 16160 runs that would have been
scored if the IBBs were not given. Since 16121 runs were actually
scored, slightly less than the no IBB estimate, IBBs on average do
not contribute additional runs for the offense. Expressed in runs per
event (IBBs), the value of an IBB is a minuscule -0.003 runs/IBB. The
negative sign indicates that the IBB saved (a few) runs.
The IBB does add a base runner which can be expected to lead to
additional runs. The effect of the additional base runner can be
estimated by applying a similar EFR analysis. In this case, the
hitting used for the OERA calculation is all observed hitting
following an IBB but using the entry in Table 5
appropriate for the additional base runner resulting from the IBB.
This procedure gives an estimate of 17186 runs that would have been
scored following the IBBs, substantially greater than both the
observed 16121 runs and estimated 16160 runs for no IBBs. It is clear
that the IBB does not contribute a significant number of additional
runs. Unfortunately, this analysis does not show that it
significantly decreases the number of runs scored either.
A second line of analysis is the use of linear regression to obtain a
value for the runs cost or savings of an IBB. In this example, the
linear regression procedure adjusts the weights of the 16 team
offensive events to minimize the sum of the squares of the
differences between actual team season runs and the values predicted
from the weighted event counts. The interpretation of the weights of
the events is the average number of runs due to a single instance of
the event (single, double, ... ). Table 7
summarizes the regression results for the 424 team-season records in
my data set. The second section of Table 7 gives
the standard deviations and the average number of team runs per
season for the three regressions. The 16 events used in the
regressions account for approximately 97% of the variation in the
team season runs. The variation in values between the two leagues and
the overall totals gives some feel for the reliability of this
method. For additional information on the regression analysis see
"A Survey
of Baseball Player Performance Evaluation Measures", J. F.
Jarvis.
In the total regression the weight for the IBB term is 0.033 runs, a
small contribution consistent with the EFR estimates. The runs value
of unintentional walks and being hit by a pitch is 0.31 which is very
close to the generally accepted value (3). The regression analysis
also shows a runs cost for the IBB significantly less than any other
batting event except an out.
The third line of evidence concerning the effect of IBBs is to
examine the distribution of the number of times the different numbers
of runs following an IBB occur.
As a comparison, a simulation was used to generate the same kind of
distribution. The simulator does not implement game level strategy
thus a comparison between it and the observed distribution should
provide insight about the effects of the IBB. The simulation based
runs distribution was obtained by doing a series of 24 simulations
with all innings in each simulation starting from the same
configuration, number of outs and base runners. The resulting
runs/inning distributions were weighted by the count of IBBs for the
configuration, Table 6, and the total was scaled
to the number of runs actually scored, 16121. The simulated
runs/inning distribution is given in the line "SIM" in Table
8. Also for comparison, the distribution of runs per inning for
all the hitting data is given as the line ALL. This distribution has
also been scaled to the number of runs following IBBs.
In the plot of the Table 8 data, Figure
1, the blue line is from the
simulations and the red X marks are the
observed totals of runs per inning following IBBs for both leagues
combined. The green diamonds show the
scaled total runs/inning distribution. This plot clearly shows that
the number of runs from 1 or 2 run scores following an IBB is
reduced, by 2940 runs, compared to the simulation based estimate of
what would have been scored if there was no IBB. However, more big
outbursts occur than might otherwise be predicted. Since the number
of 1 or 2 runs innings (0.12/IBB) is much larger than the number of 3
and greater run innings (0.03/IBB) the IBB tactic provides help in
close games.
The EFR and regression analysis attempted to show a direct benefit in
terms of fewer runs scored when the IBB was given. The results of the
two methods are consistent and both show a runs cost of the IBB that
is much less than an unintentional base on balls. They do not show a
net savings of runs. An examination the distribution of runs in an
inning following an IBB provides the needed insight. This
distribution clearly shows that number 1 or 2 run innings is less
than number to be expected based on a comparison with overall hitting
or a strategy free simulation. It also shows that the number of
innings with 3 or more runs scored is higher than the same comparison
data. Since there are many more occurrences of 1 or 2 run innings
than values 4 or greater, the IBB tactic buys many savings of 1 or 2
runs for the price of an occasional big inning. Perhaps the IBB did
contribute to the 9 run inning described in the opening
paragraph.
1) The Hidden Game of Baseball, John Thorn & Pete Palmer,
Doubleday, 1984. See Table VIII, page 153.
2) "Offensive Earned Run Average", Thomas M. Cover, Carroll W.
Keilers, Operations Research, Vol. 25, No 5, Sep-Oct 1977, pp
729-740. Implementation details are given in "Implementing
The Cover-Keilers Offensive Earned Run Average", J. F. Jarvis,
and is also available from the SABR Archives.
3) See the discussion in the chapter "Sabermetrics" in Total
Baseball, Fifth Ed., John Thorn, Pete Palmer, Michael Gershman, David
Pietruza, Viking, 1997.
4) "Chance and Intent: A Baseball Paradox", J. F. Jarvis, CHANCE Vol
11 No 3 (1998) pp 12-19. For additional information see: simulator.html
(Use your browser's "back" button to return to the paper text
from the table or figure sections)
batting order position ->
1 2 3 4 5 6 7 8 9
NL 746 378 1382 2077 1493 1283 1289 2702 366
AL 693 452 1582 1648 1458 1167 1039 684 197
------------------------------------------------
TOT 1439 830 2964 3725 2951 2450 2328 3386 563
BA SLG AB S D T HR TBB IBB HBYP RFIB
NL 0.241 0.356 18177 3155 732 149 351 1809 519 122 8356
AL 0.261 0.394 15012 2776 665 98 378 1677 324 130 7765
----------------------------------------------------------------
TOT 0.250 0.373 33189 5931 1397 247 729 3486 843 252 16121
BA SLG AB S D T HR TBB IBB HBYP
NL 0.257 0.383 1048080 191245 47247 6948 23760 98832 11717 5987
AL 0.264 0.405 1184055 218297 55716 7224 31916 116440 8921 7808
---------------------------------------------------------------------
TOT 0.261 0.395 2232135 409542 102963 14172 55676 215272 20638 13795
In these tables TBB is total bases on balls and RFIB is runs following an IBB.
4A) Observed count of inning states
Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 607425 157950 47277 35542 8760 15654 8782 8676
1 434048 180209 85922 64819 29401 31766 22605 22424
2 344683 180693 102870 81988 42339 41441 24573 27006
4B) Observed Expected Future Runs
Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 0.492 0.872 1.121 1.487 1.363 1.744 1.982 2.310
1 0.261 0.519 0.684 0.907 0.955 1.167 1.386 1.558
2 0.098 0.224 0.327 0.439 0.375 0.501 0.595 0.763
4C) OERA Predicted Expected Future Runs from all PA
Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 0.522 0.943 1.121 1.557 1.121 1.557 1.736 2.256
1 0.283 0.568 0.738 1.035 0.738 1.035 1.205 1.585
2 0.106 0.246 0.368 0.513 0.368 0.513 0.635 0.843
4D) Ratio: Observed EFR/OERA estimated EFR
Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 0.943 0.926 1.000 0.954 1.216 1.120 1.142 1.024
1 0.923 0.914 0.926 0.877 1.293 1.128 1.150 0.983
2 0.926 0.910 0.890 0.855 1.020 0.976 0.936 0.905
NL Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 0.617 1.070 1.238 1.701 1.238 1.710 1.877 2.436 (runs)
1 0.336 0.647 0.812 1.137 0.812 1.137 1.301 1.715
2 0.127 0.282 0.402 0.564 0.402 0.564 0.685 0.914
AL Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 0.677 1.149 1.308 1.799 1.308 1.799 1.959 2.536
1 0.373 0.699 0.858 1.198 0.858 1.198 1.357 1.788
2 0.143 0.307 0.425 0.596 0.425 0.596 0.714 0.954
TOT Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 0.644 1.106 1.270 1.750 1.270 1.750 1.914 2.481
1 0.353 0.670 0.832 1.164 0.832 1.164 1.326 1.748
2 0.135 0.293 0.412 0.578 0.412 0.578 0.698 0.933
NL Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 1 0 103 1 33 48 217 0
1 0 7 1746 7 445 82 2051 0
2 6 20 3992 19 874 60 1486 0
AL Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 0 1 117 2 30 30 189 0
1 1 4 1560 4 466 94 2089 0
2 7 10 2436 5 513 42 997 0
TOT Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 1 1 220 3 63 78 406 0
1 1 11 3306 11 911 176 4140 0
2 13 30 6428 24 1387 102 2483 0
Parameter Weights (runs/event)
Event NL AL TOT
OUTS -0.097 -0.114 -0.101
K -0.092 - 0.087 -0.099
SING 0.408 0.473 0.439
DOUB 0.723 0.631 0.679
TRIP 0.877 0.859 0.815
HRUN 1.417 1.502 1.484
BB+HP 0.330 0.280 0.308
IBB 0.200 0.101 0.033
SB 0.087 0.095 0.087
CS -0.361 -0.207 -0.238
ROH 0.121 -0.266 -0.114
GDP -0.447 -0.440 -0.429
ER_BF 0.671 0.643 0.620
ER_RA 0.452 0.310 0.355
RAO 0.006 0.171 0.066
RSO 0.668 0.843 0.823
Team-Seasons 200 224 424
Std Dev 17.3 17.3 17.8
Avr Season Runs 650 710 682
Other quantities used in the regression are ROH - runners out advancing after hit, ER_BF - errors allowing batters to get on base and ER_RA - errors allowing an runner advances including wild pitches and balks. Not all runner advances or scoring after an out are given by the official sacrifice categories. RAO, runner advance on an out, counts all runner advances after an out, except scoring, generalizing the sacrifice hit. Similarly, RSO, runner scoring on an out, generalizes the sacrifice fly by tabulating all scoring on an out. The line OUTS does not include outs counted in other categories such as caught stealing or picked off (CS), second out in a double play (GDP) or strike out (K).
Runs/inning ->
0 1 2 3 4 5 6 7 8 9 10 11 12 Tot Runs
NL 7026 1996 997 660 325 122 46 13 8 5 0 0 0 8356
AL 4859 1766 856 534 364 133 53 18 9 4 0 0 1 7765
---------------------------------------------------------------------
TOT 11885 3762 1853 1194 689 255 99 31 17 9 0 0 1 16121
SIM 11909 5466 2471 968 377 146 55 20 7 3 1 16109
ALL 23691 4942 2245 1015 455 188 81 31 12 4 3

Back to the J. F. Jarvis baseball page.