An Analysis of the Intentional Base on Balls

John F. Jarvis

(Presented at SABR-29, Phoenix, AZ June 25, 1999)

 

In the top of the fourth inning, Pittsburgh at Cincinnati April 30, 1996, an intentional base on balls (IBB) figured in a 9 run inning. With two runs in, Jacob Brumfield on second and only one out, Charlie Hayes was given the intentional pass. Brumfield stole third and Jason Kendall was hit by a John Smiley pitch. Denny Neagle's sacrifice fly scored the third Pirate run of the inning. A run scoring single by Carlos Garcia and a base loading walk to Al Martin resulted in an early exit for Smiley. Jeff King greeted reliever Tim Pugh by hitting his second home run of the inning. Orlando Merced followed with another home run. Jay Bell was hit by a pitch and Brumfield, the thirteenth Pirate to come to the plate in the inning flied to first to finally end it. A total of nine runs were scored in the inning with seven of them coming after the intentional pass.

Disasters of this magnitude, while unusual, do occur a couple of times each season (Table 8). Did the IBB cause the seven run outburst? Probably not. Still, Hayes had an on base percentage of 0.303 at Pittsburgh so there was a better than a 2 to 1 chance of his making an out. Outbursts such as this one are very dramatic but tend to obscure the real question which can only be answered in a statistical sense: Does the IBB as it is used in the Major Leagues, save the defense sufficient runs to justify its use? Since it is not uncommon, ML managers clearly feel it is a useful defensive strategy.

The raw data for this study is provided by the full season play-by-play descriptions obtained from Retrosheet. For the data in my library, both leagues for 1980-1990 and 1992-1996, all 20638 IBBs were examined. Hitting statistics and total runs scored during the remainder of the inning following each IBB were also recorded. Table 1, Table 2, Table 6 and Table 8 summarize the IBB data. The use of the Designated Hitter in the American League accounts for the most obvious differences between the leagues in when the IBB is given (Table 1). Table 3 summarizes total hitting and puts the post IBB hitting in Table 2 into perspective.

Carefully examining the IBB data leads to three lines of analysis that provide insight into the value or cost of an IBB. An Expected Future Runs (EFR) analysis provides an estimate of the runs that would have been scored if the IBB was not given. Linear regression, more commonly known as linear weights in sabermetric baseball jargon, provides an estimate of the average value in runs of an IBB. Comparison of the observed distribution of runs following each IBB with a simulation and the observed distribution for all runs scored is the final piece of evidence to be considered.

Since there are 3 bases and runners can be any or all of them there are 8 possible combinations possible. A team can be in an inning with 0, 1 or 2 outs thus there are 24 possible configurations of base runners and outs. As the inning proceeds, various base runner and out configurations will occur. A common progression will be from no one on and 0 outs to no one on and 1 out and finally no one on and 2 outs. The third out ends the inning and does not count as a separate configuration. If runners get on base the probability of scoring increases. Configurations can repeat. A home run with no outs causes the no outs no one on base inning starting configuration to repeat. Similarly, a bases loaded walk will not change the base runner and outs configuration.

Using these ideas a particularly insightful and useful summary of scoring can be developed (1): For every inning played, record the number of runs when each out and base runner configuration occurs. Subtract this number from the number of runs scored in the inning. Repeated configurations are recorded each time they occur. Finally, for each configuration divide the total runs by the number of time the configuration occurred. The result is the average number of runs scored for each possible base runner and outs configuration (Table 4). This table represents the Expected Future Runs (EFR) that will be, on the average, scored from each base runner and inning outs configuration. The number of times each inning and outs configuration occurred and the average runs scored for each of them computed all seasons and both leagues is given in Table 4A and 4B.

There is an elegant technique (2) due to Thomas Cover and Carroll Keilers called Offensive Earned Run Average (OERA) for estimating the 24 EFR values from basic hitting data: numbers of at bats, singles, doubles, triples, home runs and walks. Table 4C shows the results of applying this calculation to the combined NL & AL hitting for the same seasons used to generate the first two entries in Table 4. This hitting data is given in the TOT line in Table 3. Finally, the observed EFR values (Table 4B) are divided by the EFR values estimated from hitting data, Table 4C, to obtain Table 4D, which shows the systematic variations between the estimated and observed values. Besides illustrating how the actual data differs from the OERA calculated values, this table can also be used as a correction on estimates generated from other hitting data.

The interpretation of the entries in Tables 4B and 4C is the average number of runs that can be expected to score in an inning starting with the number of outs and particular combination of base runners for the hitting specified.

Total runs scored following an IBB are known: 16121 (Table 8). To put this number into perspective, an estimate is needed of the runs that would have been scored if IBBs were not given. Hitting stats for the OERA calculation are obtained by weighting each batter's season hitting by the number of IBBs he received. The more often a particular player was given the intentional pass the more his hitting should influence the hitting statistics used in this calculation. The EFR calculated from this composite hitting is adjusted by the correction factor determined from the overall hitting (Table 4D) and is given in Table 5.

The EFR values in Table 5 and the counts of when IBB were given, tabulated in Table 6, provide the data needed to estimate runs if the IBBs had not been given. Consider the case of 1 out and runners on second and third for the combined NL & AL (TOT) data. This configuration was observed 4140 times. The EFR data, Tables 5, indicates an average of 1.33 runs were scored each time this base runner and out configuration occurred. Multiplying these two values gives 5506 runs for this situation. This process is done for each base runner and outs configuration and the sum of the contributions from all configurations yields an estimate of 16160 runs that would have been scored if the IBBs were not given. Since 16121 runs were actually scored, slightly less than the no IBB estimate, IBBs on average do not contribute additional runs for the offense. Expressed in runs per event (IBBs), the value of an IBB is a minuscule -0.003 runs/IBB. The negative sign indicates that the IBB saved (a few) runs.

The IBB does add a base runner which can be expected to lead to additional runs. The effect of the additional base runner can be estimated by applying a similar EFR analysis. In this case, the hitting used for the OERA calculation is all observed hitting following an IBB but using the entry in Table 5 appropriate for the additional base runner resulting from the IBB. This procedure gives an estimate of 17186 runs that would have been scored following the IBBs, substantially greater than both the observed 16121 runs and estimated 16160 runs for no IBBs. It is clear that the IBB does not contribute a significant number of additional runs. Unfortunately, this analysis does not show that it significantly decreases the number of runs scored either.

A second line of analysis is the use of linear regression to obtain a value for the runs cost or savings of an IBB. In this example, the linear regression procedure adjusts the weights of the 16 team offensive events to minimize the sum of the squares of the differences between actual team season runs and the values predicted from the weighted event counts. The interpretation of the weights of the events is the average number of runs due to a single instance of the event (single, double, ... ). Table 7 summarizes the regression results for the 424 team-season records in my data set. The second section of Table 7 gives the standard deviations and the average number of team runs per season for the three regressions. The 16 events used in the regressions account for approximately 97% of the variation in the team season runs. The variation in values between the two leagues and the overall totals gives some feel for the reliability of this method. For additional information on the regression analysis see "A Survey of Baseball Player Performance Evaluation Measures", J. F. Jarvis.

In the total regression the weight for the IBB term is 0.033 runs, a small contribution consistent with the EFR estimates. The runs value of unintentional walks and being hit by a pitch is 0.31 which is very close to the generally accepted value (3). The regression analysis also shows a runs cost for the IBB significantly less than any other batting event except an out.

The third line of evidence concerning the effect of IBBs is to examine the distribution of the number of times the different numbers of runs following an IBB occur.

As a comparison, a simulation was used to generate the same kind of distribution. The simulator does not implement game level strategy thus a comparison between it and the observed distribution should provide insight about the effects of the IBB. The simulation based runs distribution was obtained by doing a series of 24 simulations with all innings in each simulation starting from the same configuration, number of outs and base runners. The resulting runs/inning distributions were weighted by the count of IBBs for the configuration, Table 6, and the total was scaled to the number of runs actually scored, 16121. The simulated runs/inning distribution is given in the line "SIM" in Table 8. Also for comparison, the distribution of runs per inning for all the hitting data is given as the line ALL. This distribution has also been scaled to the number of runs following IBBs.

In the plot of the Table 8 data, Figure 1, the blue line is from the simulations and the red X marks are the observed totals of runs per inning following IBBs for both leagues combined. The green diamonds show the scaled total runs/inning distribution. This plot clearly shows that the number of runs from 1 or 2 run scores following an IBB is reduced, by 2940 runs, compared to the simulation based estimate of what would have been scored if there was no IBB. However, more big outbursts occur than might otherwise be predicted. Since the number of 1 or 2 runs innings (0.12/IBB) is much larger than the number of 3 and greater run innings (0.03/IBB) the IBB tactic provides help in close games.

The EFR and regression analysis attempted to show a direct benefit in terms of fewer runs scored when the IBB was given. The results of the two methods are consistent and both show a runs cost of the IBB that is much less than an unintentional base on balls. They do not show a net savings of runs. An examination the distribution of runs in an inning following an IBB provides the needed insight. This distribution clearly shows that number 1 or 2 run innings is less than number to be expected based on a comparison with overall hitting or a strategy free simulation. It also shows that the number of innings with 3 or more runs scored is higher than the same comparison data. Since there are many more occurrences of 1 or 2 run innings than values 4 or greater, the IBB tactic buys many savings of 1 or 2 runs for the price of an occasional big inning. Perhaps the IBB did contribute to the 9 run inning described in the opening paragraph.


Notes and Additional Information

1) The Hidden Game of Baseball, John Thorn & Pete Palmer, Doubleday, 1984. See Table VIII, page 153.

2) "Offensive Earned Run Average", Thomas M. Cover, Carroll W. Keilers, Operations Research, Vol. 25, No 5, Sep-Oct 1977, pp 729-740. Implementation details are given in "Implementing The Cover-Keilers Offensive Earned Run Average", J. F. Jarvis, and is also available from the SABR Archives.

3) See the discussion in the chapter "Sabermetrics" in Total Baseball, Fifth Ed., John Thorn, Pete Palmer, Michael Gershman, David Pietruza, Viking, 1997.

4) "Chance and Intent: A Baseball Paradox", J. F. Jarvis, CHANCE Vol 11 No 3 (1998) pp 12-19. For additional information see: simulator.html


(Use your browser's "back" button to return to the paper text from the table or figure sections)

Table 1: Counts of batting order position receiving IBB

batting order position ->
1 2 3 4 5 6 7 8 9
NL 746 378 1382 2077 1493 1283 1289 2702 366
AL 693 452 1582 1648 1458 1167 1039 684 197
------------------------------------------------
TOT 1439 830 2964 3725 2951 2450 2328 3386 563


Table 2: Hitting Summary for all ABs Following an IBB

       BA   SLG    AB    S    D    T   HR   TBB  IBB  HBYP  RFIB
NL 0.241 0.356 18177 3155 732 149 351 1809 519 122 8356
AL 0.261 0.394 15012 2776 665 98 378 1677 324 130 7765
----------------------------------------------------------------
TOT 0.250 0.373 33189 5931 1397 247 729 3486 843 252 16121


Table 3: Hitting Summary for all ABs

        BA   SLG      AB      S      D     T    HR    TBB   IBB  HBYP
NL 0.257 0.383 1048080 191245 47247 6948 23760 98832 11717 5987
AL 0.264 0.405 1184055 218297 55716 7224 31916 116440 8921 7808
---------------------------------------------------------------------
TOT 0.261 0.395 2232135 409542 102963 14172 55676 215272 20638 13795

In these tables TBB is total bases on balls and RFIB is runs following an IBB.


Table 4: Combined NL & AL Hitting EFR data

4A) Observed count of inning states
Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 607425 157950 47277 35542 8760 15654 8782 8676
1 434048 180209 85922 64819 29401 31766 22605 22424
2 344683 180693 102870 81988 42339 41441 24573 27006

4B) Observed Expected Future Runs
Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 0.492 0.872 1.121 1.487 1.363 1.744 1.982 2.310
1 0.261 0.519 0.684 0.907 0.955 1.167 1.386 1.558
2 0.098 0.224 0.327 0.439 0.375 0.501 0.595 0.763

4C) OERA Predicted Expected Future Runs from all PA
Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 0.522 0.943 1.121 1.557 1.121 1.557 1.736 2.256
1 0.283 0.568 0.738 1.035 0.738 1.035 1.205 1.585
2 0.106 0.246 0.368 0.513 0.368 0.513 0.635 0.843

4D) Ratio: Observed EFR/OERA estimated EFR
Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 0.943 0.926 1.000 0.954 1.216 1.120 1.142 1.024
1 0.923 0.914 0.926 0.877 1.293 1.128 1.150 0.983
2 0.926 0.910 0.890 0.855 1.020 0.976 0.936 0.905


Table 5. EFR for player hitting weighted by Number of IBBs Received

NL Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 0.617 1.070 1.238 1.701 1.238 1.710 1.877 2.436 (runs)
1 0.336 0.647 0.812 1.137 0.812 1.137 1.301 1.715
2 0.127 0.282 0.402 0.564 0.402 0.564 0.685 0.914

AL Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 0.677 1.149 1.308 1.799 1.308 1.799 1.959 2.536
1 0.373 0.699 0.858 1.198 0.858 1.198 1.357 1.788
2 0.143 0.307 0.425 0.596 0.425 0.596 0.714 0.954

TOT Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 0.644 1.106 1.270 1.750 1.270 1.750 1.914 2.481
1 0.353 0.670 0.832 1.164 0.832 1.164 1.326 1.748
2 0.135 0.293 0.412 0.578 0.412 0.578 0.698 0.933


Table 6: Observed count of IBB inning states

NL    Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 1 0 103 1 33 48 217 0
1 0 7 1746 7 445 82 2051 0
2 6 20 3992 19 874 60 1486 0

AL Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 0 1 117 2 30 30 189 0
1 1 4 1560 4 466 94 2089 0
2 7 10 2436 5 513 42 997 0

TOT Base runners on:
outs --- 1-- -2- 12- --3 1-3 -23 123
0 1 1 220 3 63 78 406 0
1 1 11 3306 11 911 176 4140 0
2 13 30 6428 24 1387 102 2483 0


Table 7: Linear Regression applied to the offense

                   Parameter Weights (runs/event)
Event NL AL TOT
OUTS -0.097 -0.114 -0.101
K -0.092 - 0.087 -0.099
SING 0.408 0.473 0.439
DOUB 0.723 0.631 0.679
TRIP 0.877 0.859 0.815
HRUN 1.417 1.502 1.484

BB+HP 0.330 0.280 0.308
IBB 0.200 0.101 0.033

SB 0.087 0.095 0.087
CS -0.361 -0.207 -0.238
ROH 0.121 -0.266 -0.114
GDP -0.447 -0.440 -0.429

ER_BF 0.671 0.643 0.620
ER_RA 0.452 0.310 0.355
RAO 0.006 0.171 0.066
RSO 0.668 0.843 0.823


Team-Seasons 200 224 424
Std Dev 17.3 17.3 17.8
Avr Season Runs 650 710 682

Other quantities used in the regression are ROH - runners out advancing after hit, ER_BF - errors allowing batters to get on base and ER_RA - errors allowing an runner advances including wild pitches and balks. Not all runner advances or scoring after an out are given by the official sacrifice categories. RAO, runner advance on an out, counts all runner advances after an out, except scoring, generalizing the sacrifice hit. Similarly, RSO, runner scoring on an out, generalizes the sacrifice fly by tabulating all scoring on an out. The line OUTS does not include outs counted in other categories such as caught stealing or picked off (CS), second out in a double play (GDP) or strike out (K).


Table 8: Number of Occurrences of Runs in an Inning Following First IBB

         Runs/inning ->
0 1 2 3 4 5 6 7 8 9 10 11 12 Tot Runs
NL 7026 1996 997 660 325 122 46 13 8 5 0 0 0 8356
AL 4859 1766 856 534 364 133 53 18 9 4 0 0 1 7765
---------------------------------------------------------------------
TOT 11885 3762 1853 1194 689 255 99 31 17 9 0 0 1 16121
SIM 11909 5466 2471 968 377 146 55 20 7 3 1 16109
ALL 23691 4942 2245 1015 455 188 81 31 12 4 3


(Use your browser's "back" button to return to the paper text from the table or figure sections)

Back to the J. F. Jarvis baseball page.


Copyright © 1999, John F. Jarvis