**Augusta, GA**

(September 29, 2003: updated to include 2002 season data..)

Since Interleague (IL) play began in 1997 the National League (NL) teams have won 54.7% (395 - 327) of their home IL games compared to 53.9% (3712 - 3177) in non IL home games. The corresponding values for the American League (AL) are 54.2% (391 - 331) and 52.5% (3185 - 2886). I will present an argument that this additional disparity in home winning percentage in both leagues is due to the Designated Hitter (DH) rule.

In outline form my argument starts with the observation that hitting in both leagues is the same when DH and pitcher hitting is removed. Furthermore, this equivalence includes home team hitting in IL games in both leagues. Only when away hitting in IL games is examined does a statistically significant difference emerge. These differences appear to be entirely due to AL pitcher hitting in NL parks and NL DH hitting in AL parks. In both cases the production by those unaccustomed to such hitting is significantly less than the regulars in these positions in the other (home team) league. I will use a linear weights formula to estimate the excess runs for the home teams in these match ups and then estimate the additional wins due to these runs using the James Pythagorean formula. The excess wins based on differences in DH and pitcher hitting are satisfyingly close to the excess wins estimated from the observed winning percentages.

Table 1 summarizes hitting in both leagues after removal of the contributions from the DHs and pitchers and demonstrates the hitting equivalence between the leagues. Standard abbreviations are used: AB - At Bats, BA - batting average, SLG - slugging average and OBP - On Base Percentage. A conventional method for evaluating the significance of differences in averages of the types given is Student's t-test. The numbers given are the probabilities that the differences, between the AL and NL, could be as large as they are by chance if drawn from the same distribution. Probabilities smaller than 0.01 are typically taken as being significant. For the large number of plate appearances (PA) the 0.007 difference in the OBP between the two leagues appears significant while the slightly smaller differences in BA (0.001) and SLG (0.004) are not.

Table 1. League Totals Less Pitcher and DH Hitting American League National League Student's t-test YEAR AB BA SA OBA AB BA SA OBA BA SA OBA 1997 70086 0.271 0.427 0.342 72323 0.271 0.424 0.345 0.9867 0.5472 0.1781 1998 70240 0.271 0.428 0.341 83058 0.269 0.425 0.342 0.3805 0.4193 0.5182 1999 69984 0.275 0.436 0.349 83420 0.276 0.444 0.354 0.6819 0.0777 0.0251 2000 70318 0.276 0.442 0.351 83095 0.273 0.446 0.353 0.1809 0.3138 0.2080 2001 69967 0.268 0.427 0.336 82637 0.268 0.440 0.342 0.8759 0.0046 0.0065 2002 69541 0.264 0.422 0.333 82400 0.266 0.423 0.342 0.4191 0.8855 0.0001 ------------------------------------------------------------------------------- 2002 420136 0.271 0.430 0.342 486933 0.270 0.434 0.347 0.7355 0.0574 0.0000

Table 2 and Figure 1 summarize hitting by league and fielding position, whether starting or substituting into a game. The comparatively small number of interleague games is reflected in the relatively small numbers of AB for the two positions, DH in AL and Pitcher in NL, that only occur in away IL games. Due to the small numbers of PA for DH in AL games and pitchers in NL IL games, the statistical strength of the differences for pitcher and DH is not great but is still significant. In both cases, pitchers in NL, DH in AL, the hitting advantage is to the home team.

Table 2. Hitting by Fielding Position POS AB HITS HR BA SA AB HITS HR BA SA BA SL P 1600 204 4 0.128 0.159 29909 4379 147 0.146 0.188 0.0366 0.0279 C 48238 12474 1307 0.259 0.398 54690 14182 1657 0.259 0.409 0.7915 0.0355 1B 51015 14488 2475 0.284 0.493 57316 15951 2538 0.278 0.477 0.0374 0.0089 2B 52242 14221 999 0.272 0.396 60404 16615 1243 0.275 0.405 0.2846 0.0580 3B 51086 13464 1671 0.264 0.425 58064 15816 2061 0.272 0.442 0.0010 0.0017 SS 51566 14095 1264 0.273 0.412 57984 15193 1014 0.262 0.380 0.0000 0.0000 LF 52567 14353 1763 0.273 0.439 58336 16192 2522 0.278 0.475 0.0923 0.0000 CF 53206 14435 1573 0.271 0.429 60277 16319 1865 0.271 0.431 0.8292 0.6209 RF 51647 14255 2058 0.276 0.464 58611 16655 2552 0.284 0.484 0.0026 0.0005 DH 47510 12945 1968 0.272 0.459 2709 686 89 0.253 0.403 0.0285 0.0022 PH 8398 1944 183 0.231 0.347 21223 4775 485 0.225 0.345 0.2292 0.8823 PR 171 39 4 0.228 0.333 28 4 0 0.143 0.214 0.3123 0.4328

The pitcher hitting record doesn't require much comment. AL pitchers don't get many opportunities to hit except in away IL games. Perhaps the surprise is that they perform as well as they do.

To better understand the differences in DH hitting, I made plots (Figure 2 and Figure 3) of hitter BA rank ordered by the number of PA. To get the data displayed, I first averaged the hitting stats for the player for each team in each season that had the most PA. Continuing, I did the same for players having the second most PA down to the 12th most on each team. This is shown as two plots, BA separately for each league. In these plots, the data is shown using a plot symbol. I also did a weighted by PA linear regression to the data and this is shown as the straight line passing through the data. Weighting is appropriate since the first ranked hitter gets about 3.5 times as many PAs as the 12th ranked hitter. The regression equation was solved for the rank value for the observed DH BA and a vertical line is plotted at that point. On the BA plots the NL DH appears at an equivalent rank of 10th while in the AL DH appears with a 5.5 ranking. The same exercise for SLG shows a slightly more extreme DH ranking dichotomy: 9.8 for the NL (Figure 4) and 3.6 for the AL (Figure 5). The interpretation is that in away IL games the NL presses the best hitter left on the bench into playing DH since they won't carry a player on their rosters whose sole purpose is hitting.

To determine the significance of the hitting record I need a method or tool to convert pitcher and DH hitting into an estimated number of runs produced. While the baseball literature abounds with methods for doing this I will use a linear weights or linear regression formula determined from the 118 team season records that encompass IL play. The formula adopted uses seven terms: singles, doubles, triples, home runs, walks plus hit by pitches, at bats minus hits, and errors. Including errors is appropriate since it is team runs, not individual hitter runs that are being discussed. The linear regression gave a standard deviation of 19 runs out of an average of 801 scored per team. An alternative way to express the regression quality is the square of the correlation coefficient which has a value of 0.95 indicating that this regression accounts for 95% of the variation of the data. The weights obtained from the regression are given in Table 3. Other columns in Table 3 include, Average, the team average of the number of occurrences of each term and Contrib, the total runs due to that term. When an outs related term is used in such a regression it will show a negative weight indicating a cost of making an out. This table is included for completeness. No general significance should be given to this particular set of weights since they are determined only from the five seasons, 1997 - 2001, having IL play. More details on this topic can be found in "A Survey of Baseball Player Performance Evaluation Measures".

Table 3. Seven Parameter Linear RegressionParam Weight Average ContribSNGL 0.5544 982.2 544.48 DOUB 0.6339 291.6 184.83 TRIP 1.0957 31.0 33.94 HRUN 1.3851 176.6 244.66 BB+HP 0.3900 577.1 225.06 AB-HITS -0.1254 4073.7 -510.79 ERRS 0.3828 170.6 65.30

Tables 4 and 5 detail the application of the linear weights formula in Table 3 to pitcher and DH hitting in IL games. The runs column indicates the estimated runs produced by the hitting exhibited by pitchers in NL parks and DHs in AL parks. A negative value for runs indicates that the hitting cost, rather than produced, runs. The difference between runs for the two leagues is shown in the DR column in Table 6.

Table 4. Interleague Pitcher Hitting in NL parks AL Teams NL teams YEAR GMS AB S D T HR W RUNS R/G GMS AB S D T HR W RUNS R/G 1997 107 218 17 4 1 1 5 -5.4 -0.051 107 212 32 0 1 2 8 5.5 0.052 1998 112 254 28 3 0 1 14 1.6 0.015 112 223 28 6 0 0 14 1.3 0.012 1999 125 285 28 8 1 1 19 2.2 0.017 125 247 31 4 0 5 7 11.8 0.095 2000 126 279 27 5 0 1 12 -5.5 -0.043 126 256 38 10 0 2 9 11.5 0.092 2001 126 276 31 6 1 0 5 -1.6 -0.013 126 223 29 7 0 0 7 0.7 0.005 2002 126 280 33 5 0 0 12 -1.7 -0.013 126 217 26 3 1 1 12 3.5 0.028 ---------------------------------------------------------------------------------------- 2002 722 1592 164 31 3 4 67 -10.4 -0.014 722 1378 184 30 2 10 57 34.4 0.048 Table 5. Interleague DH Hitting in AL parks AL Teams NL teams YEAR GMS AB S D T HR W RUNS R/G GMS AB S D T HR W RUNS R/G 1997 107 394 67 13 1 16 48 55.4 0.518 107 417 75 16 4 15 35 61.2 0.572 1998 112 418 87 26 0 12 50 67.4 0.601 112 426 65 19 1 10 36 39.5 0.353 1999 126 456 82 26 1 16 63 75.5 0.599 126 484 86 23 0 15 64 66.6 0.528 2000 125 462 102 33 1 28 77 113.5 0.908 125 471 100 18 1 15 53 72.8 0.582 2001 126 457 80 24 2 19 53 70.5 0.559 126 456 83 20 0 17 53 68.2 0.541 2002 126 459 66 28 0 17 56 61.0 0.484 126 455 56 30 0 17 63 59.4 0.471 ---------------------------------------------------------------------------------------- 2002 722 2646 484 150 5 108 347 443.3 0.614 722 2709 465 126 6 89 304 367.7 0.509

To estimate the number of wins represented by excess runs I used the well known James Pythagorean formula (1982 Baseball Abstract, pg 18). Specifically, I multiplied the derivative of the formula with respect to offensive team runs scored times the excess runs to estimate the change in winning percentage. I multiplied this change by the number of IL games to estimate how many wins would result from the hitting differences. Table 6 summarizes these calculations. In AL park IL games, the DH advantage for the AL nets about 74 additional runs and 7 extra wins. In NL park IL games, better pitcher hitting results in 38 additional runs which translates to about 4 extra wins. The values are reasonably close to the excess wins computed from the higher IL home wins and the remaining intraleague home wins, the W/L DW column. While the number of excess runs and wins is not large, the effect appears to be real.

Table 6. Team Away Game Pitcher/Designated Hitter Summary Estimated IL Excess Runs and Wins from P, DH Home Advantage Games DH Hitting(AL)-- W/L P Hitting(NL)--- W/L YEAR ILA ILN RUNS EXR DW DW RUNS EXR DW DW 97 107 107 539 -5.8 -0.6 4.1 508 11.0 1.2 10.8 98 112 112 569 27.9 2.7 -3.2 508 -0.3 -0.0 -8.1 99 126 125 654 8.9 0.9 -4.1 651 9.7 0.9 4.5 00 125 126 692 40.8 3.7 10.0 612 17.0 1.8 -2.7 01 126 126 643 2.3 0.2 9.5 586 2.3 0.2 3.7 02 126 126 615 1.6 0.2 -4.1 517 5.2 0.6 -2.0 --------------------------------------------------------------- 02 722 722 3712 75.6 7.4 12.2 3382 44.8 4.8 6.0

There will be a partial, but not complete compensation, for these asymmetries since the current IL schedules don't require teams to play home and home series with their opponents and since the number of home and away IL games is not the same for all teams. These schedule and DH related asymmetries could result in the win or loss of a additional game due to game structural, rather than team ability, reasons.

The 1997-2000 play-by-play data used in this study is courtesy of Total Sports and Gary Gillette. I have used my own software to extract the data reported in this paper from the play-by-play files.

Back to the J. F. Jarvis baseball page.

(Use your browser's "back" button to return to the paper text from the figures.)

Copyright 2001-2003, John F. Jarvis