The best way to understand the need for and function of the events
file processing program, the "parser", is to examine the data it
processes. Following is a single game from the 1983 season (San Diego
at Atlanta, April 8, 1983) obtained from Retrosheet,
Inc and is coded using the pure textual scoring system adopted by
Retrosheet, Inc and the
Baseball Workshop.
id,ATL8304080 version,1 info,inputprogvers,"version 7RS(19) of 07/07/92" info,visteam,SDN info,hometeam,ATL info,site,ATL01 info,date,1983/04/08 info,number,0 info,starttime,0:00PM info,daynight,unknown info,usedh,false info,umphome,"Quick" info,ump1b,"Pallone" info,ump2b,"Engel" info,ump3b,"Runge" info,scorer,"Braves" info,translator,"C. Chestnut" info,inputter,"C. Chestnut" info,inputtime,1995/02/07 9:01PM info,edittime,1996/09/28 9:00PM info,howscored,park info,pitches,count info,temp,0 info,winddir,unknown info,windspeed,-1 info,fieldcond,unknown info,precip,unknown info,sky,unknown info,timeofgame,0 info,attendance,32737 info,wp,bedrs001 info,lp,chiff001 info,save,garbg001 info,gwrbi, start,richg001,"Gene Richards",0,1,7 start,bonij001,"Juan Bonilla",0,2,4 start,garvs001,"Steve Garvey",0,3,3 start,kennt001,"Terry Kennedy",0,4,2 start,lezcs001,"Sixto Lezcano",0,5,9 start,joner002,"Ruppert Jones",0,6,8 start,tempg001,"Garry Templeton",0,7,6 start,salal001,"Luis Salazar",0,8,5 start,showe001,"Eric Show",0,9,1 start,butlb001,"Brett Butler",1,1,8 start,ramir001,"Rafael Ramirez",1,2,6 start,washc001,"Claudell Washington",1,3,9 start,murpd001,"Dale Murphy",1,4,7 start,hornb001,"Bob Horner",1,5,5 start,chamc001,"Chris Chambliss",1,6,3 start,hubbg001,"Glenn Hubbard",1,7,4 start,beneb001,"Bruce Benedict",1,8,2 start,campr001,"Rick Camp",1,9,1 play,1,0,richg001,12,,S9 play,1,0,bonij001,00,,S9.1-2 play,1,0,garvs001,01,,64(1)3/GDP.2-3 play,1,0,kennt001,22,,S6.3-H play,1,0,lezcs001,12,,3/FL play,1,1,butlb001,11,,S6 play,1,1,ramir001,00,,CS2(24) play,1,1,ramir001,32,,7 play,1,1,washc001,11,,S7 play,1,1,murpd001,01,,S7.1-3 play,1,1,hornb001,31,,W.1-2 play,1,1,chamc001,10,,8 play,2,0,joner002,20,,63 play,2,0,tempg001,32,,K/C play,2,0,salal001,10,,S9 play,2,0,showe001,22,,K/C play,2,1,hubbg001,01,,8 play,2,1,beneb001,10,,63 play,2,1,campr001,11,,3 play,3,0,richg001,22,,8 play,3,0,bonij001,00,,7 play,3,0,garvs001,00,,63 play,3,1,butlb001,20,,63 play,3,1,ramir001,21,,9 play,3,1,washc001,22,,K/C play,4,0,kennt001,00,,31 play,4,0,lezcs001,10,,5/FL play,4,0,joner002,20,,43 play,4,1,murpd001,32,,6 play,4,1,hornb001,22,,K play,4,1,chamc001,12,,S9.BX2(96) play,5,0,tempg001,12,,K play,5,0,salal001,01,,2/G play,5,0,showe001,12,,K play,5,1,hubbg001,00,,4 play,5,1,beneb001,10,,S8 play,5,1,campr001,01,,HP.1-2 play,5,1,butlb001,00,,S8.2-H;1-3 play,5,1,ramir001,00,,S8.3-H;1-2 play,5,1,washc001,00,,46(1)/FO.2-3 play,5,1,murpd001,21,,13 play,6,0,richg001,30,,W play,6,0,bonij001,32,,W.1-2 play,6,0,garvs001,22,,K/C play,6,0,kennt001,00,,NP sub,falcp001,"Pete Falcone",1,9,1 play,6,0,kennt001,20,,E5.2-3;1-2 play,6,0,lezcs001,00,,NP sub,mahlr001,"Ricky Mahler",1,9,1 play,6,0,lezcs001,00,,9/SF.3-H(UR);2-3 play,6,0,joner002,00,,7 play,6,1,hornb001,32,,W play,6,1,chamc001,21,,C/E2.1-2 play,6,1,hubbg001,00,,BK.2-3;1-2 play,6,1,hubbg001,12,,K/C play,6,1,beneb001,10,,9 play,6,1,mahlr001,00,,NP sub,pocob001,"Biff Pocoroba",1,9,11 play,6,1,pocob001,10,,9 play,7,0,tempg001,00,,NP sub,bedrs001,"Steve Bedrosian",1,9,1 play,7,0,tempg001,00,,63 play,7,0,salal001,02,,S8 play,7,0,showe001,00,,SB2.1-3(E2/TH2) play,7,0,showe001,00,,NP sub,lefej001,"Joe Lefebvre",0,9,11 play,7,0,lefej001,01,,3/FL play,7,0,richg001,10,,63 play,7,1,butlb001,00,,NP sub,chiff001,"Floyd Chiffer",0,9,1 play,7,1,butlb001,00,,9 play,7,1,ramir001,22,,9 play,7,1,washc001,10,,D7 play,7,1,murpd001,21,,S4.2-3 play,7,1,hornb001,02,,64(1)/FO play,8,0,bonij001,22,,K/C play,8,0,garvs001,12,,S7 play,8,0,kennt001,11,,S8.1-2 play,8,0,lezcs001,32,,8.2-3 play,8,0,joner002,12,,K/C play,8,1,chamc001,31,,W play,8,1,hubbg001,10,,S9.1-3 play,8,1,beneb001,12,,D7.3-H;1-3 play,8,1,bedrs001,00,,NP sub,smitk102,"Ken Smith",1,9,11 play,8,1,smitk102,00,,NP sub,lucag001,"Gary Lucas",0,9,1 play,8,1,smitk102,00,,NP sub,watsb001,"Bob Watson",1,9,11 play,8,1,watsb001,32,,K/C play,8,1,butlb001,22,,43.3-H play,8,1,ramir001,00,,63 play,9,0,tempg001,00,,NP sub,garbg001,"Gene Garber",1,9,1 play,9,0,tempg001,01,,S9 play,9,0,salal001,20,,S8.1-2 play,9,0,lucag001,00,,NP sub,turnj101,"Jerry Turner",0,9,11 play,9,0,turnj101,00,,3(B)6(1)/GDP.2-3 play,9,0,richg001,20,,3/G data,er,showe001,2 data,er,chiff001,2 data,er,lucag001,0 data,er,campr001,1 data,er,falcp001,0 data,er,mahlr001,0 data,er,bedrs001,0 data,er,garbg001,0
The complete set of files for a season, one for each team
containing all their home games, contain a description of every play
that has taken place. These complete season files total 8 to 12
MBytes. The scoring system is entirely textual, thus readable, but
the amount of detail contained is so great that help from a computer
is needed if accurate statistical summaries for a team, a player or a
season are desired.
It is the function of the events file analysis program, usually
refered to as the parser, to process game or season descriptions and
produce the reports or files containing data need for other projects
or programs. A set of programs for parsing this data are available
from Retrosheet, Inc.
I have written my own parsing program. There is no better way to
understand the intricacies of the scoring system than to write a
program to analyse these files. After completing the processing for a
season, the appropriate statistics for the simulator can be written
to a file. Likewise, complete hitting, batting and fielding
statistics can be collected for all the players that appeared during
the season. Have you ever tried to find batting data for pitchers?
Have you ever tried to obtain base running statistics such as the
fraction of the time a runner on second scores or stays at third
after a single? The reward for writing the parser is the ability to
compute exhaustively any statistical quantity that can be adequately
defined.
While I will not attempt to describe the details of the scoring
system and parser, some idea of the completeness of the record and
the problems of interpretting it can be seen in the game description
above. Each at bat is described by one or more "play" records. Each
play record consists of the word play, the inning, 0 for visitor or 1
for the home team, a coded version of the players name, the count on
the batter, an optional list of the pitches to the batter, and the
detailed description of the play itself. For example:
play,6,0,lezcs001,00,,9/SF.3-H(UR);2-3
In the visitor's sixth inning Sixto Lezcano hits a sacrifice fly to
right fielder scoring the runner on third, Gene Richards, which is
unearned, (UR), and advancing the runner on second, Juan Bonilla.
Starting lineup and substitution information allows identifying each
player in a play. The detailed play description consists of three
parts but all parts or not given on each play. The first item is the
type of play. A slash, "/", separates a modifier or additional
information field from the play type. If runners advance on the play
this information is given following a period, ".".
In addition to the play record there are also "id", "start", "sub",
"info" and "data" records. The "id" record contains a unique game
identifier based on the home team, date and which game of a
doubleheader if needed. Some additional insight into this scoring
system is provided in: "The Joy of Keeping Score", Paul Dickson,
1996, Walker, pp 44-45.
Once the investment has been made in creating a program to process
this data, creating new player level statistics and applying them to
full major league seasons is straight forward. This is the method
used to evaluate the apportioned wins and losses pitching evaluation
statistic I presented at SABR97.
Following is a sample of the reports available, for the 1983 NL,
giving team related statistics.
In the hitting summary, or is opponents runs scored, hbyp is battters
hit by a pitch. The other column headings are obvious.
team hitting summary
team games runs or lob ab hits s d t hr bb hbyp
atl 162 746 640 1155 5472 1489 1096 218 45 130 582 17
chn 162 701 719 1120 5512 1436 982 272 42 140 470 29
cin 162 623 710 1090 5333 1274 896 236 35 107 588 19
hou 162 643 646 1145 5502 1412 1016 239 60 97 517 19
lan 163 654 609 1104 5440 1358 981 197 34 146 541 22
mon 163 677 646 1213 5611 1482 1042 297 41 102 509 38
nyn 162 575 680 1041 5444 1314 1004 172 26 112 436 31
phi 163 696 635 1154 5426 1352 973 209 45 125 640 26
pit 162 659 648 1142 5531 1460 1072 238 29 121 497 19
sdn 163 653 653 1103 5527 1384 1050 207 34 93 482 20
sfn 162 687 697 1104 5369 1324 946 206 30 142 619 28
sln 162 679 710 1175 5550 1496 1088 262 63 83 543 24
----------------------------------------------------------------------------
tot 1948 7993 7993 13546 65717 16781 12146 2753 484 1398 6424 292
input command ('?' for list): misc
The base stealing tabulation below also gives each teams double and triple play counts as well as counts of successes (sb) and failures (cs) for attempts on each base. Counts of pickoffs, both as offensive and defensive teams are also given.
base stealing stats
offensive defensive
team dpl tpl po1 po2 po3 sb2 cs2 sb3 cs3 sb4 cs4 dpl tpl po1 po2 po3
atl 159 1 10 3 0 135 75 11 7 0 3 176 0 10 0 0
chn 143 0 4 2 1 78 34 5 3 1 3 163 1 10 4 0
cin 137 0 7 1 0 145 65 8 5 1 6 121 0 4 2 0
hou 102 0 10 1 0 156 87 8 3 0 5 165 0 5 1 0
lan 140 0 7 0 0 155 65 10 4 1 5 131 0 7 4 0
mon 153 1 7 1 1 131 37 7 5 0 1 128 0 6 4 1
nyn 139 0 7 3 0 129 60 12 2 0 2 172 0 6 1 0
phi 159 0 3 4 1 133 67 10 3 0 4 116 0 7 1 0
pit 160 1 5 3 0 116 70 8 5 0 2 165 0 12 0 1
sdn 143 0 9 3 0 162 56 17 4 0 6 134 1 4 3 0
sfn 167 0 4 3 0 139 63 0 7 1 7 108 0 4 2 1
sln 150 0 10 4 0 194 76 10 3 3 10 173 1 8 6 0
-------------------------------------------------- --------------------
tot 1752 3 83 28 3 1673 755 106 51 7 54 1752 3 83 28 3
The following tabulation is for the entire 1983 National League.
Runner advances under various condtions are tabulated. The first line
reports on where the lead runner ends up following a single. In the
headings read 1-3 as the number of times with the lead runner on
first that a single advances him to third. An x as a destination
indicates the runner was out while an h indicates the runner scored.
If the starting base and ending base are the same the runner didn't
advance.
The second line tabulates the advances for a runner trailing a runner
on third when a single has been hit.
The third line documents the progress of the lead runner following a
double as well as runner advances on double plays.
lead runner advance on single 1-2 1-3 1-h 1x 2-2 2-3 2-h 2-x 3-3 3-h 3x 1470 699 32 34 26 509 1147 71 14 1177 1 runner on third, single, next runner advances 1-2 1-3 1-h 1-x 2-2 2-3 2-h 2-x 754 313 30 34 3 134 295 16 lead runner advance on double ------------, on double play 1-3 1-h 1-x 2-3 2-h 2-x 3-h 2-2 2-3 3-3 3-h 255 245 14 5 450 0 252 6 126 10 59
Back to the J. F. Jarvis baseball page.