Syndicate this site: (RSS)

Run Scoring: About the Data

The data for this project comes from the event files provided by Retrosheet. You'll be seeing a lot of the following notice.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

Retrosheet offers game logs for the AL and NL from 1969-1992; these were all used. There are also logs available for earlier years in the AL, but I choose not to incorporate these because of the lack of a matching NL set. Thus, they remain viable for any who wish to test the conclusions of this study using independent data.

For each of the event files, it is a relatively simple matter to run the BEVENT utility to extract the desired information. The BEVENT output is saved into working files, and the working files processed into reports for each season. These can then be manipulated as desired.

I strongly advise that, when working with event logs, be sure to extract the gameId - if there is a problem that you will need to repair, the gameId will be useful for tracking down where to make the fix (and for checking what fix should be made against the Retrosheet site).

For this study, there are a few gotchas lurking in the corners.

The first is that of substitutions for runners - you'll need to grab columns 83-85 if you want to track runners correctly, and eliminate those cases where a batter didn't score because he was run for. Also note that BEVENT has a problem when two substitutions occur at the same time - if you store the gameIds, it will be much easier to find the problem and fix it.

Second, you'll want to be especially careful with how you handle baserunners in "walk off" situations. I decided for this study that the runners safe on base at the end of the game are not charged with an opportunity, but the runners who were put out on the last play of the game are charged. There were some 53 occassions where the winning run scored when the defense got only one out. So you'll need the outs on base field, even though most of the time you can simply watch for the inning change.

Third, you'll want to watch out for those occassions where a batter scores in his own at bat without hitting a home run. I choose to file these as home runs - I can't tell the difference in the event files between a ball that goes over the fence and an inside the park home run anyway. They don't come up often, but will be necessary to insure that your runs scored totals check out.

I didn't bother paying attention to sacrifice hits and sacrifice flies, so will be using mock versions of AVG, OBP, and SLG when reporting those numbers.

I didn't make any effort to control for pitchers batting, which one could do by choosing the between the DH/non DH seasons, or by paying attention to the defensive position column.

May 8, 2004 5:49 PM | TrackBack

Comments
How much does baserunning matter to run scoring?
Excerpt: With all the recent hoo-hah about productive outs and the poo-pooing by ESPN talking heads Harold Reynolds and John Kruk...
Weblog: Off the Kuff
Tracked: May 9, 2004 7:31 PM
Post a comment




Who are you?