PITCHf/x Database Issues: Duplicate Entry for Key

If you’re trying to build a PITCHf/x database like I am, you are probably heavily leaning on Mike Fast’s work. It’s a great primer, however, that’s exactly how it should be viewed: A primer. There are numerous things that have changed since he initially wrote the page, and it’s not a simple copy/paste job to download all the data from MLBAM. Among the issues are:

  • Inadequate handling of timeouts from gd.mlb.com and gd2.mlb.com
  • Hardcoding the IP is not a valid suggestion and causes lots of problems (suggested to get around DNS resolution issues)
  • Parser script does not know how to figure out which games have been used without manually querying each at bat (expensive and unnecessary)
  • Database structure is not future-proof

I could write for hours on the first three (plus other bugs), but I didn’t document them well enough and it’s simply not that interesting to me to write about script bugs that people should learn how to fix for themselves. If used verbatim, Mike’s database structure will fail around July 2010 when inputting new pitches and at-bats into the database. You’ll get errors like the following:

PITCHf/x Errors - Parsing
Yes, this is my Windows box.

I’ve inserted a bunch of print statements to help me debug the code (ah, printf-style debugging) and saw that it was reporting duplicate or unknown key entries. Recalling the little I know about MySQL and the numbers involved (513425 and later 1900131), I was pretty sure that this is typical behavior when a memory space is overloaded. In this case, Mike uses MEDIUMINT(8) to describe the primary key of ab_id in the code.

If you have phpmyadmin (and you should), you can fix this problem rather easily by editing the structure of the table to change ab_id to INT(10) in both the pitches and atbats tables. Additionally, you need to change pitch_id in pitches to an INT(10). This will allow for larger numbers to be stored in those rows.

phpmyadmin PITCHf/x
phpmyadmin - Where to edit your PITCHf/x database (click for larger)

If you don’t use phpmyadmin, the MySQL code to execute at the command line is something like:

ALTER TABLE pitches MODIFY ab_id INT(10);

Good luck with your PITCHf/x database building!