Minor League Splits Redux Launched
I've launched yet another half-complete web application relating to baseball sabermetrics! ML Splits is a database of minor league baseball players (batters only for now) that shows their "splits" (performance against LHP and RHP) as well as park effects and major league equivalencies. The data is taken from Jeff Sackmann's old minorleaguesplits.com site where he made the CSVs available for import as open source.
I leaned on jQuery for front-end display purposes, as I'm getting more and more comfortable with using it for front-facing web applications. Mostly just dynamic div tagging and toggle() to keep the screen clear of distractions and make it easy to see what stats are really important. Wrote the entire thing in PHP 5.3.x, MySQL 5, CodeIgniter 2.0, and jQuery.
Enjoy!
EDIT: Pitchers are up as of April 4th.
Springloops is awesome, HeidiSQL is annoying, Site5 disappoints
I've started to use Springloops RC2 for my version control efforts, which is really awesome. They support SVN and git (no Mercurial yet), have a great interface with easy deployments to multiple servers so you can split them up into production/staging/development, a solid ticketing system, and a good code browser. Also, it's completely free!

HeidiSQL has been my Windows MySQL GUI of choice ever since I switched to it from Toad. It's generally very good, it's FOSS, and it handles most operations fairly well - except for CSV importing. Site5 is my current webhost, and I'm very happy with them 99.9% of the time, but they caused me a bit of pain recently. I have a lot of CSVs to import that are about 10-15 MB in size each, and I tried importing them through the very handy phpmyadmin tool. However, phpmyadmin has some memory leaks and issues with importing larger CSVs, so it ended up crashing due to memory problems on the larger files (even though it theoretically can handle up to 105MB CSVs). I sent Site5 a ticket to have them import the CSVs manually (they said they would), but they responded with "there's no table structure in your DB so we can't do it." Well, uh, that's what phpmyadmin does and why I wanted to use it over the 7 tables I need to make with 20 or so CSV files. They refused again without providing me an alternative, so I had do it myself.
I fired up HeidiSQL, manually created the tables, and imported the CSV. And... it didn't work. I ignored the first row (column headers), but it still responded with "invalid data." The fields were correctly cast for the rest of the data, so I didn't know what was up. A bit of Googling tells me that even if you ignore the first row, HeidiSQL still checks it against the data types in your table. This is idiotic and annoying for any number of reasons, all of which I leave up to you to figure out.
At any rate, deleting the first row and ignoring 0 rows ended up working just fine. Now to do this repetitive task over and over again...
Advanced Injury Database: RESTful Web Service Launched
I've launched a RESTful web interface for my Advanced Baseball Injury Database. A major problem with PITCHf/x and injury databases are that people are building them over and over against on their local (or hosted) servers, and this is a huge amount of overhead for the sabermetric community. The way to properly do this is to have one giant amalgamated database with a few trusted caretakers that deal with the updating/maintenance/feature requests while everyone else accesses the data using RESTful web services.
This makes updating and standardizing a dataset much easier and gives end users a much easier back-end interface into the database.
To access the RESTful service, you must first authenticate with the Advanced Baseball Injury Database over Facebook. (Anti-Facebook users: Get over it. I'm not going to spam your wall or steal your info, and I made this database entirely open for you to use.) You can do that on the Detailed Injuries Service page. This stores a unique key in my database and grants you access to make RESTful requests.

Example of the key
Easy enough. Copy that key down. You will need this going forward.
Making the Request: High-Level Overview
All you have to do to get injury information about ANY player from 2002-2010 is to go to this URL:
http://injurydb.drivelinebaseball.com/index.php/injurydb/injuryservice/eliasid/key
(I've been told that "eliasID" is the wrong term and I'm supposed to use mlbAMID. However, I've already coded it like "eliasid" and that's what I call it in real life, so you'll just have to deal with it if it bothers you.)
For example, if you use Jered Weaver's eliasID (450308) and my key (66- wait a minute, nice try), you get this in your web browser (squashed for easier reading):
Looks pretty ridiculous, right? Well, that's JavaScript Object Notation - JSON. You can easily parse that to get this:
So, how do you do that? Good question.
Making the Request: Low-Level Overview
If you've read this far, you probably want some code examples. No problem. I am a PHP/CodeIgniter/MySQL kind of guy, so those are the examples I'm going to give to you. However, I'm including both the simple way - file_get_contents() - and the tougher (but more universal) way - cURL. They should be all you need to get going.
Here's how you can use file_get_contents() in PHP to decode the JSON and echo it out to the browser:
function testinjuryservice()
{
// point it to Jered Weaver's eliasID and return it to the browser
$contents = file_get_contents('http://injurydb.drivelinebaseball.com/index.php/injurydb/injuryservice/450308/YOURKEYHERE');
// decode the json returned from the service
$info = json_decode($contents);
// count number of injury movements
// must cast object into an array to accurately count the number of injuries
$injuries = count((array)$info);
$x = 1;
while ($x <= $injuries)
{
// echo $info->{$x}->{'DateOn'};
// you would use the above line to get the "DateOn" value for the xth injury
// repeat this with DateOff, injury, injury_type, etc
// dump contents of the given injury out
print_r($info->{$x});
echo '<br />';
$x++;
}
}
And here's the cURL example:
function testinjuryservice()
{
// open cURL
$ch = curl_init();
// point it to Jered Weaver's eliasID and return it to the browser
curl_setopt($ch, CURLOPT_URL,
'http://injurydb.drivelinebaseball.com/index.php/injurydb/injuryservice/450308/YOURKEYHERE');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// assign it to $contents
$contents = curl_exec ($ch);
// close cURL
curl_close ($ch);
// decode the json returned from the service
$info = json_decode($contents);
// count number of injury movements
// must cast object into an array to accurately count the number of injuries
$injuries = count((array)$info);
$x = 1;
while ($x <= $injuries)
{
// echo $info->{$x}->{'DateOn'};
// you would use the above line to get the "DateOn" value for the xth injury
// repeat this with DateOff, injury, injury_type, etc
// dump contents of the given injury out
print_r($info->{$x});
echo '<br />';
$x++;
}
}
VERY IMPORTANT: I start the JSON array at 1, not 0. Don't be a slave to default counting. Humans start at 1 when they count up.
If you screw up the eliasID or your authentication key, you will get this error in the array's first position:
{"1":{"1":"Invalid key or eliasID given."}}
Alternatively, I may have banned you from the service for too many requests, which segues well into the next point: Don't abuse this system. This is not meant for you to spider my entire database by requesting every player's information from eliasID 10 (Kris Benson: fun fact) to eliasID 9999999999. Can't we all get along?
Where Do We Go From Here?
Well, if you like the service, drop me a line - kyle at driveline baseball dot com. I'd love to hear from you, and if you want to collaborate, that's cool too.
You can keep an eye out for my articles at The Hardball Times, where I write about PITCHf/x stuff and exercise science things. Or check out my baseball training company's site, Driveline Baseball.
I plan on developing a RESTful PITCHf/x interface in the future depending on interest, my motivational levels, free time, and how much I think this is going to wreck my bandwidth costs. Ideally a bunch of us pitch in, rent a cheap VPS, and we serve it up to all sabermetricians who are interested in this kind of stuff. We write tutorials and make it open source and grant freedom of information. Is that feasible? Who knows!
Have fun.
PITCHf/x Corrections Done on the Baseball Injury Database
Title says it all, for the most part:
http://injurydb.drivelinebaseball.com
Corrections to release point data were implemented with the generous help of Max Marchi of The Hardball Times. Go check them out! Regression sets are coming...
A Brief Rant on Neo-Sabermetrics
(taken from a message board post where I was discussing PITCHf/x uncertainty)
Discussion of these correction algorithms and uncertainty around something that is precisely measured brings up a tangential point: Physicists are rather famous for saying "Any measurement that you make without knowledge of its uncertainty is completely meaningless." (Walter Lewin, actually)
And so this is a good thing that we talk about it for PITCHf/x, because uncertainty is good. However, the move in sabermetrics to blindly accept observed data is very... bad. I'll stand behind OBP and SLG all day, since these have no uncertainties around them. Same with linear weights (for what they area). But... UZR/DRS/TZ.... no. These are based off of observed measurements from BIS/GIS stringers that have a serious uncertainty around them. Additionally, the data has been shown to have serious park biases - especially in Chavez Ravine.
This is the old PECOTA/BPro issue all over again - when you keep data proprietary and sell it piecemeal, you suffer from publisher's bias and all sorts of conflict of interest. And then this data is fitted to an equation that has some regression involved in it, further compounding the error (and worse: drawing conclusions from facts not found in evidence).
UZR and other similar concepts should have an uncertainty listed. Saying someone's UZR is +15.5 is ridiculous; the same is true for saying someone's fastball has a linear weight of +1.2 runs. The former is stupid because stringers have serious uncertainty around them (which goes unreported and unquantified) and the latter is dumb because we do not know for sure that someone's fastball is indeed a fastball (not all pitch types are characterized correctly).
And so the derivation of stuff like linear weights and objective data needs to be separated from the.... well... psuedoscience (psuedoanalysis?) that is often done with UZR/DRS and other measurements like it. Just because analysts qualify that the data is indeed "fuzzy" does not make it okay. You need to publish uncertainty measurements or error bars, otherwise the data (and especially its conclusions) are worthless.
Creating Batch Users in TankAuth (CodeIgniter)
As detailed in this Stack Overflow request, I had the need to convert an existing database of users/passwords (stored in plaintext) to our new authentication library driven by CI and TankAuth. User jondavidjohn got me on the right track, and here's the code I ended up writing to submit to Stack Overflow for sample use:
function batchReg()
{
$this->load->model('mymodel');
// connect to the database
$this->mymodel->dbconnect();
// build it
$query = "SELECT user, email, pass from newusers ORDER BY user ASC";
// ship it
$result = mysql_query($query);
// loop it
while ($row = mysql_fetch_array($result))
{
$data = $this->tank_auth->create_user($row['user'], $row['email'], $row['pass'], FALSE);
print_r($data);
echo "<p>";
}
}
Enjoy!
Scaffolding in CodeIgniter: Controller Code
Now that scaffolding has finally been removed from CodeIgniter, they've removed all references to it. However, I still use this function for clients, and as a result, stick with the 1.7.3 branch for now. If you're in the same situation and need the simple controller code, here it is:
<?php
class edit extends Controller {
function edit()
{
parent::Controller();
$this->load->scaffolding('tableName');
}
}
Be sure to properly set your database configuration connections (database.php) and scaffolding trigger (routes.php) to make this work.




