Kyle Boddy Entrepreneur, Hacker, Biomechanics Researcher, Baseball Lover.

27Aug/110

WiiMote, Motion Plus, Accelerometers, Gyroscopes, Baseball Pitching, and What it All Means

I am working on a much larger post (and page, and even separate website) to detail my work with modeling baseball biomechanics, but I made a post that I want to catalog here on my blog for sharing and archival purposes. This was originally written on a messageboard, so if the formatting is off, I apologize.

---

Here's a great video about accelerometers and gyroscopes:

http://www.youtube.com/watch?v=s19W-MG-whE

What do I really care about when I'm using the Wii parts? Well, to build a fully functioning Inertial Mass Unit (IMU) to get 1:1 motion capture/control, I need to do what they demonstrate above. However, this is very complicated and requires 6 DOF. The degrees of freedom are:

Moving up and down (heaving)
Moving left and right (swaying)
Moving forward and backward (surging)
Tilting forward and backward (pitching)
Turning left and right (yawing)
Tilting side to side (rolling)

I really only care about what the forearm is doing in relation to the elbow; this eliminates the first three DOF. Fortunately for me, the first 3 DOF are handled by accelerometers and the last 3 DOF are handled by gyroscopes. What matters the most is tracking:

-Humeral internal rotation velocity rate of change (pitch)
-Forearm pronation/supination rate of change (roll)

And to a lesser extent:

-Ulnar/radial degrees of flexion rate of change (yaw)

So the next step is synchronizing what I see on high-speed two-dimensional frontal plane (side view) video and what I get from the gyroscopes. By doing this, I can nearly eliminate the need to have a four or five high-speed camera system that uses Direct Linear Transformation to recreate a three-dimensional model of a pitcher. This is awesome, because DLT is both ****ing ridiculously time intensive as well as somewhat expensive due to the need for 4+ high-speed cameras ($150 each minimum with current consumer technology) and the software to handle it ($50, but it's very bare bones).

It's cool to be the guy doing the most to push low-cost / DIY biomechanical analysis of amateur athletics, but it also means I have no peer groups to work with. The Internet helps, but very few people are working with this kind of technology to produce the stuff I want to make. It's both exciting to be a pioneer in a field and incredibly frustrating because I have no formal education in physics or mechanical engineering, so I need to read pretty much everything I can get my hands on to understand it all.

I'd be remiss if I didn't mention that it's a bit terrifying that I could very well be wasting a lot of my time from an application/technology standpoint. If this product is so good (and I believe it is), then it already should exist given that the underlying technologies have been around for some time, though it can be said that it's only been affordable since the Wii and smartphones have given rise to cheap small consumer electronics for accelerometers and gyroscopes - not very long. But there's no proven market for what I want to sell, and it will never be huge.

Fortunately, I see this as an awesome opportunity to learn about science and to contribute - however marginally - to the field. Science and technology are two wholly separate disciplines, and as Richard Feynman famously said about his work: "I do things for the pleasure of finding things out."

1Apr/118

Minor League Splits Redux Launched

I've launched yet another half-complete web application relating to baseball sabermetrics! ML Splits is a database of minor league baseball players (batters only for now) that shows their "splits" (performance against LHP and RHP) as well as park effects and major league equivalencies. The data is taken from Jeff Sackmann's old minorleaguesplits.com site where he made the CSVs available for import as open source.

ML Splits

ML Splits

I leaned on jQuery for front-end display purposes, as I'm getting more and more comfortable with using it for front-facing web applications. Mostly just dynamic div tagging and toggle() to keep the screen clear of distractions and make it easy to see what stats are really important. Wrote the entire thing in PHP 5.3.x, MySQL 5, CodeIgniter 2.0, and jQuery.

Enjoy!

EDIT: Pitchers are up as of April 4th.

17Mar/112

Advanced Injury Database: RESTful Web Service Launched

Advanced Injury Database

Advanced Injury Database

I've launched a RESTful web interface for my Advanced Baseball Injury Database. A major problem with PITCHf/x and injury databases are that people are building them over and over against on their local (or hosted) servers, and this is a huge amount of overhead for the sabermetric community. The way to properly do this is to have one giant amalgamated database with a few trusted caretakers that deal with the updating/maintenance/feature requests while everyone else accesses the data using RESTful web services.

This makes updating and standardizing a dataset much easier and gives end users a much easier back-end interface into the database.

To access the RESTful service, you must first authenticate with the Advanced Baseball Injury Database over Facebook. (Anti-Facebook users: Get over it. I'm not going to spam your wall or steal your info, and I made this database entirely open for you to use.) You can do that on the Detailed Injuries Service page. This stores a unique key in my database and grants you access to make RESTful requests.

Example of the key

Example of the key

Easy enough. Copy that key down. You will need this going forward.

Making the Request: High-Level Overview

All you have to do to get injury information about ANY player from 2002-2010 is to go to this URL:

http://injurydb.drivelinebaseball.com/index.php/injurydb/injuryservice/eliasid/key

(I've been told that "eliasID" is the wrong term and I'm supposed to use mlbAMID. However, I've already coded it like "eliasid" and that's what I call it in real life, so you'll just have to deal with it if it bothers you.)

For example, if you use Jered Weaver's eliasID (450308) and my key (66- wait a minute, nice try), you get this in your web browser (squashed for easier reading):

Jered Weaver

Jered Weaver

Looks pretty ridiculous, right? Well, that's JavaScript Object Notation - JSON. You can easily parse that to get this:

Jered Weaver - See!

Jered Weaver - See!

So, how do you do that? Good question.

Making the Request: Low-Level Overview

If you've read this far, you probably want some code examples. No problem. I am a PHP/CodeIgniter/MySQL kind of guy, so those are the examples I'm going to give to you. However, I'm including both the simple way - file_get_contents() - and the tougher (but more universal) way - cURL. They should be all you need to get going.

Here's how you can use file_get_contents() in PHP to decode the JSON and echo it out to the browser:

function testinjuryservice()
	{
		// point it to Jered Weaver's eliasID and return it to the browser
		$contents = file_get_contents('http://injurydb.drivelinebaseball.com/index.php/injurydb/injuryservice/450308/YOURKEYHERE');

		// decode the json returned from the service
		$info = json_decode($contents);

		// count number of injury movements
		// must cast object into an array to accurately count the number of injuries
		$injuries = count((array)$info);
		$x = 1;

		while ($x <= $injuries)
		{
		      // echo $info->{$x}->{'DateOn'};
		      // you would use the above line to get the "DateOn" value for the xth injury
		      // repeat this with DateOff, injury, injury_type, etc

		      // dump contents of the given injury out
		      print_r($info->{$x});
		      echo '<br />';
		      $x++;
		}
	}

And here's the cURL example:

function testinjuryservice()
	{
		// open cURL
		$ch = curl_init();

		// point it to Jered Weaver's eliasID and return it to the browser
		curl_setopt($ch, CURLOPT_URL,
		'http://injurydb.drivelinebaseball.com/index.php/injurydb/injuryservice/450308/YOURKEYHERE');
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

		// assign it to $contents
		$contents = curl_exec ($ch);

		// close cURL
		curl_close ($ch);

		// decode the json returned from the service
		$info = json_decode($contents);

		// count number of injury movements
		// must cast object into an array to accurately count the number of injuries
		$injuries = count((array)$info);
		$x = 1;

		while ($x <= $injuries)
		{
		      // echo $info->{$x}->{'DateOn'};
		      // you would use the above line to get the "DateOn" value for the xth injury
		      // repeat this with DateOff, injury, injury_type, etc

		      // dump contents of the given injury out
		      print_r($info->{$x});
		      echo '<br />';
		      $x++;
		}
	}

VERY IMPORTANT: I start the JSON array at 1, not 0. Don't be a slave to default counting. Humans start at 1 when they count up.

If you screw up the eliasID or your authentication key, you will get this error in the array's first position:

{"1":{"1":"Invalid key or eliasID given."}}

Alternatively, I may have banned you from the service for too many requests, which segues well into the next point: Don't abuse this system. This is not meant for you to spider my entire database by requesting every player's information from eliasID 10 (Kris Benson: fun fact) to eliasID 9999999999. Can't we all get along?

Where Do We Go From Here?

Well, if you like the service, drop me a line - kyle at driveline baseball dot com. I'd love to hear from you, and if you want to collaborate, that's cool too.

You can keep an eye out for my articles at The Hardball Times, where I write about PITCHf/x stuff and exercise science things. Or check out my baseball training company's site, Driveline Baseball.

I plan on developing a RESTful PITCHf/x interface in the future depending on interest, my motivational levels, free time, and how much I think this is going to wreck my bandwidth costs. Ideally a bunch of us pitch in, rent a cheap VPS, and we serve it up to all sabermetricians who are interested in this kind of stuff. We write tutorials and make it open source and grant freedom of information. Is that feasible? Who knows!

Have fun.

13Mar/110

PITCHf/x Corrections Done on the Baseball Injury Database

Title says it all, for the most part:

http://injurydb.drivelinebaseball.com

Corrections to release point data were implemented with the generous help of Max Marchi of The Hardball Times. Go check them out! Regression sets are coming...

16Feb/110

A Brief Rant on Neo-Sabermetrics

(taken from a message board post where I was discussing PITCHf/x uncertainty)

Discussion of these correction algorithms and uncertainty around something that is precisely measured brings up a tangential point: Physicists are rather famous for saying "Any measurement that you make without knowledge of its uncertainty is completely meaningless." (Walter Lewin, actually)

And so this is a good thing that we talk about it for PITCHf/x, because uncertainty is good. However, the move in sabermetrics to blindly accept observed data is very... bad. I'll stand behind OBP and SLG all day, since these have no uncertainties around them. Same with linear weights (for what they area). But... UZR/DRS/TZ.... no. These are based off of observed measurements from BIS/GIS stringers that have a serious uncertainty around them. Additionally, the data has been shown to have serious park biases - especially in Chavez Ravine.

This is the old PECOTA/BPro issue all over again - when you keep data proprietary and sell it piecemeal, you suffer from publisher's bias and all sorts of conflict of interest. And then this data is fitted to an equation that has some regression involved in it, further compounding the error (and worse: drawing conclusions from facts not found in evidence).

UZR and other similar concepts should have an uncertainty listed. Saying someone's UZR is +15.5 is ridiculous; the same is true for saying someone's fastball has a linear weight of +1.2 runs. The former is stupid because stringers have serious uncertainty around them (which goes unreported and unquantified) and the latter is dumb because we do not know for sure that someone's fastball is indeed a fastball (not all pitch types are characterized correctly).

And so the derivation of stuff like linear weights and objective data needs to be separated from the.... well... psuedoscience (psuedoanalysis?) that is often done with UZR/DRS and other measurements like it. Just because analysts qualify that the data is indeed "fuzzy" does not make it okay. You need to publish uncertainty measurements or error bars, otherwise the data (and especially its conclusions) are worthless.

27Oct/100

PITCHf/x Database Issues: Duplicate Entry for Key

If you're trying to build a PITCHf/x database like I am, you are probably heavily leaning on Mike Fast's work. It's a great primer, however, that's exactly how it should be viewed: A primer. There are numerous things that have changed since he initially wrote the page, and it's not a simple copy/paste job to download all the data from MLBAM. Among the issues are:

  • Inadequate handling of timeouts from gd.mlb.com and gd2.mlb.com
  • Hardcoding the IP is not a valid suggestion and causes lots of problems (suggested to get around DNS resolution issues)
  • Parser script does not know how to figure out which games have been used without manually querying each at bat (expensive and unnecessary)
  • Database structure is not future-proof

I could write for hours on the first three (plus other bugs), but I didn't document them well enough and it's simply not that interesting to me to write about script bugs that people should learn how to fix for themselves. If used verbatim, Mike's database structure will fail around July 2010 when inputting new pitches and at-bats into the database. You'll get errors like the following:

PITCHf/x Errors - Parsing

Yes, this is my Windows box.

I've inserted a bunch of print statements to help me debug the code (ah, printf-style debugging) and saw that it was reporting duplicate or unknown key entries. Recalling the little I know about MySQL and the numbers involved (513425 and later 1900131), I was pretty sure that this is typical behavior when a memory space is overloaded. In this case, Mike uses MEDIUMINT(8) to describe the primary key of ab_id in the code.

If you have phpmyadmin (and you should), you can fix this problem rather easily by editing the structure of the table to change ab_id to INT(10) in both the pitches and atbats tables. Additionally, you need to change pitch_id in pitches to an INT(10). This will allow for larger numbers to be stored in those rows.

phpmyadmin PITCHf/x

phpmyadmin - Where to edit your PITCHf/x database (click for larger)

If you don't use phpmyadmin, the MySQL code to execute at the command line is something like:

ALTER TABLE pitches MODIFY ab_id INT(10);

Good luck with your PITCHf/x database building!

24Oct/100

New Web Project: Advanced Baseball Injury Database

Concept Image

I'm developing a web application that is similar in function to both PITCHf/x and injury-related databases out there. However, it will combine both types of data in hopes of performing some advanced searches not currently offered by any of the main free sources of information (Corey Dawkins' Baseball Injury Tool, Fangraphs data, Trip Somers' PITCHf/x tool, etc).

The "complete" application allows you to search for peak/average velocities, average release point, average number of pitches thrown per appearance, number of appearances per year, and other variables. For logged-in members, you will see an animated image of a pitcher's mechanics if it is available in the database, and if it is not, you will be able to donate a small amount of money ($10-20) to have said pitcher's mechanics digitized, added to the database, and a sponsored link with your name/group of choice added to the page.

With luck, people will be able to come up with some interesting combinatorial analysis that might correlate some PITCHf/x values with real-world injuries in pitchers. Plug-ins using jpgraph to utilize the power of R and general statistical analysis will be available in the Advanced Combination Search.