Hello, readers! Its been a while. I’ve taken a slight hiatus from blogging, but I have a few projects in the pipeline. To get back into the spirit of blogging, I thought I would share my thoughts on useful sources for baseball data science projects. I’ve been asked about this topic a couple of times recently, hence my inspiration for this short post.
The following chart lists useful baseball data sources and corresponding resources. This is, by no means, comprehensive. However, if you’re looking to start some baseball data science projects, this list should be a decent start.
Data Source | Description | Resources |
---|---|---|
Lahman Database | Aggregate year-by-year statistics dating back to the 1800s | Documentation |
Retrosheet | Over 100 data points on each play of the MLB season | Field Descriptions, Download Script |
Pitchfx | Diagnostics on every pitch thrown by an MLB pitcher | Scraper |
Baseball Reference | Aggregated statistics summarized in every possible way | Scraper |
Fangraphs | All the sabermetric stats you could ever want | Fangraphs Leaderboard |
mlbgame Python library | Game stats that can be ingested via Python | Documentation |