<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>harperawl.net</title>
<link>https://harperawl.net/blog.html</link>
<atom:link href="https://harperawl.net/blog.xml" rel="self" type="application/rss+xml"/>
<description>A blog built with Quarto</description>
<generator>quarto-1.9.32</generator>
<lastBuildDate>Thu, 21 May 2026 07:00:00 GMT</lastBuildDate>
<item>
  <title>FFDB, my local Statcast database, is now on GitHub</title>
  <dc:creator>Harper Wicker-Lenseigne</dc:creator>
  <link>https://harperawl.net/posts/ffdb-release/</link>
  <description><![CDATA[ 





<section id="tldr" class="level3">
<h3 class="anchored" data-anchor-id="tldr">tl;dr:</h3>
<p>I’m publishing the code behind my local Statcast database, which I’m calling FFDB (because it’s Four-seam Fast), in case anyone else would like to use it as well. You can find it <a href="https://github.com/harperawl/ffdb">on GitHub</a>. Instructions for usage can be found there! Enjoy! <a href="../..\contact.html">Contact me</a> if you have any questions. I’m happy to help!</p>
</section>
<section id="overview" class="level3">
<h3 class="anchored" data-anchor-id="overview">Overview</h3>
<p>One of the first things that really drew me to baseball was sabermetrics. I spent hours experimenting with the MLB API, creating rudimentary Python scripts to loop through game log JSON files that I had downloaded. Of course, this was wildly inefficient. So frustratingly inefficient, in fact, that the struggle it presented led to a year-long on-and-off journey of creating and iterating on what is now FFDB. I had originally just created this as an internal tool for myself to play with, but after putting some decent time into making it faster, cleaner, and more efficient, I figured I may as well put it out there, in case it helps anyone who was in the same situation I was.</p>
<p>If you have any experience with the MLB game data API, the schema I’ve defined should feel mostly familiar – the major conceptual difference is that I’m using a relational database-style setup instead of nesting JSON objects. Using SQL queries to interact with the data is much more effective, and having all the data stored locally makes it significantly faster as well. Admittedly, it does take up a lot of space – the JSON files from 2008 to 2026 are currently taking up more than 60 GB on my hard drive right now. However, you can delete the JSON files after generating the Parquet files (which are ~30x smaller!), but this means you can’t regenerate the Parquet files if a change to the schema is made without redownloading all of the JSON files again.</p>
<p>Speaking of downloading all the JSON files, I would like to mention that you should be somewhat careful downloading the JSON files from the MLB API. While I haven’t personally experienced this, you could risk an IP ban or rate limiting.</p>
<p>Note that this project is certainly still a work in progress! Additionally, changes are made to the MLB API all the time, so I will likely be updating this continuously. I will do my best to avoid any changes that aren’t backwards-compatible, but no promises…</p>
</section>
<section id="example-queries" class="level3">
<h3 class="anchored" data-anchor-id="example-queries">Example queries</h3>
<p>Get the average velocity of Paul Skenes’ four-seam fastballs in the 2024 regular season:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb1-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">SELECT</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">AVG</span>(start_speed)                    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- average the speed column in our results</span></span>
<span id="cb1-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">FROM</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">events</span> e                              <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- grab rows from the pitches column</span></span>
<span id="cb1-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">JOIN</span> games g <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">ON</span> g.game_id <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> e.game_id      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- add info from the corresponding game row</span></span>
<span id="cb1-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">JOIN</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ref</span>.players rp <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">ON</span> rp.<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">id</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> e.pitcher   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- add info from the player row that matches the pitcher's ID</span></span>
<span id="cb1-5"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">WHERE</span> rp.full_name <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'Paul Skenes'</span>         <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- match the name of the pitcher</span></span>
<span id="cb1-6">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">AND</span> g.game_type <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'R'</span>                      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- regular season only</span></span>
<span id="cb1-7">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">AND</span> g.season <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2024</span>                        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- filter by season for only 2024</span></span>
<span id="cb1-8">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">AND</span> e.pitch_type <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'FF'</span>                    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- filter by pitch type for four-seamers</span></span></code></pre></div></div>
<p>Get the team OPS for each team in 2025:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb2-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">SELECT</span></span>
<span id="cb2-2">    rt.team_name,                                                                     <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- label each row with the proper team</span></span>
<span id="cb2-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">SUM</span>(bl.hits <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> bl.base_on_balls <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> bl.hit_by_pitch)                                 <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- OBP calculation</span></span>
<span id="cb2-4">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">SUM</span>(bl.at_bats <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> bl.base_on_balls <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> bl.hit_by_pitch <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> bl.sac_flies) <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">AS</span> obp,</span>
<span id="cb2-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">SUM</span>(bl.total_bases) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">SUM</span>(bl.at_bats) <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">AS</span> slg,                                     <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- SLG calculation</span></span>
<span id="cb2-6">    obp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> slg <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">AS</span> ops                                                                  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- add 'em together</span></span>
<span id="cb2-7"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">FROM</span> batting_logs bl                                                                  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- easiest way to aggregate offensive stats</span></span>
<span id="cb2-8"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">JOIN</span> games g <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">ON</span> bl.game_id <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> g.game_id                                                <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- get game-level info like season</span></span>
<span id="cb2-9"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">JOIN</span> player_logs pl <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">ON</span> pl.game_id <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> bl.game_id <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">AND</span> bl.player <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pl.player              <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- get player context info from ID, specifically their team ID</span></span>
<span id="cb2-10"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">JOIN</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ref</span>.teams rt <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">ON</span> pl.parent_team_id <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> rt.<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">id</span>                                        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-- get team info from ID, specifically the name</span></span>
<span id="cb2-11"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">WHERE</span> g.season <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2025</span></span>
<span id="cb2-12">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">AND</span> g.game_type <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'R'</span></span>
<span id="cb2-13"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">GROUP</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">BY</span> rt.team_name</span></code></pre></div></div>
</section>
<section id="query-tips" class="level3">
<h3 class="anchored" data-anchor-id="query-tips">Query tips</h3>
<p>The documentation itself can be found in the GitHub repository, but I’m going to put some tips for building queries on this page as well.</p>
<section id="regular-season-game-type-filtering" class="level4">
<h4 class="anchored" data-anchor-id="regular-season-game-type-filtering">Regular season “game-type” filtering</h4>
<p>The game file downloader will download every “major league” game each year (more exactly, it’s downloading every game under the “sportId” 1, which corresponds to the MLB). This means it includes spring training (which you almost never want to include) and the postseason (which you often do not want to include). If you just want regular season data, you should do the following:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb3-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">SELECT</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> </span>
<span id="cb3-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">FROM</span> games g</span>
<span id="cb3-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">WHERE</span> g.game_type <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'R'</span></span></code></pre></div></div>
<p>To more finely control the game types that are included, you can reference the relevant MLB Stats API endpoint <a href="https://statsapi.mlb.com/api/v1/gameTypes">/gameTypes</a>.</p>
</section>
<section id="only-returning-pitches-from-events-table-queries" class="level4">
<h4 class="anchored" data-anchor-id="only-returning-pitches-from-events-table-queries">Only returning pitches from “events” table queries</h4>
<p>The “events” table represents pitch-level events, which usually means each row is a pitch, but not exactly. Batter timeouts, game advisories, and other supplemental pitch-level events are included as rows. This can cause issues with null values that might not be super obvious! The solution is as follows:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode sql code-with-copy"><code class="sourceCode sql"><span id="cb4-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">SELECT</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span></span>
<span id="cb4-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">FROM</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">events</span> e</span>
<span id="cb4-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">WHERE</span> e.is_pitch <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">TRUE</span></span></code></pre></div></div>


</section>
</section>

 ]]></description>
  <category>baseball</category>
  <category>programming</category>
  <guid>https://harperawl.net/posts/ffdb-release/</guid>
  <pubDate>Thu, 21 May 2026 07:00:00 GMT</pubDate>
</item>
</channel>
</rss>
