Architecture and Workflows
Federal Election Commission (FEC) Data Architecture and Workflows
Overview
Contributions by individuals to political candidates can be:
- Given to the candidates' principal campaign committees.
- Given to the authorized committees directly.
- Donated to intermediary committees as earmarked contributions or share of contributions to joint fundraising committees.
All these different ways to donate result in different ways or types of filing to the FEC. Data filed to the FEC takes some time to get processed by the FEC. Unprocessed data is not accessible through the same API endpoints as processed data but filed into a special .fec format to the FEC.
For each of these ways of reporting, our Elections platform provides workflows to process the data behind the scenes. This section aims to detail the architecture and workflows of our platform.
Note: Knowing these processes is not necessary for using the Elections Platform, but might be important to the interested user to fully understand the system. Our goal is to provide a transparent system without any black-box parts, and allow all processes to be comprehensible and checkable by our users.
The diagram gives a birds-eye view of the Elections Data Platform and its data sources.
Here are the three ETL processes for the different types of FEC data and the filings that we used to import the data to the Elections database:
- Processed FEC Data: Most of the data is processed FEC Data. For previous election cycles and for the old data of the current election cycle, data will be imported from here. We clone the FEC database since access to is very limited as the data can only be exported in its raw form. This results in limited filtering, no aggregation, and has a traffic limit for each user. We also assign earmarked contributions or the share of a joint fundraising committee to our integrated contribution table.
- Raw Electronic Filings to Candidate Committees: Before filings to the FEC are processed, we download the raw submitted electronic filings. These have a different format and our process imports those into a uniform format in the Elections database.
- Raw Electronic Filings by Intermediary Committees: Unprocessed filings containing earmarked contributions or the share of a joint fundraiser require individual processing in order to get integrated into the same contribution data format in our system.
The Elections Data Platform imports the FEC individual contribution data to a:
- Centralized database with an openly accessible part for which the database model is introduced
- Raw part with tables for all the ETL and processing to integrate the data into the accessible database model
The ETL part of the database does not change any data, and it only helps to combine the various sources into the later data model. The open data model is kept very slim in order to allow access without complex SQL queries. To provide very fast access to the data, we extensively indexed the data model so users can quickly access over 60 million individual contributions with the large set of FEC and enriched attributes.
Next to the FEC, the Election Data Platform imports data from other official sources:
- The Social Security Agency (SSA) provides us with the historic name distributions that help identify the gender of a donor,
- The Center of Disease Control (CDC) provides the historic live table data, which together with the SSA's data, give us the age of a donor. This enables us to infer the most probable age at the time of a donation, which also gives us a donor's past age for past election cycles.
- The Census Bureau gives us data to estimate the ethnicity by a donor's last name and to locate a donors address using the Census Bureau's TIGERweb GeoServices.
- We are currently connecting the historic election results provided by MIT and Harvard, as well as the Bureau of Labor Statistics' (LBS) unemployment data, and the Federal Communications Commission's (FCC) media spending reports by the political candidates.
The Elections Data Platform offers several interfaces to access the enriched contribution data:
- SQL Interface: This interface resembles the direct SQL access to the enriched contribution data by executing SQL queries that the user can write only by knowing the data model. The SQL Interface comes with an UI with an online SQL editor. From the website, the user can download the data directly, or create URLs that contain a full SQL query for future downloads or an up-to-date data embedding. The SQL Interface's query endpoint is the most powerful connection to the data. Complex queries can take time so all data is cached for repeated use, resulting in a faster return time. With new FEC data, the cached data for old queries are updated, so there are no delays when running older queries.
- Data + Charts API: The API provides easy access over RESTful HTTP GET endpoints to give the user access to the most frequently used queries. This API also has endpoints that provide more advanced analytics. We intended to grow our API with user requests and suggestions.
- e.ventures Cohort Analysis Google Sheet: We created a Google Sheet with direct access to all the data in our Elections database to allow slicing and dicing the data. Users can conduct donor and donation cohort analysis, and visualize the data.
The next section on Database Model introduces the underlying open data model, the interfaces and the enrichment processes.
Updated 8 months ago
