For the first use case, we will visualize the donor-age distribution for the 2020 presidential election candidates Trump and Sanders. We will learn the different ways to retrieve the data from our Federal Election Commission (FEC) Data database, and transform the data into a format convenient for further use.
The e.ventures ObservableHQ collection contains a bar chart visualizing the Trump vs. Sanders donor-age distribution.
The most important part of the notebook are the SQL queries used to retrieve the donor-age distribution for both Trump and Sanders from the database. Those are the assignments to the variables
sqlSanders. Once the notebook is loaded or updated, we run the SQL queries on the FEC Data SQL Interface
/query endpoint. You can also run the SQL query by pasting the URL
https://elections.eventures.vc/query?token=<API_KEY>&sql=<sql_query> on a browser.
SQL queries can be constructed using the online SQL editor and also turned into links to the
Make sure you replace:
<API_KEY>with your API key given by an e.ventures team member.
<sql_query>with the SQL query.
The fetched results always cover the latest update on the contribution data in our database. If a new query is executed for the very first time, processing time can take several minutes or longer for very complex queries. However, the query will be cached in the SQL interface and the next run will only take seconds. Also, after a new month's filings arrive, the cache is updated with the old queries on the new data. Future requests will only require seconds.
The data for the bar chart is fetched from the database using an SQL query. The section explains how to create the SQL query to build Trump's donor-age distribution. The database model with tables and attributes is referenced above.
We split the SQL query into two queries for easier understanding:
Before we can look at the individual contributions, we need to gather the involved committees that receive the contributions and are registered with the FEC. These are the presidential candidate's committees (i.e., either principal campaign committees, or authorized committees for which a candidate reports financial activities). Since Trump and Sanders both only have one (principal campaign) committee that they file, this query returns only one committee ID.
The tables that are required from the database model are:
- This gives the information about which candidate is assigned to which committee.
- This gives the candidate ID for a partial name match.
Lastly, we have to state which cycle we need the information for.
This gives us the following SQL query:
SELECT CMTE_ID FROM CommitteeMaster ⟵ source table for committee information WHERE CYCLE = 2020 AND CAND_ID IN ( ⟵ specify cycle and candidate Id SELECT CAND_ID subquery to fetch candidate Id for name FROM CandidateMaster ⟵ source table for candidate information WHERE CYCLE = 2020 AND CAND_NAME LIKE '%Trump%' ⟵ matches "Trump, Donald J" );
In the inner query, we fetch the FEC candidate ID for any candidate with the string "Trump" somewhere in the name. The
% sign indicates that any characters are allowed before and after the name. Therefore, we also match "Trump, Donald J" (note that matches are case-insensitive). For the given candidate Id, we query the committee Ids for the 2020 election cycle. This gives us "C00580100" which is Trump's principal campaign committee.
With the information about the committee Id, we can now count the number of contributions for each age that donated to the committee in the 2020 election cycle.
The following SQL statement has an inner query for calculating the mean age of each donor and an outer query for aggregating the mean age for each age:
SELECT MEAN_AGE, COUNT(MEAN_AGE) AS NUM ⟵ defines CSV columns FROM ( subquery fetches mean age for unique individual contributors SELECT NAME_ID, ZIP_CODE, ROUND(SUM(AGE)/COUNT(*)) AS MEAN_AGE FROM ContributionsByIndividuals ⟵ source table WHERE CYCLE = 2020 AND CMTE_ID = 'C00580100' ⟵ Trump's PCC GROUP BY NAME_ID, ZIP_CODE ⟵ aggregation by unique donor ) A WHERE MEAN_AGE IS NOT NULL ⟵ exclude data with no enrichable age GROUP BY MEAN_AGE ⟵ Group for aggregation for outer mean ORDER BY MEAN_AGE; ⟵ Sort order
In the inner query,
ZIP_CODE are attributes used to define a unique donor.
NAME_IDidentifies a unique donor's first and last name (see above).
AGEis an attribute by which the FEC data has been enriched by the FEC Data data enrichment process.
Since a unique donor is aging during the two years of the election cycle, and we require a single number, we calculate the
MEAN_AGE for each donor. To do so, we aggregate the mean by
GROUP BY to calculate the rounded mean using the formula
Now, the inner query lists all of Trump's unique individual donors group by their mean age.
The outer query counts the donors for each year of age.
GROUP BY is used to define the attribute to aggregate (i.e., the attribute
MEAN_AGE that has been calculated in the inner year).
Finally, the missing (
NULL) values for the age are excluded by adding the constraint
WHERE MEAN_AGE IS NOT NULL, and the result is sorted ascending using
ORDER BY MEAN_AGE.
The SQL query from the previous section can be executed on the FEC Data database using the SQL Interface.
To do so,
- Go to
- Copy the SQL query from the previous section into the editor.
- Click on Download CSV to fetch a CSV file with the query result.
If a new query is executed, the result is not in the cache and has to be calculated from scratch. Once the query result is in the cache, it can be fetched in seconds.
Note: Make sure you replace
<API_KEY>with the API key given by an e.ventures team member.
With more than 37 million individual contributions for the 2020 election cycle (the contributions are counted until 31st May 2020), running complex SQL queries take time. Although the database is highly optimized for answering queries, calculations can take from seconds to minutes. For complex queries without restricting to subsets of data, this can take up to 15 or 30 minutes. Queries across cycles can take even longer. To optimize the speed, we store the results of a query executed in the browser cache, even if the user closes the browser window before the query execution is finished. Therefore, if the user does not want to wait, they can come back later and re-execute the query to fetch the result from the cache.
The SQL Interface offers the
/query endpoint to encode a full SQL query into a single URL and execute the query by visiting the URL (i.e. running the URL in the browser). A HTTP GET request on that URL will directly return the downloadable CSV result file. Users can also integrate the URL wherever a CSV file is expected (e.g, in the ObservableHQ notebooks). By clicking on Create Link in the SQL Interface, a link to the
/query endpoint with the encoded SQL query is created. Internally, there is no difference for the SQL query if it is run via the URL or entered manually in the online SQL editor.
In our example, the following URL is created:
Note: The URL also contains the
<API_KEY>used for creating the URL. Sharing the link also shares the API key.
The ObservableHQ notebook in our example that visualizes the Trump versus Sanders donor-age distribution embeds this link to fetch the data.
The FEC Data Data + Charts API offers an easier approach than SQL queries on the data.
As SQL queries can be complex and difficult to understand, the Data + Charts API gives endpoints for selected queries. For the donor-age distribution example, the
/committee/candidate/search endpoint fetches the committee Ids for the required candidates.
For example, to fetch the committee Ids for Biden, we run the API:
This returns a CSV file with Biden's principal campaign committee (his only committee).
CYCLE,CAND_ID,CAND_NAME,CMTE_ID,CMTE_NM 2020,P80000722,"BIDEN, JOSEPH R JR",C00703975,BIDEN FOR PRESIDENT
With the returned committee id (
CMTE_ID), the endpoint
/age_distribution/committee is used to fetch the corresponding donor-age distribution:
These two endpoints in the Data + Charts API give the same results as the queries using the SQL Interface. Under the hood, the endpoints execute the same SQL queries but hide this layer of complexity from the user. In the future, frequently used queries will be added to the Data + Charts API to give easy access to and collect important questions on the contributions data.
In the example, the
/age_distribution/committee endpoint can also fetch data from several committees or even several cycles at once. For example, to fetch the donor-age distributions for Biden and Sanders, we run the API:
As always, the result is cached once a query is executed for the first time.
Updated over 1 year ago