Donor-Age Distribution for Candidates and Committees
Visualizing the Donor-Age Distribution
For the first use case, we will visualize the donor-age distribution for the 2020 presidential election candidates Trump and Sanders. We will learn the different ways to retrieve the data from our Federal Election Commission (FEC) Data database, and transform the data into a format convenient for further use.
The e.ventures ObservableHQ collection contains a bar chart visualizing the Trump vs. Sanders donor-age distribution.
The steps to build this chart are simple and straightforward to enable users to quickly fork and modify it for their own visualization. Code on observableHQ is written in an adapted JavaScript dialect and aims at constructing SVG graphics. Most of the code in the notebook customizes the chart and reshapes the data arrays.
The most important part of the notebook are the SQL queries used to retrieve the donor-age distribution for both Trump and Sanders from the database. Those are the assignments to the variables sqlTrump
and sqlSanders
. Once the notebook is loaded or updated, we run the SQL queries on the FEC Data SQL Interface /query
endpoint. You can also run the SQL query by pasting the URL https://elections.eventures.vc/query?token=<API_KEY>&sql=<sql_query>
on a browser.
SQL queries can be constructed using the online SQL editor and also turned into links to the /query
on https://elections.eventures.vc/ui?token=<API_KEY>
.
Make sure you replace:
<API_KEY>
with your API key given by an e.ventures team member.<sql_query>
with the SQL query.
Fetching the results
The fetched results always cover the latest update on the contribution data in our database. If a new query is executed for the very first time, processing time can take several minutes or longer for very complex queries. However, the query will be cached in the SQL interface and the next run will only take seconds. Also, after a new month's filings arrive, the cache is updated with the old queries on the new data. Future requests will only require seconds.
Querying the Donor-Age Distribution with SQL
The data for the bar chart is fetched from the database using an SQL query. The section explains how to create the SQL query to build Trump's donor-age distribution. The database model with tables and attributes is referenced above.
We split the SQL query into two queries for easier understanding:
Retrieving the committee Id(s) for Trump's authorized committee(s) for the 2020 election cycle
Before we can look at the individual contributions, we need to gather the involved committees that receive the contributions and are registered with the FEC. These are the presidential candidate's committees (i.e., either principal campaign committees, or authorized committees for which a candidate reports financial activities). Since Trump and Sanders both only have one (principal campaign) committee that they file, this query returns only one committee ID.
The tables that are required from the database model are:
CommitteeMaster
- This gives the information about which candidate is assigned to which committee.
CandidateMaster
.- This gives the candidate ID for a partial name match.
Lastly, we have to state which cycle we need the information for.
This gives us the following SQL query:
SELECT CMTE_ID
FROM CommitteeMaster āµ source table for committee information
WHERE CYCLE = 2020 AND CAND_ID IN ( āµ specify cycle and candidate Id
SELECT CAND_ID subquery to fetch candidate Id for name
FROM CandidateMaster āµ source table for candidate information
WHERE CYCLE = 2020 AND
CAND_NAME LIKE '%Trump%' āµ matches "Trump, Donald J"
);
In the inner query, we fetch the FEC candidate ID for any candidate with the string "Trump" somewhere in the name. The %
sign indicates that any characters are allowed before and after the name. Therefore, we also match "Trump, Donald J" (note that matches are case-insensitive). For the given candidate Id, we query the committee Ids for the 2020 election cycle. This gives us "C00580100" which is Trump's principal campaign committee.
Retrieving the number of contributions for each age
With the information about the committee Id, we can now count the number of contributions for each age that donated to the committee in the 2020 election cycle.
The following SQL statement has an inner query for calculating the mean age of each donor and an outer query for aggregating the mean age for each age:
SELECT MEAN_AGE, COUNT(MEAN_AGE) AS NUM āµ defines CSV columns
FROM ( subquery fetches mean age for unique individual contributors
SELECT NAME_ID, ZIP_CODE, ROUND(SUM(AGE)/COUNT(*)) AS MEAN_AGE
FROM ContributionsByIndividuals āµ source table
WHERE CYCLE = 2020 AND CMTE_ID = 'C00580100' āµ Trump's PCC
GROUP BY NAME_ID, ZIP_CODE āµ aggregation by unique donor
) A
WHERE MEAN_AGE IS NOT NULL āµ exclude data with no enrichable age
GROUP BY MEAN_AGE āµ Group for aggregation for outer mean
ORDER BY MEAN_AGE; āµ Sort order
In the inner query, NAME_ID
and ZIP_CODE
are attributes used to define a unique donor.
NAME_ID
identifies a unique donor's first and last name (see above).AGE
is an attribute by which the FEC data has been enriched by the FEC Data data enrichment process.
Since a unique donor is aging during the two years of the election cycle, and we require a single number, we calculate the MEAN_AGE
for each donor. To do so, we aggregate the mean by NAME_ID
and ZIP_CODE
using GROUP BY
to calculate the rounded mean using the formula ROUND(SUM(AGE)/COUNT(*))
.
Now, the inner query lists all of Trump's unique individual donors group by their mean age.
The outer query counts the donors for each year of age. GROUP BY
is used to define the attribute to aggregate (i.e., the attribute MEAN_AGE
that has been calculated in the inner year).
Finally, the missing (NULL
) values for the age are excluded by adding the constraint WHERE MEAN_AGE IS NOT NULL
, and the result is sorted ascending using ORDER BY MEAN_AGE
.
Querying the Donor-Age Distribution from the SQL Interface
The SQL query from the previous section can be executed on the FEC Data database using the SQL Interface.
To do so,
- Go to
https://elections.eventures.vc/ui?token=<API_KEY>
. - Copy the SQL query from the previous section into the editor.
- Click on Download CSV to fetch a CSV file with the query result.
If a new query is executed, the result is not in the cache and has to be calculated from scratch. Once the query result is in the cache, it can be fetched in seconds.
Note: Make sure you replace
<API_KEY>
with the API key given by an e.ventures team member.
With more than 37 million individual contributions for the 2020 election cycle (the contributions are counted until 31st May 2020), running complex SQL queries take time. Although the database is highly optimized for answering queries, calculations can take from seconds to minutes. For complex queries without restricting to subsets of data, this can take up to 15 or 30 minutes. Queries across cycles can take even longer. To optimize the speed, we store the results of a query executed in the browser cache, even if the user closes the browser window before the query execution is finished. Therefore, if the user does not want to wait, they can come back later and re-execute the query to fetch the result from the cache.
The SQL Interface offers the /query
endpoint to encode a full SQL query into a single URL and execute the query by visiting the URL (i.e. running the URL in the browser). A HTTP GET request on that URL will directly return the downloadable CSV result file. Users can also integrate the URL wherever a CSV file is expected (e.g, in the ObservableHQ notebooks). By clicking on Create Link in the SQL Interface, a link to the /query
endpoint with the encoded SQL query is created. Internally, there is no difference for the SQL query if it is run via the URL or entered manually in the online SQL editor.
In our example, the following URL is created:
https://elections.eventures.vc/query?token=<API_KEY>&sql=SELECT%20MEAN_AGE%2C%20COUNT(MEAN_AGE)%20AS%20NUM_UNIQUE_DONORS%0AFROM%20(%20%0A%20%20%20%20SELECT%20NAME_ID%2C%20ZIP_CODE%2C%20ROUND(SUM(AGE)%2FCOUNT(*))%20AS%20MEAN_AGE%20%0A%20%20%20%20FROM%20ContributionsByIndividuals%0A%20%20%20%20WHERE%20CYCLE%20%3D%202020%20AND%20CMTE_ID%20%3D%20'C00580100'%0A%20%20%20%20GROUP%20BY%20NAME_ID%2C%20ZIP_CODE%0A)%20A%0AWHERE%20MEAN_AGE%20IS%20NOT%20NULL%0AGROUP%20BY%20MEAN_AGE%0AORDER%20BY%20MEAN_AGE%3B
Note: The URL also contains the
<API_KEY>
used for creating the URL. Sharing the link also shares the API key.
The ObservableHQ notebook in our example that visualizes the Trump versus Sanders donor-age distribution embeds this link to fetch the data.
Querying the Donor-Age Distribution from the Data+Charts API
The FEC Data Data + Charts API offers an easier approach than SQL queries on the data.
As SQL queries can be complex and difficult to understand, the Data + Charts API gives endpoints for selected queries. For the donor-age distribution example, the /committee/candidate/search
endpoint fetches the committee Ids for the required candidates.
Running the API
For example, to fetch the committee Ids for Biden, we run the API:
http://elections.eventures.vc/data-charts/v1/committee/candidate/search?name_part=Biden& cycle=2020&token=<API_KEY>
This returns a CSV file with Biden's principal campaign committee (his only committee).
CYCLE,CAND_ID,CAND_NAME,CMTE_ID,CMTE_NM
2020,P80000722,"BIDEN, JOSEPH R JR",C00703975,BIDEN FOR PRESIDENT
With the returned committee id (CMTE_ID
), the endpoint /age_distribution/committee
is used to fetch the corresponding donor-age distribution:
https://elections.eventures.vc/data-charts/v1/age_distribution/committee? committee_id=C00703975&cycle=2020&token=<API_KEY>
.
These two endpoints in the Data + Charts API give the same results as the queries using the SQL Interface. Under the hood, the endpoints execute the same SQL queries but hide this layer of complexity from the user. In the future, frequently used queries will be added to the Data + Charts API to give easy access to and collect important questions on the contributions data.
In the example, the /age_distribution/committee
endpoint can also fetch data from several committees or even several cycles at once. For example, to fetch the donor-age distributions for Biden and Sanders, we run the API:
https://elections.eventures.vc/data-charts/v1/age_distribution/committee? committee_id=C00703975&committee_id=C00696948&cycle=2020&token=<API_KEY>
As always, the result is cached once a query is executed for the first time.
Updated about 4 years ago