Database Model
Introduction to the e.ventures Data Database Model
Overview
We kept the Federal Election Commission (FEC) Data Database Model simple to allow access to the data using the SQL interface without complex queries. Nonetheless, the model contains the full complexity of the parts of the FEC data schema that is related to contributions by individuals. Our three core database table schema are oriented towards the published data schemas of the FEC bulk data which contains a detailed description of each of the attributes.
The main tables of our data model have a similar schema as the bulk download files [Candidate master](https://www.fec.gov/campaign-finance-data/all-candidates-file-description/)
, [Committee master](https://www.fec.gov/campaign-finance-data/committee-master-file-description/)
and [Contributions by individuals](https://www.fec.gov/campaign-finance-data/contributions-individuals-file-description/)
.
In this section, will give an overview of the schema and explain the schema attributes.
Note: The data model can also be downloaded through the SQL interface using the SQL statements
SHOW TABLES
andDESCRIBE <table_name>
. For example, the URLhttps://data.eventures.vc/query?token<api_key>&sql=DESCRIBE%20ContributionsByIndividuals%3B
returns the schema for the main tables with the contributions by individuals.
Schema: CandidateMaster
This schema implements the FEC bulk data schema. It clones the FEC data that is accessible through the FEC API without further processing. Only presidential candidates are included in our database for now.
Our schema adds the CYCLE
attribute for the corresponding two-year election cycle in order to include all data in one table. The further attributes describe a candidate's name, FEC Id, party affiliation, the office sought (currently it is only President), address, year of election, candidate's status and if they are an incumbent or a challenger. This schema is uniquely identified with the primary key pair (CAND_ID
, CYCLE
), and is useful to find a candidate by name.
Schema: CommitteeMaster
This schema implements the FEC bulk data schema. It clones the FEC data accessible through the FEC API without further processing. Only committees relevant to presidential candidate contributions by individuals are included for now. These are the:
- Candidate's principal campaign committee (PCC)
- Committees authorized by a candidate
- Joint fundraising committees from which the candidate receives a share
- Conduit committees forwarding an earmarked individual contribution to the candidate's PCC
These committees are all part of the flow of donations for individual contributions. We do not include PACs that are not authorized by a candidate and are funding, marketing campaigns, etc, independently.
The schema contains details for a committee: name, FEC Id, address, designation (e.g., Joint Fundraising Committee), type (e.g., presidential campaign committee), filing frequency, etc. Each committee is uniquely described by the primary key pair (CMTE_ID
, CYCLE
). The connection between a committee and a candidate is defined by the pair (CAND_ID
, CYCLE
). A candidate's committees (PCCs and authorized committees) can be selected by choosing a candidate Id and a cycle from the CommitteeMaster schema.
Schema: ContributionsByIndividuals
This schema is the main part of the e.ventures Data Platform. It combines the contributions by individuals that come in from the different data import processes. The FEC attributes in the table are oriented towards the FEC's bulk data format.
Each contribution is assigned to a committee Id which it is filed by, earmarked for, or transferred to (JFCs). This committee Id is the final recipient of the contribution or, mathematically expressed, the sink of the money flow graph. Using the key pair (CMTE_ID
, CYCLE
), the contributions can be connected to the committee data schema and then to the candidate schema. Using this chain, it is possible to for example, select all contributions for a candidate who is chosen by name.
The FEC data part of the schema contains the donation details transaction amount, transaction date, city, state, zip code, employer, occupation of the donor as disclosed to the FEC. Due to privacy reasons, we do not share the name of the donor and his or her street address. Instead, we create a unique NAME_ID
for each unique pair of first name and last name. This is sufficient to further process individual donors. If the user is interested in more details about a donor, the official FEC website provides these details. The full name, however, is used internally for data enrichment.
Each individual contribution also contains all FEC identifiers that allow our users to clearly connect our data to the official FEC data. Due to the different processes of the FEC, there are several identifiers which allow us to uniquely identify a contribution:
SUB_ID
: This is a unique identifier given to each contribution after the contribution has been processed by the FEC. Also, other transactions like expenditures receive a submission Id.FILE_NUM
: A unique file number is given to each report that is filed with the FEC. Amendments to previous reports receive a new file number. The file number alone does not identify a contribution.TRANSACTION_ID
: Each transaction filed in a report is identified by a transaction Id. These are given to contributions as well as to other transactions. The combination of file number and transaction Id uniquely identifies a contribution or other transaction. An amended report references the same transactions by the same transaction Ids as that of the report that was amended.
Please view the FEC information for more details.
There are some other identifiers, like the image_number of the PDF file that has been created from the field reports. However, the three mentioned identifiers (SUB_ID
, FILE_NUM
, TRANSACTION_ID
) are the most important ones and allow us to track each contribution. A recently filed contribution can only be identified by file number and transaction Id if it has not yet been assigned a submission Id. For contributions of older cycles when paper filings were popular, these different identifiers are not always present and correct.
The SUB_ID
can easily be used to verify single contributions in the Elections database. The FEC API allows you to directly download contributions using this Id.
The enriched data part of the schema contains the attributes that are added to the FEC source data. The enriched attributes are:
AGE
: estimated from a donor's first name and the date of donationGENDER
: estimated from a donor's first nameETHNICITY
: estimated form a donor's first name and year of transaction date (to choose census)PCT_<ETHNICITY>
: the probability for each ethnicity descriptor as given by census.govFIPS, COUNTY
,TRACT_CD
,BLOCK_CD
: the geocoded location and census details as fetched from census.govHH_INCOME_MEDIAN
: the median household income of a donor's county
The attribute IS_CALCULATED
specifies if the amount of a donation is calculated using the disclosed allocation of a candidate's share from a joint fundraising committee or reported directly by the receiving committee.
Schema: FinancialReportsCommittee
The reporting of contributions, disbursement and other financial transactions uses the FEC forms 3/3X/3P. The detailed transactions, such as the contributions by individuals, are filed in schedules to these forms. This information is available through the FEC web interface and the FEC API.
The table FinancialReportsCommittee
mirrors the contribution-related filed form data to allow the convenient connection of contributions and the reports they were filed in. Here are two use cases:
- Using the
FILE_NUMBER
in an individual contribution dataset, the original report it was filed with can be queried with information like the report type, the coverage period, or the amendment status - Using the field
TTL_IND_CONTR_PERIOD
that contains the total of the individual contributions in a reporting period, the amount of the individual contributions disclosed with itemized donors can be compared to the total of the individual contributions received by a candidate. This allows the user to project the disclosed numbers on the totals and to account for the different percentages of disclosure.
Schema: CumulativeDonors
The Schema CumulativeDonors
(see image under the section Schema: FinancialReportsCommittee) contains information beyond the raw and enriched FEC contribution data. The attribute CUMUL_DONORS
gives the number of unique donors for a candidate identified by the FEC candidate Id in the field CAND_ID
from the beginning of a two year election cycle up to REF_DATE
. Uniqueness is defined by both Name_ID
and ZIP_CODE
of a donor/donation. The data is available for the top twenty candidates receiving the most total amount of contributions.
Calculating the number of cumulative donors for each day of an election cycle is very computing-intensive. This is the reason we provide the pre-calculated data in this table, although it can be calculated by our users with queries on the contribution data. Once new data is available from forms filed with the FEC, the data in this schema is updated.
This pre-calculated schema is an example of caching more complex data in the system. This schema is just a beginning and we welcome requests for pre-calculated derived information that helps to answer complex questions on the data.
Updated about 4 years ago