Enricher API
Overview
After the individual contribution data has been imported from the Federal Election Commission (FEC) API into the FEC Data Platform, we further enrich the data with additional attributes from official sources. The enriched data is part of the data model.
Our data enrichment process is intended to be 100% transparent. We publish the Enricher API that automates the enriching workflow if no public API is available on the census TIGERweb API, or if the enrichment process is not trivial (for example, calculating the median household income).
The Enricher API provides endpoints for enrichment for:
The API supports both:
GET
requests for the use with single input queryPOST
requests for larger bulk processing
We make the individual contribution data more meaningful using the following attributes in our Enricher API:
Attribute | Description |
---|---|
AGE | This is estimated from a donor's first name and the date of donation. |
GENDER | This is estimated from a donor's first name. |
ETHNICITY | This is estimated from a donor's first name and year of transaction date (to choose census). |
PCT_<ETHNICITY> | The probability for each ethnicity descriptor as given by census.gov. |
FIPS , COUNTY , TRACT_CD , BLOCK_CD | The geocoded location and census details that is fetched from census.gov. |
HH_INCOME_MEDIAN | The median household income of a donor's county. |
Transparency
Transparency of all our import and enrichment processes is extremely important to us. All the data is checkable, reproducible and involves no black-box workflows.
We guarantee transparency by:
- Using only official (.gov) data sources,
- Describing all our enrichment steps in detail
- Giving you access to the Enricher API , a helpful tool for the enrichment processes
Methods and Sources for Data Enrichment
Overview
In this section, we'll share how we derive the following attributes of the donors:
Age and Gender
Calculating Gender of the Donor
The Social Security Agency (SSA) publishes the numbers of applicants for social security cards for each first name with gender for each year starting in the year 1880. Note that names with occurrences lower than 5 times are excluded. From this data, the most probable gender for a first name can be derived.
Calculating Age of the Donor
The age of a donor is relative to a reference date which is the day of the donation. For example, if a donor donated on Jan 1, 2020, we will calculate the age of the donor on the Jan 1, 2020.
The deduction of the age for a given first name requires two additional sources of data, namely the:
Here are the steps we took to calculate age of the donor for a given first name:
- Get the number of donors with a specific name
- With the SSA data from above, we get the number of babies born with specific names.
- Calculate age distribution
- To estimate the age for a name, we need to know the age distribution. This requires taking the life expectancy into account using the CDC's actuarial life tables and the SSA's historic life tables. The life tables define the number of deaths each year given the date of birth. To get a smooth distribution curve, we estimate a daily death rate. The results give us the percentage of living and dead people in the age distribution. We only include the living since only those who can donate give us the real estimated age distribution.
- Use the median age
- We then use the median age of this age distribution as the most likely age for a donor's first name at a donation date. This age is used to enrich the contribution data.
Note: Enrichment is only possible if a donor's first name is part of the published SSA name data. The remaining donors are not assigned an age or gender.
Ethnicity
We use the US Census Bureau data for the 2000 census and 2010 census to enrich the data with the donor's ethnicity. The Census data counts the recorded ethnicities for each last name occurring at least 100 times in the census.
For a given donor's last name, we enrich the Elections data is using the:
- Most frequent ethnicity in the census (using the field
ETHNICITY
) - Probability for each ethnicity (as categorized by the Census) derived from the data (using the field
PCT_<ETHNICITY>
)
We use both data points above to estimate the ethnicity of a particular donor and the estimated ethnicity distribution in a group of donors. We also use the closest Census year to the donation date. For example, if a donor donated in the years 2005 and earlier, we use the Census 2000 data; if a donor donated in the years 2006 and later, we use the Census 2010 data.
Geocoding and Census Data
For geocoding the donors' addresses, we use the US Census Bureau's TIGERweb services.
We use TIGERweb's RESTful API to access to US Census Bureau's TIGER database. This enables us to geocode an address and enrich the FEC Data with the:
- Name and FIPS code of the county (which is useful for county-level mapping)
- Census Tract and Blocked code to further narrow down the donor's location
The TIGERweb services give us the:
- Normalization of the donor's addresses.
- For example, the reported addresses "123 Main ST" and "123 Main Street, APT 4" will both be normalized to "123 Main Street". This enables us to assign different donations by the same donor with different spellings of the same address to the same location.
- Ability to assign different donors at the same location to a unique address.
- The unique address will be referenced by the field
ADDRESS_ID
in the data (currently planned featured).
- The unique address will be referenced by the field
- Latitude and longitude pair of a donor's address
- However, we do not include this in the FEC Data database for privacy reasons.
If the TIGERweb services return no geocoding result, the contribution data is not enriched, and have NULL
values in our database. This can be due to various reasons, such as the address is a post office box, the address is not (yet) in the TIGER database, the address is misspelt, or the address is outside of the U.S..
Household Income
Our source is the US Census Bureau's median household income. This is the 2018 Poverty and Median Household Income Estimates for Counties, States, and National, and is published bu the U.S. Census Bureau, Small Area Income and Poverty Estimates (SAIPE) Program on December 2020.
We include the median household income per county and enrich each contribution data set by the median household income of the county geocoded using the TIGERweb services.
Pre-requisites
The API specification with a description of the parameters is available at our interactive Enricher API Swagger page.
To test the API on this page, make sure you authenticate your API key by completing steps 1 to 5.
Age and Gender
The age and gender endpoints give age and gender estimates for a given first name and a date in the format yyyy-mm-dd
. Based on the number of new borns' first names per year and the actuarial tables from SSA, we estimate the age distribution at the given date and the most likely gender.
For more details on the calculation, refer to the age and gender section.
Age and gender estimates for given name(s) and reference date(s)
Description
Returns age and gender estimates for given name(s) and reference date(s).
Endpoint
POST /age_gender
curl -X POST "https://data.eventures.vc/enrich/v1/age_gender?apiKey=<API_KEY>" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"queries\":[{\"date\":\"2020-08-01\",\"name\":\"Joe\"}]}"
https://data.eventures.vc/enrich/v1/age_gender?apiKey=<API_KEY>
Parameters
No parameters.
Request body
A JSON object containing names and reference dates.
{
"queries": [
{
"date": "2020-08-01",
"name": "Joe"
}
]
}
You can also enter multiple dates and names.
multiple dates and names
multiple dates and names.
Response
Status: 200 OK
{
"results": [
{
"age": 60,
"gender": "m"
}
]
}
access-control-allow-origin: https://data.eventures.vc
cache-control: private
content-encoding: gzip
content-length: 79
content-type: application/json
date: Sat, 01 Aug 2020 04:27:07 GMT
server: Google Frontend
status: 200
vary: Accept-Encoding, Origin
x-cloud-trace-context: cb213a104fd56aec06e44694a332415e;o=1
Gender and age distribution for a given name and a reference date
Description
Returns gender and age distribution for a given name and a reference date.
Endpoint
GET /age_gender/name/{name}/date/{date}
Note: Replace
{name}
and{date}
with the name and date that you'd like to query for. The format for the date isyyyy-mm-dd
.
curl -X GET "https://data.eventures.vc/enrich/v1/age_gender/name/Alexandria/date/2020-08-01?apiKey=<API_KEY>" -H "accept: text/html"
https://data.eventures.vc/enrich/v1/age_gender/name/Alexandria/date/2020-08-01?apiKey=<API_KEY>
Parameters
Name | Type | Description |
---|---|---|
name | string | The name to return the age and gender estimates for. |
date | string | The reference date for the estimation. The format for the date is yyyy-mm-dd . |
Response
Status: 200 OK
Response body: A HTML file displaying the result.
{
"html": {
"summary": "Example result",
"value": "<html><body><ul><li>item 1</li><li>item 2</li></ul></body></html>"
}
}
Ethnicity
The ethnicity endpoints give the most probable ethnicity and the probability for each ethnic group. The underlying data comes from the 2000 and 2010 Census.
The ethnic groups defined in the Census data are:
pct2prace
: Percent Non-Hispanic Two or More Racespactaian
: Percent Non-Hispanic American Indian and Alaska Native Alonepctapi
: Percent Non-Hispanic Asian and Native Hawaiian and Other Pacific Islander Alonepctblack
: Percent Non-Hispanic Black or African American Alonepcthispanic
: Percent Hispanic or Latino originpctwhite
: Percent Non-Hispanic White Alone
Ethnicities for a surname and a year given in a JSON file
Description
Returns the ethnicities for a surname and a year given in a JSON file.
Endpoint
POST /ethnicity/
curl -X POST "https://data.eventures.vc/enrich/v1/ethnicity?apiKey=<API_KEY>" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"queries\":[{\"surname\":\"cortez\",\"year\":2020}]}"
https://data.eventures.vc/enrich/v1/ethnicity?apiKey=<API_KEY>
Parameters
No parameters.
Response
Status: 200 OK
Response body: An JSON with the estimates.
{
"results": [
{
"pct2prace": 0.44,
"pctaian": 0.29,
"pctapi": 2.92,
"pctblack": 0.7,
"pcthispanic": 89.65,
"pctwhite": 6,
"race": "hispanic",
"surname": "cortez"
}
]
}
Note:
"pct2prace" : 0.44
means that there are 0.44 Percent Non-Hispanic with Two or More Race with the surnamecortez
. For more details, please refer to the ethnic group fields."race": "hispanic"
means that a majority of individuals with the surnamecortez
belong to the hispanic race.
Ethnicities for a surname and a year
Description
Returns the ethnicities for a surname and a year.
Endpoint
GET /ethnicity/surname/{surname}/year/{year}
Note: Replace
{surname}
and{year}
with the name and year that you'd like to query for. The format for the year isyyyy
.
curl -X GET "https://data.eventures.vc/enrich/v1/ethnicity/surname/ahmed/year/2020?apiKey=<API_KEY>" -H "accept: application/json"
https://data.eventures.vc/enrich/v1/ethnicity/surname/ahmed/year/2020?apiKey=<API_KEY>
Parameters
Name | Type | Description |
---|---|---|
surname | string | The surname to return estimates for ethnicity. |
year | integer | The reference year. The format for the year is yyyy . |
Response
Status: 200 OK
Response body: An JSON with the estimates.
{
"pct2prace": 3.96,
"pctaian": 0.36,
"pctapi": 56.54,
"pctblack": 22.02,
"pcthispanic": 1.44,
"pctwhite": 15.69,
"race": "api",
"surname": "ahmed"
}
Note:
"pctapi": 56.54
means that there are 56.54 Percent Non-Hispanic Asian and Native Hawaiian and Other Pacific Islander Alone with the surnameahmed
. For more details, please refer to the ethnic group fields."race": "api"
means that a majority of individuals with the surnameahmed
belong to the hispanic race.
Updated about 4 years ago