CBO’s analyses of the distribution of household income and federal taxes are based on administrative tax data from the Internal Revenue Service’s Statistics of Income (SOI) and on household survey data from the Census Bureau’s Current Population Survey (CPS). Those two data sources contain complementary information. The SOI data contain detailed income information for those who file taxes each year but lack information for those who do not file taxes; the data also lack information about nontaxable sources of income. The CPS data contain information about a wide range of nontaxable sources of income for all U.S. households, regardless of whether they file tax returns in a given year.
By statistically combining the information from those two sources, CBO creates a comprehensive database of income sources for all U.S. households to serve as the foundation for its distributional analyses. This presentation provides an overview of the algorithm that CBO uses to statistically match the SOI and CPS data, and it provides some summary statistics on the characteristics of nonfiling tax units.
Presentation by Kevin Perese, an analyst in CBO's Tax Analysis Division, at a Washington Center for Equitable Growth workshop on distributional national accounts.
Statistically Matching Administrative Tax Data With Household Survey Data
1. Congressional Budget Office
Statistically Matching
Administrative Tax Data
With Household Survey Data
Presentation at a Workshop Organized by the
Washington Center for Equitable Growth
July 21, 2017
Kevin Perese
Tax Analysis Division
As developmental work for analysis for the Congress, the information in this presentation is preliminary and is
being circulated to stimulate discussion and critical comment.
2. 1CONGRESSIONAL BUDGET OFFICE
Why Is It Necessary to
Match Tax and Survey Data?
SOI Data
• Tax filers
• Taxable income
CPS Data
Nonfilers
CPS Data
Transfer
income
All income sources for
nonfilers come from
the CPS.
“SOI” is the Internal Revenue Service’s Statistics of Income. “CPS” is the Census Bureau’s Current Population Survey.
3. 2CONGRESSIONAL BUDGET OFFICE
A Five-Step Process
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
4. 3CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
The unit of analysis in CBO distribution reports is the CPS household.
5. 4CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
The unit of analysis in CBO distribution reports is the CPS household.
However, there can be multiple tax units in a household.
6. 5CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
An algorithm is used to create tax units based on CPS relationship, age, and
income variables.
The unit of analysis in CBO distribution reports is the CPS household.
However, there can be multiple tax units in a household.
7. 6CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
0 1 2 3+
Number of Children
0 1 2 3+
Married
Single
Nonelderly
One Elderly
Nonelderly
Elderly
0 1+
Dependents
0 1+
Nonelderly
Elderly
0
0
0Two Elderly
8. 7CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Total income = Wages
+ Interest and dividends
+ Business income
+ Rental income
+ Unemployment insurance
+ Pension income
+ Capital gains
+ Social Security benefits
+ Other income
First, using SOI data, define total income.
9. 8CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Total income = 𝛽𝛽0 * Wages
+ 𝛽𝛽1 * Interest and dividends
+ 𝛽𝛽2 * Business income
+ 𝛽𝛽3 * Rental income
+ 𝛽𝛽4 * Unemployment insurance
+ 𝛽𝛽5 * Pension income
+ α * Intercept
+ Error Term
Capital gains
Social Security benefits
Other income
Then, in each year and each demographic cell,
estimate the following regression (using SOI data):
Variables that are in both
the SOI and the CPS
10. 9CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Total income = ̂𝛽𝛽0 * Wages
+ ̂𝛽𝛽1 * Interest and dividends
+ ̂𝛽𝛽2 * Business income
+ ̂𝛽𝛽3 * Rental income
+ ̂𝛽𝛽4 * Unemployment insurance
+ ̂𝛽𝛽5 * Pension income
+ �𝛼𝛼 * Intercept
Finally, calculate predicted total income in the CPS and the SOI,
using the estimated regression coefficients:
11. 10CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Demographic Celli
CPS File SOI File
Record
ID
Record
ID
1 A
2 B
3 C
4 D
E
Within each demographic cell,
each file is sorted from
highest to lowest
predicted total income.
12. 11CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Demographic Celli
CPS File SOI File
Record
ID
Sample
Weight
Record
ID
Sample
Weight
A 1
B 1
1 5 C 1
D 3
2 3
E 3
3 5
4 3
13. 12CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Demographic Celli
CPS File SOI File Merged File
Record
ID
Sample
Weight
Record
ID
Sample
Weight
Record
ID
Sample
Weight
A 1 1A 1
B 1
1 5 C 1
D 3
2 3
E 3
3 5
4 3
14. 13CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Demographic Celli
CPS File SOI File Merged File
Record
ID
Sample
Weight
Record
ID
Sample
Weight
Record
ID
Sample
Weight
A 1 1A 1
B 1 1B 1
1 5 C 1
D 3
2 3
E 3
3 5
4 3
15. 14CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Demographic Celli
CPS File SOI File Merged File
Record
ID
Sample
Weight
Record
ID
Sample
Weight
Record
ID
Sample
Weight
A 1 1A 1
B 1 1B 1
1 5 C 1 1C 1
D 3
2 3
E 3
3 5
4 3
16. 15CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Demographic Celli
CPS File SOI File Merged File
Record
ID
Sample
Weight
Record
ID
Sample
Weight
Record
ID
Sample
Weight
A 1 1A 1
B 1 1B 1
1 5 C 1 1C 1
1D 2
D 3
Pick up the remaining
weight on the first CPS
record, and split the
weight on the fourth
SOI record.
2 3
E 3
3 5
4 3
17. 16CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Demographic Celli
CPS File SOI File Merged File
Record
ID
Sample
Weight
Record
ID
Sample
Weight
Record
ID
Sample
Weight
A 1 1A 1
B 1 1B 1
1 5 C 1 1C 1
1D 2
D 3
2D 1
2 3 Pick up the remaining
weight on the fourth
SOI record, and split
the weight on the
second CPS record.
E 3
3 5
4 3
18. 17CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Demographic Celli
CPS File SOI File Merged File
Record
ID
Sample
Weight
Record
ID
Sample
Weight
Record
ID
Sample
Weight
A 1 1A 1
B 1 1B 1
1 5 C 1 1C 1
1D 2
D 3
2D 1
2 3
2E 2
E 3
And so on…
3 5
4 3
19. 18CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Demographic Celli
CPS File SOI File Merged File
Record
ID
Sample
Weight
Record
ID
Sample
Weight
Record
ID
Sample
Weight
A 1 1A 1
B 1 1B 1
1 5 C 1 1C 1
1D 2
D 3
2D 1
2 3
2E 2
E 3
3E 1
…until all SOI records
(portions of SOI
sample weights) have
been exhausted.
3 5
4 3
20. 19CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Demographic Celli
CPS File SOI File Merged File
Record
ID
Sample
Weight
Record
ID
Sample
Weight
Record
ID
Sample
Weight
A 1 1A 1
B 1 1B 1
1 5 C 1 1C 1
1D 2
D 3
2D 1
2 3
2E 2
E 3
3E 1
3_ 4
3 5
4 3 4_ 3
Nonfilers
21. 20CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Demographic Celli
CPS File
Record
ID
1
2
A household with
two tax units
22. 21CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Demographic Celli
CPS File SOI File Merged File
Record
ID
Record
ID
Record
ID
1 A 1A
2 B 1B
A household with
two tax units
C 1C
D 1D
E 2D
2E
23. 22CONGRESSIONAL BUDGET OFFICE
1 2 3 4 5
Create
CPS Tax Units
Define
Demographic Cells
Estimate
Regressions
Sort and
Match Files
Reassemble
Households
Demographic Celli
CPS File SOI File Merged File Household File
Record
ID
Record
ID
Record
ID
Record
ID
1 A 1A 1A-2D
2 B 1B 1A-2E
A household with
two tax units
C 1C
D 1D 1B-2D
E 1B-2E
2D
2E 1C-2D
1C-2E
The Household file has every combination of
CPS-SOI matches in the Merged file, with each
household record getting a scaled weight so
that the sum of weights is the same as the
original CPS household weight.
1D-2D
1D-2E
25. 24CONGRESSIONAL BUDGET OFFICE
A Taxonomy of Tax Units
245 million tax units
147 million
tax-filing units
97 million
nonfiling tax units
In 2013, there were:
26. 25CONGRESSIONAL BUDGET OFFICE
A Taxonomy of Tax Units
245 million tax units
84
million
147 million
tax-filing units
97 million
nonfiling tax units
14
million
In 2013, there were:
nondependent
tax units
dependent
tax units
27. 26CONGRESSIONAL BUDGET OFFICE
Some Results
4.8 million
2.1 million
1.1 million
5.7 million
Number of Nondependent, Nonfiling Tax Units
Married Single
Nonelderly Elderly Nonelderly Elderly
28. 27CONGRESSIONAL BUDGET OFFICE
Some Results
Average Income of Nondependent, Nonfiling Tax Units
$5,000
$10,000
$15,000
$20,000
Wages
Other
Income
Social
Security Income
Other
Transfers
= less than $500.
Married Single
Nonelderly Elderly Nonelderly Elderly