Measuring the Tax Gap

Measuring the Tax Gap Brian Erard B. Erard & Associates BEandAssoc@Aol.com

Outline of Presentation • Tax Gap Overview • Measures based on random audits • Sample design and considerations • Application of design-based measures • Application of model-based measures • Measures based on operational audit data • Measures based on comparisons of surveys and administrative data • Other creative approaches

Conceptual Issues • How is the tax gap defined? • What are its components? • Why attempt measure it? • How does it compare to the underground economy? • How broad a scope should the measure cover?

How is the tax gap defined? • Gross – the difference between the tax that taxpayers should pay and the tax they actually pay on a timely basis • Net – the difference between the gross tax gap and taxes collected through enforcement and late payments

What are its components • Non-filing – taxes owed but not reported and paid on a timely basis by non-registrants/non-filers (and late filers) • Underreporting – taxes attributable to underreporting of actual liabilities on timely filed tax returns • Underpayment – taxes that are reported but not paid on a timely basis • This component often can be accurately assessed from administrative records

How is true tax liability defined? • The liability that would be recommended based on the interpretation of a fully informed tax official? • The actual liability that is assessed following the resolution of any disputed amounts between the taxpayer and the tax agency? • The liability that would be assessed if it were to be assessed by an impartial court of law?

Why attempt to measure the tax gap? • Collection of tax revenue is the primary function of a tax administration • Accountability: Helpful for evaluating the degree to which the tax administration is successful • Disaggregation of the tax gap is helpful for understanding the sources and potential underlying causes of tax compliance

What is the underground economy (UE)? • Underground/black/hidden/unobserved economy • Broadest concept: Subset of all economic activity (from both legal and illegal sources/market and non-market) that goes unrecorded in official statistics • Typical concept: Difference between total market-based income (legal and illegal) and recorded GDP

How does UE differ from tax gap? • Not all unrecorded income is taxable (due to filing thresholds, exemptions, and certain deductions) • Some taxable income sources are not counted in UE measures (such as capital gains and various transfers) • A sizeable portion of the tax gap is attributable to aggressive use of tax credits, depreciation rules, transfer pricing, and other provisions rather than direct underreporting of income • The tax gap includes taxes on income that have been reported but not paid • Recorded GDP actually accounts for some sources of unreported income • Conceptually, the UE includes income from illegal activities (drugs, gambling, prostitution) that is typically excluded from tax gap measurement • The UE is even harder to measure!

Scope of tax gap measurement • Ideally, a broad monetary measure encompassing all taxes and all forms of non-compliance • As a practical matter, it may be too costly or difficult to develop a reasonably accurate broad measure • A large scale random audit programme may exhaust a large share of a tax administration’s compliance resources • Alternatives for a narrower scope include: • Focus on certain key taxes • Focus on compliance rates rather than compliance levels • Focus on indicators of non-compliance rather than direct measures

US tax gap map, TY 2006

Role of third-party reporting and withholding in U.S.

HMRC tax gap 2009-10 and 2010-11

Denmark personal income taxes TY2006

Sweden

Pre-filled returns in Sweden and Denmark

Uses and misuses of tax gap • Uses • Reasonably good indicator of the order of magnitude of tax non-compliance • Helpful for identifying key sources of non-compliance • Underlying data can be useful for risk assessment • Misuses • Short-term trend analysis • Performance evaluation

Digression on “closing the tax gap” • Public disclosure of tax gap estimates inevitably leads to demands to “close the gap” • Even under an optimal tax administration, it is important to recognise that some gap will exist • Nor is it optimal to audit until MR=MC • Heisenberg uncertainty principal • Attempts to measure the tax gap impact its size • Attempts to reduce the tax gap impact the tax base

How can we measure evasion? • Audit Data • Random • Operational • Combined operational and random • Measures based on comparisons of surveys and administrative data • Other creative approaches

Designing random audit studies • Scope • Scale • Sampling strategy • Data collection

Scope • May be interested in a particular tax or tax issue • Individual income tax, Corporate income tax, VAT • Specific credits, deductions, or income sources • May be interested in a particular taxpayer segment • Self-employed taxpayers, employers, high wealth individuals • For instance, one may want to investigate compliance by small businesses with all taxes (income tax, VAT/sales tax, employment taxes, etc.)

Scale • The appropriate scale of the programme depends on factors such as: • What is being measured (e.g., rates or dollar amounts) • Planned method of estimation: design-based or model-based • Desired precision for key estimates • Other planned uses for the data (e.g., risk scoring)

Evolution of IRS random audit programs: Taxpayer Compliance Measurement Program (TCMP) • Line-by-line audits of a stratified random sample of about 50,000 individual income tax returns • Conducted approximately every 3 years from TY 1963 until TY 1988 • Also occasional studies of other taxes (employment, small corporations, partnerships, individual non-filers) • Primary uses were: • Development of audit selection criteria • Measurement of tax gap • Research

Long dry spell

13 years later … TY 2001 National Research Program (NRP) • Stratified random sample of 45,000 individual returns for TY 2001 • Advertised as “kinder and gentler” than TCMP • About 10% of returns accepted without examination or with only a correspondence examination • Not all line items examined • Some routinely examined – e.g., self-employment returns • Some examined only at discretion of “classifier” or examiner • Case building materials provided in advance • For TY 2001, had a small “calibration sample” of returns audited in a manner similar to old TCMP program • Useful for evaluating non-compliance on line items that were not routinely examined

NRP redesign • Smaller annual studies of individual income tax • Most recently for tax years 2006, 2007, 2008 • About 14,000 returns per year • No longer a calibration sample • Some recent studies of other taxes • S-corporations (tax years 2003 and 2004, 5,000 returns) • Employment tax (2008-2010, 6,000 returns)

Design challenges • Mandatory vs. discretionary examination of line items • Intensity of probes for unreported income sources • Examination of related entities • Adjustments following disputes and appeals • If detection controlled estimation is to be employed, ensuring sufficient examiners who have each done a reasonable number of audits of the return items of interest

Some best practices for random audit studies • Non-sampling errors can plague a random audit study. The following practices help to prevent such errors: • Appropriate support and training of examiners and other staff – buy-in by examiners is crucial • Provide examiners with relevant case-building information • Design procedures to distinguish reports on the wrong line item from reports of an incorrect amount • Have good procedures for recording, validating, and correcting data • Record details on which specific line items or issues have been examined and which have not • Provide adequate supervision • It is also useful to consider what auxiliary information to collect to aid research

Random sampling: design-based estimation • Design-based estimation is very common in survey work. Under this approach: • The variables of interest in the population are treated as fixed but unknown numbers • Estimates are computed based on a randomly drawn sample from this population (typically, these estimates are the sample analogues of the population characteristics of interest) • The properties of the estimates (such as their means and variances) are derived using information only about the selection probabilities for the observations in the sample (i.e., the approach is non-parametric)

Estimating the rate of non-compliance • Canada Processing Review Programme • Approach is to contact a random sample of individual taxpayers who have claimed certain credits or deductions to request receipts to verify their claims • The results are used to measure the rates of non-compliance on these items and to develop targeting criteria for future verification work • Canada Core Audit Programme • Approach is to randomly audit various SME segments for selected tax issues to estimate rates of material non-compliance and assess risks

Simple random sampling (SRS) • One starts with a sample frame • For this example, the frame is all tax returns in a given year that claimed at least one specified credit or deduction • Under SRS, one randomly chooses returns from the sample frame in such a way that every possible sample of size n that can be drawn from the N returns in the population has an equal chance of selection

Point and interval estimation Let p = unknown population proportion of returns with an improper claim n = sample size = number of sampled returns found to have an improper claim Then is the point estimate of the rate of non-compliance The following is a confidence interval for p: The term is known as the margin of error (m) For a 95% confidence interval,

How large should the samplebe? Suppose we want to draw a random sample to estimate the rate of non-compliance with a margin of error m=.03 (for a 95% level of confidence). Since we can calculate n as: Of course, we don’t know p. The worst case scenario for precision is p=1/2, in which case:

Some notes • If the population size N is relatively small, a somewhat smaller sample will be required. (We are ignoring the FPC factor point estimate) • If we are confident that the true rate p is far from ½, we can use a smaller sample

Estimating the magnitude of non-compliance • Example: Kleven et al. (2011) • As part of this study, a random sample of Danish taxpayers were selected for rather comprehensive audits of their personal tax returns • The study was used for various purposes, including developing an estimate of overall tax underreporting

Summation notation

Point estimation represent the overall magnitudes of tax underreporting on the N returns in the population represent the overall magnitudes of tax underreporting on the n returns in a SRS from the population represents the mean level of tax underreporting in the population represents the aggregate level of tax underreporting in the population Our respective point estimates of the mean and aggregate levels of tax underreporting in the population are:

Interval estimation The population standard deviation of tax underreporting is defined as: The interval estimates for the mean and aggregate levels of tax underreporting are, respectively:

How large should the sample be? Suppose we want our margin of error for the mean level of tax underreporting to be £50, and we believe that is roughly 2,000. Since , we compute: Similarly, suppose that there are 1 million taxpayers and we want our margin of error for the aggregate level of tax underreporting to be £50 million. Since , we compute:

Stratified random sampling • So far, we have considered SRS. However, often it is preferable to use a stratified random sample. • One should do so if: • Reasonably precise estimates are desired for certain subgroups of the population; or • The mean value of the variable of interest is likely to differ substantially across different subgroups • For instance, separate sampling strata were defined for employment status (self-employed or not self-employed), return complexity, and region in the Denmark study

Summation notation, continued

Estimation with a stratified random sample Under stratified random sampling, we divide the population into H distinct strata. The population count within the hthstratum is Nhand the total population count is The population mean is defined as: A simple random sample of size nhis drawn from each stratum, and the sample mean for the hth stratum is This serves as an estimate of the population stratum mean . The estimate of the overall population mean is computed as: .

Sample weights • To simplify computation of sample statistics, one often constructs sample weights, which are defined as the inverse of the sampling rate within a stratum: for all taxpayers i in stratum h • So, for instance, the estimate of the population mean is computed as a weighted average over the entire sample:

Stratified sampling strategies • Proportional allocation: sample each stratum in proportion to its size in the population: • Optimal allocation: choose stratum sample sizes to maximise precision for a given overall sample size n • Suppose the cost of examining a return in stratum h is ch • Then the optimal allocation sets

Estimating rates vs. magnitudes • Estimation of rates of non-compliance tends to require a modest sized random sample (1,000 observations or less) for reasonable precision • The distribution of the magnitude of tax non-compliance tends to be highly skewed, resulting in a large population standard deviation • As a consequence, rather large samples are typically required for adequate precision in estimating magnitudes

Model-based approaches with random audit data • Under a model-based approach, one specifies a relationship between the variable of interest (non-compliance) and its potential determinants • The model generally imposes functional form and distributional assumptions (parametric approach) • The quality of the estimates depends not only on the sample design but also the validity of the modelling assumptions

Why use a model-based approach? • To control for measurement errors, such as: • The failure to fully detect non-compliance • Conflation of deliberate and unintentional errors • To improve one’s understanding of what drives compliance behaviour and to predict future behaviour • Potentially, to improve the precision of tax gap estimates (if the underlying modelling assumptions are reasonably valid)

Measuring the Tax Gap