You are on page 1of 57

How Taxing Is Tax Filing?

Leaving Money On The Table Because Of Compliance Costs


Youssef Benzarti

University of California, Berkeley March 21, 2014

Death, taxes and childbirth! Theres never any convenient time for any of them Margaret Mitchell, Gone with the Wind
Abstract Every year more than 240 million taxpayers have to le income taxes, imposing a signicant cost on the economy. How large is this cost and are taxpayers willing to forego large tax benets to avoid it? To answer this question, I focus on the choice between itemizing deductions and claiming the standard deduction. I use a non-parametric approach along with administrative tax data to show that the cost of itemizing deductions exceeds $700 on average per household and that taxpayers are willing to forego large tax benets to avoid it. I show that this cost is mostly driven by the time spent archiving receipts rather than lling-out forms. The cost also increases in income, consistent with the fact that the value of time of richer households is larger. I explain the magnitude of the cost using a model based on present-bias. I also argue that the results cannot be explained by lack of information nor audit probabilities.
I thank Alan Auerbach, Dan Benjamin, Stefano DellaVigna, Alex Gelber, Daniel Gross, Hilary Hoynes, Emiliano Huet-Vaughn, Marc Kaufmann, Henrik Kleven, Attila Lindner, Takeshi Murooka, Matthew Rabin, Jesse Rothstein, Emmanuel Saez, Josh Schwartzstein and Alisa Tazhitdinova for helpful discussions and comments.

Electronic copy available at: http://ssrn.com/abstract=2412703

Introduction

With the tax code getting increasingly complex, taxpayers have to le a large number of forms and keep track of numerous receipts. Every year, more than 240 million households have to le taxes. In addition, some have to spend time ling several other schedules. How large is this cost and are taxpayers willing to forego tax benets to avoid it? I answer this question by focusing on one specic decision: the choice between itemizing deductions or claiming the standard deduction. The main diculty with identifying the extent to which individuals fail to itemize deductions is the fact that true deductions are not observable for taxpayers who claim the standard deduction. To address this issue, I use a non-parametric approach and administrative tax data. If individuals are truly foregoing tax benets, there should be a missing mass in the distribution of deductions in the neighborhood of the standard deduction. I provide evidence of this missing mass for any year and any ling status. I then turn to my main identication strategy which relies on a natural experiment. Following a reform, the standard deduction amount is exogenously increased in some cases by 50%. This constitutes an ideal setting to observe whether individuals who were previously itemizing would start claiming the standard deduction. Following the increase in the standard deduction threshold, I observe a drop in the mass of itemizers in the neighborhood of the standard deduction. By measuring the magnitude of this missing mass, I am able to construct the distribution of foregone benets. On average, individuals forego more than $700 of tax benets to avoid the burden of itemizing. This translates into households perceive that itemizing requires more than 20 hours when their time is valued at the same rate as their regular jobs. This result is striking, especially given that it is much larger than the IRS estimates (less than 5 hours). If individuals switch to the standard deduction because they value their time more than the benets they could potentially derive from itemizing, richer households should forego more tax benets than poorer ones. To test for this, I break down individuals by income deciles and repeat the same identication strategy outlined in the previous paragraph. The results show a signicantly increasing relationship between foregone tax benets and income, consistent with the hypothesis that richer individuals assign a higher cost to ling taxes because they have higher marginal value of time. I then turn to the panel dataset to further investigate the cost of itemizing. I focus on 2

Electronic copy available at: http://ssrn.com/abstract=2412703

the factors that increase the likelihood of switching from itemizing to claiming the standard deduction. Consistent with the non-parametric evidence, I nd that - conditional on being close to the standard deduction threshold - taxpayers are more likely to switch to the standard deduction if they have higher incomes. To provide further evidence of the value of time interpretation of the cost, I consider an exogenous shock to the time available to taxpayers by focusing on taxpayers who have a newborn. Babies are time consuming and when the tax season comes, parents are likely to try and le their taxes as fast as possible because their value of time is extremely high. The results that I nd are consistent with this assumptions: households with newborns are 9% more likely to claim the standard deduction, controlling for income and other observable characteristics. The cost of itemizing is the sum of two separate costs: the cost of record-keeping and the cost of lling out schedule A. Which one of the two is higher and drives the result? To answer this question, I consider the outside option of using a tax-preparer. Tax-preparers can provide assistance in lling out forms but they cannot perform the record-keeping tasks. The fee charged by tax-preparer to prepare schedule A is therefore an upper bound on the cost of lling out schedule A. I nd that this fee is less than $50, implying that most of the cost is driven by the archiving cost. Reasonable calibrations of the cost of itemizing suggest that it is unlikely that such a simple task requires so much time. Taxpayers have an average of 4 receipts that they need to keep track of and schedule A is one of the easiest forms to ll out as it does not require any calculations or any tax tables. The taxpayer only needs to copy numbers from receipts and then sum them up requiring less than an hour to be completed. This leaves more than 19 hours for the record-keeping of four receipts i.e. more than 4 hours per receipt. Overall, it is hard to explain such a high cost without assuming that there are other forces at play. In light of this, I construct a model that rationalizes my ndings and particularly the large magnitude of the cost. The taxpayer faces two separate costs: one for record-keeping and one for lling-out schedule A. The cost of lling out schedule A is constant over time. However, if receipts are not archived immediately, the cost of archiving them increases continuously. I show that a rational taxpayer archives receipts as soon as they are available, but a naive present-biased one procrastinates on the record keeping which leads to a large cost at the 3

time of itemizing. The predictions are consistent with my ndings that the costs are much higher than any reasonable estimates. This model is contrasted with one in which the taxpayer is rational and performs the ling and record keeping tasks whenever optimal. I calibrate this model and extrapolate the cost to ling the 1040 form using the IRS estimates for the entire US population. I nd that the aggregate cost is in excess of 1.9% of GDP. The two outlined models oer two dierent perspectives on the cost. The rst ones argues that part of the cost is not due to the collection system per se but is mostly driven by a behavioral bias. The second one argues that the cost is purely due to the collection process. This distinction is crucial from a policy perspective as the rst model calls for a policy intervention that targets the behavioral bias to x the time inconsistency of the taxpayer, whereas the second one requires the tax collection process to be reformed. The identication strategy that I employ allows me to rule-out any possible explanations based on lack of information about the possibility of itemizing deductions. Given that the mass is missing following the reform, individuals are switching from being itemizers to claiming the standard deduction and should be aware of the possibility of itemizing. I am also able to rule out the possibility that taxpayers switch to the standard deduction to avoid audits. The true perceived probabilities of audit are extremely small for this income group (less than 1%) and are virtually the same for taxpayers who itemize and taxpayers who do not. The results of this paper have implications in several dimensions. First, this is to my knowledge the rst paper that provides non-parametric evidence of how costly the tax collection system is. The only other paper that addresses this question using tax data is Pitt and Slemrod (1991). However, they only use one cross section and make structural assumptions about the benets and the costs, possibly biasing their results. They estimate a cost of itemizing of $104 (in 2013 dollars). This paper also adds to a long tradition in public economics emphasizing the need to screen applications for welfare benets by imposing high transaction costs on them such as waiting in line, lling out forms etc. Poorer individuals value their time less - possibly because they are unemployed - and such policies can successfully target them by screening richer individuals. My results show that this eect is indeed true and that richer individuals tend to forego more benets than poorer ones. However, given how large the cost is, such policies could 4

be screening too many individuals. In addition, if the cost is driven by time-inconsistency, such a screening mechanism could lead to unwanted distortions such as screening rational individuals versus naive ones rather than rich ones versus poor ones. This paper also provides an explanation for certain documented behavior that the literature has struggled to explain. Economists are puzzled by the fact that poor households forego certain tax benets such as claiming the EITC (Earned Income Tax Credit) or SNAP (food stamps). Individuals who qualify for the EITC but have incomes so low that they are not required to le taxes could fail to claim the EITC because of the cost of ling the 1040 form. Bhargava and Manoli (2011) show that failure to claim the EITC can be explained by lack of information about the program but they never address transaction costs. This paper shows that this could also be a channel for not claiming the EITC. The literature has focused on the stigma cost to explain foregone SNAP benets but in light of the magnitude of my results a simpler explanation could be that administrative costs are too large. Jones (2010) shows that taxpayers fail to adjust their tax withholding resulting in foregone interests. He explains his results with inertia, but another reasonable explanation is the cost of lling out form W4 and sending it to the IRS. Feenberg and Skinner (1989) and Rees-Jones (2013) show that taxpayers who have a balance due are more likely to reduce their balance enough so that it becomes a refund by claiming additional deductions. The cost of sending a cheque to the IRS could be the channel for the result.

2
2.1

Data and Institutional Background


The Decision to Itemize

Taxpayers can reduce their taxable income by claiming deductions. These deductions are intended as a way to make the tax system more progressive and also as an economic incentive as some of those goods create positive externalities. The most common deductions claimed by taxpayers are for mortgage interest payments, state and local income taxes, charitable contributions and real estate taxes. The way deductions reduce the taxpayers tax liability is through her marginal tax rate. Consider a single person with an income of $150,000 putting her in the 28% marginal tax bracket in 1989 which starts at an income of $87,850. If the person spends a total of $10,000 5

on dierent expenses that she is allowed to deduct from her income, her tax liability is reduced by $2,800. If instead she decides to claim the standard deduction - which in 2013 was $6,100, her tax liability gets reduced by $1,708. The decision to itemize deductions seems rather straightforward and only entails comparing two numbers. Itemizing however is administratively burdensome as it requires collecting several documents, working through a separate tax form and sending additional evidence to the government. The rational taxpayer is supposed to account for these costs when contemplating the itemization decision and if her total itemized deduction only exceeds the standard deduction by a small amount, she is likely to claim the standard deduction even though she would be increasing her tax liability by a little bit. Approximately two thirds of the population claims the standard deduction. The standard deduction amount varies by ling status (single, joint, married ing separately and head of household) and by whether the person is blind or older than 65 it has also been indexed to ination since the TRA 86 reform.

2.2

Data

The dataset used to carry this analysis consists of annual cross-sections of individual tax returns constructed by the IRS and commonly called the Individual Public Use Tax Files. The data is available annually for the periods that I am analyzing. The number of observation per year ranges from 80,000 to 200,000. The repeated cross sections are stratied random samples where the randomization occurs over the social security number. The data over samples high-income taxpayers as well as taxpayers with business income but weights are provided by the IRS allowing my analysis to reect population averages. I will focus here on joint lers as they constitute the larger group allowing me to have more observations. But I also report results for single lers and head of households.

Results

If taxpayers are claiming the standard deduction even though they could benet from itemizing, the density of deductions should be shallow in the neighborhood of the standard de6

duction. This shallowness is observed for any year non-reform year (gures 1, 2, 3, 4 and 5). To show the causal relationship between the standard deduction and the shallow density in the neighborhood of the standard deduction, I use four reforms that increases the standard deduction amount by a large proportion in 1971, 1972, 1989 and 2003 (see table ??). I observe that the shallowness follows precisely the standard deduction threshold (gure 7, 6 and 8). I develop a method to recover the counterfactual density of deductions by using the year that precedes each reform. I use this counterfactual to estimate the cost of itemizing by measuring the magnitude of the missing mass caused by the proximity to the standard deduction.

3.1

The Density of Deductions Is Shallow In the Neighborhood of the Standard Deduction

If some taxpayers are claiming the standard deduction even though their total itemized deductions are greater than the standard deduction amount the distribution of the total itemized deductions should be shallow in the neighborhood of the standard deduction threshold. In order to verify this assertion, I graph the density of deductions for all years ranging from 1980 to 2006 by bin sizes of $500. Notice that the closest bin to the standard deduction is only composed of itemizers whose deductions are strictly larger than the standard deduction amount. Figures 1, 2, 3, 4 and 5 show the distribution of itemized deductions for years ranging from 1980 to 2006 and for joint lers (this eect is consistent across ling categories). We can notice that the distribution is systematically shallow in the neighborhood of the standard deduction matching my initial hypothesis. There are no observations on the left of the standard deduction threshold because nonitemizers are not required to report their deductions. Is the true density of deductions discontinuous at the standard deduction? It is likely, but without observing the density on the left-hand side of the standard deduction threshold, I cannot rule-out a smooth density. Approximately two-thirds of taxpayers claim the standard deduction which means that the density below the standard deduction threshold cannot be increasing from zero onwards and then connects with the density on the right-hand side of the standard deduction, as this would fail to account for a large portion of the population. If the density is smoothly decreasing on the left-hand side of the standard deduction threshold and 7

it is single-peaked, then it is likely that the density is discontinuous. But I cannot rule out double peaked distributions without knowing what the true distribution of total deductions is below the standard deduction threshold. This is why I compare the density in years prior and posterior to each reform to identify the causal eect of the standard deduction.

3.2

Identifying the missing distribution

The government had to adjust the standard deduction amount in several instances. This resulted in large increases in the standard deduction amount. These natural experiments constitute the ideal exogenous variation to analyze the eect of the standard deduction on itemizers. Reforms happen in 1970, 1971, 1988 and 2002. Table 1 reports that the standard deduction is increased respectively by 50%, 33%, 33% and 21%. I compare the year prior to the reform to the year following the reform for the following reason. The taxpayer has to perform two separate tasks when itemizing. First, she has to keep record of all the expenses she incurred in year t and keep track of their receipts. Second, she has to ll out schedule A in year t + 1. Therefore, the two tasks happen in two separate years: the year prior to ling and the ling year. Assume I am considering the 1988 reform. The taxpayer les her taxes in 1989 for 1988, which means that she is likely to have already collected the documents in 1988 and only has to work through schedule A. Once done, she realizes that her itemized deductions are only slightly above the standard deduction amount and that it would have been more cost ecient for her to claim the standard deduction. However, documents have been collected already and forms lled so she follows through with the decision to itemize. She promises herself however that next year she will save herself the trouble of dealing with all this administrative hassle and will claim the standard deduction instead. Accordingly, the cost calculated in 1988 should only reect that of working through schedule A. In 1989 however, the cost should reect the entire administrative burden of itemizing deductions. Comparing 1987 to 1988 and 2002 to 2003 in gure 9 shows that there is a lagged response precisely for this reason, Figures 7, 6 and 8 graph the density of deductions in years prior and posterior to the reform year in 1970, 1971, 1988 and 2002. The bin size for gure 7 is $200 and $500 for 6 and 8. The dierence in bin size is due to the high ination in the 1970s. We can notice that the shape of the distribution in year t+1 mirrors that of year t-1 and that the shallowness 8

very precisely follows the new standard deduction threshold. This shows that taxpayers are claiming the standard deduction once it is increased even though their deductions are larger than the standard deduction. Notice that the missing mass is smaller in the 1970s compared to later years. Ination was extremely large in the 1970s suggesting that eventhough the nominal cost could be small, the real one is likely to be of the same magnitude.

3.3

Adjusting the distribution from year to year

Given that I am comparing two separate years, I need to adjust for the eect of ination. Adjusting the previous years density by multiplying by ination is imperfect as the deductions are not guaranteed to vary with ination. In fact most of the deductions are likely to remain xed in nominal terms. In case of charitable contributions it is unlikely that individuals would adjust from year to year the amount of contributions they make by the ination amount). In the case of the mortgage deduction, interest rates can be xed, indexed to ination or indexed to other variables. In addition, even if the dierence is small between the ination adjustment and the actual adjustment it can create a signicant distortion when comparing across two years. Assume that ination is constant across years and equal to and that the ination adjustment over corrects the distribution by . This means that when comparing year ts distribution to year t+1, year t will be over adjusted by . But when comparing year t to year t+2 (my main identication), year t is adjusted by (1 + + )(1 + + ) = (1 + )2 + (2 + + 2 ). When comparing distributions that are two years apart, the corrected distribution is over corrected by (2 + + 2 ). Tax data probably constitutes the best source to calculate this growth rate. I calculate the weighted average of deductions in year t-1 and compare it to that of year t+1. The ratio gives me the growth rate of deductions. In years when there is a reform, I restrict the sample so that it is far enough from the neighborhood of the new standard deduction to avoid having the eect of the reform contaminate the natural growth of the deduction. Consider the 1988 reform. It increased the standard deduction from $3,760 to $5,000. If I were to consider all taxpayers with deductions above $5,000, the growth rate would be biased upwards, since many taxpayers stop itemizing and start claiming the standard deduction when their itemized deductions are close to $5,000 resulting in a higher proportion of high 9

itemizers (since taxpayers who claim the standard deduction are not accounted for). For this reason, in years when there is a reform, I restrict the sample that I use to calculate the growth rate to any itemizer who is 10 bins away from the standard deduction to ensure that they are unlikely to switch to the standard deduction because they are close to it. Next, I need to account for the fact that the compositions of deductions might be dierent for those that have more deductions than others. This is a reasonable concern given that deductions are correlated with incomes and high incomes tend to have a lower proportion of charitable donations in their total deductions and more mortgage payments overall. Furthermore, the growth rate is also likely to be deduction specic: charitable donations are less likely than state taxes to grow at a rate that is close to ination. To address this concern, I calculate the proportion of each type of deductions for individuals that are more than 10 bins away from the standard deduction and individuals that are less than 10 bins away from it. I nd that the proportions are indeed dierent, although the dierence is small. I then calculate the growth rate of each deduction and use it along with the proportion of deductions to nd the growth rate for individuals below the threshold. I nd that the two growth rates are fairly similar. The results are reported in table 4. The rst column shows the growth rate of each category of deductions. The second one calculates the proportion of deductions for each group of taxpayers: those that are less than 10 bins away from the standard deduction and those that are more than 10 bins away from the standard deduction. The remaining categories of deductions, such as moving expenses or casualty or theft loss are not included because they represent less than 1% of the proportion of deductions. To verify that this adjustment is reasonable, I carry placebo tests on years when there was no reform and using the same adjustment I verify that the distributions from two separate years are indeed overlapping in gures 10 and 11. I exclude years prior to 1990 because the standard deduction was not indexed to ination prior to 1986 and from 1987 to 1989 reforms aected the distribution (including the standard deduction reform).

3.4

Recovering the Counterfactual Distribution Using the Reform Years

I focus on the 1988 reform in the rest of the paper because it provides the most precise cost estimate given that there were only three tax brackets. 10

To calculate the cost of itemizing deductions, I use the following approach. I rst create bins of a given size. For the 1989 reform, I use a bin size of $500. I then calculate the weighted frequency of individuals located in those bins. I subtract the size of the 1989 bin from the size of the corresponding bin in 1987 after adjusting the amounts to account for ination (see previous section). This approach allows me to measure the percentage of individuals that claim the standard deduction even though their total itemized deductions exceed the standard deduction amount by multiples of $500. Once I get those percentages, I need to adjust the 1987 distribution as it might be distorted by its proximity to the standard deduction amount. For clarity I associate each bin with a number that denotes its distance from the standard deduction amount. For example, in 1987 the standard deduction amount is $3750. This means that bin [3750, 4250] will be called bin number 1 in 1987 and bin [4750, 5250] will be called bin number 3 in 1987. Bins in 1989 are dened in a similar way relative to the standard deduction amount of $5,200: bin [5200, 5700] is bin number 1 and bin [6200, 6700] is bin number number 3. To perform the adjustment, I consider the last bin in 1989 for which the 1987 and 1989 curves are not overlapping. Figure 6 shows that these are bin number 6 in 1989 and bin 8 in 1987. The dierence between 1987 and 1989 for this bin would give me the true distribution provided that bin number 8 in 1987 is not distorted. To check for whether bin number 6 in 1987 gives me the true density or not I turn to the 1989 distribution and I look at the distortion for bin number 8 in 1989 and compare it to bin number 10 in 1987. I nd that the two curves are overlapping in bin number 8 in 1989. This means that the standard deduction is unlikely to distort the behavior of a taxpayer who is located 6 bins away from the standard deduction. I can therefore safely infer that the dierence between 1987 and 1989 corresponds to the true distortion for this bin. I repeat this process for every single bin until I reach the very rst bin. Starting with bin number 4 in 1987, the 1987 distribution does not provide me with the true counterfactual anymore. I previously calculated the true distortion that occurs at bin number 4 by comparing the 1987 and 1989 distribution at bin number 4 in 1989. Denote that distortion by d%. I can use this distortion to correct the 1987 distribution and form the counterfactual by adding d% to the 1987 distribution in bin number 4. This process allows me to reconstruct the counterfactual when the 1987 distribution is distorted.

11

I explain this process using a hypothetical example for illustrative purposes. I generate an undistorted hypothetical density of deductions in gure 13. Each bin size is equal to $100. I assume that the cost distribution in the population is given by the following: 40% have a cost lower than 100*MTR 70% have a cost lower than 200*MTR 85% have a cost lower than 300*MTR 95% have a cost lower than 400*MTR I introduce a standard deduction in the second bin in gure 15 and apply the cost outlined above to the density. The histograms labeled distorted are the ones that are empirically observed and reported in gures 1, 2, 3, 4 and 5. To calculate the cost distribution in this scenario I would simply compare the percentage dierence between the true density and the distorted one. Unfortunately the true one is unobserved. This is why I use a reform. Figure 18 assumes that cost distribution is the same and introduces a reform that increase the standard deduction amount by $200 (2 bins). I denote by di the distortion introduced by the standard deduction in bin i. 40% of the population has a cost that is smaller than 100*MTR. This means that 1 40% = 60% will claim the standard deduction in the rst bin. This implies that the rst bin is distorted by 60% i.e. d1 = 60%. Similarly, d2 = 30%, d3 = 15% and d4 = 5% and di = 0 for any i > 4. I will show that using the method outlined in the previous paragraph I can recover these numbers. Denote by bt i the bin density, where i is the distance (in bins) to the standard deduction and t is the year. Year t corresponds to the pre-reform year and year t + 1 to the post-reform year. When overlapping the deduction density for year t and year t + 1, bt i will be on top
+1 of bt i2 because the standard deduction jumps by 2 bins when the reform happens in my t+1 hypothetical scenario. If bt i bi2 = 0 then di2 = 0. I then use backwards induction and start with the rst undistorted bin. In the theory graph it corresponds to bin 7 in year t:

t+1 bt = 0 implies that d5 = 0. This means that for both year t and t+1, b5 , b6 , b7 7 b5

etc are undistorted as d5 = 0 means that nobody has a cost greater than 500*MTR. This also means that I can use bt 6 as the counterfactual to calculate d4 . 12

t+1 bt = 5% implies that d4 = 5%. Given that bt 6 b4 6 is the true density, I can use the true counterfactual to calculate d2 . t+1 bt = 15% implies that d3 = 15%. I can use 5 b3 d1 . b87 3 d3

b87 4 d4

as

as true counterfactual to calculate

t To calculate d2 I need to use bt 4 . But I know from the second bullet point that b4 is distorted. This implies that the counterfactual density that I need to use to calculate

d2 is

bt 4 d4

rather than bt 4 . Hence, d2 =

bt 4 d4

+1 bt = 30%. 2

Similarly, to calculate d1 I need to use bt 3 . But I know from the third bullet point that bt 3 is distorted. This implies that the counterfactual density that I need to use to bt b87 t 3 calculate d3 is d3 rather than b . Hence, d = b89 1 3 1 = 60%. d3 3 This example shows that I am able to recover the true (unobserved) density by using the pre-reform and post-reform densities. The densities d1 , d2 etc. allow me to calculate the distribution of the cost. The average cost is the rst moment of this distribution. Using the 1988 reform, I nd that it is equal to $379 in 1989 dollars ($713 in 2014 dollars).

3.5

Anatomy of the Cost

Itemizing deductions is a 2-step process. First, the taxpayer has to keep a record of all the expenses she wants to deduct during the year that she is ling taxes for, call it year t. Second, she has to le a separate form when itemizing, called schedule A. The vast majority of taxpayers itemize three types of deductions: State and local income taxes: these are taxes paid in year t to the state or to the locality. They are reported on the W2 received in January of year t + 1 Mortgage interest: this is the interest paid to nance the main or second home of the taxpayer. It is reported on form 1098 which is received in January of year t + 1 Charitable donations: any payment made for charitable purposes including to religious institutions. These payments are not subject to third-party reporting. The taxpayer has to keep record of her own receipts. 13

In addition, some taxpayers also claim other taxes (real estate or sales taxes in some years), other interest expenses (credit-card interest in some years), casualty or theft losses, medical and dental expenses and miscellaneous deductions. Schedule A is relatively easy to ll out especially if the taxpayer only needs to itemize the most common deductions outlined above. All she has to do is copy numbers from the 1098, W2 or charitable contribution receipts, sum them up and copy the sum in the 1040 form. There are no complicated tax schedules nor intricate tax operations. Record keeping is more time consuming as one has to archive the various evidence of expenses to be able to recover them when the tax season arrives. It is however easier to keep track of deductions that are third-party reported given that taxpayers receive the W2 and 1098 in January of year t + 1.

3.6
3.6.1

Cost Estimates
The 1988 reform

The 1988 reform provides the most precise cost estimate: there were no reforms aecting deductions in 1988 or 1989 and the only reforms aecting the 1987 distribution do not have a lagged eects (discussed later). I estimate the costs using the 1971-1972 and 2003 reforms but 1988 reform cost estimates are the most accurate ones. My estimates show that taxpayers are willing to forego $378 in 1989. The average wage for the taxpayers whom I identify as foregoing tax benets varies between $8 for the lowest income group to $14 for the highest income one. A back of the envelope calculation implies that taxpayers perceive that itemizing requires more than 20 hours. Every year, the IRS provides cost estimates for each tax form including both the time required to ll out the form and to keep track of the receipts. In 1989, the IRS estimates that the average taxpayer needs 1 hour and 1 minute to ll out schedule A, 2 hours and 47 minutes for record keeping, 26 minutes to learn about the form and 20 minutes to copy and assemble the documents before sending them to the IRS. This totals 4 hours and 34 minutes. Guyton et al. (2003) describe the methods used by the IRS to calculate the cost. They explain that the IRS uses the Individual Taxpayer Burden Model developed jointly with IBM to calculate the cost of ling taxes. The IRS inputs estimates from surveys in the model that provides the cost estimates. However, the specics of the model are unclear. 14

3.6.2

The 2003 reform

In 2003, the standard deduction is increased from $7,850 to $9,500. Similarly to 1988, comparing the density of itemizers in 2002 to 2004 in gure 8 we can see that individuals who are close to the standard deduction after the reform stop itemizing. To estimate the cost of itemizing, I use a similar approach to the one outlined above. There are however some caveats to using the 2003 reform: In 2003, the government allowed taxpayers to deduct the highest of state and local income taxes and state sales taxes. This is likely to bias the cost estimates downwards as deductions are likely to increase overall. The proportion of electronic lers increases signicantly from 2002 to 2004 possibly because of the technological expansion of the early 2000s. If e-ling reduces the cost of complying with taxes then it is also likely to bias the cost estimates downwards. This is both a limitation and an opportunity: I will use later to estimate the eect of e-ling on the cost of compliance. There is heterogeneity in the marginal tax rate for individuals close to the standard deduction threshold. Taxpayers who are close to the standard deduction threshold have a marginal tax rate of 10%, 15%, 25% and 28%. Whereas in 1988 I could use one marginal tax rate, here I have to take an average of the marginal tax rates to calculate the foregone benets. The cost estimated is equal to $191. There is a strong relationship between income and foregone benets. This means that to compare it to the 1988 cost estimate, I need to adjust for the fact that individuals who are close to the standard deduction in 2003 are relatively poorer.

What Are the Main Drivers of the Cost?

Taxpayers are willing to forego large tax benets to avoid having to itemize. This behavior reveals that the cost of itemizing is fairly large. What drives such a high cost? Itemizing involves two tasks: record keeping and lling out schedule A. In this section, I show that 15

record keeping carries more weight in the decision to itemize than the task of lling out schedule A. In addition, there is a strong relationship between income and foregone benets which is consistent with the idea that richer individuals value their time more. Taxpayers in states with no state income taxes forego relatively less benets than taxpayers who live in states with state income taxes. Moreover, taxpayers with newborns, with a low ratio of state incomes taxes and mortgage payments and who use tax preparers are more likely to switch to the standard deduction.

4.1

Record Keeping

At any point in time, taxpayers have access to tax preparers. For a certain sum of money, the taxpayer can get a tax specialist to ll out her 1040 and schedule A forms. However, the tax preparer cannot perform the record keeping for her. The tax preparer fee would provide an upper bound on the cost of lling out schedule A for the taxpayer: if the cost of lling out schedule A is larger than the fee, she can go to a tax preparer. I can identify this fee in my the dataset: individuals who itemize their deductions are allowed to deduct the tax preparer fee from their income. The average tax preparer fee for individuals who le the 1040 and schedule A but not schedule B, C, D etc. is $19. This is the fee for lling out both the 1040 form and schedule A. This means that $19 is a generous upper bound. If the foregone amount of money is driven by a cost of lling out schedule A then taxpayers have the outside option of paying someone to perform this task and - for some of them - save large sums of money. This suggests that any cost in excess of $19 should be attributed to record keeping. Since the estimated cost is equal to $378, most of it is indeed due to record keeping.

4.2

Income and Foregone Benets?

Itemizing deductions takes time and if time is money then rich taxpayers should forego more deductions than poor ones. Using the earning of taxpayers in the dataset I can verify it. I break down the sample in ten deciles of AGI. But because this would signicantly reduce the sample size and might make my results signicantly more noisy, I consider a distribution around each AGI decile threshold. For example, the lower AGI group consists of every individual with AGI below the second decile threshold. But the second group consists of 16

an AGI that is comprised between the rst and the third AGI decile etc. Notice that some individuals will simultaneously belong to two groups: for example individuals whose AGI falls in the second AGI decile will belong both the the rst group (AGI below the second AGI threshold) and the second group (AGI greater than the rst decile threshold but smaller than the third decile threshold). This overlap is not a concern because the goal of this breakdown is to graph the relationship between AGI and foregone benets. The precise location of a point in the AGI/foregone benet space is of no particular importance. It only matters in depicting the general trend of the relationship. Once the groups are constructed, I am able to calculate the foregone benet for every one of the groups by repeating the same procedure developed in the previous sections: I compare the distribution in 1987 to that in 1989, reconstruct the counterfactual distribution of itemized deductions and calculate the foregone benet distribution by comparing the counterfactual distribution to the true one. The foregone benets distributions rst moment allows me to observe the average foregone benet for each group. I only report results for the rst six groups because deductions and AGI are positively correlated implying that there are very few high income individuals close to the standard deduction threshold not allowing me to observe the distribution before and after the reform. In gure 21, the x-axis represents the average AGI and the y-axis the average foregone benet for each income group. The relationship is increasing: as income increases taxpayers forego more benets consistent with the idea that they value their time relatively more. Notice that eventhough itemized deductions increase with AGI, this is not what is driving the result that higher incomes forego more deductions. Comparing the 1987 distribution to the 1989 one and only focusing on percentage dierences between the two distributions in given bins allows to rule-out this concern. No matter what the level is, I only calculate the percentage dierence between the two distributions and this should not be aected by the levels of the deduction. Figure 21 shows the relationship between income and the perceived required hours to itemize. I assume that taxpayers work on average forty hours a day and fty weeks a year. I divide their wages by the number of hours worked per year and divide this number by two again when considering joint lers: when ling jointly, only one person is required to le schedule A and to collect the receipts. Eventhough the benets from ling are possibly larger for joint lers, the task is not necessarily more costly since most itemizers have the same types 17

of deductions, only higher amounts. Using a revealed preference argument, this gives me a relationship between the AGI groups and the revealed time that each group thinks itemizing requires. Although of a lower magnitude and signicance I nd an increasing relationship between the value of time of a given taxpayer and the AGI deciles. The estimates range between 20 hours for the lowest income group to 30 hours for the highest one. If the relationship had been constant, it would have meant that richer and poorer taxpayers perceive the decision to itemize to be equally time consuming, but here the relationship is increasing. This could be interpreted in several ways: It could be that richer individuals truly spend more time itemizing because they have more deductions. It is true that rich individuals have higher amounts of deductions but it is unlikely that they require more time to itemize them. The cost of itemizing is mostly xed and does not generally increase in the amount of the deduction. If one taxpayer has $10,000 worth of mortgage interest, she will most likely spend the same amount of time itemizing them as a taxpayer who has $100,000 since they have to spend the same amount of time archiving the forms and entering the numbers on schedule A. It is likely that what this relationship reveals is that individuals have dierent preferences over ling their taxes relative to working an extra hour at their regular jobs. What I am calculating is the marginal rate of substitution (MRS) between an hour of work and an hour of ling taxes. The MRS could be dierent for two dierent individuals for two reasons: they enjoy working at their regular job equally but dislike ling taxes dierently or they enjoy working at their jobs dierently but dislike ling taxes equally. The most plausible story is that better paying jobs are usually more fullling and that individuals equally dislike ling taxes, which explains that the revealed value of time for rich households is higher than for poor households.

4.3
4.3.1

Who Is More Likely to Switch to the Standard Deduction?


Identication Strategy

I use the panel dataset to identify the reasons that make a taxpayer more likely to switch to the standard deduction. I focus on taxpayers who itemize deductions in year t and observe 18

their decisions in year t+1 by creating a dummy variable that equals 1 if the taxpayer switches to the standard deduction in year t + 1. I also make sure to drop individuals who have to le other schedules (B, C etc) as they could bias the results. I do not consider individuals who switch from claiming the standard deduction to itemizing because this decision is not as easily available as the opposite one. A person with deductions in excess of the standard deduction threshold can easily decide between itemizing or not. But a person that is claiming the standard deduction is likely to have too few deductions in total to be able to itemize. All my results are clustered at the individual level. Clustering at the state level yields similar results. I regress the variable that indicates that the individual is switching to the standard deduction on several variables of interest that I explain below. I also control for the level of deductions in year t, a polynomial of AGI, marital status, year xed eects and states. The results are reported in table 2. 4.3.2 Newborns

Childbirth constitutes a good test of how time availability aects the decision to itemize. Having a newborn drastically reduces the amount of time available. To test this eect, I construct a dummy variable that is equal to 1 if a household has a newborn during year t by observing if an additional child dependent is claimed on the tax return. I also construct a dummy variable indicating how close the taxpayer was to the standard deduction threshold. The variable is equal to one if she was less than 6 bins away from the standard deduction the previous year. This variable is important as it identies individuals who are more likely to be treated by the natural experiment. I regress the variable indicating whether the person switches from being an itemizer to claiming the standard deduction on the dummy variables that I constructed above as well as the interaction term between those variables and the one that indicates whether the person is close to the standard deduction. The results are reported in table 2, column 1, 2 and 9. I nd a signicant and positive coecient for the interaction term of being close to the standard deduction threshold and having a newborn. Adding controls does not change the magnitude of the estimate nor the standard errors. A taxpayer who is close to the standard deduction threshold and who has a newborn is 9% more likely to switch to the standard 19

deduction. Notice that this eect accounts for the fact that income and total deductions tend to increase following birth. If itemizing was not a time consuming task, this result would be rather counter-intuitive: prior to having a child one should expect incomes to increase and since deductions tend to be positively correlated with income, one should observe a higher probability of itemizing. Here I observe the opposite. This means that ceteris paribus, child birth increases the chances of switching to the standard deduction for itemizers precisely because newborns are time consuming and require a lot of attention that might be drawn from other tasks that are not as urgent. This analysis shows that an exogenous shock such as child birth has signicant eects over the decision to itemize providing evidence of the fact that itemizing is time consuming indeed. A complementary explanation is that child birth introduces organizational challenges in the life of the taxpayer. A rational taxpayer should be able to foresee these challenges and schedule her time so as to be able to itemize once taxes are due. A present-biased one might not be able to do so if she is naive about her procrastination. I will return to this distinction in a later section and explain its importance from a policy perspective. 4.3.3 High Ratio of Third Party Reported Deductions

The mortgage payment deduction and the state and local income tax deduction are dierent from the other deductions. They are third-party reported implying that taxpayers receive a statement in the form of a 1098 or W2 in January of year t + 1, signicantly reducing the record keeping cost. It is also harder to adjust them in the short run to respond to tax incentives. I construct a dummy variable equal to 1 if more than 80% of the deductions of a given taxpayer are composed of state and local income taxes and mortgage payments. I follow the same procedure as previously outlined. The results are reported in table 2 columns 3, 4 and 9. The regression shows that a taxpayer who has a high proportion of these two deductions is 17% more likely to switch to the standard deduction when her deductions are close to the standard deduction threshold. This can be interpreted in two ways: It could be that the overall cost is smaller because these two types of tax deductions 20

have a relatively lower record keeping cost because both the W2 and the 1098 are received in January of year t + 1, closer to the tax ling season. The fact that the record keeping cost is smaller if the receipts are sent closer to the tax ling season suggests that forms are harder to nd or more likely to get lost as time elapses possibly because they are not properly archived. Alternatively it could be that these deductions are hard to adjust: a taxpayer cannot reduce her mortgage payments or income as readily as she can reduce her charitable donations. This suggests that the response of the treated taxpayers is real: when they start claiming the standard deduction, they also reduce their charitable donations. 4.3.4 Tax Preparers

Tax-preparers are readily available and provide the taxpayer with assistance to le her return. They also provide help in choosing the best options when ling taxes and ensuring that the taxpayer is optimizing. However, they do not make the task of record keeping any easier. Are taxpayers who are using the services of tax preparers more likely to itemize deductions? To address this question, I use a similar approach to the one outlined above. I create a dummy variable that identies whether a person switches from being an itemizer to claiming the standard deduction and regress it on various observable characteristics and on a dummy variable that is equal to 1 if the person is preparing the return herself and interact it with a variable that determines if the individual is close to the standard deduction threshold. Who uses tax-preparers? Three types of individuals: low-income households who can get their refund faster when using tax-preparers, households with complicated tax-returns, households whose value of time is larger than the fee that they have to pay to the tax-preparers. The taxpayers who itemize deductions are unlikely to be from low-income households simply because deductions are mostly constituted of items that are strongly correlated with income (home mortgage, state and local taxes and charitable contributions). To control for individuals who are visiting tax-preparers because of the complexity of their tax return, I drop any person who les any other schedule but schedule A. Those include individuals who have capital gains or dividends, or individuals who have prot or losses from farming etc. These schedules are signicantly more complicated and a visit to the 21

tax-preparers might be necessary even for the most tax-savvy taxpayers. I nd that taxpayers who were using tax preparers in year t are 8% more likely to claim the standard deduction in year t + 1. At rst, this result can seem counterintuitive but it is driven by the fact that taxpayers are not providing any help with record keeping and the majority of the cost is due to record keeping. In addition, taxpayers who use tax preparers are likely to be the ones who have a higher marginal disutility of ling taxes, so high that they would rather have someone else do it. This is due to their aversion to taxes or to the fact they are generally more busy. Therefore, it is not surprising that they would be more likely to claim the standard deduction and is consistent with my previous ndings.

4.4

States With No State Income Taxes

States with no income taxes represent an exogenous variation that can further our understanding of the decision to itemize. On the one hand, not having to le state taxes means that the taxpayer is getting less benets from itemizing since she cannot deduct those expenses both on the federal return and on her state return. But on the other hand, not having to le state taxes means that the taxpayer spends less time overall working on her taxes and can potentially spend more time guring out her tax deductions: her marginal disutility from ling her taxes is lower if she does not have to le a state tax return in addition to the federal return so she incurs less disutility from itemizing her deductions. To answer this question, I break down states by whether they collect income taxes or not. In 1989, ten states had no income taxes: Alaska, Connecticut, Florida, Nevada, New Hampshire, South Dakota, Tennessee, Texas, Washington and Wyoming. Individuals in these states represent 13% of the entire sample. If I were to carry my main identications approach on individuals that are not subject to state taxes and compare them to those that are subject to state taxes, I could be biasing my results simply because the sample size of individuals subject to state taxes is larger. With a larger sample size, the two curves of the distribution of itemized deductions are likely to be less noisy and the curves will tend to intersect at a further point. To address this issue, I break down the states that are subject to state taxes in groups of similar sizes to that of the states with state taxes. I nd that individuals that reside in states with no state taxes have - on average - a lower cost than individuals that live in states with state taxes. Individuals in states with no state 22

income taxes forego an average of $208 but states with state income taxes forego more than twice this amount ($444). This shows that the cost of ling taxes outweighs the benets of having to deduct taxes on ones state taxes. Essentially, individuals who do not have to le a state return are less likely to forego deductions probably because they spend less time ling their taxes overall since they do not have to le state taxes and that gives them more time to work on their federal return.

Making Sense of the Result

Most taxpayers have 4 to 5 receipts that need to be used to itemize. In addition, they have to ll out schedule A. This is one of the easiest schedules as it only requires taxpayers to enter numbers and sum them up and is unlikely to exceed an hour of work. I estimated that a taxpayer perceives the task of itemizing as requiring more than 20 hours of work. This in turn implies that each receipt requires an average of 4 hours of record keeping. This back of the envelope calculation suggests that taxpayers are making a behavioral mistake when ling their taxes. The following model oers a possible explanation of why the cost could be this high. It builds upon ODonoghue and Rabin (1999) and ODonoghue and Rabin (2008). The model relies on the idea that the cost of record keeping continuously increases for every day that the receipt is not archived as soon as it is received. Receipts that are not archived can be lost or it could take more time to look for them. Knowing this, the rational taxpayer archives her receipts as soon as she gets them. The naive taxpayer knows that it will be more costly to archive the receipt the next day but because of her time inconsistency, she procrastinates on it, leading to a large cost.

5.1

Setting

Assume for simplicity that the taxpayer only needs to itemize one deduction for example for a charitable contribution she made. Then the taxpayer is facing two distinct costs when considering the decision to itemize deductions. The rst one is that of record keeping, denoted here by c. The second one is lling out schedule A itself which is denoted by k . Assume that the taxpayer has N periods to perform the two tasks, that they have to be 23

performed on two separate days and that record keeping has to be done before lling out schedule A. This means that the last period in which record keeping can be performed is N 1 whereas lling out schedule A can be performed no later than in period N . If the taxpayer succeeds in performing the two tasks she receives a one time benet V . Once the taxpayer gets the receipt for her charitable contribution, she can decide to archive it immediately by incurring a cost c or archive it later and incur a larger cost c(1 + r ) in the next period. I denote by the time-discount factor, the present-bias parameter, t the period in which the record keeping is performed and t + u the period in which schedule A is led. In what follows, I use two denitions: Denition 1: For given , , c, k , (1 + r ), t and u a task is said to be -worthwhile if c(1 + r )t1 + u (V k ) > 0. Similarly: Denition 2 For given , c, k , (1 + r ), t and u a task is said to be -worthwhile if c(1 + r )t1 + u (V k ) > 0.

5.2

The Rational Taxpayer

The rational taxpayer has a standard utility function where per-period utility is discounted by in the future. The total utility is given by U = u0 +
i

i ui

The decision to itemize or claim the standard deduction can be written as follows:

max t (c(1 + r )t1 + u (V k ))


t,u

Cost c is incurred as soon as taxpayers start the record keeping. If she waits an additional u periods before lling out schedule A the cost of record keeping is multiplied by (1 + r ). 24

Assuming that V k > 0 i.e. that the benet is large enough to justify lling out schedule A, the taxpayer would want to perform this task as soon as possible, which means having u = 1. The taxpayer is left with choosing t such that:

max t (c(1 + r )t1 + (V k ))


t

Assume the taxpayer is contemplating the decision to perform the record keeping task in the rst period giving her: c + (V k ). She will only perform it if c + (V k ) > 0. And if she waits an additional period she will receive (c(1 + r ) + (V k )), which is smaller than the utility she would have enjoyed if the task had been performed in the rst period. This means that the rational taxpayer will either perform the tasks immediately or never perform them. And she only performs the tasks if the project is worthwhile.

5.3

The Present-Biased Taxpayer

The present-biased taxpayer maximizes the following utility function:

u0 +
i

i ui

Which in this context translates to:

max t (c(1 + r )t1 + u (V k ))


t,u

Since benet V and cost k are both incurred in the same period, the taxpayer will not procrastinate on performing the second task. This means that u = 1. The present-biased taxpayer is maximizing:

max t (c(1 + r )t1 + (V k ))


t

First, the task should be -worthwhile i.e. it should be protable for the present-biased 25

taxpayer to perform the task now. This happens when the following condition is satised:

c + (V k ) > 0 This condition simply states that the cost today should be smaller than the discounted net benet tomorrow. Second, she can perform the record keeping now or she can wait and perform it next period. She will prefer performing it next period if the following inequality is satised:

t [c(1 + r )t1 + (V k )] < t+1 [c(1 + r )t1 + (V k )] Which can be rewritten as: For close to one, this inequality simplies to: 1 1+r

<

The taxpayer will procrastinate on archiving the charitable donation receipt if the (mis-) perceived benet of waiting an additional period is greater than the increase in archiving cost. Provided that it holds in period t = 0, the condition will hold in any subsequent period t > 0 meaning that if the task is worthwhile but not performed in the very rst period, the taxpayer will procrastinate on it until the deadline. Standard present-bias models show that with a strict deadline, naifs are likely to procrastinate on completing a task but will perform it with certainty in the very last period. Filing taxes has a strict deadline (April 15th) but I will show in what follows that - under certain conditions - the naif will never perform the task simply because the expected benet V has become too low for the task to be -worthwhile. The intuition is the following: 1. In the rst stage, itemizing deductions is both and -worthwhile. The present-biased taxpayer would prot from performing the task now, but believes she would prot more 26

from doing it later and keeps on postponing the task to the next period. 2. In the second stage, the expected value of itemizing has decreased enough to make the task not -worthwhile anymore but is still -worthwhile. The taxpayer would not want to perform the task now but still believes that she will do it tomorrow. 3. In the third stage, the task is neither nor -worthwhile. The taxpayer does not want to perform it neither now, nor later.

For this to happen, we need the task to stop being -worthwhile and still be -worthwhile in the next period. This is veried when the two following inequalities hold:

c(1 + r )t1 + (V k ) < 0

c(1 + r )t + V k > 0 Which can be rewritten as follows:


1 1

(V k ) t ( V k ) t 1 1<r < 1 c c Under this condition, the taxpayer keeps procrastinating on archiving the receipt until the cost of doing it is too large to justify itemizing altogether. This result holds even for a relatively small c and predicts that large benets are foregone even if the initial cost would not justify it for a rational taxpayer.

5.4

The Partially Naive Taxpayer

The partially naive taxpayer has self-control problems but is aware of them. She is able to look forward, solve the problem backwards and realize that at some point, the task will stop being -worthwhile prompting her to perform the task before it is not worthwhile anymore. Therefore her behavior is similar to that of the rational taxpayer. 27

5.5

Psychic Cost vs True Cost

My identication strategy shows that the cost of complying with taxes is very high and certainly higher than that calculated by the IRS. If taxpayers are truly not making any behavioral mistakes then the cost that I estimated is the true cost of tax collection, suggesting that it is highly inecient and needs reforming. On the other hand, if the true model of the taxpayers behavior is one for which the taxpayer makes a behavioral mistake by being time inconsistent, then one should be careful when dening the cost of complying with taxes. The cost that I identied is composed of the true cost of itemizing and a behavioral cost. This also means that the tax collection process is not necessarily inecient but rather that the behavior of the taxpayer drives the ineciency. If the government wants to reduce the cost of complying with taxes, it should not focus on the process itself (having a standard deduction rather than itemized deductions) but should focus on xing the taxpayers biases.

6
6.1

Alternative Explanations
Role of Information and Cognitive Abilities

The increase in the standard deduction amounts makes it so that some taxpayers who were itemizing deductions start claiming the standard deduction instead. This means that the taxpayers that are foregoing tax benets after the reform are well informed of the possibility of itemizing deductions and have the cognitive abilities to do so.

6.2

Audit Probabilities

Could it be that taxpayers believe that itemizers are more likely to be audited than individuals claiming the standard deduction? The probabilities of audit for this portion of the population are lower than 1% and are virtually the same for individuals whose deductions are close to the standard deduction and individuals who claim the standard deduction. Assume that an audit would require one full day of work i.e. approximately $80 in my sample. This is smaller than the foregone benets implying that individuals would have an audit probability greater than 1. 28

6.3

Other Reforms Aecting the Total Deduction Distribution?

Could there be any other exogenous variation aecting the distribution of itemized deductions in 1989 and contaminating my main identication strategy? It is unlikely. One rst check for this is to look at the two distributions and specically whether they are overlapping on the portion that is away from the standard deduction. If they are not, then it means that something else has happened besides the increase in the standard deduction threshold aecting everybody not only those that are close to the threshold. Fortunately, the two distributions are overlapping in regions away from the standard deduction suggesting that the only eect that I am capturing is that of the standard deduction threshold being increased. But for the sake of exhaustivity, I also look at all the reforms that could have aected the distributions to rule them out as possible contaminants of my identication strategy. The majority of the tax reforms happened following the TRA86 and were enacted in 1987. Among those, there were some deduction reforms. When comparing 1987 to 1989, I am controlling for the TRA86 reforms. But there might slow adjustment and lagged response in 1988 or 1989. To rule this-out, I look at the deduction reforms following TRA86 and nd that it is reasonable to assume that the adjustment is immediate. The deduction reforms enacted in 1987 are the following (source: IRS):

Increase of the threshold of medical deductions that are allowed from 5% in 1986 to 7.5% of ones AGI in 1987. There is no reason to assume that there will be a slow adjustment in this case: the medical expense deduction amount should drop on aggregate in 1987 because less of it is allowed but one should not expect it to drop further than 1987. Sales taxes are not deductible anymore. For similar reasons, one should observe a drop in the total deductions in 1987 as sales taxes were a large portion of it but there should be no lagged eect: there could be a lagged eect for the aggregate purchases of individuals in 1988 and 1989 but since sales taxes are not allowed as a deduction anymore, this - by denition - should not aect the level of deductions anymore. The home mortgage interest deduction is subject to a new limit. The home mortgage interest deductions for a given year are capped at the value of ones house (plus renovations). Anything in excess of the value of the house have to be deducted as personal 29

interest for which only 65% of the total value can be deducted. First, the IRS estimated that very few taxpayers were aected by this reform since it is very rare that ones home mortgage interest in one given year exceeds the total value of ones house. Second, there is no reason to expect a drop in levels in the subsequent years. If a person truly is aected by this reform, in 1987 she will be forced to claim less deduction than she was previously claiming. Mortgages are indeed less exible than regular sales and assume that this person cannot adjust her mortgage payments for the next two years and can only do so starting from 1989. What will happen in 1989? She will have an incentive to reduce her mortgage payments to the level at which she deduct all of it i.e. the level from 1987. Essentially, this would result in no real change in the levels after 1987 and can also be ruled-out as a possible contaminant of my identication strategy. There are no other reforms aecting directly or indirectly the amount of itemized deductions an individual can qualify for. Given the situation it is reasonable to assume that the only real change to the level of itemized deductions from 1987 to 1989 is the increase in the standard deduction threshold.

7
7.1

Policy Implications
Cost of Compliance

Policy makers had no precise estimates of the compliance cost. Most of the literature on the compliance cost is based on survey evidence and its usual shortcomings (Slemrod and Sorum (1985) and Blumenthal and Slemrod (1992)). To my knowledge, this is the rst paper to use a non-parametric approach along with administrative data to reveal the preferences of taxpayers over the compliance cost. The costs are large, informing the policy maker that the welfare lost because of compliance is of policy importance. The cost is also distortionary as it impacts individuals dierently: it varies with income, location in states with no state taxes etc. If the taxpayer is truly present biased, my model shows that the policy intervention should not necessarily target the collection system but rather the behavior of the individual. In light of my evidence, a policy that would target reducing the cost of lling out forms seems misguided since the majority of the cost is precisely due to record keeping. One approach 30

could be to require less evidence of expenses when the taxpayer itemizes. This would prove out to be ecient in reducing the compliance cost but is likely to result in more evasion. The policy maker has to trade o the cost that evasion imposes on society and the cost that compliance imposes on individuals. My model also shows that there are relatively inexpensive policy interventions that can signicantly reduce the cost of compliance. Advocates of pre-populated forms argue that they are likely to reduce evasion and mistakes by taxpayers. My results show that they are also likely to improve the taxpayer welfare by reducing the compliance cost. Two of the three most common deductions are state and local income taxes and mortgage interest payments. Both of them are third-party reported implying that the IRS knows the amount of deductions that the taxpayer qualies for. The use of electronic receipts is another channel through which record keeping costs can be further reduced. Some employers issue the W2 online and some banks provide an electronic 1098. Keeping track of an electronic document can be much easier than a paper one. This would only benet taxpayers who have access to the Internet but it would not hurt the rest and therefore constitutes a Pareto improvement

7.2

Screening Literature

There is a long tradition in public economics that emphasizes the benets of conditioning transfers on xed characteristics and more particularly imposing transaction costs when providing welfare to screen richer households from applying for them. To my knowledge there was no empirical evidence conrming that transaction costs are larger for richer households. This paper shows that it is the case and that such policy can be ecient. It warns however that transaction costs need to be chosen with care as they can be relatively large and can end up screening more income groups than optimal. They can also screen present-biased taxpayers versus rational ones rather poor taxpayers versus poorer ones. Similarly to Saez (2009) this paper also shows that details matter in designing a tax collection system. Details that in theory should not have any impact over the decisions can result in signicant behavioral distortions. 31

Conclusion

How heavy is the burden of tax compliance? I answer this question by non-parametrically calculating the cost of itemizing deductions. I show that it is in excess of $700 per taxpayer. Rich taxpayers have a larger cost, consistent with the fact that they have a higher hourly wage. Households with newborns are also more likely to switch to the standard deduction because they have less time available. The magnitude of the cost has important policy implications.

References
Bhargava, S. and D. Manoli (2011): Why are benets left on the table? assessing the role of information, complexity, and stigma on take-up with an irs eld experiment, Tech. rep., Working Paper. Blumenthal, M. and J. Slemrod (1992): The compliance cost of the US individual income tax system: A second look after tax reform, National Tax Journal, 185202. Feenberg, D. R. and J. S. Skinner (1989): Sources of IRA saving, National Bureau of Economic Research Cambridge, Mass., USA. Guyton, J. L., J. F. OHare, M. P. Stavrianos, and E. J. Toder (2003): Estimating the compliance cost of the US individual income tax, National Tax Journal, 673688. Jones, D. (2010): Inertia and overwithholding: explaining the prevalence of income tax refunds, Tech. rep., National Bureau of Economic Research. ODonoghue, T. and M. Rabin (1999): Doing it now or later, American Economic Review, 103124. ODonoghue, T. and M. Rabin (2008): Procrastination on long-term projects, Journal of Economic Behavior & Organization, 66, 161175. Pitt, M. M. and J. Slemrod (1991): The compliance cost of itemizing deductions: Evidence from individual tax returns, National Bureau of Economic Research Cambridge, Mass., USA. 32

Rees-Jones, A. (2013): Loss Aversion Motivates Tax Sheltering: Evidence From US Tax Returns, Available at SSRN. Saez, E. (2009): Details matter: The impact of presentation and information on the takeup of nancial incentives for retirement saving, American Economic Journal: Economic Policy, 1, 204228. Slemrod, J. and N. Sorum (1985): The compliance cost of the US individual income tax system, National Bureau of Economic Research Cambridge, Mass., USA.

33

Figure 1: Shallow Distribution of Deductions In the Neighborhood of the Standard Deduction 1980-1985

Frequency

Frequency

Frequency 0 3400 10000 Total Itemized Deductions in 1981 bin size of 200 20000 0 0

3400 10000 Total Itemized Deductions in 1980 bin size of 200

20000

3400 10000 Total Itemized Deductions in 1982 bin size of 500

20000

(a) 1980

(b) 1981

(c) 1982

Frequency

Frequency

Frequency 0 3400 10000 Total Itemized Deductions in 1984 bin size of 500 20000 0 0

34
0 3400 10000 Total Itemized Deductions in 1983 bin size of 500 20000 3540 10000 20000 Total Itemized Deductions in 1985 bin size of 500 30000

(d) 1983

(e) 1984

(f ) 1985

Notes: The gures above depict the distribution of deductions for itemizers ling jointly. The bin size is $500 and the red line depicts the standard deduction threshold for each year. Notice that the distribution is systematically shallow in the neighborhood of the standard deduction threshold.

Figure 2: Shallow Distribution of Deductions In the Neighborhood of the Standard Deduction 1986-1991

Frequency

Frequency

Frequency 0 3760 10000 20000 Total Itemized Deductions in 1987 bin size of 500 30000 0 0

3670

10000 20000 Total Itemized Deductions in 1986 bin size of 500

30000

5000 10000 20000 Total Itemized Deductions in 1988 bin size of 500

30000

(a) 1986

(b) 1987

(c) 1988

Frequency

Frequency

Frequency 0 5450 10000 20000 Total Itemized Deductions in 1990 bin size of 500 30000 0 0

35
0 5200 10000 20000 Total Itemized Deductions in 1989 bin size of 500 30000 5700 10000 20000 Total Itemized Deductions in 1991 bin size of 500 30000

(d) 1989

(e) 1990

(f ) 1991

Notes: The gures above depict the distribution of deductions for itemizers ling jointly. The bin size is $500 and the red line depicts the standard deduction threshold for each year. Notice that the distribution is systematically shallow in the neighborhood of the standard deduction threshold.

Figure 3: Shallow Distribution of Deductions In the Neighborhood of the Standard Deduction 1992-1997

Frequency

Frequency

Frequency 0 6200 10000 20000 Total Itemized Deductions in 1993 bin size of 500 30000 0 0

6000 10000 20000 Total Itemized Deductions in 1992 bin size of 500

30000

6350 10000 20000 Total Itemized Deductions in 1994 bin size of 500

30000

(a) 1992

(b) 1993

(c) 1994

Frequency

Frequency

Frequency 0 6700 10000 20000 Total Itemized Deductions in 1996 bin size of 500 30000 0 0

36
0 6550 10000 20000 Total Itemized Deductions in 1995 bin size of 500 30000 690010000 20000 30000 Total Itemized Deductions in 1997 bin size of 500 40000

(d) 1995

(e) 1996

(f ) 1997

Notes: The gures above depict the distribution of deductions for itemizers ling jointly. The bin size is $500 and the red line depicts the standard deduction threshold for each year. Notice that the distribution is systematically shallow in the neighborhood of the standard deduction threshold.

Figure 4: Shallow Distribution of Deductions In the Neighborhood of the Standard Deduction 1998-2003

Frequency

Frequency

Frequency 0 720010000 20000 30000 Total Itemized Deductions in 1999 bin size of 500 40000 0 0

710010000 20000 30000 Total Itemized Deductions in 1998 bin size of 500

40000

7350 20000 30000 Total Itemized Deductions in 2000 bin size of 500

40000

(a) 1998

(b) 1999

(c) 2000

Frequency

Frequency

Frequency 0 7850 20000 30000 Total Itemized Deductions in 2002 bin size of 500 40000 0 0

37
0 7600 20000 Total Itemized Deductions in 2001 bin size of 500 40000 9500 20000 30000 40000 Total Itemized Deductions in 2003 bin size of 500 50000

(d) 2001

(e) 2002

(f ) 2003

Notes: The gures above depict the distribution of deductions for itemizers ling jointly. The bin size is $500 and the red line depicts the standard deduction threshold for each year. Notice that the distribution is systematically shallow in the neighborhood of the standard deduction threshold.

Figure 5: Shallow Distribution of Deductions In the Neighborhood of the Standard Deduction 2004-2006

Frequency

Frequency

Frequency 0 10000 20000 30000 40000 Total Itemized Deductions in 2005 bin size of 500 50000 0 0

9700 20000 30000 40000 Total Itemized Deductions in 2004 bin size of 500

50000

10300 20000 30000 40000 Total Itemized Deductions in 2006 bin size of 500

50000

(a) 2004

(b) 2005

(c) 2006

38

Notes: The gures above depict the distribution of deductions for itemizers ling jointly. The bin size is $500 and the red line depicts the standard deduction threshold for each year. Notice that the distribution is systematically shallow in the neighborhood of the standard deduction threshold.

Figure 6: Distribution of Deductions for Itemizers Filing Jointly Before and After the 1988 Reform

Frequency 0 0 5200 Total Itemized Deductions bin size of 500 adjusted to account for inflation 1987 1989
Notes: (a) The rst graph depicts the distribution of deductions for itemizers ling jointly. The 1987 distribution is adjusted to account for ination in 1989. Notice that the 1989 distribution is lower than the 1987 distribution specically in the neighborhood of the standard deduction, whereas the two distributions are very similar when comparing them further away from the standard deduction.

39

Figure 7: Distribution of Deductions for Itemizers Filing Jointly Before and After the 1971 and 1972 Reforms

Frequency 0 1000 1500 2000 Total Itemized Deductions bin size of 200 adjusted to account for inflation 1970 1972 1971
Notes: This graph depicts the distribution of deductions for all itemizers as the standard deduction was the same across ling status. The 1970 and 1971 distributions are adjusted to account for ination. Notice that the 1970 and 1971 distributions are lower than the 1972 distribution specically in the neighborhood of the standard deduction, whereas the two distributions are very similar when comparing them further away from the standard deduction.

40

Figure 8: Distribution of Deductions for Itemizers Filing Jointly Before and After the 2003 Reform

Frequency 0 0 9700 Total Itemized Deductions bin size of 500 adjusted to account for inflation 2002 2004
Notes: This graph depicts the distribution of deductions for itemizers ling jointly. The 2004 distribution is adjusted to account for ination. Notice that the 2004 distributions are lower than the 2002 distribution specically in the neighborhood of the standard deduction, whereas the two distributions are very similar when comparing them further away from the standard deduction.

41

Figure 9: Lagged Response: Small Eect During Reform Year

Frequency 0 0

5000 Total Itemized Deductions bin size of 500 inflation adjusted 1987 1988

Frequency 0

9500 Total Itemized Deductions bin size of 500 inflation adjusted 2002 2003

Notes: The rst graph depicts the distribution of deductions for itemizers ling jointly in 1987 and 1988 and the second one for 2002 and 2003. Notice that the missing mass is smaller than in gure 6 and 8 showing that there is a lagged response to the reform

42

Figure 10: Placebo Test: No Missing Mass In Years With No Reforms


(a) 1990-1992 (b) 1991-1993

Frequency

Frequency 6000 Total Itemized Deductions bin size of 500 adjusted to account for inflation 1990 1992 0 6200 Total Itemized Deductions bin size of 500 adjusted to account for inflation 1991 1993

Frequency

Frequency

6350 Total Itemized Deductions bin size of 500 adjusted to account for inflation 1992 1994

43

(c) 1992-1994

(d) 1993-1995

6550 Total Itemized Deductions bin size of 500 adjusted to account for inflation 1993 1995

Notes: The gures above depict the distribution of deductions for itemizers ling jointly in years where no reforms aecting deductions took place. Notice that there is no missing mass in the neighborhood of the standard deduction.

Figure 11: Placebo Test: No Missing Mass In Years With No Reforms


(a) 1994-1996 (b) 1995-1997

Frequency

Frequency 6700 Total Itemized Deductions bin size of 500 adjusted to account for inflation 1994 1996 0 6900 Total Itemized Deductions bin size of 500 adjusted to account for inflation 1995 1997

Frequency

Frequency

7100 Total Itemized Deductions bin size of 500 adjusted to account for inflation 1996 1998

44

(c) 1996-1998

(d) 1997-1999

7200 Total Itemized Deductions bin size of 500 adjusted to account for inflation 1997 1999

Notes: The gures above depict the distribution of deductions for itemizers ling jointly in years where no reforms aecting deductions took place. Notice that there is no missing mass in the neighborhood of the standard deduction.

Figure 12: Placebo Test: No Missing Mass In Years With No Reforms


(a) 1998-2000 (b) 1999-2001

Frequency

Frequency 7350 Total Itemized Deductions bin size of 500 adjusted to account for inflation 1998 2000 0 7600 Total Itemized Deductions bin size of 500 adjusted to account for inflation 1999 2001

45

Notes: The gures above depict the distribution of deductions for itemizers ling jointly in years where no reforms aecting deductions took place. Notice that there is no missing mass in the neighborhood of the standard deduction.

Figure 13: Hypothetical Undistorted Density

density 0 0 total itemized deductions

Notes: I assume a hypothetical distribution of itemized deductions and costs and graph the eect of the cost on the distribution. The true distribution is unobserved and corresponds to what the distribution would look like if there was no cost. The second one is the observed distribution prior to the reform and the third one is after the reform. The red vertical lines correspond to the standard deduction threshold. The rst one is the pre-reform one and the second one is the post-reform one. The missing mass that I observe is the dierence between pre and post distribution on the right hand side of the post reform standard deduction threshold. The cost that I estimate is given by the dierence between the true distribution and the post-reform distribution on the right hand side of the post-reform standard deduction

46

Figure 14: Hypothetical Undistorted Density With Standard Deduction

density 0

SD pre reform total itemized deductions

Notes: Assuming that there is no cost to itemizing and there is a standard deduction, the density of itemizers should not be distorted in the neighborhood of the standard deduction and we should not observe any missing mass.

47

Figure 15: Eect of Cost of Itemizing On Hypothetical Density

density 0

SD pre total itemized deductions true (unobserved) distorted

Notes: I assume a hypothetical distribution of itemized deductions and costs and graph the eect of the cost on the distribution. The true distribution is unobserved and corresponds to what the distribution would look like if there was no cost. The second one is the observed distribution prior to the reform and the third one is after the reform. The red vertical lines correspond to the standard deduction threshold. The rst one is the pre-reform one and the second one is the post-reform one. The missing mass that I observe is the dierence between pre and post distribution on the right hand side of the post reform standard deduction threshold. The cost that I estimate is given by the dierence between the true distribution and the post-reform distribution on the right hand side of the post-reform standard deduction

48

Figure 16: Eect of Cost of Itemizing On Hypothetical Density With Reform

density 0

SD pre SD post total itemized deductions true (unobserved) postreform density prereform density

Notes: I assume a hypothetical distribution of itemized deductions and costs and graph the eect of the cost on the distribution. The true distribution is unobserved and corresponds to what the distribution would look like if there was no cost. The second one is the observed distribution prior to the reform and the third one is after the reform. The red vertical lines correspond to the standard deduction threshold. The rst one is the pre-reform one and the second one is the post-reform one. The missing mass that I observe is the dierence between pre and post distribution on the right hand side of the post reform standard deduction threshold. The cost that I estimate is given by the dierence between the true distribution and the post-reform distribution on the right hand side of the post-reform standard deduction

49

Figure 17: Reconstruction of the Counterfactual Density

Since no distortion at bin A, no adjustment of bin B

density

5 bins from SD

B A

5 bins from SD

SD pre SD post total itemized deductions true (unobserved) postreform density prereform density

Notes: This graph and the one on the following page illustrate the method that I use to reconstruct the counterfactual density (blue histograms). I consider the rst bin for which the pre-reform and post-reform years overlap. At this bin there is no distortion. This means that 5 bins away from the standard deduction there should be no distortion. This also means that 5 bins away from the pre-reform standard deduction, there should be no distortion. But this also corresponds to 3 bins away from the post-reform standard deduction. This implies that 3 bins away from the post-reform standard deduction, the pre-reform density is the true density. Similarly, when looking 4 bins away from the post-reform standard deduction, I nd a distortion. This implies in turn that 4 bins away from the pre-reform density, there should be a distortion of equal proportion to the same that I calculated 4 bins away from the post-reform standard deduction. I adjust the density that is 2 bins away from the post-reform standard deduction by this amount.

50

Figure 18: Reconstruction of the Counterfactual Density

Adjust bin B to correct for distortion calculated at bin A

B
density

4 bins from SD

4 bins from SD

SD pre SD post total itemized deductions true (unobserved) postreform density prereform density

Notes: See previous page

51

Figure 19: Reconstructed Distribution and Missing Mass in 1989

5200 Total Itemized Deductions bin size of 500 inflation adjusted 1989 reconstruced using 1987 distribution observed 1989

5200 Total Itemized Deductions bin size of 500 inflation adjusted missing mass observed 1989 reconstructed 1989

Notes: This graph depicts the reconstructed distribution in 1989 using the method that I outline in section 3.3 and the observed distribution for 1989. The missing mass that allows me to estimate the distribution of the cost is given by the area lying between the two curves

52

Figure 20: Reconstructed Distribution and Missing Mass in 2004

9700 Total Itemized Deductions bin size of 1000 inflation adjusted 2004 reconstruced using 2002 distribution observed 2004

9700 Total Itemized Deductions bin size of 1000 inflation adjusted missing mass observed 2004 reconstructed 2004

Notes: This graph depicts the reconstructed distribution in 2004 using the method that I outline in section 3.3 and the observed distribution for 2002. The missing mass that allows me to estimate the distribution of the cost is given by the area lying between the two curves

53

Figure 21: Relationship Between Income and Foregone Benets


500 0 35000 100 Foregone Benefits 200 300 400

40000

45000

50000 Income

55000

60000

0 35000

10

Hours Spent 20 30

40

50

40000

45000

50000 Income

55000

60000

Notes: (a) The rst graph depicts the increasing relationship between income and foregone benets: richer households are more likely to forego deductions (b) The second graph divides the foregone benet by the hourly wage of each household and depicts the hours spent itemizing by each income group.

54

Table 1: Standard Deduction Amounts Across Years For Joint Filers


Year 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 Standard deduction 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1500 2000 2000 2000 2600 2800 3200 3200 3400 3400 3400 3400 3400 Growth Rate 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 50.00% 33.33% 0.00% 0.00% 0.30% 0.08% 0.14% 0.00% 0.06% 0.00% 0.00% 0.00% 0.00% Year 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Standard deduction 3400 3540 3670 3760 5000 5200 5450 5700 6000 6200 6350 6550 6700 6900 7100 7200 7350 7600 7850 9500 9700 10000 10300 Growth Rate 0.00% 4.12% 3.67% 2.45% 32.98% 4.00% 4.81% 4.59% 5.26% 3.33% 2.42% 3.15% 2.29% 2.99% 2.90% 1.41% 2.08% 3.40% 3.29% 21.02% 2.11% 3.09% 3.00%

Notes: The table shows the standard deduction amounts from 1961 to 2006 for joint lers and its growth rate. The years that I use to identify the cost are in bold.

55

Table 2: Determinants of Likelihood of Switching to Standard Deduction With and Without Controls
Outcome: newborn x close to SD newborn high ratio x close to SD high ratio AGI x close to SD 56 no preparer x close to SD no preparer close to SD Controls R2 N Clusters (individual) Likelihood of switching to the standard deduction: {0,1} (1) (2) (3) (4) (5) (6) (7) (8) (9) 0.10*** 0.10*** 0.09*** (0.01) (0.01) (0.02) 0.01* 0.00 0.01 (0.01) (0.01) (0.01) -0.29*** -0.30*** -0.17*** (0.07) (0.06) (0.06) 0.06*** 0.08*** 0.08*** (0.00) (0.01) (0.01) 0.09*** 0.11*** 0.11*** (0.01) (0.01) (0.01) -0.11*** -0.11*** -0.08*** (0.02) (0.02) (0.02) -0.03*** -0.02*** -0.03*** (0.00) (0.00) (0.00) 0.85*** 0.81*** 0.88*** 0.85*** 0.64*** 0.54*** 0.90*** 0.87*** 0.61*** (0.01) (0.01) (0.01) (0.01) (0.02) (0.02) (0.01) (0.01) (0.02) No Yes No Yes No Yes No Yes Yes 0.563 0.586 0.571 0.597 0.575 0.603 0.568 0.590 0.617 16157 16157 16157 16157 16157 16157 16157 16157 16157 4357 4357 4357 4357 4357 4357 4357 4357 4357

Notes: Each cell of this table reports an estimate from a separate regression of the likelihood of switching to the standard deduction. newborn indicates whether the family experienced a birth during the tax year. high ratio corresponds to a dummy variable indicating that a household has a proportion of state and local income taxes and mortgage payment deductions in excess of 80% of total deductions. no preparer indicates that the individual did not use the help of tax preparer during the previous ling season. close to SD indicates that the taxpayers total deductions were close to the standard deduction threshold the previous year

Table 3: Deduction Growth Rate and Composition By Distance to Standard Deduction


Deduction Type Total Interest Paid State and Local Income Taxes Charitable Donations Real Estate Taxes Medical Expenses Total Growth Growth Rate 1.4% 4.8% 5.8% 13.1% 17.1% Proportion Below 10th bin 46% 18% 13% 12% 4% 4.86% Proportion Above 10th bin 52% 16% 11% 10% 3% 4.3%

57

You might also like