Simulating Income Elasticity Estimates

John M. Parman

24 September 2019

Creating a Sample of Fathers and Sons

First, let’s construct a sample of fathers and sons, matching certains characteristics of the joint distribution of father and son earnings in the United States. We will start by generated a sample of 10,000 fathers whose incomes are distributed log normally with a mean and standard deviation equal to that given in the sample if Solon (1992).

. clear

. set obs 5000
number of observations (_N) was 0, now 5,000

. gen id = _n

. set seed 24892217

The set obs command generates an empty 5,000 observation dataset. The gen id command simply creates a unique id number for each observation equal to its observation number. Finally, the set seed specifies the seed for Stata’s random number generator. (Note: I am setting the seed solely so that the same results get generated each time I compile this document. I chose the seed number based on the serial number of a bill in my pocket).

Now we are going to generate fathers’ incomes assuming that earnings are distributed log normal. We can do this by creating log income as a random variable using Stata’s rnormal function, using the mean and standard deviation from Table 1 in Solon (1992).

. gen log_father_inc = rnormal(10.1,0.69)

As for sons’ earnings, we would like those to be a function of fathers’ earnings. We will assume that sons’ log earnings are linearly related to fathers’ log earnings with a mean zero, normally distributed error term:

The value of β1 can be taken directly from the estimated coefficient in Table 2 of Solon (1992). The value of β0 is then simply equal to the mean log income for sons in Table 1 minus β1 times the mean log income for fathers in Table 1. Finally, we can choose the standard deviation for ε that, once used in the above equation, generates son incomes that match the standard deviation of log son earnings given in Table 1. This leads to a value of 0.413 for β1, 5.58 for β0 and 0.94 for σε. With these values, we can now generate sons’ log income values:

. gen son_epsilon = rnormal(0,0.94)

. gen log_son_inc = 5.58+.413 * log_father_inc + son_epsilon

. gen father_inc = exp(log_father_inc)

. gen son_inc = exp(log_son_inc)

Let’s take a quick look at our generated incomes, summarizing the data, looking at the correlation between father and son incomes, and then looking at the income distributions graphically.

. sum log_son_inc log_father_inc

    Variable │        Obs        Mean    Std. Dev.       Min        Max
─────────────┼─────────────────────────────────────────────────────────
 log_son_inc │      5,000    9.758711    .9891708   6.166673   13.89211
log_father~c │      5,000    10.09548    .6920174   7.581927   12.72381

. corr log_son_inc log_father_inc
(obs=5,000)

             │ log_so~c log_fa~c
─────────────┼──────────────────
 log_son_inc │   1.0000
log_father~c │   0.3022   1.0000


. histogram father_inc, frequency ytitle(Frequency) xtitle(Father's income)
(bin=36, start=1962.4065, width=9268.9957)

. graph export father_inc.png, width(500) replace
(file father_inc.png written in PNG format)

. histogram log_father_inc, frequency ytitle(Frequency) xtitle(Father's log income)
(bin=36, start=7.5819268, width=.14283017)

. graph export log_father_inc.png, width(500) replace
(file log_father_inc.png written in PNG format)

. histogram son_inc, frequency ytitle(Frequency) xtitle(Son's income)
(bin=36, start=476.5979, width=29975.997)

. graph export son_inc.png, width(500) replace
(file son_inc.png written in PNG format)

. histogram log_son_inc, frequency ytitle(Frequency) xtitle(Son's log income)
(bin=36, start=6.1666732, width=.21459554)

. graph export log_son_inc.png, width(500) replace
(file log_son_inc.png written in PNG format)

The distribution of father’s income The distribution of father’s log income

The distribution of son’s income The distribution of son’s income

To take a graphical look at the relationship between father and son log incomes, we could use a standard scatterplot. However, with 5,000 observations, a scatterplot will be somewhat uninformative (go ahead and try it yourself using Stata’s scatter command if you would like to see why). Instead, we can use a package for Stata to create a binned scatterplot:

. binscatter log_son_inc log_father_inc, xtitle("Father's log income") ytitle("Son's log income")

. graph export father_son_scatter.png, width(500) replace
(file father_son_scatter.png written in PNG format)
Binned scatterplot of son’s and father’s log income
Binned scatterplot of son’s and father’s log income

Notice the nice, linear relationship between son’s log income and father’s log income. This should not come as a surprise given that this is how we constructed son’s log income in the first place. If you would like to use the binscatter program on your own computer, you can install it with the following command: ssc install binscatter.

The Effect of Rounding on Estimated Intergenerational Income Elasticities

First let’s confirm that our simulated data match the real data used in Solon (1992). To check, let’s run a regression to recover the intergenerational income elasticity for the sample:

. reg log_son_inc log_father_inc

      Source │       SS           df       MS      Number of obs   =     5,000
─────────────┼──────────────────────────────────   F(1, 4998)      =    502.46
       Model │  446.814043         1  446.814043   Prob > F        =    0.0000
    Residual │  4444.50226     4,998  .889256155   R-squared       =    0.0913
─────────────┼──────────────────────────────────   Adj R-squared   =    0.0912
       Total │  4891.31631     4,999  .978458953   Root MSE        =      .943

───────────────┬────────────────────────────────────────────────────────────────
   log_son_inc │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
───────────────┼────────────────────────────────────────────────────────────────
log_father_inc │    .432021   .0192732    22.42   0.000      .394237     .469805
         _cons │   5.397252   .1950292    27.67   0.000     5.014909    5.779595
───────────────┴────────────────────────────────────────────────────────────────

From the regression results, we see that we get a coefficient on father’s log income 0.43. Thus we have an intergenerational income elasticity (roughly) equal to that of Solon (1992). We can think of this as the true intergenerational income elasticity for our sample. Now we will consider what happens with two common problems with the way incomes are recorded in survey data: rounding and censoring.

First, we will explore the effects of rounding. Suppose that the survey provides options for income that are in $5,000 intervals (alternatively, assume that people tend to round their incomes to the nearest $5,000). We can generate rounded versions of the father and son incomes using Stata’s round function and then take the natural log to get rounded log income values:

. gen rounded_father_inc = round(father_inc,5000)

. gen rounded_son_inc = round(son_inc,5000)

. gen log_rounded_father_inc = ln(rounded_father_inc)
(4 missing values generated)

. gen log_rounded_son_inc = ln(rounded_son_inc)
(121 missing values generated)

Now we can use these new variables to re-estimate our intergenerational income elasticity:

. reg log_rounded_son_inc log_rounded_father_inc

      Source │       SS           df       MS      Number of obs   =     4,876
─────────────┼──────────────────────────────────   F(1, 4874)      =    418.22
       Model │  321.010598         1  321.010598   Prob > F        =    0.0000
    Residual │  3741.14468     4,874  .767571743   R-squared       =    0.0790
─────────────┼──────────────────────────────────   Adj R-squared   =    0.0788
       Total │  4062.15527     4,875   .83326262   Root MSE        =    .87611

───────────────────────┬────────────────────────────────────────────────────────────────
   log_rounded_son_inc │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
───────────────────────┼────────────────────────────────────────────────────────────────
log_rounded_father_inc │   .3714719   .0181646    20.45   0.000     .3358611    .4070826
                 _cons │   6.073227   .1839989    33.01   0.000     5.712506    6.433947
───────────────────────┴────────────────────────────────────────────────────────────────

Notice that the measurement error we introduced by rounding incomes has led to an attenuation bias for the intergenerational income elasticity, substantially reducing the estimated coefficient on father’s log income to 0.37. Using rounded incomes, without acknowledging the impact of this rounding on the estimation, would lead us to conclude there is significantly more income mobility than is actually in the underlying data.

The rounding exercise also demonstrates another problem. If you look closely at the commands generating new log incomes, you will notice that several missing values were generated. These missing values are cases where the income was rounded to zero and the natural log of zero does not exist, hence the missing value for log income. One criticism of the intergenerational income elasticity is that its calculation requires dropping individuals with no earnings.

The Effect of Censoring on Estimated Intergenerational Income Elasticities

Now we will consider what happens when we top code incomes, a common practice in income datasets. We will impose a top code of $100,000 in our dataset using Stata’s min function (all incomes above $100,000 simply get coded as $100,000):

. gen censored_father_inc = min(rounded_father_inc,100000)

. gen censored_son_inc = min(rounded_son_inc,100000)

. gen log_censored_father_inc = ln(censored_father_inc)
(4 missing values generated)

. gen log_censored_son_inc = ln(censored_son_inc)
(121 missing values generated)

. reg log_censored_son_inc log_censored_father_inc

      Source │       SS           df       MS      Number of obs   =     4,876
─────────────┼──────────────────────────────────   F(1, 4874)      =    413.22
       Model │  296.981315         1  296.981315   Prob > F        =    0.0000
    Residual │  3502.90966     4,874  .718692995   R-squared       =    0.0782
─────────────┼──────────────────────────────────   Adj R-squared   =    0.0780
       Total │  3799.89097     4,875  .779464815   Root MSE        =    .84776

────────────────────────┬────────────────────────────────────────────────────────────────
   log_censored_son_inc │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
────────────────────────┼────────────────────────────────────────────────────────────────
log_censored_father_inc │   .3635857    .017886    20.33   0.000      .328521    .3986504
                  _cons │   6.141692   .1810735    33.92   0.000     5.786706    6.496678
────────────────────────┴────────────────────────────────────────────────────────────────

Notice that this further reduces our estimated intergenerational income elasticity to 0.36. The main takeaway is the same, whether we introduce mismeasurement through rounding or censoring of the data, any mismeasurement leads to the appearance of a weaker relationship between father and son’s incomes. This leads to a lower estimated intergenerational income elasticity, leading us to the false conclusion that there is greater mobility. However, this greater estimated mobility is simply the product of measurement, it has nothing to do with sons’ fortunes being less closely tied to those of their fathers.

The Effect of Transitory Fluctuations in Income

Now we will turn our attention to the difference between average income over the life cycle and income in the current period. In general, annual income over the life cycle follows a concave shape, with earnings rising over the early career of an individual and then falling in the final years of the career. This suggests that observing earnings very early or very late in an individual’s career will lead to underestimates of average earnings and observing earnings in the peak of a career will lead to overestimates of average earnings. This problem can be handled reasonably well by controlling for a quadratic in an individual’s age.

More problematic is that individuals experience transitory fluctuations in income over their careers, temporary rises and falls in income unrelated to overall trends over the life cycle. To examine the effect these transitory fluctuations have on the estimated income elasticity, let’s introduce some random ups and downs in father and son’s earnings. We can introduce these transitory fluctuations by treating our son_inc and father_inc variables as our average lifetime annual income and creating a new observation of annual income that includes a random increase or decrease relative to this average income.

. gen father_income_shock = (runiform()-.5)

. gen transitory_father_inc = father_inc * (1+father_income_shock)

. gen log_transitory_father_inc = ln(transitory_father_inc)

. gen son_income_shock = (runiform()-.5)

. gen transitory_son_inc = son_inc * (1+son_income_shock)

. gen log_transitory_son_inc = ln(transitory_son_inc)    

In the above commands, we have adjusted incomes by a random percentage ranging with a uniform probability between negative 50% and positive 50%. We can think of these new incomes as observations of a single year of income and the original income variables as observations of the true lifetime average annual income. Now we can see the impact of using one year’s earnings rather than average annual earnings on our estimate of intergenerational income elasticity:

. reg log_transitory_son_inc log_transitory_father_inc

      Source │       SS           df       MS      Number of obs   =     5,000
─────────────┼──────────────────────────────────   F(1, 4998)      =    375.64
       Model │   378.21947         1   378.21947   Prob > F        =    0.0000
    Residual │  5032.31492     4,998  1.00686573   R-squared       =    0.0699
─────────────┼──────────────────────────────────   Adj R-squared   =    0.0697
       Total │  5410.53439     4,999  1.08232334   Root MSE        =    1.0034

──────────────────────────┬────────────────────────────────────────────────────────────────
   log_transitory_son_inc │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
──────────────────────────┼────────────────────────────────────────────────────────────────
log_transitory_father_inc │   .3632171   .0187405    19.38   0.000     .3264776    .3999566
                    _cons │   6.063303   .1888758    32.10   0.000     5.693024    6.433583
──────────────────────────┴────────────────────────────────────────────────────────────────

Our estimated intergenerational elasticity is now reduced to 0.36, a rather substantial attenuation bias that would lead us to overestimate the extent of intergenerational mobility. For how transitory fluctuations impact real data, consider Table 2 in Solon (1992), our source for our empirical elasticity estimate. The estimates in this table demonstrate a clear increase in estimated intergenerational income elasticities as more periods are used to construct average incomes. In a more recent example, Mazumder (2005) finds large changes in the intergenerational income elasticity when using a single observation of annual income versus an average of several years of annual income observations (see Figure 4). One important difference is that our random income shocks may be a bit different than real world random income shocks. In particular, Mazumder notes that transitory income shocks may exhibit some persistence. This autocorrelation in real-world transitory shocks will further weaken the association between father and son incomes, creating a greater attenuation of the intergenerational income elasticity estimate.

Exercises

  1. Explore the way that averaging over multiple income observations helps mitigate the attenuation bias due to transitory fluctuations in income. Generate four more years of income data for the fathers and son, each containing a random transitory fluctuation around the true father and son incomes. Plot the estimated intergenerational earnings elasticities from using one, two, three, four and five years worth of income data.
  2. Suppose that these income shocks exhibit persistence. Generate five years of earnings data for the fathers and sons assuming that the income shocks follow an AR(1) process. Plot the estimated interge earnings elasticities from using one through five years of income data and interpret any differences with what you found in Question 1.
  3. One problem with a focus on log income is that we end up dropping individuals with zero earnings. Suppose that there is a 5% unemployment rate. In other words 5% of fathers and sons will earn zero dollars in a given year instead of our simulated log incomes and will be dropped from any intergenerational earnings elasticity estimates. Demonstrate how introducing unemployment effects the estimated intergenerational earnings elasticity if unemployment is random. Clearly explain how you are modeling unemployment.
  4. Repeat (3) assuming that unemployment is more likely for individuals in the left tail of the earnings distribution.
  5. Repeat (3) assuming that fathers who are more likely to be unemployed also have sons who are more likely to be unemployed.
  6. For the sake of brevity, I only looked at one set of randomly generated incomes. It is quite possible that I happened to draw a rather extreme set of random incomes. Ideally, we should repeat the exercise many times and look at the distribution of estimated intergenerational income elasticities for each scenario. Repeat the simulations an additional 100 times, with different random number seeds each time, and calculate the mean and standard deviation of the resulting estimated intergenerational income elasticities when looking at the true incomes, rounded incomes, truncated incomes and incomes with transitory shocks.