Professional Documents
Culture Documents
(Anti)Fragility- N. N. Taleb.nb
1
Risk is Not in The Past (the Turkey Problem)
Introduction: Fragility, not Statistics
Fragility (Chapter 2) can be defined as an accelerating sensitivity to a harmful stressor: this response plots as a concave curve and mathematically culminates in more harm than benefit from the disorder cluster [(i) uncertainty, (ii) variability, (iii) imperfect, incomplete knowledge, (iv) chance, (v) chaos, (vi) volatility, (vii) disorder, (viii) entropy, (ix) time, (x) the unknown, (xi) randomness, (xii) turmoil, (xiii) stressor, (xiv) error, (xv) dispersion of outcomes, (xvi) unknowledge. Antifragility is the opposite, producing a convex response that leads to more benefit than harm. We do not need to know the history and statistics of an item to measure its fragility or antifragility, or to be able to predict rare and random ('black swan') events. All we need is to be able to assess whether the item is accelerating towards harm or benefit. The relation of fragility, convexity and sensitivity to disorder is thus mathematical and not derived from empirical data.
The risk of breaking of the coffee cup is not necessarily in the past time series of the variable; if fact surviving objects have to have had a rosy past.
The problem with risk management is that past time series can be (and actually are) unreliable. Some finance journalist (Bloomberg) was commenting on my statement in Antifragile about our chronic inability to get the risk of a variable from the past with economic time series. Where is he going to get the risk from since we cannot get it from the past? from the future?, he wrote. Not really, you finance-imbecile: from the present, the present state of the system. This explains in a way why the detection of fragility is vastly more potent than that of risk -and much easier. But this is not just a problem with journalism. Naive inference from time series is incompatible with rigorous statistical inference; yet workers with time series believe that it is statistical inference.
Turkey Problems
N Definition: Take, as of time T, a standard sequence X= 8Xt0 +i Dt <i=0 as the discretely monitored history of a function of the process Xt X HA, f L is defined as over the interval (t0 , TD , T=t0 +N Dt, the estimator MT
(Anti)Fragility- N. N. Taleb.nb
X MT HA, f L
where 1 A : X {0,1} is an indicator function taking values 1 if X A and 0 otherwise, and f is a function of X. f(X)=1, f(X)=X, and f(X) = X N correspond to the probability, the first moment, and N th moment, respectively.
X a) Standard Estimator. Mt (A,f) where f(x) =x and A is defined on the domain of the process X, standard measures from x, such as moments of order z, etc., "as of period" T, The measure might be useful for the knowledge of a process, but remain insufficient for decision making as the decision-maker may be concerned for risk management purposes with the left tail (for distributions that are not entirely skewed, such as purely loss functions such as damage from earthquakes, terrorism, etc. X b) Standard Risk Estimator. The shortfall S= E[M|X<K] estimated by MT HA, f L
N i=0 1A Xt0 +i Dt N i=0 1A
Criterion: The measures M or S are considered to be an estimator over interval (t- N Dt, t] if and only if holds in expectation over the period Xt+i Dt , across counterfactuals of the process, with a threshold x, so |E[MX Dt (A,f)]- MX (A, f)]|< x . In other words, it should have some t+i t stability for the estimator not to be considered random: it estimates the true value of the variable. This is standard sampling theory. Actually, it is at the core of statistics. Let us rephrase: Standard statistical theory doesnt allow claims on estimators made in a given set unless these can generalize, that is, reproduce out of sample, into the part of the series that has not taken place (or not seen), i.e., for time series, for t>t.
Take the measure MtX HH-, L, X 4 L of the fourth noncentral moment MtX HH-, L, X 4 L
N i=0 HXt-i Dt L4
N N
and the N-sample maximum quartic observation Max{Xt-i Dt 4 < . Q(N) is the contribution of the maximum quartic variations
i=0
i=0
(Anti)Fragility- N. N. Taleb.nb
VARIABLE Silver SP500 CrudeOil Short Sterling Heating Oil Nikkei FTSE JGB Eurodollar Depo 1M Sugar 11 Yen Bovespa Eurodollar Depo 3M CT DAX
Q HMax Quartic Contr.L 0.94 0.79 0.79 0.75 0.74 0.72 0.54 0.48 0.31 0.3 0.27 0.27 0.25 0.25 0.2
N HyearsL 46. 56. 26. 17. 31. 23. 25. 24. 19. 48. 38. 16. 28. 48. 18.
Description of dataset:
Naively, the fourth moment expresses the stability of the second moment. The higher the variations the higher... For a Gaussian (i.e., the distribution of the square of a Chi-square distributed variable) show the maximum contribution should be around .008 .0028
M@t+1D 0.004
0.003
0.002
0.001
(Anti)Fragility- N. N. Taleb.nb
M@t+1D 0.004
0.003
0.002
0.001
0.001
0.002
0.003
0.004
M@tD
Fig 1 Comparing M[t-1, t] and M[t,t+1], where t= 1year, 252 days, for macroeconomic data using extreme deviations, A= (- ,-2 standard deviations (equivalent)], f(x) = x (replication of data from The Fourth Quadrant, Taleb, 2009)
0.0004
0.0002
0.0001
0.0001
0.0002
0.0003
0.0004
0.0005
M@tD
Fig 2 This are a lot worse for large deviations A= (- ,-4 standard deviations (equivalent)], f(x) = x M@t+1D 0.030
0.025
0.020
0.015
0.010
0.005
M@tD 0.005 0.010 0.015 0.020 0.025 0.030 Fig 3 The regular is predictive of the regular, that is mean deviation. Comparing M[t] and M[t+1 year] for macroeconomic data using regular deviations, A= (- ,) , f(x)= |x|
(Anti)Fragility- N. N. Taleb.nb
10
400
600
800
1000
When the generating process is powerlaw with low exponent, plenty of confusion can take place. For instance, take Pinker(2011) claiming that the generating process has a tail exponent ~1.15 and drawing quantitative conclusions from it. The next two figures show the realizations of two subsamples, one before, and the other after the turkey problem, illustrating the inability of a set to deliver true probabilities.
Fig x First 100 years (Sample Path): A Monte Carlo generated realization of a process of the "80/20 or 80/02 style", that is tail exponent a= 1.1
Fig x: The Turkey Surprise: Now 200 years, the second 100 years dwarf the first; these are realizations of the exact same process, seen with a longer widow and at a different scale.
(Anti)Fragility- N. N. Taleb.nb
(Anti)Fragility- N. N. Taleb.nb
2
Preasymptotics and The Central Limit in the Real World
N Xi - N m i=1 s
D
N H0, 1L as n
Where is converges "in distribution". Granted convergence "in distribution" is about the weakest form of convergence. Effectively we are dealing with a double problem. The first, as uncovered by Jaynes, corresponds to the abuses of measure theory: Some properties that hold at infinity might not hold in all limiting processes --a manifestation of the classical problem of uniform and pointwise convergence. Jaynes 2003 (p.44):"The danger is that the present measure theory notation presupposes the infinite limit already accomplished, but contains no symbol indicating which limiting process was used (...) Any attempt to go directly to the limit can result in nonsense". Granted Jaynes is still too Platonic (he falls headlong for the Gaussian by mixing thermodynamics and information). But we accord with him on this point --along with the definition of probability as information incompleteness, about which later. The second problem is that we do not have a "clean" limiting process --the process is itself idealized. Now how should we look at the Central Limit Theorem? Let us see how we arrive to it assuming "independence".
PB- u Z =
n Xi i=0 n s
uF =
1 2p
e 2 Z -u
Z2
(Anti)Fragility- N. N. Taleb.nb
1 - e- 2 Z for - u z u
-u
Z2
inside the "tunnel" [-u,u] --the odds of falling inside the tunnel itself and
u - u
on n not x.
Since C(N+M)=C(N)+C(M), the additivity of the Log Characteristic function under convolution makes it easy to see the speed of the convergence to the Gaussian. Fat tails implies that higher moments implode --not just the 4th . Table of Normalized Cumulants -Speed of Convergence (Dividing by sn where n is the order of the cumulant). Distribution PDF N - convoluted Log Characteristic 2 nd Cum 3 rd 4 th Normal@m, sD
Hx-mL2 2 s2
PoissonHlL
-l lx x!
ExponentialHlL -x l l
z
GHa, bL
b-a
x b
xa-1
2p s
GHaL l l- z
N log 1 0 0
z m-
z2 s 2 2
N logII-1+ 1
1 Nl 1 N 2 l2
Ml
N logJ 1
2l N 3! l2 N2
N logHH1 - b zL-a L 1
2 abN 3! a2 b2 N 2
10
(Anti)Fragility- N. N. Taleb.nb
5 th 6 th 7 th 8 th 9 th 10 th Distribution
1 N 3 l3 1 N 4 l4 1 N 5 l5 1 N 6 l6 1 N 7 l7 1 N 8 l8
4! l3 N3 5! l4 N4 6! l5 N5 7! l6 N6 8! l7 N7 9! l8 N8
4! a3 b3 N 3 5! a4 b4 N 4 6! a5 b5 N 5 7! a6 b6 N 6 8! a7 b7 N 7 9! a8 b8 N 8
StudentTH3L
x2 2 s2 2
StudentTH4L
2 p s1
+ H1 - pL
z2 s1 2 2
6
z2 s 2 2 2
3
2
2 p s2 -
p Ix2 +3M
12 I 3 zN
1 x2 +4
52
N log p
+ H1 - pL
N JlogJ 3 z + 1N -
N logH2 z2 K2 H2 zLL
P@X > aD
s2 a2 1 n2
P@X > n s D
Which effectively accommodate power laws but puts a bound on the probability distribution of large deviations --but still significant.
8. 1011 2. 1015 9. 10
18
(Anti)Fragility- N. N. Taleb.nb
11
10
1. 1023
100
Let us proceed with a simple example. The Extremum of a Gaussian variable: Say we generate N Gaussian variables 8Zi <N with mean 0 and unitary standard deviation, and take i=1 the highest value we find. We take the upper bound Ej for the N-size sample run j E j = Max 9Zi, j =i=1 Assume we do so M times, to get M samples of maxima for the set E E = 9Max 9Zi, j =i=1 = j=1 The next figure will plot a histogram of the result.
N M N
Figure 1: Taking M samples of Gaussian maxima; here N= 30,000, M=10,000. We get the Mean of the maxima = 4.11159 Standard Deviation= 0.286938; Median = 4.07344
-x+a
Let us fit to the sample an Extreme Value Distribution (Gumbel) with location and scale parameters a and b, respectively: f(x;a,b) =
- b +
-x+a b
12
(Anti)Fragility- N. N. Taleb.nb
Now let us generate, exactly as before, but change the distribution, with N random powerlaw distributed variables Zi , with tail exponent m=3, generated from a Student T Distribution with 3 degrees of freedom. Again, we take the upper bound. This time it is not the Gumbel, but the Frchet distribution that would fit the result, Frchet f(x; a,b)=
x -a b
aJ N
b
-1-a
, for x>0
Figure 3: Fitting a Frchet distribution tothe Student T generated with m=3 degrees of freedom. The Frechet distribution a=3, b=32 fits up to higher values of E. But next two graphs shows the fit more closely.
(Anti)Fragility- N. N. Taleb.nb
13
Figure 5: Q-Q plot. Fits up to extremely high values of E, the rest of course owing to sample insuficiency for extremely large values, a bias that typically causes the underestimation of tails, as the points tend to fall to the right.
How Extreme Value Has a Severe Inverse Problem In the Real World
In the previous case we start with the distribution, with the assumed parameters, then get the corresponding values, as these risk modelers do. In the real world, we dont quite know the calibration, the a of the distribution, assuming (generously) that we know the distribution. So here we go with the inverse problem. The next table illustrates the different calibrations of PK the probabilities that the maximum exceeds a certain value K (as a multiple of b under different values of K and a.
14
(Anti)Fragility- N. N. Taleb.nb
a 1. 1.25 1.5 1.75 2. 2.25 2.5 2.75 3. 3.25 3.5 3.75 4. 4.25 4.5 4.75 5.
1 P>3 b
1 P>10 b
1 P>20 b
3.52773 4.46931 5.71218 7.3507 9.50926 12.3517 16.0938 21.0196 27.5031 36.0363 47.2672 62.048 81.501 107.103 140.797 185.141 243.5
10.5083 18.2875 32.1254 56.7356 100.501 178.328 316.728 562.841 1000.5 1778.78 3162.78 5623.91 10 000.5 17 783.3 31 623.3 56 234.6 100 001.
20.5042 42.7968 89.9437 189.649 400.5 846.397 1789.35 3783.47 8000.5 16 918.4 35 777.6 75 659.8 160 000. 338 359. 715 542. 1.51319 106 3.2 106
Consider that the error in estimating the a of a distribution is quite large, often > 1/2, and typically overstimated. So we can see that we get the probabilities mixed up > an order of magnitude. In other words the imprecision in the computation of the a compounds in the evaluation of the probabilities of extreme values.
(Anti)Fragility- N. N. Taleb.nb
15
3
On the Difference Between Binaries and Vanillas
This explains how and where prediction markets (or, more general discussions of betting matters) do not correspond to reality and have little to do with exposures to fat tails and Black Swan effects. Elementary facts, but with implications. This show show, for instance, the long shot bias is misapplied in real life variables, why political predictions are more robut than economic ones. This discussion is based on Taleb (1997) showing the difference between a binary and a vanilla option.
Definitions
A binary bet (or just a binary or a digital): a outcome with payoff 0 or 1 (or, yes/no, -1,1, etc.) Example: prediction market, election, most games and lottery tickets. Also called digital. Any statistic based on YES/NO switch. Binaries are effectively bets on probability. They are rarely ecological, except for political predictions. (More technically, they are mapped by the Heaviside function.) A exposure or vanilla: an outcome with no open limit: say revenues, market crash, casualties from war, success, growth, inflation, epidemics... in other words, about everything. Exposures are generally expectations, or the arithmetic mean, never bets on probability, rather the pair probability payoff A bounded exposure: an exposure(vanilla) with an upper and lower bound: say an insurance policy with a cap, or a lottery ticket. When the boundary is close, it approaches a binary bet in properties. When the boundary is remote (and unknown), it can be treated like a pure exposure. The idea of clipping tails of exposures transforms them into such a category.
The Problem
The properties of binaries diverge from those of vanilla exposures. This note is to show how conflation of the two takes place: prediction markets, ludic fallacy (using the world of games to apply to real life), 1. They have diametrically opposite responses to skewness. 2. They repond differently to fat-tailedness (sometimes in opposite directions). Fat tails makes binaries more tractable. 3. Rise in complexity lowers the value of the binary and increases that of the exposure.
Some direct applications: 1- Studies of long shot biases that typically apply to binaries should not port to vanillas. 2- Many are surprised that I find many econometricians total charlatans, while Nate Silver to be immune to my problem. This explains why. 3- Why prediction markets provide very limited information outside specific domains. 4- Etc.
One can hold beliefs that a variable can go lower yet bet that it is going higher. Simply, the digital and the vanilla diverge. PHX > X0 L> , but EHXL < EHX0 L. This is normal in the presence of skewness and extremely common with economic variables. Philosophers have a related problem called the lottery paradox which in statistical terms is not a paradox.
The Elementary Fat Tails Mistake
1 2
A slightly more difficult problem. When I ask economists or social scientists, what happens to the probability of a deviation >1s when you fatten the tail (while preserving other properties)?, almost all answer: it increases (so far all have made the mistake). Wrong. They miss the idea that fat tails is the contribution of the extreme events to the total properties, and that it is a pair probability payoff that matters, not just probability.
16
(Anti)Fragility- N. N. Taleb.nb
A slightly more difficult problem. When I ask economists or social scientists, what happens to the probability of a deviation >1s when you fatten the tail (while preserving other properties)?, almost all answer: it increases (so far all have made the mistake). Wrong. They miss the idea that fat tails is the contribution of the extreme events to the total properties, and that it is a pair probability payoff that matters, not just probability. Ive asked variants of the same question. The Gaussian distribution spends 68.2% of the time between 1 standard deviation. The real world has fat tails. In finance, how much time do stocks spend between 1 standard deviations? The answer has been invariably lower. Why? Because there are more deviations. Sorry, there are fewer deviations: stocks spend between 78% and 98% between 1 standard deviations (computed from past samples). Some simple derivations: Let x follow a Gaussian distribution (m , s). Assume m=0 for the exercise. What is the probability of exceeding one standard deviation? P>1 s = 1 1 2
erfcK-
1 2
O , where erfc is the complimentary error function, P>1 s = P< 1 s >15.86% and the probability of
staying within the stability tunnel between 1 s is > 68.2 %. Let us fatten the tail, using a standard method of linear combination of two Gaussians with two standard deviations separated by s 1 + a and
s 1 - a , where a is the "vvol" (which is variance preserving, a technically of no big effect here, as a standard deviation-preserving spreading gives the same qualitative result). Such a method leads to immediate raising of the Kurtosis by a factor of I1 + a2 M since
E Ix4 M E Ix2 M
2
P>1 s = P< 1 s = 1 -
So then, for different values of a as we can see, the probability of staying inside 1 sigma increases.
0.6
0.5
0.4
0.3
0.2
0.1
-4
-2
Fatter and fatter tails: different values of a. We notice that higher peak lower probability of nothing leaving the 1 s tunnel
Fatter tails increases time spent between deviations, giving the illusion of absence of volatility when in fact events are delayed and made worse (my critique of the Great Moderation). Stopping Time & Fattening of the tails of a Brownian Motion: Consider the distribution of the time it takes for a continuously monitored Brownian motion S to exit from a "tunnel" with a lower bound L and an upper bound H. Counterintuitively, fatter tails makes an exit (at some sigma) take longer. You are likely to spend more time inside the tunnel --since exits are far more dramatic. y is the distribution of exit time t, where t inf {t: S [L,H]} From Taleb (1997) we have the following approximation
yH t sL =
m
1 HlogHHL - logHLLL 1 H L
n 2
- 8 It s M p s2
n2 p2 t s2 2 HlogHHL-logHLLL2
n=1
H-1L
L sin
H sin
and the fatter-tailed distribution from mixing Brownians with s separared by a coefficient a:
(Anti)Fragility- N. N. Taleb.nb
17
yHt s, aL =
1 2
pH t s H1 - aLL +
1 2
pHt s H1 + aLL
This graph shows the lengthening of the stopping time between events coming from fatter tails.
Probability 0.5 0.4 0.3 0.2 0.1 2 4 6 8 Exit Time
{this is a note. a more advanced discussion explains why more uncertain mean (vanilla) might mean less uncertain probability (prediction), etc. Also see the Why We Dont Know What We Are Talking About When We Talk About Probability.