Textbook Statistics Taleb

Probability, Risk and(Anti)fragility
(Anti)Fragility- N. N. Taleb.nb
1
Risk is Not in The Past (the Turkey Problem)
Introduction: Fragility, not Statistics
Fragility (Chapter 2) can be defined as an accelerating sensitivity to a harmful stressor: this response plots as a concave curve and mathematically culminates in more harm than benefit from the disorder cluster [(i) uncertainty, (ii) variability, (iii) imperfect, incomplete knowledge, (iv) chance, (v) chaos, (vi) volatility, (vii) disorder, (viii) entropy, (ix) time, (x) the unknown, (xi) randomness, (xii) turmoil, (xiii) stressor, (xiv) error, (xv) dispersion of outcomes, (xvi) unknowledge. Antifragility is the opposite, producing a convex response that leads to more benefit than harm. We do not need to know the history and statistics of an item to measure its fragility or antifragility, or to be able to predict rare and random ('black swan') events. All we need is to be able to assess whether the item is accelerating towards harm or benefit. The relation of fragility, convexity and sensitivity to disorder is thus mathematical and not derived from empirical data.
The risk of breaking of the coffee cup is not necessarily in the past time series of the variable; if fact surviving objects have to have had a rosy past.
The problem with risk management is that past time series can be (and actually are) unreliable. Some finance journalist (Bloomberg) was commenting on my statement in Antifragile about our chronic inability to get the risk of a variable from the past with economic time series. Where is he going to get the risk from since we cannot get it from the past? from the future?, he wrote. Not really, you finance-imbecile: from the present, the present state of the system. This explains in a way why the detection of fragility is vastly more potent than that of risk -and much easier. But this is not just a problem with journalism. Naive inference from time series is incompatible with rigorous statistical inference; yet workers with time series believe that it is statistical inference.
Turkey Problems
N Definition: Take, as of time T, a standard sequence X= 8Xt0 +i Dt <i=0 as the discretely monitored history of a function of the process Xt X HA, f L is defined as over the interval (t0 , TD , T=t0 +N Dt, the estimator MT
X MT HA, f L
N i=0 1A f HXt0 +i Dt L N i=0 1A
where 1 A : X {0,1} is an indicator function taking values 1 if X A and 0 otherwise, and f is a function of X. f(X)=1, f(X)=X, and f(X) = X N correspond to the probability, the first moment, and N th moment, respectively.
X a) Standard Estimator. Mt (A,f) where f(x) =x and A is defined on the domain of the process X, standard measures from x, such as moments of order z, etc., "as of period" T, The measure might be useful for the knowledge of a process, but remain insufficient for decision making as the decision-maker may be concerned for risk management purposes with the left tail (for distributions that are not entirely skewed, such as purely loss functions such as damage from earthquakes, terrorism, etc. X b) Standard Risk Estimator. The shortfall S= E[M|X<K] estimated by MT HA, f L
N i=0 1A Xt0 +i Dt N i=0 1A
, A = (-, K], f(x) =x
Criterion: The measures M or S are considered to be an estimator over interval (t- N Dt, t] if and only if holds in expectation over the period Xt+i Dt , across counterfactuals of the process, with a threshold x, so |E[MX Dt (A,f)]- MX (A, f)]|< x . In other words, it should have some t+i t stability for the estimator not to be considered random: it estimates the true value of the variable. This is standard sampling theory. Actually, it is at the core of statistics. Let us rephrase: Standard statistical theory doesnt allow claims on estimators made in a given set unless these can generalize, that is, reproduce out of sample, into the part of the series that has not taken place (or not seen), i.e., for time series, for t>t.
An Application: Test of Turkey Problem on Macroeconomic data
Performance of Standard Parametric Risk Estimators, f(x)= xn (Norm !2 )

With economic variables one single observation in 10,000, that is, one single day in 40 years, can explain the bulk of the "kurtosis", a measure of "fat tails", that is, both a measure how much the distribution under consideration departs from the standard Gaussian, or the role of remote events in determining the total properties. For the U.S. stock market, a single day, the crash of 1987, determined 80% of the kurtosis. The same problem is found with interest and exchange rates, commodities, and other variables. The problem is not just that the data had "fat tails", something people knew but sort of wanted to forget; it was that we would never be able to determine "how fat" the tails were. Never. The implication is that those tools used in economics that are based on squaring variables (more technically, the Euclidian, or L-2 norm), such as standard deviation, variance, correlation, regression, or value-at-risk, the kind of stuff you find in textbooks, are not validscientifically(except in some rare cases where the variable is bounded). The so-called "p values" you find in studies have no meaning with economic and financial variables. Even the more sophisticated techniques of stochastic calculus used in mathematical finance do not work in economics except in selected pockets. The results of most papers in economics based on these standard statistical methodsthe kind of stuff people learn in statistics classare thus not expected to replicate, and they effectively don't. Further, these tools invite foolish risk taking. Neither do alternative techniques yield reliable measures of rare events, except that we can tell if a remote event is underpriced, without assigning an exact value. From Taleb (2009), using Log returns, Xt log P HtL P Ht - i DtL
Take the measure MtX HH-, L, X 4 L of the fourth noncentral moment MtX HH-, L, X 4 L
N i=0 HXt-i Dt L4
N N
and the N-sample maximum quartic observation Max{Xt-i Dt 4 < . Q(N) is the contribution of the maximum quartic variations
i=0
Max 8Xt-i Dt 4 < QHNL

N i=0 HXt-i Dt L4
i=0
VARIABLE Silver SP500 CrudeOil Short Sterling Heating Oil Nikkei FTSE JGB Eurodollar Depo 1M Sugar 11 Yen Bovespa Eurodollar Depo 3M CT DAX
Q HMax Quartic Contr.L 0.94 0.79 0.79 0.75 0.74 0.72 0.54 0.48 0.31 0.3 0.27 0.27 0.25 0.25 0.2
N HyearsL 46. 56. 26. 17. 31. 23. 25. 24. 19. 48. 38. 16. 28. 48. 18.
Description of dataset:
The entire dataset
Naively, the fourth moment expresses the stability of the second moment. The higher the variations the higher... For a Gaussian (i.e., the distribution of the square of a Chi-square distributed variable) show the maximum contribution should be around .008 .0028
Performance of Standard NonParametric Risk Estimators, f(x)= x or |x| (Norm !1)

Does the past resemble the future?
M@t+1D 0.004
0.003
0.002
0.001
M@t+1D 0.004
0.003
0.002
0.001
0.001
0.002
0.003
0.004
M@tD
Fig 1 Comparing M[t-1, t] and M[t,t+1], where t= 1year, 252 days, for macroeconomic data using extreme deviations, A= (- ,-2 standard deviations (equivalent)], f(x) = x (replication of data from The Fourth Quadrant, Taleb, 2009)
M@t+1D Concentration of tail events without predecessors
0.0004
0.0003 Concentration of tail events without successors
0.0002
0.0001
0.0001
0.0002
0.0003
0.0004
0.0005
M@tD
Fig 2 This are a lot worse for large deviations A= (- ,-4 standard deviations (equivalent)], f(x) = x M@t+1D 0.030
0.025
0.020
0.015
0.010
0.005
M@tD 0.005 0.010 0.015 0.020 0.025 0.030 Fig 3 The regular is predictive of the regular, that is mean deviation. Comparing M[t] and M[t+1 year] for macroeconomic data using regular deviations, A= (- ,) , f(x)= |x|
Typical Manifestations of The Turkey Surprise
10
200 -10 -20 -30 -40 -50
400
600
800
1000
Fig x The Turkey Problem (The Black Swan, 2007/2010)
When the generating process is powerlaw with low exponent, plenty of confusion can take place. For instance, take Pinker(2011) claiming that the generating process has a tail exponent ~1.15 and drawing quantitative conclusions from it. The next two figures show the realizations of two subsamples, one before, and the other after the turkey problem, illustrating the inability of a set to deliver true probabilities.
Fig x First 100 years (Sample Path): A Monte Carlo generated realization of a process of the "80/20 or 80/02 style", that is tail exponent a= 1.1
Fig x: The Turkey Surprise: Now 200 years, the second 100 years dwarf the first; these are realizations of the exact same process, seen with a longer widow and at a different scale.
Summary and Conclusion

The Problem With The Use of Statistics in Social Science
Many social scientists do not have a clear idea of the difference between science and journalism, or the one between rigorous empiricism and anecdotal statements. Science is not about making claims about a sample, but using a sample to make general claims and discuss properties that apply outside the sample. Take M* the estimator we saw above from the realizations (a sample path) for some process, and M the "true" mean that would emanate from knowledge of the generating process for such variable. When someone says: "Crime rate in NYC dropped between 2000 and 2010", the claim is about M* the observed mean, not M the true mean, hence the claim can be deemed merely journalistic, not scientific, and journalists are there to report "facts" not theories. No scientific and causal statement should be made from M* on "why violence has dropped" unless one establishes a link to M the true mean. M* cannot be deemed "evidence" by itself. Working with M* cannot be called "empiricism". What I just wrote is at the foundation of statistics (and, it looks like, science). Bayesians disagree on how M* converges to M, etc., never on this point. From his statements, Pinker seems to be aware that M* may have dropped (which is a straight equality) and sort of perhaps we might not be able to make claims on M which might not have really been dropping. Now Pinker is excusable. The practice is widespread in social science where academics use mechanistic techniques of statistics without understanding the properties of the statistical claims. And in some areas not involving time series, the differnce between M* and M is negligible. So I rapidly jot down a few rules before showing proofs and derivations (limiting M to the arithmetic mean). Where E is the expectation operator under "real-world" probability measure P: Tails Sampling Property: E[|M*-M|] increases in with fat-tailedness (the mean deviation of M* seen from the realizations in different samples of the same process). In other words, fat tails tend to mask the distributional properties. Counterfactual Property: Another way to view the previous point, m[M*], The distance between different values of M* one gets from repeated sampling of the process (say counterfactual history) increases with fat tails. Survivorship Bias Property: E[M*-M ] increases under the presence of an absorbing barrier for the process. (Casanova effect) Left Tail Sample Insufficiency: E[M*-M] increases with negative skeweness of the true underying variable. Asymmetry in Inference: Under both negative skewness and fat tails, negative deviations from the mean are more informational than positive deviations. Power of Extreme Deviations (N=1 is OK): Under fat tails, large deviations from the mean are vastly more informational than small ones. They are not "anecdotal". (The last two properties corresponds to the black swan problem).
The Problem of Past Time Series

The four aspects of what we will call the nonreplicability issue, particularly for mesures that are in the tails: a- Statistical rigor (or Pinker Problem). The idea that an estimator is not about fitness to past data, but related to how it can capture future realizations of a process seems absent from the discourse. Much of econometrics/risk management methods do not meet this simple point and the rigor required by orthodox, basic statistical theory. b- Statistical argument on the limit of knowledge of tail events. Problems of replicability are acute for tail events. Tail events are impossible to price owing to the limitations from the size of the sample. Naively rare events have little data hence what estimator we may have is noisier. c- Mathematical argument about statistical decidability. No probability without metaprobability. Metadistributions matter more with tail events, and with fat-tailed distributions. The soft problem: we accept the probability distribution, but the imprecision in the calibration (or parameter errors) percolates in the tails. The hard problem (Taleb and Pilpel, 2001, Taleb and Douady, 2009): We need to specify an a priori probability distribution from which we depend, or alternatively, propose a metadistribution with compact support. Both problems are bridged in that a nested stochastization of standard deviation (or the scale of the parameters) for a Gaussian turn a thin-tailed distribution into a powerlaw (and stochastization that includes the mean turns it into a jump-diffusion or mixed-Poisson). d- Economic arguments: The Friedman-Phelps and Lucas critiques, Goodharts law. Acting on statistical information (a metric, a response) changes the statistical properties of some processes.
2
Preasymptotics and The Central Limit in the Real World
An Erroneous Notion of Limit:

Take the conventional formulation of the Central Limit Theorem (Grimmet & Stirzaker, 1982; Feller 1971, Vol. II): Let X1 ,X2 ,... be a sequence of independent identically distributed random variables with mean m & variance s 2 satisfying m< and 0 <s 2 <, then
N Xi - N m i=1 s
D
N H0, 1L as n
Where is converges "in distribution". Granted convergence "in distribution" is about the weakest form of convergence. Effectively we are dealing with a double problem. The first, as uncovered by Jaynes, corresponds to the abuses of measure theory: Some properties that hold at infinity might not hold in all limiting processes --a manifestation of the classical problem of uniform and pointwise convergence. Jaynes 2003 (p.44):"The danger is that the present measure theory notation presupposes the infinite limit already accomplished, but contains no symbol indicating which limiting process was used (...) Any attempt to go directly to the limit can result in nonsense". Granted Jaynes is still too Platonic (he falls headlong for the Gaussian by mixing thermodynamics and information). But we accord with him on this point --along with the definition of probability as information incompleteness, about which later. The second problem is that we do not have a "clean" limiting process --the process is itself idealized. Now how should we look at the Central Limit Theorem? Let us see how we arrive to it assuming "independence".
The Problem of Convergence

The CLT works does not fill-in uniformily, but in a Gaussian way --indeed, disturbingly so. Simply, whatever your distribution (assuming one mode), your sample is going to be skewed to deliver more central observations, and fewer tail events. The consequence is that, under aggregation, the sum of these variables will converge "much" faster in the body of the distribution than in the tails. As N, the number of observations increases, the Gaussian zone should cover more grounds... but not in the "tails". This quick note shows the intuition of the convergence and presents the difference between distributions. Take the sum of of random independent variables Xi with finite variance under distribution j(X). Assume 0 mean for simplicity (and symmetry, absence of skewness to simplify). A more useful formulation of the Central Limit Theorem (Kolmogorov et al,x)
PB- u Z =
n Xi i=0 n s
uF =
1 2p
e 2 Z -u
Z2
So the distribution is going to be:
1 - e- 2 Z for - u z u
-u
Z2
inside the "tunnel" [-u,u] --the odds of falling inside the tunnel itself and
u - u
j '@nD HZL z + j '@nD HZL z

outside the tunnel [-u,u] Where j'[n] is the n-summed distribution of j. How j'[n] behaves is a bit interesting here --it is distribution dependent.
Width of the Tunnel [-u,u]

Clearly we do not have a "tunnel", but rather a statistical area for crossover points. I break it into two situations: 1) Case 1: The distribution j(x) is not scale free, i.e., for x large, tail e
-k x j Hn xL j H2 n xL > ; j HxL j Hn xL
in other words the distribution has an exponential
. In other words:it has all the moments.

j Hn xL depends j HxL
2) Case 2: The distribution j(x) is scale free, i.e., for x large,
on n not x.
Dealing With the Distribution of the Summed distribution j

Assume the very simple case of a mixed distribution, where X follows a Gaussian (m1 , s1 ) with probability p and with probability (1-p) follows another Gaussian (m2 , s2 ). Where (1-p) is very small, large, m2 is very large and s2 small we can be dealing with a jump (at the limit it becomes a Poisson). Alternatively, a route I am taking here for simplification of the calculations, I can take means of 0, and variance in the small probability case to be very large, leading to a huge, but unlikely jump. Take x(t) the characteristic function, x n the one under n-convolutions.
3) Using Log Cumulants & Observing Convergence to the Gaussian

The normalized cumulant of order n, C(n) is the derivative of the log of the characteristic function f which we convolute N times divided by the second cumulant (i,e., second moment). CHn, NL = H-Ln !n logHfN L H-!2 logHfLN Ln-1 .z 0
Since C(N+M)=C(N)+C(M), the additivity of the Log Characteristic function under convolution makes it easy to see the speed of the convergence to the Gaussian. Fat tails implies that higher moments implode --not just the 4th . Table of Normalized Cumulants -Speed of Convergence (Dividing by sn where n is the order of the cumulant). Distribution PDF N - convoluted Log Characteristic 2 nd Cum 3 rd 4 th Normal@m, sD
Hx-mL2 2 s2
PoissonHlL
-l lx x!
ExponentialHlL -x l l
z
GHa, bL
b-a
x b
xa-1
2p s
GHaL l l- z
N log 1 0 0
z m-
z2 s 2 2
N logII-1+ 1
1 Nl 1 N 2 l2
Ml
N logJ 1
2l N 3! l2 N2
N logHH1 - b zL-a L 1
2 abN 3! a2 b2 N 2
10
5 th 6 th 7 th 8 th 9 th 10 th Distribution
0 0 0 0 0 0 Mixed Gaussians HStoch VolL

x2 2 s1 2
1 N 3 l3 1 N 4 l4 1 N 5 l5 1 N 6 l6 1 N 7 l7 1 N 8 l8
4! l3 N3 5! l4 N4 6! l5 N5 7! l6 N6 8! l7 N7 9! l8 N8
4! a3 b3 N 3 5! a4 b4 N 4 6! a5 b5 N 5 7! a6 b6 N 6 8! a7 b7 N 7 9! a8 b8 N 8
StudentTH3L
x2 2 s2 2
StudentTH4L
PDF N - convoluted Log Characteristic 2 nd Cum 3 rd 4 th 5 th 6 th
2 p s1
+ H1 - pL
z2 s1 2 2
6
z2 s 2 2 2
3
2
2 p s2 -
p Ix2 +3M
12 I 3 zN
1 x2 +4
52
N log p
+ H1 - pL
N JlogJ 3 z + 1N -
N logH2 z2 K2 H2 zLL
1 0 -I3 H-1 + pL p Hs2 - s2 L2 M 1 2 IN 2 Hp s2 - H-1 + pL s2 L3 M 1 2 0 I15 H-1 + pL p H-1 + 2 pL Hs2 1 IN 4 Hp s2 - H-1 + pL s2 L5 M 1 2 s2 L3 M 2
1 Ind Ind Ind Ind
1 Ind Ind Ind
Note: On "Infinite Kurtosis"- Discussion

Note on Chebyshev's Inequality and upper bound on deviations under finite variance Even then finite variance not considering that it still does not mean much. Consider Chebyshev's inequality:
P@X > aD
s2 a2 1 n2
P@X > n s D
Which effectively accommodate power laws but puts a bound on the probability distribution of large deviations --but still significant.
The Effect of Finiteness of Variance

This table shows the probability of exceeding a certain s for the Gaussian and the lower on probability limit for any distribution with finite variance. Deviation 3 4 5 6 7 8 9 Gaussian 7. 102 3. 104 3. 106 1. 10
9
Chebyshev Upper Bound 9 16 25 36 49 64 81
8. 1011 2. 1015 9. 10
18
11
10
1. 1023
100
Extreme Value Theory: Fuhgetaboudit

Extreme Value Theory has been considered a panacea for dealing with extreme events by some risk modelers . On paper it looks great. But only on paper. The problem is the calibration and parameter uncertainty --in the real world we dont know the parameters. The ranges in the probabilities generated we get are monstrous. This is a short presentation of the idea, followed by an exposition of the difficulty.
What is Extreme Value Theory? A Simplified Exposition

Case 1, Thin Tailed Distribution
Let us proceed with a simple example. The Extremum of a Gaussian variable: Say we generate N Gaussian variables 8Zi <N with mean 0 and unitary standard deviation, and take i=1 the highest value we find. We take the upper bound Ej for the N-size sample run j E j = Max 9Zi, j =i=1 Assume we do so M times, to get M samples of maxima for the set E E = 9Max 9Zi, j =i=1 = j=1 The next figure will plot a histogram of the result.
N M N
Figure 1: Taking M samples of Gaussian maxima; here N= 30,000, M=10,000. We get the Mean of the maxima = 4.11159 Standard Deviation= 0.286938; Median = 4.07344
-x+a
Let us fit to the sample an Extreme Value Distribution (Gumbel) with location and scale parameters a and b, respectively: f(x;a,b) =
- b +
-x+a b
12
Figure 2: Fitting an extreme value distribution (Gumbel) a= 3.97904, b= 0.235239
So far, beautiful. Let us next move to fat(ter) tails.

Case 2, Fat-Tailed Distribution
Now let us generate, exactly as before, but change the distribution, with N random powerlaw distributed variables Zi , with tail exponent m=3, generated from a Student T Distribution with 3 degrees of freedom. Again, we take the upper bound. This time it is not the Gumbel, but the Frchet distribution that would fit the result, Frchet f(x; a,b)=
x -a b
aJ N
b
-1-a
, for x>0
Figure 3: Fitting a Frchet distribution tothe Student T generated with m=3 degrees of freedom. The Frechet distribution a=3, b=32 fits up to higher values of E. But next two graphs shows the fit more closely.
13
Figure 5: Q-Q plot. Fits up to extremely high values of E, the rest of course owing to sample insuficiency for extremely large values, a bias that typically causes the underestimation of tails, as the points tend to fall to the right.
Figure 4: Seen more closely
How Extreme Value Has a Severe Inverse Problem In the Real World
In the previous case we start with the distribution, with the assumed parameters, then get the corresponding values, as these risk modelers do. In the real world, we dont quite know the calibration, the a of the distribution, assuming (generously) that we know the distribution. So here we go with the inverse problem. The next table illustrates the different calibrations of PK the probabilities that the maximum exceeds a certain value K (as a multiple of b under different values of K and a.
14
a 1. 1.25 1.5 1.75 2. 2.25 2.5 2.75 3. 3.25 3.5 3.75 4. 4.25 4.5 4.75 5.
1 P>3 b
1 P>10 b
1 P>20 b
3.52773 4.46931 5.71218 7.3507 9.50926 12.3517 16.0938 21.0196 27.5031 36.0363 47.2672 62.048 81.501 107.103 140.797 185.141 243.5
10.5083 18.2875 32.1254 56.7356 100.501 178.328 316.728 562.841 1000.5 1778.78 3162.78 5623.91 10 000.5 17 783.3 31 623.3 56 234.6 100 001.
20.5042 42.7968 89.9437 189.649 400.5 846.397 1789.35 3783.47 8000.5 16 918.4 35 777.6 75 659.8 160 000. 338 359. 715 542. 1.51319 106 3.2 106
Consider that the error in estimating the a of a distribution is quite large, often > 1/2, and typically overstimated. So we can see that we get the probabilities mixed up > an order of magnitude. In other words the imprecision in the computation of the a compounds in the evaluation of the probabilities of extreme values.
15
3
On the Difference Between Binaries and Vanillas
This explains how and where prediction markets (or, more general discussions of betting matters) do not correspond to reality and have little to do with exposures to fat tails and Black Swan effects. Elementary facts, but with implications. This show show, for instance, the long shot bias is misapplied in real life variables, why political predictions are more robut than economic ones. This discussion is based on Taleb (1997) showing the difference between a binary and a vanilla option.
Definitions
A binary bet (or just a binary or a digital): a outcome with payoff 0 or 1 (or, yes/no, -1,1, etc.) Example: prediction market, election, most games and lottery tickets. Also called digital. Any statistic based on YES/NO switch. Binaries are effectively bets on probability. They are rarely ecological, except for political predictions. (More technically, they are mapped by the Heaviside function.) A exposure or vanilla: an outcome with no open limit: say revenues, market crash, casualties from war, success, growth, inflation, epidemics... in other words, about everything. Exposures are generally expectations, or the arithmetic mean, never bets on probability, rather the pair probability payoff A bounded exposure: an exposure(vanilla) with an upper and lower bound: say an insurance policy with a cap, or a lottery ticket. When the boundary is close, it approaches a binary bet in properties. When the boundary is remote (and unknown), it can be treated like a pure exposure. The idea of clipping tails of exposures transforms them into such a category.
The Problem
The properties of binaries diverge from those of vanilla exposures. This note is to show how conflation of the two takes place: prediction markets, ludic fallacy (using the world of games to apply to real life), 1. They have diametrically opposite responses to skewness. 2. They repond differently to fat-tailedness (sometimes in opposite directions). Fat tails makes binaries more tractable. 3. Rise in complexity lowers the value of the binary and increases that of the exposure.
Some direct applications: 1- Studies of long shot biases that typically apply to binaries should not port to vanillas. 2- Many are surprised that I find many econometricians total charlatans, while Nate Silver to be immune to my problem. This explains why. 3- Why prediction markets provide very limited information outside specific domains. 4- Etc.
The Elementary Betting Mistake
One can hold beliefs that a variable can go lower yet bet that it is going higher. Simply, the digital and the vanilla diverge. PHX > X0 L> , but EHXL < EHX0 L. This is normal in the presence of skewness and extremely common with economic variables. Philosophers have a related problem called the lottery paradox which in statistical terms is not a paradox.
The Elementary Fat Tails Mistake
1 2
A slightly more difficult problem. When I ask economists or social scientists, what happens to the probability of a deviation >1s when you fatten the tail (while preserving other properties)?, almost all answer: it increases (so far all have made the mistake). Wrong. They miss the idea that fat tails is the contribution of the extreme events to the total properties, and that it is a pair probability payoff that matters, not just probability.
16
A slightly more difficult problem. When I ask economists or social scientists, what happens to the probability of a deviation >1s when you fatten the tail (while preserving other properties)?, almost all answer: it increases (so far all have made the mistake). Wrong. They miss the idea that fat tails is the contribution of the extreme events to the total properties, and that it is a pair probability payoff that matters, not just probability. Ive asked variants of the same question. The Gaussian distribution spends 68.2% of the time between 1 standard deviation. The real world has fat tails. In finance, how much time do stocks spend between 1 standard deviations? The answer has been invariably lower. Why? Because there are more deviations. Sorry, there are fewer deviations: stocks spend between 78% and 98% between 1 standard deviations (computed from past samples). Some simple derivations: Let x follow a Gaussian distribution (m , s). Assume m=0 for the exercise. What is the probability of exceeding one standard deviation? P>1 s = 1 1 2
erfcK-
1 2
O , where erfc is the complimentary error function, P>1 s = P< 1 s >15.86% and the probability of
staying within the stability tunnel between 1 s is > 68.2 %. Let us fatten the tail, using a standard method of linear combination of two Gaussians with two standard deviations separated by s 1 + a and
s 1 - a , where a is the "vvol" (which is variance preserving, a technically of no big effect here, as a standard deviation-preserving spreading gives the same qualitative result). Such a method leads to immediate raising of the Kurtosis by a factor of I1 + a2 M since
E Ix4 M E Ix2 M
2
= 3 Ha2 + 1L 1 2 erfc 2 1 1-a 1 2 erfc 2 1 a+1
P>1 s = P< 1 s = 1 -
So then, for different values of a as we can see, the probability of staying inside 1 sigma increases.
0.6
0.5
0.4
0.3
0.2
0.1
-4
-2
Fatter and fatter tails: different values of a. We notice that higher peak lower probability of nothing leaving the 1 s tunnel
The Event Timing Mistake
Fatter tails increases time spent between deviations, giving the illusion of absence of volatility when in fact events are delayed and made worse (my critique of the Great Moderation). Stopping Time & Fattening of the tails of a Brownian Motion: Consider the distribution of the time it takes for a continuously monitored Brownian motion S to exit from a "tunnel" with a lower bound L and an upper bound H. Counterintuitively, fatter tails makes an exit (at some sigma) take longer. You are likely to spend more time inside the tunnel --since exits are far more dramatic. y is the distribution of exit time t, where t inf {t: S [L,H]} From Taleb (1997) we have the following approximation
yH t sL =
m
1 HlogHHL - logHLLL 1 H L
n 2
- 8 It s M p s2
n2 p2 t s2 2 HlogHHL-logHLLL2
n=1
H-1L
L sin
n p HlogHLL - logHSLL logHHL - logHLL
H sin
n p HlogHHL - logHSLL logHHL - logHLL
and the fatter-tailed distribution from mixing Brownians with s separared by a coefficient a:
17
yHt s, aL =
1 2
pH t s H1 - aLL +
1 2
pHt s H1 + aLL
This graph shows the lengthening of the stopping time between events coming from fatter tails.
Probability 0.5 0.4 0.3 0.2 0.1 2 4 6 8 Exit Time
Expected t 8 7 6 5 4 3 0.1 0.2 0.3 0.4 0.5 0.6 0.7 v
More Complicated : MetaProbabilities (TK)
{this is a note. a more advanced discussion explains why more uncertain mean (vanilla) might mean less uncertain probability (prediction), etc. Also see the Why We Dont Know What We Are Talking About When We Talk About Probability.

Textbook Statistics Taleb

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Textbook Statistics Taleb

Uploaded by

Copyright:

Available Formats

Probability, Risk and(Anti)fragility

N i=0 1A f HXt0 +i Dt L N i=0 1A

, A = (-, K], f(x) =x

An Application: Test of Turkey Problem on Macroeconomic data

Performance of Standard Parametric Risk Estimators, f(x)= xn (Norm !2 )

Max 8Xt-i Dt 4 < QHNL

The entire dataset

Performance of Standard NonParametric Risk Estimators, f(x)= x or |x| (Norm !1)

M@t+1D Concentration of tail events without predecessors

0.0003 Concentration of tail events without successors

Typical Manifestations of The Turkey Surprise

200 -10 -20 -30 -40 -50

Fig x The Turkey Problem (The Black Swan, 2007/2010)

Summary and Conclusion

The Problem of Past Time Series

An Erroneous Notion of Limit:

The Problem of Convergence

So the distribution is going to be:

j '@nD HZL z + j '@nD HZL z

Width of the Tunnel [-u,u]

in other words the distribution has an exponential

. In other words:it has all the moments.

2) Case 2: The distribution j(x) is scale free, i.e., for x large,

Dealing With the Distribution of the Summed distribution j

3) Using Log Cumulants & Observing Convergence to the Gaussian

0 0 0 0 0 0 Mixed Gaussians HStoch VolL

PDF N - convoluted Log Characteristic 2 nd Cum 3 rd 4 th 5 th 6 th

1 0 -I3 H-1 + pL p Hs2 - s2 L2 M 1 2 IN 2 Hp s2 - H-1 + pL s2 L3 M 1 2 0 I15 H-1 + pL p H-1 + 2 pL Hs2 1 IN 4 Hp s2 - H-1 + pL s2 L5 M 1 2 s2 L3 M 2

1 Ind Ind Ind Ind

1 Ind Ind Ind

Note: On "Infinite Kurtosis"- Discussion

The Effect of Finiteness of Variance

Chebyshev Upper Bound 9 16 25 36 49 64 81

Extreme Value Theory: Fuhgetaboudit

What is Extreme Value Theory? A Simplified Exposition

Figure 2: Fitting an extreme value distribution (Gumbel) a= 3.97904, b= 0.235239

So far, beautiful. Let us next move to fat(ter) tails.

Figure 4: Seen more closely

The Elementary Betting Mistake

= 3 Ha2 + 1L 1 2 erfc 2 1 1-a 1 2 erfc 2 1 a+1

The Event Timing Mistake

n p HlogHLL - logHSLL logHHL - logHLL

n p HlogHHL - logHSLL logHHL - logHLL

Expected t 8 7 6 5 4 3 0.1 0.2 0.3 0.4 0.5 0.6 0.7 v

More Complicated : MetaProbabilities (TK)

You might also like