You are on page 1of 14

This article was downloaded by: On: 25 October 2010 Access details: Access Details: Free Access Publisher

Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 3741 Mortimer Street, London W1T 3JH, UK

Cryptologia

Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t725304178

The Voynich Manuscript: Evidence of the Hoax Hypothesis


Andreas Schinner

To cite this Article Schinner, Andreas(2007) 'The Voynich Manuscript: Evidence of the Hoax Hypothesis', Cryptologia, 31:

2, 95 107

To link to this Article: DOI: 10.1080/01611190601133539 URL: http://dx.doi.org/10.1080/01611190601133539

PLEASE SCROLL DOWN FOR ARTICLE


Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Cryptologia, 31:95107, 2007 Copyright  Taylor & Francis Group, LLC ISSN: 0161-1194 print DOI: 10.1080/01611190601133539

The Voynich Manuscript: Evidence of the Hoax Hypothesis


ANDREAS SCHINNER
Abstract In this article, I analyze the Voynich manuscript, using random walk mapping and token=syllable repetition statistics. The results significantly tighten the boundaries for possible interpretations; they suggest that the text has been generated by a stochastic process rather than by encoding or encryption of language. In particular, the so-called Chinese theory now appears less convincing. Keywords hoax hypothesis, statistical analysis, stochastic process, Voynich manuscript

Downloaded At: 14:22 25 October 2010

Introduction
The Voynich manuscript (the VMS) is a handwritten codex of about 250 pages, ink on vellum, appearing on stylistic grounds to date from around 1500. It contains illustrations of mostly unidentifiable plants, astronomical or astrological diagrams, and naked nymphs, bathing in strange arrangements of pools or tubs connected by complex systems of pipes. The most striking feature, however, is the text, written in an elegant unique script that has defied commonly accepted translation so far. Information about the VMS, its possible history, as well as attempts of explanation can be found in various places [4, 7, 15, 14]. Only a brief summary will be given here. Interpretations of the VMS can roughly be divided into three classes: . Cipher text hypothesis. The VMS contains natural language text (from the origin of the manuscript this should most probably be Latin or German) that has been encrypted. . Plain text hypothesis. The VMS text is plain text in natural, not yet identified language that either did not possess an original alphabet in the beginning 16th century or the system of writing appeared too complex to a medieval scholar. The word length statistics makes East Asian languages, in particular Chinese, the most promising candidate for this (Chinese theory). Alternatively, the script could also have been invented together with an artificial language. . Hoax hypothesis. The VMS contains no meaningful text at all. In this context, the word hoax should be associated with a broad spectrum of possibilities, ranging from intentional forgery for monetary gain to the work by an idiot savant, interpreted by medieval scholars as revelation of arcane lore. These three classes are not completely distinct. For example, the VMS could contain a message hidden steganographically in a set of otherwise meaningless
Address correspondence to Dr. Andreas Schinner, Institut fur Experimentalphysik, Abteilung fur Atom- and Oberflachenphysik, Johannes Kepler Universitat, Altenberger Strae 69, 4040 Linz, Austria. E-mail: andreas.schinner@jku.at

95

96

A. Schinner

strings. This theory is especially difficult to prove or disprove; the best argument against it known so far is a psychological one: the basic principle of steganography is to hide the mere existence of a messageand the worst place to hide a genuine secret is an apparently mysterious book. It is one of the most striking features of the VMS that even modern computer aided analysis so far could not rule out a single one of these interpretations definitely. Instead, arguments pro and contra all three viewpoints can be given: since statistical properties characteristic for natural languages are also present in the VMS text, the encryption method usedif anyshould not be too complex; additionally, around 1500, cryptology was still in its early beginnings. Despite these facts, all attempts of decipherment by modern cryptanalysts have failed. On the other hand, the text shows several exotic linguistic features like the frequent word repetitions, or the preferred positions for certain letters within a line; this appears to be incompatible with the plain text hypothesis, even in the artificial language version. Consequently, there are attractions in the hoax hypothesis. However, the VMS text obviously is not composed of simple random strings, and it shows rich linguistic-like structure. It seemed unlikely that a medieval hoaxer (or even an early 20th century forger) could create such a convincing facsimile language within reasonable time. The work by Gordon Rugg [11] has proven that this need not necessarily be true: an algorithm feasible even with medieval technology (the table-and-grille method) makes it possible for a single person to generate a text as long and complex as the VMS within approximately three months. This, however, is just a possibility and far from a proof of the hoax hypothesis. Furthermore, the table-and-grille method as investigated so far does not explain all of the statistical text properties of the VMS. The three concurrent explanation classes are thus still of roughly equal relevance. In this article, statistical investigations of the VMS are presented that provide additional restrictions to possible solutions. Mapping the text to a random walk uncovers characteristic long-range correlations not present in normal human writings; they better fit to a stochastic process with memory effects than a sequence of tokens chosen according to linguistic rules. Furthermore, the distribution of gaps between two similar or selected tokens, respectively, also differs qualitatively from normal texts; its mathematical properties indicate the presence of very unusual random effects. Possible implications of these results for the interpretation of the VMS are discussed in the conclusions section. Throughout this article, the following usual conventions are used: the term token denotes any string of characters separated by spaces or line start or end; a word is a type of token regardless of its frequency in the text. For characters or tokens from the VMS script the European Voynich Alphabet (EVA) is used [15]; the letters (or sequences of letters) are written italic and are put in angle brackets: (for example, the notorious most frequent VMS token will be transcribed as hdaiini). Finally, the analysis presented in this article is based on the various text samples listed in Table 1.

Downloaded At: 14:22 25 October 2010

Random Walk Model


Following Kokol, Podgorelec, Zorman, Kokol, and Njivar [8], long-range power law correlations are present in a wide variety of information encoding systems, ranging from human writings (natural languages and computer programs) to DNA

The Voynich Manuscript: Evidence of the Hoax Hypothesis Table 1. Text sources used in this article Text Voynich manuscript1 Vulgate Bible Luther Bible Alice in Wonderland Chinese Bible
1

97

Text part Language Number of tokens Number of words All 2 Unknown 5% 3 Latin 5% 3 German All English Genesis Mandarin4 36,000 25,000 35,000 26,000 34,000 7000 6000 4000 3000 2000

majority vote version of interlinear EVA transcription 1.6e6 [15]; 2or particular sections of it; see Table 2; 3percentages are counted from top of document; and, 4in pin-yin romanization with all tones removed.

sequences. To some extent, they characterize the information content and complexity of communication. A useful method to study correlations in character strings is based on mapping the symbol sequence to a stochastic process that, especially in linguistic literature, frequently is called Brownian walk. This terminology is somehow misleading, since Brownian motion can be described as scaling limit of a so-called random walk: in the theory of stochastic processes [2] it is characterized by independent steps that all have the same probability distribution, i.e., are uncorrelated. On the other hand, in statistical physics, for example, the expression random walk with memory is sometimes used to describe a situation when the stochastic process generating the steps is of Markovian or even non-Markovian type. In the following, random walk should be understood with respect to this generalized meaning. As a first step it is necessary to encode the characters of the texts under investigation to bit sequences. It has been shown that the actual definition of this code table has negligible influence on the interesting quantities, as long as all (or at least almost all) possible bit patterns are used [12]. Since the VMS contains no punctuation signs they are removed from the other texts too; upper case characters are converted to lower case. Thus the remaining character set consists of the letters az, the German umlauts a, o u, and the German sz ligature ; empty spaces are , ignored. These 30 characters can be represented by a 5-bit code. The bits of the resulting binary string then define the steps 1 of a random walk. Let Dyl; l0 yl l0 yl0 be the walk displacement between step numbers l0 and l l0. Then
F l2 Dy2 hDyi2 1

Downloaded At: 14:22 25 October 2010

describes the variance of the mean displacement. The angle brackets denote averaging over all l0. For pure (uncorrelated) random walks of infinite length, where the steps are Bernoulli trials with probability p, one easily obtains: F l2 4p1 pl 3

In general, F(l ) will behave asymptotically as F l / l a , where an exponent a 6 0.5 indicates the presence of long-range correlations.

98

A. Schinner

Downloaded At: 14:22 25 October 2010

Figure 1. Root mean square fluctuation of the random walk displacement for the VMS and normal language texts. Inset: VMS curve (full line) with low and high l asymptotic behavior, respectively (dashed lines).

Particular care has to be taken evaluating Eq. (2) for a walk of finite length N to avoid finite size effects: as l ! N 1 the sample size available for calculating the averages (i.e. the number of possible l0 values) tends to 1; consequently, F l ! 0. In the calculations presented here l is limited to a maximal value of N=10. The resulting F(l) on applying this method to the VMS and other texts is shown in Figure 1. Previous investigations by Kokol et al. [8] of various human writings have demonstrated that for natural language texts (almost independent of the language used) the asymptotic exponent a of F(l) does not notably differ from 0.5, while for computer program source codes significant deviations are observed. As far as the normal language samples are considered the present results confirm this. Most interestingly, the VMS text shows completely different behavior: a crossover point exists where the random process a 0.5 turns into an asymptotic exponent a % 0:85, indicating the presence of memory effects in the underlying stochastic process. The principal structure of F(l) remains the same also for single sections of the VMS, as presented in Table 2: the asymptotic exponents for parts of the VMS are somewhat lower (between 0.7 and 0.8) than for the whole text; the difference is mainly due to the relatively high sensitivity of a to reduction of the walk length. Two facts are especially noteworthy: (i) the crossover point lco % 360 72 characters 5 bits of the whole text fits well to the average line length; (ii) this value approximately also holds for sections that are associated with Curriers language A [3], while for sections written in language B lco is significantly higher (by approximately a factor of 3). It appears that in the VMS significant correlations between tokens with spacing of more than an average text line exist, while within a line the text behaves randomly (like ordinary human writings). To inspect this more closely, the step (or bit) autocorrelation function

The Voynich Manuscript: Evidence of the Hoax Hypothesis Table 2. Random walk asymptotic displacement variance VMS section All Herbal Astrological Biological Pharmaceutical Recipes
1

99

Folios 1r116v 1r66v 67r73v 75r84v 87r102v 103r116v

Walk length 954456 272896 74721 172096 99176 282536

a1 0.131 0.243 0.396 0.161 0.314 0.182

a1 0.846 0.768 0.659 0.762 0.706 0.738

2 lco

Script3 AB A ? B A B

356 196 339 1065 277 1285

0:5 a F l ! al a for l > 1; see Eq. (2) and text; 2Crossover l-value: lco alco ; and, 3Currier lan> guage [3] that is dominant in this section.

Cl hnl l0 nl0 i hnl0 i2 and its corresponding cumulative distribution function Cc l


l 1X Ck l k1

Downloaded At: 14:22 25 October 2010

are useful quantities. n(k) denotes the value (0 or 1) of the bit at position k in the binary string generating the random walk. As demonstrated in Figure 2 positive correlations in the VMS build up within approximately l < 400 that are by an order of magnitude stronger than in ordinary text. These correlations decay after some thousand steps. Such positive correlations are typical for a stochastic process in which the probability of a particular random event is increased by previous occurrences of this event.

Figure 2. Cumulative step autocorrelation function Cc(l), cf., Eq. (5), (smoothed by 100 points adjacent averaging); full line: VMS, dashed line: Vulgate Bible. Inset: autocorrelation function C(l), cf., Eq. (4), for l between 1000 and 1030; full line: VMS, gray shaded area: Vulgate Bible.

100

A. Schinner

A classical model for such a system, often applied to cascade processes like particle induced electron emission [1], is the so-called Plya process. It is based on the o Plya urn scheme, where on drawing a ball of particular color from an urn a specific o number of balls of the same color are put into the urn, increasing the probability of drawing this color again [5] (spurious contagion). In the scaling limit of large step numbers l the resulting distribution is the so-called Plya distribution, also known as o negative binomial distribution   1=b Pn bln 1 bl1=bn 6 n In the present context Pn is the probability that in a walk of length l!1 the number of up-steps is equal to n. Mean and variance of Pn are given by h ni l r2 l1 bl 7 8

The parameter b describes the cascading strength of the process: for b 0 the random steps are uncorrelated and Eq. (6) turns into a Poisson distribution, while for b 1 the so-called Yule-Ferry process (also known as simple birth process) is recovered [2]. Since l / l, from Eq. (8) follows that an underlying Plya process results in the o p asymptotic behavior F l / b l 1 of the random walk model. In order to reproduce the observed a % 0:85 from Figure 1, l-dependence of b is necessary. Strictly speaking, the underlying process then is no longer a pure Plya process, since with nono constant b Eq. (6) no longer satisfies the Kolmogorov equations exactly. Due to the rather weak variation of b / l 0:3 , however, it still remains a useful approximation. The actual representation of the random walk in form of the VMS text can be used to estimate the true distribution Pn(l). Unfortunately, in particular for large l (which represents the interesting case) the sample size is too small to identify the distribution with compelling evidence (mainly because b is small). The data, however, do not contradict the hypothesis Eq. (6). The unusual shape of F(l) for the VMS has major impact on possible interpretations. In particular, the Chinese hypothesis appears not to be compatible with it. The impression that a non-Markovian stochastic process, where the step probability depends on the long-term history, may play a key role in the interpretation of the VMS will be still deepened in the following sections.

Downloaded At: 14:22 25 October 2010

Similar Tokens Repetition Distance Distribution


In a previous work by G. Landini [9] the repetition distance distribution of the most frequent tokens in the VMS (hdaiini), Alice in Wonderland (the), and the Vulgate Bible (et), respectively, have been investigated, i.e., the probability distribution of the number of other tokens between two occurrences of the particular one (iso-word gap). The result did not show characteristic difference between the VMS and the normal texts, apart from the well-known enigmatic VMS feature that common words, in particular hdaiini, quite frequently appear in sequences and consequently have non-vanishing probability for zero repetition distance. As will be demonstrated in this section it is more instructive to investigate the repetition distance of two similar rather than exactly matching tokens. From the

The Voynich Manuscript: Evidence of the Hoax Hypothesis

101

Downloaded At: 14:22 25 October 2010

many well-known string distance metrics the more straight-forward Levenshtein distance [6] will be used here. More sophisticated methods of calculating string distances tend to be optimized for human writings which appears problematic in the VMS context of unknown language and meaning (if any). The Levenshtein distance of two character strings is an integer ranging from 0 (exact match) to the maximum of the two string lengths (no similarity), denoting the number of elementary edit operations necessary to make both strings equal. Mapping this number to the interval [0,100] yields a percentage of dissimilarity for two tokens. In Figure 3, the similar token repetition distance distribution Pn for the VMS compared with normal texts is presented. Here n denotes the number of other tokens between two similar ones, i.e., n 0 corresponds to the situation of two alike tokens in immediate vicinity. Two words are considered similar if their dissimilarity as defined above is less or equal to 30%; it turns out that the precise value (10%) of this threshold changes Pn only quantitatively, not qualitatively. The most striking feature is the almost mathematically perfect smooth shape of the VMS curve for n ! 0, while the other text sample data display the expected irregular behavior and tend to zero (or at least small values). As noted previously, this simply expresses the effect that writers normally try to avoid word repetitions. It is especially noteworthy that even the Chinese text lies closer to the European languages than the VMS, although the higher tendency of common-word repetition sequences in Asian languages is a frequent argument in favor of the Chinese theory. The remaining text samples listed in Table 1 have been omitted in Figure 3 just to avoid confusion by too many markers; their behavior is comparable to that of the Vulgate Bible. Let us consider an infinite random text consisting of N words occurring with probabilities kk, k 1,. . ., N. The chance for a particular word k to reappear for

Figure 3. Similar tokens repetition distance distribution (maximal dissimilarity 30%) of the VMS, compared with Vulgate Bible and the pin-yin text. Inset: VMS result and fit using Eq. (12) (a 3.5618, b 0.1534, q 0.9885).

102

A. Schinner

the next time exactly after n other tokens follows a geometric distribution kk 1 kk n . The total token repetition distance distribution is then given by Pn
N X k1

k2 1 kk n k

The geometric distribution has its maximum at n 0 and decreases monotonically; a behavior also true for the VMS data in Figure 3. The fact that normal texts as well as the VMS obey Zipfs first law [10] suggests the approximation kk / 1=k. As rough estimate for small n the discrete index k may be replaced by a continuous variable j, turning the sum Eq. (9) into an integral. Setting kj % c=j with an upper cutoff jm to ensure convergence of the kj-norm, and under the reasonable assumption c << 1, Eq. (9) then yields Pn % c 1 1 cn1 n1 10

Downloaded At: 14:22 25 October 2010

For large n, the sum Eq. (9) may be estimated by the maximal summand, as long as the kk cover the range of values down to zero sufficiently dense. The maximum k0 of the function f k k2 1 kn is given by k0 2=n 2, and Eq. (9) can be approximated by Pn %  2 4 1 e2 n 2 11

The n dependence of Eqs. (10) and (11) suggests Pn a 1 qn1 1 n bn2 12

with parameters a, b, and q as interpolating fit formula. As can be seen in Figure 3, it excellently represents the VMS data with reasonable parameter c 1 q k1 : it equals the order of magnitude of the relative frequency of the VMS token hdaiini. The other parameters a and b reflect the mixture of the two asymptotic limits. On a scale large enough all texts are somehow random and produce the observed 1=n2 tail in Pn. The small-n behavior of the VMS is the most remarkable effect: it appears to indicate the presence of some kind of random selection process during the text generation, as already noted in the previous section. It should be emphasized again that the VMS text obviously is not a simple convolution of independent random strings; at least the underlying stochastic process must be fairly complex, involving history dependent variation of the step probabilities, building up correlations. This is also instructively demonstrated by comparing Pn of a text with its token scrambled version (i.e., where the token positions have been transposed randomly). As can be seen in Figure 4, token scrambling modifies the VMS result only quantitatively (which confirms an already present degree of randomness in the original text), whereas the Vulgate Bible curve is transformed in shape towards the VMS data; in the contrary, P0 for the VMS is decreased significantly. This effect appears compatible with the assumption of a key stochastic process with spurious contagion of, e.g., Plya type involved in the VMS text generation o method.

The Voynich Manuscript: Evidence of the Hoax Hypothesis

103

Downloaded At: 14:22 25 October 2010

Figure 4. Similar tokens repetition distance distribution (maximal dissimilarity 30%) of the VMS, compared with Vulgate Bible and token scrambled versions of both texts. The lines just connect the markers to guide the eye.

Selected Tokens Repetition Distance Distribution


In the previous section the probability for n other tokens separating two arbitrarily selected but similar ones (with respect to Levenshtein string metric distance) has been investigated. Although the unusual behavior of the VMS text contrasting normal human writings is clearly visible, the statistical details are somehow concealed due to the nature of the problem: the geometric distribution characteristic for random sequences is expanded to power-law behavior by the summation Eq. (9). Furthermore, the concept of similarity as well as finite sample size effects add extra random noise. In this section the problem will be modified slightly: what is the probability for two tokens sharing a particular property, being separated by n ones that do not possess this property? Such a property may be the occurrence of a particular letter within a token, or a special word structure. This type of question appears especially promising since it is a well-known fact that VMS words possess a rich variety of characteristic structural details (crust-mantle-core decomposition [14]). The symbol hqi in the VMS appears almost always in word-initial position. It has been speculated that it might be a prefix with meaning and, rather than part of the remaining token (much like the Latin suffix que). In Figure 5, the repetition distance distribution of tokens beginning with hqi is plotted, compared with that of the token und (the German word for and) in the Luther Bible. Again, the VMS result yields a surprisingly simple and smooth curve, qualitatively different from that associated with the normal text. A more detailed analysis of the data shows that Pn can be excellently fitted by a mixture of two geometric distributions: Pn ap1 1 p1 n 1 ap2 1 p2 n 13

104

A. Schinner

Downloaded At: 14:22 25 October 2010

Figure 5. Repetition distance distribution of VMS tokens beginning with EVA hqi (full squares), and the token und in the Luther Bible (open circles), respectively. Full line: fit of the geometric distribution mixture Eq. (13) with parameters a 0.50275, p1 0.28531, p2 0.10482.

A mixture of two probability distributions indicates the presence of two independent subpopulations in the statistical data. Eq. (13) is, for example, produced by the following random process: use two dice with success probabilities p1 and p2, respectively. Throw a die until success (failure means not to add the hqi prefix to a token in the sequence); then continue with either die 1 or 2, depending on a random decision with probability a. However, this should only be seen as example algorithm; the mechanisms behind the text generation process must be somehow more complex, as has been demonstrated in the previous sections. In this context it is especially noteworthy that Eq. (13) is also compatible with (i.e., is a good approximation to) the situation of a stochastic process with varying step probability, being gradually decreased from p1 to p2 on failure events, and reset to p1 on success. This provides another link to spurious contagion processes like the Plya scheme discussed previously. o The hqi prefix is just a single aspect of the fairly complex VMS word grammar. However, the behavior expressed by Eq. (13) is found throughout a wide variety of token selection conditions; a few examples are listed in Table 3. Most interestingly, the crossover point between the two geometric distributions (i.e., the real value n for which both terms of Eq. (13) contribute equally) is in most cases close to the average number of tokens per line. For a token scrambled version of the VMS, however, Eq. (13) is reduced to a single geometric distribution, as is expected in agreement with the previous analysis. On the other hand, for normal texts two possible results have been found so far: if the selection criterion is weak and linguistically (almost) irrelevant, then the result will be a single geometric distribution (straight random result); an example is the

The Voynich Manuscript: Evidence of the Hoax Hypothesis Table 3. Some examples for the parameter fits of Eq. (13) Selection condition Token Token Token Token Token begins with hqi contains hcChi1 contains hchei contains hshei ends with haiini a 0.50275 0.58879 0.90501 0.74403 0.93027 p1 0.28531 0.12189 0.16838 0.12395 0.12892
2

105

p2 0.10482 0.03196 0.03489 0.02811 0.02001

xC2 4.5 17.4 25.7 24.6 37.8

1 a  C stands for  gallows character (hf i, hki, hpi, hti); and,  . ln 1ap2 ln 1p1 : ap1 1p2

crossover point: xC

selection of all tokens in an English text that contain the letter e. If, however, the condition is correlated with semantic (sub-) structures or at least nontrivial token parts the result more or less resembles the Luther Bible curve in Figure 5. Like in the previous sections the behavior of the Chinese text does not differ significantly.

Downloaded At: 14:22 25 October 2010

Conclusions
Concerning the VMS enigma, such investigations are of special interest that emphasize the peculiar structural properties of the VMS text in contrast to normal language. All methods of analysis used in the present article fall into this category. Interpreting normal texts as bit sequences yields deviations of little significance from a true (uncorrelated) random walk. For the VMS, this only holds on a small scale of approximately the average line length; beyond positive correlations build up: the presence=absence of a symbol appears to increase=decrease the tendency towards another occurrence. The Plya urn scheme is an example for such a behao vior; it is, however, not exactly reproducing the VMS data and should be seen as a first approximation only. Nevertheless, this result has important implications on the possible solutions of the VMS riddle. Encryption tends to destroy correlations in a text rather than building them up. The method, however, could be a more complex variant of a word game, like the childrens secret language Opish (there you add the syllable op before each vowel); in this case the effective information content of the VMS would at least be rather low. The result appears incompatible with the plain text hypothesis. Even in artificial language correlations tend to be contextual, i.e., on the small scale of a few sentences. Thus, the hoax hypothesis may provide the most convincing explanation base for the data. A variant of the table-and-grille method still is a promising candidate, if the table is filled with syllables selected under involvement of some lottery algorithm producing the observed statistical effects. The source for the positive correlations might as well be (or partly be) a psychological one: the creator of the table could unconsciously have written them into it while trying to equally distribute the syllables (the human mind is extremely poor at generating random numbers). An additional problem, however, arises upon reusing a table with different grilles: the variance Eq. (2) is very sensitive to correlations created by overlapping slot patterns, leading to significant structures in F(l) for large l. To avoid this behavior not observed for the VMS text, about 4 to 6 only (of the 27 possible) 3 3 grilles can be

106

A. Schinner

used with a particular 39 40 table. It is unlikely that the creator of the VMS has excluded the forbidden grille layouts by mere luck, but perhaps out of aesthetic (symmetry) considerations? On the other hand, the table-and-grille scheme need not necessarily contain the (whole) truth about the VMS generation process, even if the hoax hypothesis finally might turn out to be correct. The token repetition statistics also emphasizes the strangeness of the VMS language. Again the results differ significantly from comparative text samples, indicating that the VMS language is more closely related to a stochastic process than human communication. Of particular interest is the mixture of two geometric distributions Eq. (13) that almost perfectly describes the gap distribution of tokens with, for example, a particular prefix. Such exact statistical properties of complex systems are either trivial (as in the case of purely random aspects) or express an underlying principle. Since Eq. (13) contains a crossover between two terms it most probably is not trivial (pure randomness would have yielded a single geometric distribution). Another exact property of the VMS is already well known: the word length distribution follows almost exactly a binomial distribution. This fact has been a strong argument in favor of the Chinese theory [13] since East Asian languages, in particular Chinese, also show this feature. The present investigations, however, let the Chinese theory appear much less promising; instead, the mathematically exact shape of the VMS word distribution may be seen as additional evidence of an underlying stochastic process (a binomial distribution describes the sum of independent random summands). It must be emphasized that the present study is not a proof of the hoax hypothesis, nor can it definitely rule out either of the two other main theory classes. It gives, however, some hints on the most promising direction for future investigations. In the text so far I was trying to avoid writing down my personal opinion about the VMS, where it goes beyond the presentation of facts and the inevitable basic interpretation of statistics (I am aware of how easily statistics can be misinterpreted following prejudice). From my viewpoint, the VMS is a cleverly set psychological trap still active after five centuries, reflecting the analysts expectations and hopes like a mirror without containing meaningful information itself. It has been created using algorithmic methods, implicitly or explicitly involving some degree of randomness. A frequent argument against the hoax hypothesis is that even utilizing something like the table-and-grille the effort for a hoaxer would have been inadequately high: to defraud Emperor Rudolf II of Bohemia (the possible first buyer of the VMS) a much simpler concept should have been sufficient. As always with psychological arguments there is the intrinsic danger of projecting a value system. Perhaps the VMS is the once-in-a-lifetime masterpiece of a habitual forgeror simply a special kind of artwork, created with no immoral motivation: around 1980 the Italian architect and industrial designer Luigi Serafini has written and illustrated his famous Codex Seraphinianus (most probably inspired by the VMS) that looks like the visual encyclopedia of an extraterrestrial world, and is written in incomprehensible language with strange curvilinear script. Obviously there is some artistic or even philosophical attraction in the creation of a phantasmagoric book that has no inherent meaningand therefore, can take on any one.

Downloaded At: 14:22 25 October 2010

Acknowledgment
The author wishes to thank M. A. Labi for stimulating discussions and proofreading the manuscript.

The Voynich Manuscript: Evidence of the Hoax Hypothesis

107

About the Author


Dr. Andreas Schinner is a theoretical physicist, performing freelance research at the Johannes Kepler University in Linz, Austria. His main area of scientific interest is theoretical solid state physicsparticularly particle beam interactions with matter. He is also working as a self-employed software developer.

References
1. Benka, O., A. Schinner, and T. Fink. 1995. Distribution of the Number of Emitted Electrons for MeVH -, and He2 -ion Impacts on Metals, Phys. Rev. A, 51(3):22812284. 2. Cox, D. R. and H. D. Miller. 1965. The Theory of Stochastic Processes. London: Methuen & Co Ltd. 3. Currier, P. H. 1976. Some important new statistical findings. Proceedings of a Seminar held on 30 November 1976 in Washington DC. In edited by M. E. DImperio. Privately printed pamphlet, 30 November 1976. ftp://ftp.funet.fi/pub/doc/religion/occult/necronornicon/voynich/currier.paper. Last date accessed by me=web document update: 20 Feb 2007. 4. DImperio, M. E. 1978. The Voynich ManuscriptAn Elegant Enigma. Laguna Hills, CA: Aegean Park Press. 5. Feller, W. 1957. An Introduction to Probability Theory and its Applications. Vol. 1, New York: Wiley. 6. Gilleland, M. 2002. Levenshtein Distance in Three Flavors. http://ww.merriampark.com/Id.htm last accessed 20 Feb 2007. 7. Kennedy, G. and R. Churchill. 2005. The Voynich Manuscript: The Unsolved Riddle of an Extraordinary Book Which has Defied Interpretation for Centuries. London: Orion Publishing Group Ltd. 8. Kokol, P., V. Podgorelec, M. Zorman, T. Kokol, and T. Njivar. 1999. Computer and Natural Language TextsA Comparison Based on LongRange Correlations, Journal of the American Society for Information Science, 50:12951301. 9. Landini, G. 2000. Zipfs laws in the Voynich Manuscript, http-document, currently no 405 longer available in the Internet. 10. Landini, G. 2001. Evidence of Linguistic Structure in the Voynich Manuscript Using Spectral Analysis, Cryptologia, 25(4):275295. 11. Rugg, G. 2004. An Elegant Hoax? A Possible Solution to the Voynich Manuscript, Cryptologia, 28(1):3146. 12. Schenkel, A., J. Zhang, and Y. Zhang. 1993. Long Range Correlations in Human Writings, Fractals, 1(1):4755. 13. Stolfi, J. 2002. Chinese theory Redux: Comparing the VMS and East Asian word length distributions. http://www.ic.unicamp.br/~stolfi/voynich/02-01-18-chinese-redux/ last accessed 20 Feb 2007. 14. Stolfi, J. 2003. Voynich manuscript Stuff. http://www.ic.unicamp.br/~stolfi/voynich/last accessed 20 Feb 2007. 15. Zandbergen, R. 2003. The Voynich manuscript. http://www.voynich.nu/ last accessed 20 Feb 2007.

Downloaded At: 14:22 25 October 2010

You might also like