Infinite monkey theorem
The infinite monkey theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type a particular chosen text, such as the complete works of William Shakespeare. In this context, " almost surely" is a mathematical term with a precise meaning, and the "monkey" is not an actual monkey; rather, it is a metaphor for an abstract device that produces a random sequence of letters ad infinitum. The theorem illustrates the perils of reasoning about infinity by imagining a vast but finite number, and vice versa. The probability of a monkey typing a given string of text as long as, say, Hamlet, is so tiny that, were the experiment conducted, the chance of it actually occurring during a span of time of the order of the age of the universe is minuscule but not zero.
Variants of the theorem include multiple and even infinitely many typists, and the target text varies between an entire library and a single sentence. The history of these statements can be traced back to Aristotle's Metaphysics and Cicero's De natura deorum, through Blaise Pascal and Jonathan Swift, and finally to modern statements with their iconic typewriters. In the early 20th century, Émile Borel and Arthur Eddington used the theorem to illustrate the timescales implicit in the foundations of statistical mechanics. Various Christian apologists on the one hand, and Richard Dawkins on the other, have argued about the appropriateness of the monkeys as a metaphor for evolution.
Today, popular interest in the typing monkeys is sustained by numerous appearances in literature, television and radio, music, and the Internet. In 2003, a humorous experiment was performed with six Celebes Crested Macaques, but their literary contribution was five pages consisting largely of the letter S.
There is a straightforward proof of this theorem. If two events are statistically independent, (i.e. neither affects the outcome of the other), then the probability of both happening equals the product of the probabilities of each one happening independently. E.g. if the chance of rain in Sydney on a particular day is 0.3 and the chance of an earthquake in San Francisco on that day is 0.008, then the chance of both happening on that same day is 0.3 × 0.008 = 0.0024.
Suppose the typewriter has 50 keys, and the word to be typed is " banana". Typing at random, the chance that the first letter typed is b is 1/50, and the chance that the second letter typed is a is also 1/50, and so on, because events are independent. So the chance of the first six letters matching banana is
- (1/50) × (1/50) × (1/50) × (1/50) × (1/50) × (1/50) = (1/50)6.
For the same reason, the chance that the next 6 letters match banana is also (1/50)6, and so on.
From the above, the chance of not typing banana in a given block of 6 letters is 1 − (1/50)6. Because each block is typed independently, the chance Xn of not typing banana in any of the first n blocks of 6 letters is
As n grows, Xn gets smaller. For an n of a million, Xn is 99.99%, but for an n of 10 billion Xn is 53% and for an n of 100 billion it is 0.17%. As n approaches infinity, the probability Xn approaches zero; that is, by making n large enough, Xn can be made as small as one likes.
The same argument shows why at least one of infinitely many monkeys will (almost surely) produce a text as quickly as it would be produced by a perfectly accurate human typist copying it from the original. In this case Xn = (1 − (1/50)6)n where Xn represents the probability that none of the first n monkeys types banana correctly on their first try. When we consider 100 billion monkeys, the probability falls to 0.17%, and as the number of monkeys n increases to infinity the value of Xn — the probability of the monkeys failing to reproduce the given text — decreases to zero. This is equivalent to stating that the probability that one or more of an infinite number of monkeys will produce a given text on the first try is 100%, or that it is almost certain they will do so.
The two statements above can be stated more generally and compactly in terms of strings, which are sequences of characters chosen from some finite alphabet:
- Given an infinite string where each character is chosen uniformly at random, any given finite string almost surely occurs as a substring at some position (and indeed, infinitely many positions).
- Given an infinite sequence of infinite strings, where each character of each string is chosen uniformly at random, any given finite string almost surely occurs as a prefix of one of these strings (and indeed, as a prefix of infinitely many of these strings in the sequence).
Both follow easily from the second Borel-Cantelli lemma. For the second theorem, let Ek be the event that the kth string begins with the given text. Because this has some fixed nonzero probability p of occurring, the Ek are independent, and the below sum diverges,
the probability that infinitely many of the Ek occur is 1. The first theorem is shown similarly; one can divide the random string into nonoverlapping blocks matching the size of the desired text, and make Ek the event where the kth block equals the desired string.
Ignoring punctuation, spacing, and capitalization, a monkey typing letters uniformly at random has a chance of one in 26 of correctly typing the first letter of Hamlet. It has a chance of one in 676 (26 × 26) of typing the first two letters. Because the probability shrinks exponentially, at 20 letters it already has only a chance of one in 2620 = 19,928,148,895,209,409,152,340,197,376, roughly equivalent to the probability of buying 4 lottery tickets consecutively and winning the jackpot each time. In the case of the entire text of Hamlet, the probabilities are so vanishingly small they can barely be conceived in human terms. Say the text of Hamlet contains 130,000 letters (it is actually more, even stripped of punctuation), then there is a probability of one in 3.4 × 10183,946 to get the text right at the first trial. The average number of letters that needs to be typed until the text appears is also 3.4 × 10183,946.
For comparison purposes, there are only about 3 × 1079 hydrogen atoms in the observable universe and only 4.3 × 1017 seconds have elapsed since the Big Bang. Even if the observable universe were filled with monkeys typing for all time, their total probability to produce a single instance of Hamlet would still be less than one in 10183,800. As Kittel and Kroemer put it, "The probability of Hamlet is therefore zero in any operational sense of an event…", and the statement that the monkeys must eventually succeed "gives a misleading conclusion about very, very large numbers." This is from their textbook on thermodynamics, the field whose statistical foundations motivated the first known expositions of typing monkeys.
In one of the forms in which probabilists now know this theorem, with its "dactylographic" [i.e., typewriting] monkeys (French: singes dactylographes; the French word singe covers both the monkeys and the apes), appeared in Émile Borel's 1913 article "Mécanique Statistique et Irréversibilité" (Statistical mechanics and irreversibility), and in his book "Le Hasard" in 1914. His "monkeys" are not actual monkeys; rather, they are a metaphor for an imaginary way to produce a large, random sequence of letters. Borel said that if a million monkeys typed ten hours a day, it was extremely unlikely that their output would exactly equal all the books of the richest libraries of the world; and yet, in comparison, it was even more unlikely that the laws of statistical mechanics would ever be violated, even briefly.
The physicist Arthur Eddington drew on Borel's image further in The Nature of the Physical World (1928), writing:
If I let my fingers wander idly over the keys of a typewriter it might happen that my screed made an intelligible sentence. If an army of monkeys were strumming on typewriters they might write all the books in the British Museum. The chance of their doing so is decidedly more favourable than the chance of the molecules returning to one half of the vessel.
These images invite the reader to consider the incredible improbability of a large but finite number of monkeys working for a large but finite amount of time producing a significant work, and compare this with the even greater improbability of certain physical events. Any physical process that is even less likely than such monkeys' success is effectively impossible, and it may safely be said that such a process will never happen.
Origins and "The Total Library"
In a 1939 essay entitled "The Total Library", Argentine writer Jorge Luis Borges traced the infinite-monkey concept back to Aristotle's Metaphysics. Explaining the views of Leucippus, who held that the world arose through the random combination of atoms, Aristotle notes that the atoms themselves are homogeneous and their possible arrangements only differ in position and ordering. The Greek philosopher compares this to the way that a tragedy and a comedy consist of the same "atoms", i.e., alphabetic characters. Three centuries later, Cicero's De natura deorum (On the Nature of the Gods) argued against the atomist worldview:
He who believes this may as well believe that if a great quantity of the one-and-twenty letters, composed either of gold or any other matter, were thrown upon the ground, they would fall into such order as legibly to form the Annals of Ennius. I doubt whether fortune could make a single verse of them.
Borges follows the history of this argument through Blaise Pascal and Jonathan Swift, then observes that in his own time, the vocabulary had changed. By 1939, the idiom was "that a half-dozen monkeys provided with typewriters would, in a few eternities, produce all the books in the British Museum." (To which Borges adds, "Strictly speaking, one immortal monkey would suffice.") Borges then imagines the contents of the Total Library which this enterprise would produce if carried to its fullest extreme:
Everything would be in its blind volumes. Everything: the detailed history of the future, Aeschylus' The Egyptians, the exact number of times that the waters of the Ganges have reflected the flight of a falcon, the secret and true nature of Rome, the encyclopedia Novalis would have constructed, my dreams and half-dreams at dawn on August 14, 1934, the proof of Pierre Fermat's theorem, the unwritten chapters of Edwin Drood, those same chapters translated into the language spoken by the Garamantes, the paradoxes Berkeley invented concerning Time but didn't publish, Urizen's books of iron, the premature epiphanies of Stephen Dedalus, which would be meaningless before a cycle of a thousand years, the Gnostic Gospel of Basilides, the song the sirens sang, the complete catalog of the Library, the proof of the inaccuracy of that catalog. Everything: but for every sensible line or accurate fact there would be millions of meaningless cacophonies, verbal farragoes, and babblings. Everything: but all the generations of mankind could pass before the dizzying shelves—shelves that obliterate the day and on which chaos lies—ever reward them with a tolerable page.
Borges's total library concept was the main theme of his widely-read 1941 short story " The Library of Babel", which describes an unimaginably vast library consisting of interlocking hexagonal chambers, together containing every possible volume that could be composed from the letters of the alphabet and some punctuation characters.
Applications and Criticisms
In his 1931 book The Mysterious Universe, Eddington's rival James Jeans attributed the monkey parable to a "Huxley", presumably meaning Thomas Henry Huxley. This attribution is incorrect. Today, it is sometimes further reported that Huxley applied the example in a now-legendary debate over Charles Darwin's Origin of Species with the Anglican Bishop of Oxford, Samuel Wilberforce, held at a meeting of the British Association for the Advancement of Science at Oxford in June 30, 1860. This story suffers not only from a lack of evidence, but the fact that in 1860 the typewriter itself had yet to emerge. Primates were still a sensitive topic for other reasons, and the Huxley-Wilberforce debate did include byplay about apes: the bishop asked whether Huxley was descended from an ape on his grandmother's or his grandfather's side, and Huxley responded something to the effect that he would rather be descended from an ape than from someone who argued as dishonestly as the bishop.
Despite the original mix-up, monkey-and-typewriter arguments are now common in arguments over evolution. For example, Doug Powell argues as a Christian apologist that even if a monkey accidentally types the letters of Hamlet, it has failed to produce Hamlet because it lacked the intention to communicate. His parallel implication is that natural laws could not produce the information content in DNA. A more common argument is represented by John MacArthur, who claims that the genetic mutations necessary to produce a tapeworm from an amoeba are as unlikely as a monkey typing Hamlet's soliloquy, and hence the odds against the evolution of all life are impossible to overcome.
Evolutionary biologist Richard Dawkins employs the typing monkey concept in his 1986 book The Blind Watchmaker to demonstrate the abilities of natural selection in producing biological complexity out of random mutations. In the simulation experiment he describes, Dawkins has his Weasel program produce the Hamlet phrase METHINKS IT IS LIKE A WEASEL by typing random phrases but constantly freezing those parts of the output which already match the goal. The point is that random string generation merely serves to furnish raw materials, while selection imparts the information.
A different avenue for rejecting the analogy between evolution and an unconstrained monkey lies in the problem that the monkey types only one letter at a time, independently of the other letters. Hugh Petrie argues that a more sophisticated setup is required, in his case not for biological evolution but the evolution of ideas:
In order to get the proper analogy, we would have to equip the monkey with a more complex typewriter. It would have to include whole Elizabethan sentences and thoughts. It would have to include Elizabethan beliefs about human action patterns and the causes, Elizabethan morality and science, and linguistic patterns for expressing these. It would probably even have to include an account of the sorts of experiences which shaped Shakespeare's belief structure as a particular example of an Elizabethan. Then, perhaps, we might allow the monkey to play with such a typewriter and produce variants, but the impossibility of obtaining a Shakespearean play is no longer obvious. What is varied really does encapsulate a great deal of already-achieved knowledge.
James W. Valentine, while admitting that the classic monkey's task is impossible, finds that there is a worthwhile analogy between written English and the metazoan genome in this other sense: both have "combinatorial, hierarchical structures" that greatly constrain the immense number of combinations at the alphabet level.
R. G. Collingwood argued in 1938 that art cannot be produced by accident, and wrote as a sarcastic aside to his critics,
…some … have denied this proposition, pointing out that if a monkey played with a typewriter … he would produce … the complete text of Shakespeare. Any reader who has nothing to do can amuse himself by calculating how long it would take for the probability to be worth betting on. But the interest of the suggestion lies in the revelation of the mental state of a person who can identify the 'works' of Shakespeare with the series of letters printed on the pages of a book…
Nelson Goodman took the contrary position, illustrating his point along with Catherine Elgin by the example of Borges' “ Pierre Menard, Author of the Quixote”,
What Menard wrote is simply another inscription of the text. Any of us can do the same, as can printing presses and photocopiers. Indeed, we are told, if infinitely many monkeys … one would eventually produce a replica of the text. That replica, we maintain, would be as much an instance of the work, Don Quixote, as Cervantes' manuscript, Menard's manuscript, and each copy of the book that ever has been or will be printed.
In another writing, Goodman elaborates, "That the monkey may be supposed to have produced his copy randomly makes no difference. It is the same text, and it is open to all the same interpretations…." Gérard Genette dismisses Goodman's argument as begging the question.
For Jorge J. E. Gracia, the question of the identity of texts leads to a different question, that of author. If a monkey is capable of typing Hamlet, despite having no intention of meaning and therefore disqualifying itself as an author, then it appears that texts do not require authors. Possible solutions include saying that whoever finds the text and identifies it as Hamlet is the author; or that Shakespeare is the author, the monkey his agent, and the finder merely a user of the text. These solutions have their own difficulties, in that the text appears to have a meaning separate from the other agents: what if the monkey operates before Shakespeare is born, or if Shakespeare is never born, or if no one ever finds the monkey's typescript?
Random number generation
The theorem concerns a thought experiment which cannot be fully carried out in practice, since it is predicted to require prohibitive amounts of time and resources. Nonetheless, it has inspired efforts in finite random text generation.
One computer program run by Dan Oliver of Scottsdale, Arizona, according to an article in The New Yorker, came up with a result on August 4, 2004: After the group had worked for 42,162,500,000 billion billion years, one of the "monkeys" typed, “VALENTINE. Cease toIdor:eFLP0FRjWK78aXzVOwm)-‘;8.t . . ." The first 19 letters of this sequence can be found in "The Two Gentlemen of Verona". Other teams have reproduced 18 characters from "Timon of Athens", 17 from "Troilus and Cressida", and 16 from "Richard II".
A website entitled The Monkey Shakespeare Simulator, launched on July 1, 2003, contained a Java applet that simulates a large population of monkeys typing randomly, with the stated intention of seeing how long it takes the virtual monkeys to produce a complete Shakespearean play from beginning to end. For example, it produced this partial line from Henry IV, Part 2, reporting that it took "2,737,850 million billion billion billion monkey-years" to reach 24 matching characters:
- RUMOUR. Open your ears; 9r"5j5&?OWTY Z0d…
Due to processing power limitations, the program uses a probabilistic model (by using a random number generator or RNG) instead of actually generating random text and comparing it to Shakespeare. When the simulator "detects a match" (that is, the RNG generates a certain value or a value within a certain range), the simulator simulates the match by generating matched text.
Questions about the statistics describing how often an ideal monkey should type certain strings can motivate practical tests for random number generators as well; these range from the simple to the "quite sophisticated". Computer science professors George Marsaglia and Arif Zaman report that they used to call such tests "overlapping m- tuple tests" in lecture, since they concern overlapping m-tuples of successive elements in a random sequence. But they found that calling them "monkey tests" helped to motivate the idea with students. They published a report on the class of tests and their results for various RNGs in 1993.
Primate behaviorists Cheney and Seyfarth remark that real monkeys would indeed have to rely on chance to have any hope of producing Romeo and Juliet. Unlike apes and particularly chimpanzees, the evidence suggests that monkeys lack a theory of mind and are unable to differentiate between their own and others' knowledge, emotions, and beliefs. Even if a monkey could learn to write a play and describe the characters' behaviour, it could not reveal the characters' minds and so build an ironic tragedy.
In 2003, lecturers and students from the University of Plymouth MediaLab Arts course used a £2,000 grant from the Arts Council to study the literary output of real monkeys. They left a computer keyboard in the enclosure of six Celebes Crested Macaques in Paignton Zoo in Devon in England for a month, with a radio link to broadcast the results on a website. One researcher, Mike Phillips, defended the expenditure as being cheaper than reality TV and still "very stimulating and fascinating viewing".
Not only did the monkeys produce nothing but five pages consisting largely of the letter S, the lead male began by bashing the keyboard with a stone, and the monkeys continued by urinating and defecating on it. The zoo's scientific officer remarked that the experiment had "little scientific value, except to show that the 'infinite monkey' theory is flawed". Phillips said that the artist-funded project was primarily performance art, and they had learned "an awful lot" from it. He concluded that monkeys "are not random generators. They're more complex than that. … They were quite interested in the screen, and they saw that when they typed a letter, something happened. There was a level of intention there."
The infinite monkey theorem and its associated imagery is considered a popular and proverbial illustration of the mathematics of probability, widely known to the general public because of its transmission through popular culture rather than because of its transmission via the classroom.
The enduring, widespread and popular nature of the knowledge of the theorem was noted in the introduction to a 2001 paper, "Monkeys, Typewriters and Networks — the Internet in the Light of the Theory of Accidental Excellence" (Hoffmann and Hofmann). In 2002, a Washington Post article said: "Plenty of people have had fun with the famous notion that an infinite number of monkeys with an infinite number of typewriters and an infinite amount of time could eventually write the works of Shakespeare." In 2003, the previously mentioned Arts Council funded experiment involving real monkeys and a computer keyboard received widespread press coverage. In 2007, the theorem was listed by Wired magazine in a list of eight classic thought experiments.
The history of the imagery of 'typing monkeys' dates back at least as far as Borel's use of the metaphor in his essay in 1913, and this imagery has recurred many times since in a variety of media. Today, popular interest in the typing monkeys is sustained by numerous appearances in literature, television and radio, music, and the Internet, as well as graphic novels and stand-up comedy routines.