DARWIN AMONG THE MACHINES; OR, THE ORIGINS OF [ARTIFICIAL] LIFE [1]

DARWIN AMONG THE MACHINES; OR, THE ORIGINS OF [ARTIFICIAL] LIFE

In examining the prospects for artificial intelligence and artificial life Samuel Butler (1835-1902) faced the same mysteries that permeate these two subjects today. "I first asked myself whether life might not, after all, resolve itself into the complexity of arrangement of an inconceivably intricate mechanism," he recalled in 1880, retracing the development of his ideas. "If, then, men were not really alive after all, but were only machines of so complicated a make that it was less trouble to us to cut the difficulty and say that that kind of mechanism was 'being alive,' why should not machines ultimately become as complicated as we are, or at any rate complicated enough to be called living, and to be indeed as living as it was in the nature of anything at all to be? If it was only a case of their becoming more complicated, we were certainly doing our best to make them so." [1]

These questions can be distilled into one essential puzzle: the origin of life. "We wanted to know whence came that germ or those germs of life which, if Mr. Darwin was right, were once the world's only inhabitants," asked Butler "They could hardly have come hither from some other world; they could not in their wet, cold, slimy state have traveled through the dry ethereal medium which we call space, and yet remained alive. If they traveled slowly, they would die, if fast, they would catch fire."[1] The only viable answer, without recourse to some higher being "at variance with the whole spirit of evolution," was that life "had grown up, in fact, out of the material substances and forces of the world"- as life might once again be growing up out of the material substances and forces of machines.

Freeman J. Dyson, a mathematical physicist better known as one of the architects of quantum electrodynamics, took a mid-career detour into theoretical biology that resulted in a thin volume titled Origins of Life. The essence of my father's hypothesis was that life began not once, but twice. "It is often taken for granted that the origin of life is the same thing as the origin of replication," he wrote, noting "a sharp distinction between replication and reproduction. Cells can reproduce but only molecules can replicate. In modern times, reproduction of cells is always accompanied by replication of molecules, but this need not always have been so. Either life began only once, with the functions of replication and metabolism already present in rudimentary form and linked together from the beginning, or life began twice, with two separate kinds of creatures, one kind capable of metabolism without exact replication, the other kind capable of replication without metabolism.... The most striking fact which we have learned about life as it now exists is the ubiquity of dual structure, the division of every organism into hardware and software components, into protein and nucleic acid. I consider dual structure to be prima facie evidence of dual origin. If we admit that the spontaneous emergence of protein structure and of nucleic acid structure out of molecular chaos are both unlikely, it is easier to imagine two unlikely events occurring separately over a long period of time." [2]

Over a period of twenty years, Dyson developed a toy mathematical model that "allows populations of several thousand molecular units to make the transition from disorder to order with reasonable probability."[3] These self-sustaining--and haphazardly reproducing--autocatalytic systems then provide energy (and information) gradients hospitable to the development of replication, perhaps first of parasites infecting the metabolism of primitive precursors of modern cells. Once metabolism is infected by replication, as the Darwins showed us, natural selection will do the rest.

Natural selection does not require replication; statistically approximate reproduction, for simple creatures, is good enough. The difference between replication (producing an exact copy) and reproduction (producing a similar copy) is the basis of a broad generalization: genesreplicate but organisms reproduce. As organisms became more complicated, they discovered how to replicate instructions (genes) that could help them reproduce; looking at it the other way around, as instructions became more complicated, they discovered how to reproduceorganisms to help replicate the genes.

If organisms truly replicated, or reproduced even an approximate likeness of themselves without following a distinct set of inherited instructions, we would have Lamarckian evolution, with acquired characteristics transmitted to the offspring. According to the dual-origin hypothesis, natural selection may have operated in a purely statistical fashion for millions if not hundreds of millions of years before self-replicating instructions took control. This brings us back to Butler versus Darwin, because during this extended evolutionary prelude Lamarckian, not neo-Darwinian, selection would have been at work. We should think twice before dismissing Lamarck because Lamarckian evolution may have taken our cells the first - and most significant - step toward where we stand today. Genotype and phenotype may have started out synonymous and only later become estranged by the central dogma of molecular biology that allows communication from genotype to phenotype but not the other way. Life, however, arrives at distinctions by increments and rarely erases its steps. Remnants of Lamarckian evolution may be more prevalent, biologically, than we think - not to mention Lamarckian tendencies among machines.

My father asked three fundamental questions: "Is life one thing or two things? Is there a logical connection between metabolism and replication? Can we imagine metabolic life without replication, or replicative life without metabolism?"[4] These same three questions surround the origin(s) of life among machines. Here, too, a dual-origin hypothesis can shift the balance of probabilities in life's favor once the distinction between reproduction and replication is understood. In looking for signs of artificial life, either on the loose or cooked up in the laboratory, however permeable this distinction may prove to be, one should expect to see signs of metabolism without replication and replication without metabolism first. If we look at the world around us, we see a prolific growth of electronic metabolism, populated by virulently replicating code just as the dual-origin hypothesis predicts. The origins of this go back (at least) to the inauguration of John von Neumann's high-speed electronic digital computer at the Institute for Advanced Study in 1951.

"During the summer of 1951," according to Julian Bigelow, "a team of scientists from Los Alamos came and put a large thermonuclear calculation on the IAS machine; it ran for 24 hours without interruption for a period of about 60 days, many of the intermediate results being checked by duplicate runs, and throughout this period only about half a dozen errors were disclosed. The engineering group split up into teams and was in full-time attendance and ran diagnostic and test routines a few times per day, but had little else to do. So it had come alive."[5]

The age of digital computers dawned over the New Jersey countryside while a series of thermonuclear explosions, led by the MIKE test at Eniwetok Atoll on 1 November 1952, corroborated the numerical results. In 1953, a series of experiments performed at the Institute for Advanced Study demonstrated that digital computers could be used not only to develop the means of destroying life, but to spawn lifelike processes of a form so far entirely unknown.

Italian-Norwegian mathematician Nils Aall Barricelli (1912-1993) arrived at the Institute as a visiting member for the spring term of 1953. Barricelli initiated extensive tests of evolution theories in 1953, 1954, and 1956, using the IAS computer to develop a working model of Darwinian evolution and to investigate the role of symbiogenesis in the origin of life.The theory of symbiogenesis was introduced in 1909 by Russian botanist Konstantin S. Merezhkovsky (1855-1921) and expanded by Boris M. Kozo-Polyansky (1890-1957) in 1924.[6] "So many new facts arose from cytology, biochemistry, and physiology, especially of lower organisms," wrote Merezhkovsky in 1909, "that [in] an attempt once again to raise the curtain on the mysterious origin of organisms... I have decided to undertake... a new theory on the origin of organisms, which, in view of the fact that the phenomenon of symbiosis plays a leading role in it, I propose to name the theory of symbiogenesis."[7]Symbiogenesis offered a controversial adjunct to Darwinism, ascribing the complexity of living organisms to a succession of symbiotic associations between simpler living forms. Lichens, a symbiosis between algae and fungi, sustained life in the otherwise barren Russian north; it was only natural that Russian botanists and cytologists took the lead in symbiosis research. Taking root in Russian scientific literature, Merezhkovsky's ideas were elsewhere either ignored or declared unsound, most prominently by Edmund B. Wilson's dismissal of symbiogenesis as "an entertaining fantasy that the dualism of the cell in respect to nuclear and cytoplasmic substance resulted from the symbiotic association of two types of primordial microorganisms, that were originally distinct."[8]

Merezhkovsky viewed both plant and animal life as the result of a combination of two plasms: mycoplasm, represented by bacteria, fungi, blue-green algae, and cellular organelles; and amoeboplasm, represented by certain "monera without nuclea" that formed the nonnucleated material at the basis of what we now term eukaryotic cells. Merezhkovsky believed that mycoids came first. When they were eaten by later-developing amoeboids they learned to become nuclei rather than lunch. It is equally plausible that amoeboids came first, with mycoids developing as parasites later incorporated symbiotically into their hosts. The theory of two plasms undoubtedly contains a germ of truth whether the details are correct or not. Merezhkovsky's two plasms of biology were mirrored in the IAS experiments by embryonic traces of the two plasms of computer technology -- hardware and software -- that were just beginning to coalesce.

The theory of symbiogenesis assumes that the most probable explanation for improbably complex structures (living or otherwise) lies in the association of less complicated parts. Sentences are easier to construct by combining words than by combining letters. Sentences then combine into paragraphs, paragraphs combine into chapters, and, eventually, chapters combine to form a book--highly improbable, but vastly more probable than the chance of arriving at a book by searching the space of possible combinations at the level of letters or words. It was apparent to Merezhkovsky and Kozo-Polyansky that life represents the culmination of a succession of coalitions between simpler organisms, ultimately descended from not-quite-living component parts. Eukaryotic cells are riddled with evidence of symbiotic origins, a view that has been restored to respectability by Lynn Margulis in recent years. But microbiologists arrived too late to witness the symbiotic formation of living cells.

Barricelli enlarged on the theory of cellular symbiogenesis, formulating a more general theory of "symbioorganisms," defined as any "self-reproducing structure constructed by symbiotic association of several self-reproducing entities of any kind."[11] Extending the concept beyond familiar (terrestrial) and unfamiliar (extraterrestrial) chemistries in which populations of self-reproducing molecules might develop by autocatalytic means, Barricelli applied the same logic to self-reproducing patterns of any nature in space or time--such as might be represented by a subset of the 40,960 bits of information, shifting from microsecond to microsecond within the memory of the new machine at the IAS. "The distinction between an evolution experiment performed by numbers in a computer or by nucleotides in a chemical laboratory is a rather subtle one," he observed. [12]

Barricelli saw the IAS computer as a means of introducing self-reproducing structures into an empty universe and observing the results. "The Darwinian idea that evolution takes place by random hereditary changes and selection has from the beginning been handicapped by the fact that no proper test had been found to decide whether such evolution was possible and how it would develop under controlled conditions," he reported in a review of the experiments performed at the IAS. "A test using living organisms in rapid evolution (viruses or bacteria) would have the serious drawback that the causes of adaptation or evolution would be difficult to state unequivocally, and Lamarckian or other kinds of interpretation would be difficult to exclude. However if, instead of using living organisms, one could experiment with entities which, without any doubt could evolve exclusively by 'mutations' and selection, then and only then would a successful evolution experiment give conclusive evidence; the better if the environmental factors also are under control as for example if they are limited to some sort of competition between the entities used." [13]

After forty-three years, Barricelli's experiments appear as archaic as Galileo's first attempt at a telescope--less powerful than half a pair of cheap binoculars--although Galileo's salary was doubled by the Venetian Senate in 1609 as a reward. The two Italians compensated for their primitive instruments with vision that was clear. Barricelli tailored his universe to fit within the limited storage capacity of the IAS computer's forty Williams tubes: a total of one two-hundredth of a megabyte, in the units we use today. Operating systems and programming languages did not yet exist. "People had to essentially program their problems in 'absolute'," James Pomerene explained, recalling early programming at the IAS, when every single instruction had to be hand-coded to refer to an absolute memory address. "In other words, you had to come to terms with the machine and the machine had to come to terms with you."[14]

Working directly in binary machine instruction code, Barricelli constructed a cyclical universe of 512 cells, each cell occupied by a number (or the absence of a number) encoded by 8 bits. Simple rules that Barricelli referred to as "norms" governed the propagation of numbers (or "genes"), a new generation appearing as if by metamorphosis after the execution of a certain number of cycles by the central arithmetic unit of the machine. These reproduction laws were configured "to make possible the reproduction of a gene only when other different genes are present, thus necessitating symbiosis between different genes."[15] The laws were concise, ordaining only that each number shift to a new location (in the next generation) determined by the location and value of certain genes in the current generation. Genes depended on each other for survival, and cooperation (or parasitism) was rewarded with success. A secondary level of norms (the "mutation rules") governed what to do when two or more different genes collided in one location, the character of these rules proving to have a marked effect on the evolution of the gene universe as a whole. Barricelli played God, on a very small scale.

The empty universe was inoculated with random numbers generated by drawing playing cards from a shuffled deck. Robust and self-reproducing numerical coalitions (patterns loosely interpreted as "organisms") managed to evolve. "We have created a class of numbers which are able to reproduce and to undergo hereditary changes," Barricelli announced. "The conditions for an evolution process according to the principle of Darwin's theory would appear to be present. The numbers which have the greatest survival in the environment... will survive. The other numbers will be eliminated little by little. A process of adaptation to the environmental conditions, that is, a process of Darwinian evolution, will take place."[16] Over thousands of generations, Barricelli observed a succession of "biophenomena," such as successful crossing between parent symbioorganisms and cooperative self-repair of damage when digits were removed at random from an individual organism's genes.

The experiments were plagued by problems associated with more familiar forms of life: parasites, natural disasters, and stagnation when there were no environmental challenges or surviving competitors against which organisms could exercise their ability to evolve. To control the parasites that infested the initial series of experiments in 1953, Barricelli instituted modified shift norms that prevented parasitic organisms (especially single-gened parasites) from reproducing more than once per generation, thereby closing a loophole through which they had managed to overwhelm more complex organisms and bring evolution to a halt. "Deprived of the advantage of a more rapid reproduction, the most primitive parasites can hardly compete with the more evolved and better organized species... and what in other conditions could be a dangerous one-gene parasite may in this region develop into a harmless or useful symbiotic gene." [17]

Barricelli discovered that evolutionary progress was achieved not so much through chance mutation as through sex. Gene transfers and crossing between numerical organisms were strongly associated with both adaptive and competitive success. "The majority of the new varieties which have shown the ability to expand are a result of crossing-phenomena and not of mutations, although mutations (especially injurious mutations) have been much more frequent than hereditary changes by crossing in the experiments performed."[18] Echoing the question that Samuel Butler had asked seventy years earlier in Luck, or Cunning? Barricelli concluded that "mutation and selection alone, however, proved insufficient to explain evolutionary phenomena."[19] He credited symbiogenesis with accelerating the evolutionary process and saw "sexual reproduction [as] the result of an adaptive improvement of the original ability of the genes to change host organisms and recombine."[20]

Symbiogenesis leads to parallel processing of genetic code, both within an individual multicellular organism and across the species as a whole. Given that nature allows a plenitude of processors but a limited amount of time, parallel processing allows a more efficient search for those sequences that move the individual, and the species, ahead.

Efficient search is what intelligence is all about. "Even though biologic evolution is based on random mutations, crossing and selection, it is not a blind trial-and-error process," explained Barricelli in a later retrospective of his numerical evolution work. "The hereditary material of all individuals composing a species is organized by a rigorous pattern of hereditary rules into a collective intelligence mechanism whose function is to assure maximum speed and efficiency in the solution of all sorts of new problems... and the ability to solve problems is the primary element of intelligence which is used in all intelligence tests.... Judging by the achievements in the biological world, that is quite intelligent indeed."[21]

A century after On the Origin of Species pitted Charles Darwin and Thomas Huxley against Bishop Wilberforce, there was still no room for compromise between the trial and error of Darwin's natural selection and the supernatural intelligence of a theological argument from design. Samuel Butler's discredited claims of species-level intelligence--neither the chance success of a blind watchmaker nor the predetermined plan of an all-knowing God--were reintroduced by Barricelli, who claimed to detect faint traces of this intelligence in the behavior of pure, self-reproducing numbers, just as viruses were first detected by biologists examining fluids from which they had filtered out all previously identified living forms.

"The notion that no intelligence is involved in biological evolution may prove to be as far from reality as any interpretation could be," Barricelli argued later, in 1963. "When we submit a human or any other animal for that matter to an intelligence test, it would be rather unusual to claim that the subject is unintelligent on the grounds that no intelligence is required to do the job any single neuron or synapse in its brain is doing. We are all agreed upon the fact that no intelligence is required in order to die when an individual is unable to survive or in order not to reproduce when an individual is unfit to reproduce. But to hold this as an argument against the existence of an intelligence behind the achievements in biological evolution may prove to be one of the most spectacular examples of the kind of misunderstandings which may arise before two alien forms of intelligence become aware of one another."[22] Likewise, to conclude from the failure of individual machines to act intelligently that machines are not intelligent may represent a spectacular misunderstanding of the nature of intelligence among machines.

The evolution of digital symbioorganisms took less time to happen than to describe. "Even in the very limited memory of a high speed computer a large number of symbioorganisms can arise by chance in a few seconds," Barricelli reported. "It is only a matter of minutes before all the biophenomena described can be observed."[23] The digital universe had to be delicately adjusted so that evolutionary processes were not immobilized by dead ends. Scattered among the foothills of the evolutionary fitness landscape were local maxima from which "it is impossible to change only one gene without getting weaker organisms." In a closed universe inhabited by simple organisms, the only escape to higher ground was by exchanging genes with different organisms or by local shifting of the rules. "Only replacements of at least two genes can lead from a relative maximum of fitness to another organism with greater vitality,"[24] noted Barricelli, who found that the best solution to these problems (besides the invention of sex) was to build a degree of diversity into the universe itself.

"The Princeton experiments were continued for more than 5,000 generations using universes of 512 numbers," Barricelli reported. "Moreover, the actual size of the universe was usually increased far beyond 512 numbers by running several parallel experiments with regular interchanging of several (50 to 100) consecutive numbers between two universes.... Within a few hundred generations a single primitive variety of symbioorganism invaded the whole universe. After that stage was reached no collisions leading to new mutations occurred and no evolution was possible. The universe had reached a stage of 'organized homogeneity' which would remain unchanged for any number of following generations.... In many instances a new mutation rule would lead to a complete disorganization of the whole universe, apparently due to the death by starvation of a parasite, which in this case was the last surviving organism.... Homogeneity problems were eventually overcome by using different mutation rules in different sections of each universe. Also slight modifications of the reproduction rule were used in different universes to create different types of environment... by running several parallel experiments and by exchanging segments between two universes every 200 or 500 generations it was possible to break homogeneity whenever it developed in one of the universes."[25]

As Alan Turing had blurred the distinction between intelligence and non-intelligence by means of his universal machine, so Barricelli's numerical symbioorganisms blurred the distinction between living and nonliving things. Barricelli cautioned his audience against "the temptation to attribute to the numerical symbioorganisms a little too many of the properties of living beings," and warned that "the author takes no responsibility for inferences and interpretations which are not rigorous consequences of the facts presented."[26] He stressed that although numerical symbioorganisms and known terrestrial life forms exhibited parallels in evolutionary behavior, this did not imply that numerical symbioorganisms were alive. "Are they the beginning of, or some sort of, foreign life forms? Are they only models?" he asked. "They are not models, not any more than living organisms are models. They are a particular class of self-reproducing structures already defined." As to whether they are living, "it does not make sense to ask whether symbioorganisms are living as long as no clear-cut definition of 'living' has been given."[27] A clear-cut definition of "living" remains elusive to this day.

Barricelli's numerical organisms were like tropical fish in an aquarium, confined to an ornamental fragment of a foreign ecosystem, sealed behind the glass face of a Williams tube. The perforated cards that provided the only lasting evidence of their existence were lifeless imprints, skeletons preserved for study and display. The numerical organisms consisted of genotype alone and were far, far, simpler than even the most primitive viruses found in living cells (or computer systems) today. Barricelli knew that "something more is needed to understand the formation of organs and properties with a complexity comparable to those of living organisms. No matter how many mutations occur, the numbers... will never become anything more complex than plain numbers."[28] Symbiogenesis--the forging of coalitions leading to higher levels of complexity--was the key to evolutionary success, but success in a closed, artificial universe has only fleeting meaning in our own. Translation into a more tangible phenotype (the interpretation or execution, whether by physical chemistry or other means, of the organism's genetic code) was required to establish a presence in our universe, if Barricelli's numerical symbioorganisms were to become more than laboratory curiosities, here one microsecond and gone the next.

Barricelli wondered "whether it would be possible to select symbioorganisms able to perform a specific task assigned to them. The task may be any operation permitting a measure of the performance reached by the symbioorganisms involved; for example, the task may consist in deciding the moves in a game being played against a human or against another symbioorganism."[29] In a later series of experiments (performed on an IBM 704 computer at the AEC computing laboratory at New York University in 1959 and at Brookhaven National Laboratory in 1960) Barricelli evolved a class of numerical organisms that learned to play a simple but non-trivial game called "Tac-Tix," played on a 6-by-6 board and invented by Piet Hein. The experiment was configured so as to relate game performance to reproductive success. "With present speed, it may take 10,000 generations (about 80 machine hours on the IBM 704...) to reach an average game quality higher than 1," Barricelli estimated, this being the quality expected of a rank human beginner playing for the first few times.[30] In 1963, using the large Atlas computer at Manchester University, this objective was achieved for a short time, but without further improvement, a limitation that Barricelli attributed to "the severe restrictions... concerning the number of instructions and machine time the symbioorganisms were allowed to use."[31]

In contrast to the IAS experiments, in which the numerical symbioorganisms consisted solely of genetic code, the Tac-Tix experiments led to "the formation of non-genetic numerical patterns characteristic for each symbioorganism. Such numerical patterns may present unlimited possibilities for developing structures and organs of any kind to perform the tasks for which they are designed."[32] A numerical phenotype had taken form. This phenotype was interpreted as moves in a board game, via a limited alphabet of machine instructions to which the gene sequence was mapped, just as sequences of nucleotides code for an alphabet of amino acids in translating proteins from DNA. "Perhaps the closest analogy to the protein molecule in our numeric symbioorganisms would be a subroutine which is part of the symbioorganism's game strategy program, and whose instructions, stored in the machine memory, are specified by the numbers of which the symbioorganism is composed," Barricelli explained.[33] In coding for valid instructions at the level of phenotype rather than genotype, evolutionary search is much more likely to lead to meaningful sequences, for the same reason that a meaningful sentence is far more likely to be evolved by choosing words out of a dictionary than by choosing letters out of a hat.

A purely numerical sequence could, in principle (and in time) evolve to be translated, through any number of intermediary languages, into anything else. "Given a chance to act on a set of pawns or toy bricks of some sort the symbioorganisms will 'learn' how to operate them in a way which increases their chance for survival," Barricelli explained. "This tendency to act on any thing which can have importance for survival is the key to the understanding of the formation of complex instruments and organs and the ultimate development of a whole body of somatic or non-genetic structures."[34] Once the concept of translation from genotype to phenotype is given form, Darwinian evolution picks up speed--not just the evolution of organisms, but the evolution of the genetic language and translation system that provide the flexibility and redundancy to survive in a noisy, unpredictable world. A successful interpretive language not only tolerates ambiguity, it takes advantage of it. "It is almost too easy to imagine possible uses for phenotype structures--because the specifications for an effective phenotype is so sloppy," wrote A. G. Cairns-Smith, in his Seven Clues to the Origin of Life . "A phenotype has to make life easier or less dangerous for the genes that (in part) brought it into existence. There are no rules laid down as to how this should be done."[35]

Barricelli's pronouncements had a vaguely foreboding, Butler-ish air about them, despite the disclaimer about confusing "life-like" with "alive." Samuel Butler had warned that Darwin's irresistible logic applied not only to the kingdom of nature but to the kingdom of machines; Nils Barricelli now demonstrated that it was the kingdom of numbers that held the key to that "Great First Cause" of Erasmus Darwin's "one living filament," whether encoded as strings of nucleotides or as strings of electronic bits. Barricelli saw that electronic digital computers heralded an unprecedented change of evolutionary pace, just as Butler had seen evolution quickened by the age of steam.

Barricelli's use of biological terminology to describe self-reproducing code fragments is reminiscent of early pronouncements about artificial intelligence, when machines that processed information with less intelligence than a pocket calculator were referred to as machines that think. A relic from the age of vacuum tubes and giant brains, Barricelli's IAS experiments strike the modern reader as nanve--until you stop and reflect that numerical symbioorganisms have, in less than fifty years, proliferated explosively, deeply infiltrating the workings of life on earth. With our cooperation as symbiotic hosts, self-reproducing numbers are managing (Barricelli would say learning) to exercise increasingly detailed and far-reaching control over the conditions in our universe that are helping to make life more comfortable in theirs. Are the predictions of Samuel Butler and Nils Barricelli turning out to be correct?

"Since computer time and memory still is a limiting factor, the non-genetic patterns of each numeric symbioorganism are constructed only when they are needed and are removed from the memory as soon as they have performed their task," explained Barricelli, describing the Tac-Tix-playing organisms of 1959. He might as well have been describing that class of numerical symbioorganisms --computer software--that we execute and terminate from moment to moment today. "This situation is in some respects comparable to the one which would arise among living beings if the genetic material got into the habit of creating a body or a somatic structure only when a situation arises which requires the performance of a specific task (for instance a fight with another organism), and assuming that the body would be disintegrated as soon as its objective had been fulfilled."[36]

The precursors of symbiogenesis in the von Neumann universe were order codes, conceived (in the Burks-Goldstine-von Neumann reports) before the digital matrix that was to support their existence had even taken physical form. Order codes constituted a fundamental alphabet that diversified in association with the proliferation of different hosts. In time, successful and error-free sequences of order codes formed into subroutines--the elementary units common to all programs, just as a common repertoire of nucleotides is composed into strings of DNA. Subroutines became organized into an expanding hierarchy of languages, which then influenced the computational atmosphere as pervasively as the oxygen released by early microbes influenced the subsequent course of life.

By the 1960s complex numerical symbioorganisms known as operating systems had evolved, bringing with them entire ecologies of symbionts, parasites, and coevolving hosts. The most successful operating systems, such as OS/360, MS-DOS, and UNIX, succeeded in transforming and expanding the digital universe to better propagate themselves. It took five thousand programmer-years of effort to write and debug the OS/360 code; the parasites and symbionts sprouted up overnight. There was strength in numbers. "The success of some programming systems depended on the number of machines they would run on," commented John Backus, principal author of Fortran, a language that has had a long and fruitful symbiosis with many hosts.[37] The success of the machines depended in turn on their ability to support the successful languages; those that clung to dead languages or moribund operating systems became extinct.

The computational ecology grew by leaps and bounds. In 1954, IBM's model 650 computer shipped with 6,000 lines of code; the first release of OS/360 in 1966 totaled just under 400,000 instructions, expanding to 2 million instructions by the early 1970s. The total of all system software provided by the major computer manufacturers reached 1 million lines of code by 1959, and 100 million by 1972. The amount of random access memory in use worldwide, costing an average $4.00 per byte, reached a total of 1,000 megabytes in 1966, and a total of 10,000 megabytes, at an average cost of $1.20 per byte, in 1971. Annual sales of punched cards by U. S. manufacturers exceeded 200 billion cards (or 500,000 tons) in 1967, after which the number of cards began to decline in favor of magnetic tapes and disks.[38]

In the 1970s, with the introduction of the microprocessor, a second stage of this revolution was launched. The replication of processors thousands and millions at a time led to the growth of new forms of numerical symbioorganisms, just as the advent of metazoans sparked a series of developments culminating in an explosion of new life-forms six hundred million years ago. New species of numerical symbioorganisms began to appear, reproduce, and become extinct at a rate governed by the exchange of floppy disks rather than the frequency of new generations of mainframes at IBM. Code was written, copied, combined, borrowed, and stolen among software producers as freely as in a primordial soup of living but only vaguely differentiated cells. Anyone who put together some code that could be executed as a useful process--like Dan Fylstra's Visicalc in 1979--was in for a wild ride. Businesses sprouted like mushrooms, supported by the digital mycelium underneath. Corporations came and went, but successful code lived on.

Twenty years later, fueled by an epidemic of packet-switching protocols, a particularly virulent strain of symbiotic code, the neo-Cambrian explosion entered a third and even more volatile phase. Now able to propagate at the speed of light instead of at the speed of circulating floppy disks, numerical symbioorganisms began competing not only for memory and CPU cycles within their local hosts but within a multitude of hosts at a single time. Successful code is now executed in millions of places at once, just as a successful genotype is expressed within each of an organism's many cells. The possibilities of complex, multicellular digital organisms are only beginning to be explored.

The introduction of distributed object-oriented programming languages (metalanguages, such as Java, that allow symbiogenesis to transcend the proprietary divisions between lower-level languages in use by different hosts) is enabling numerical symbioorganisms to roam, reproduce, and execute freely across the computational universe as a whole. Through the same hierarchical evolution by which order codes were organized into subroutines and subroutines into programs, objects, being midlevel conglomerations of code, will form higher-level structures distributed across the net. Object-oriented programming languages were first introduced some years ago with a big splash that turned out to be a flop. But what failed to thrive on the desktop may behave entirely differently on the Internet. Nils Barricelli, in 1985, drew a parallel between higher-level object-oriented languages and the metalanguages used in cellular communication, but he put the analogy the other way: "If humans, instead of transmitting to each other reprints and complicated explanations, developed the habit of transmitting computer programs allowing a computer-directed factory to construct the machine needed for a particular purpose, that would be the closest analogue to the communication methods among cells of various species."[39]

But aren't these analogies deeply flawed? Software is designed, engineered, and reproduced by human beings; programs are not independently self-reproducing organisms selected by an impartial environment from the random variation that drives other evolutionary processes that we characterize as alive. The analogy, however, is valid, because the analog of software in the living world is not a self-reproducing organism, but a self-replicating molecule of DNA. Self-replication and self-reproduction have often been confused. Biological organisms, even single-celled organisms, do not replicate themselves; they host the replication of genetic sequences that assist in reproducing an approximate likeness of themselves. For all but the lowest organisms, there is a lengthy, recursive sequence of nested programs to unfold. An elaborate self-extracting process restores entire directories of compressed genetic programs and reconstructs increasingly complicated levels of hardware on which the operating system runs. That most software is parasitic (or symbiotic) in its dependence on a host metabolism, rather than freely self-replicating, strengthens rather than weakens the analogies with life.

In 1953, Nils Barricelli observed a digital universe in the process of being born. There was only a fraction of a megabyte of random access memory on planet Earth, and only part of it was working at any given time. "The limited capacity of even the largest calculating machines makes it impossible to operate with more than a few thousand genes at a time instead of the thousands of billions of genes and organisms with which nature operates," he wrote in 1957. "This makes it impossible to develop anything more than extremely primitive symbioorganisms even if the most suitable laws of reproduction are chosen."[40] Not so today. Barricelli's universe has expanded explosively, providing numerical organisms with inexhaustible frontiers on which to grow.

"Given enough time in a sufficiently varied universe," predicted Nils Barricelli, "the numeric symbioorganisms might be able to improve considerably their technique in the use of evolutionary processes."[41] In later years, Barricelli continued to apply the perspective gained through his IAS experiments to the puzzle of explaining the origins and early evolution of life. "The first language and the first technology on Earth was not created by humans. It was created by primordial RNA molecules almost 4 billion years ago," he wrote in 1986. "Is there any possibility that an evolution process with the potentiality of leading to comparable results could be started in the memory of a computing machine and carried on to a stage giving fundamental information on the nature of life?" He endeavored "to obtain as much information as possible about the way in which the genetic language of the living organisms populating our planet (terrestrial life forms) originated and evolved."[42]

Barricelli viewed the genetic code "as a language used by primordial 'collector societies' of [transfer] RNA molecules... specialized in the collection of amino acids and possibly other molecular objects, as a means to organize the delivery of collected material." He drew analogies between this language and the languages used by other collector societies, such as social insects, but warned that "trying to use the ant and bee languages as an explanation of the origin of the genetic code would be a gross misunderstanding."[43] Languages are, however, the key to evolving increasingly complex, self-reproducing structures through the cooperation of simpler component parts.

According to Simen Gaure, Nils Barricelli "balanced on a thin line between being truly original and being a crank." Most cranks turn out to be cranks; a few cranks turn out to be right. "The scientific community needs a couple of Barricellis each century," added Gaure. As Barricelli's century draws to a close, the distinctions between A-life (represented by strings of electronic bits) and B-life (represented by strings of nucleotides) are being traversed by the first traces of a language that comprehends them both. Does this represent the gene's learning to manipulate the power of the bit, or does it represent the bit's learning to manipulate the power of the gene? As algae and fungi became lichen, the answer will be both. And it is the business of symbiogenesis to bring such coalitions to life.

NOTES

Notes:

1. Samuel Butler, Unconscious Memory (London: David Bogue, 1880), reprinted as vol. 6 of The Shrewsbury Edition of the Works of Samuel Butler (London: Jonathan Cape, 1924), 13, 15.

2. Butler, Unconscious Memory, 13.

3. Freeman J. Dyson, Origins of Life (Cambridge: Cambridge University Press, 1985), 8-9.

4. Freeman J. Dyson, "A Model for the Origin of Life," Journal of Molecular Evolution 18 (1982): 344.

5. Dyson, Origins of Life, 5.

6. Julian Bigelow, "Computer Development at the Institute for Advanced Study," in Nicholas Metropolis, J. Howlett, and Gian-Carlo Rota, eds., A History of Computing in the Twentieth Century (New York: Academic Press, 1980), 308.

7. Konstantin S. Merezhkovsky, Theory of two Plasms as the Basis of Symbiogenesis, A New Study on the Origin of Organisms (in Russian; Kazan, 1909); Boris M. Kozo-Polyansky, A New Principle of Biology: Essay on the Theory of Symbiogenesis (in Russian; Moscow, 1924). The theory is most accessible in English in Liya N. Khakhina's Concepts of Symbiogenesis: A Historical and Critical Study of the Research of Russian Botanists (in Russian; Moscow, 1979) translated by Stephanie Merkel and edited by Lynn Margulis and Mark McMenamin (New Haven: Yale University Press, 1992).

8. Merezhkovsky, Theory of two Plasms as the Basis of Symbiogenesis, 8; after Khakina, Concepts of Symbiogenesis, ii.

9. Edmund B. Wilson, The Cell in Development and Heredity, 3d ed. (New York: Macmillan, 1925), 738.

10. Nils A. Barricelli, "Numerical Testing of Evolution Theories: Part I," Acta Biotheoretica, 16 (1962): 94.