The Maximum Genetic Diversity Theory of Evolution
This is the official website for Professor Shi Huang and the Maximum Genetic Diversity (MGD) Theory. This site explores the groundbreaking discovery of the Genetic Equidistance Phenomenon and its implications for human origins, including the revolutionary Out of East Asia Theory as an alternative to the traditional Out of Africa model. Our mission is to explain, document, and share the evidence for MGD — a framework for understanding biodiversity, human origins, and molecular evolution.
The Maximum Genetic Diversity (MGD) theory was inspired by an independent rediscovery of the genetic equidistance phenomenon (GEP), first described in 1963. The GEP initially led to the molecular clock hypothesis, which in turn inspired the neutral theory. The molecular clock has since been proven incorrect and discarded. The MGD theory has emerged as a more accurate interpretation of the GEP, overturning numerous phylogenetic conclusions based on the now-debunked neutral theory. It challenges traditional views on human origins by emphasizing the saturation of genetic diversity. The theory supports the Out of East Asia model of modern human origins.
Unlike the Neutral Theory, MGD argues that physiological selection and natural selection maintains genetic diversity at an optimal maximum, shaping evolutionary outcomes differently.
In geometry, a gnomon is a figure that, when combined with an existing shape, produces a new figure similar to the original. This concept was introduced in Euclid’s Elements (Book II, Definition 2) and has been generalized to describe any rule of transformation that generates patterns that are both distinct and self-similar. The golden gnomon refers to the gnomon of the golden triangle — a metaphor for the iterative, self-similar process of creation in evolution. Our goal in evolutionary biology is to uncover the “gnomon” that drives the endless generation of life’s diversity and complexity/order.
Explore the foundational concepts of the Maximum Genetic Diversity (MGD) theory:
The more complex the phenotype, the greater the restriction on the choice of molecular building blocks. Complex/ordered systems need higher precision building parts. In biology, this means there is an inverse relationship between genetic diversity and epigenetic complexity. Complexity is defined as the number of cell types.
For any system that can allow a limited level of random errors or noises in molecular building parts, such errors may be beneficial, deleterious, or neutral depending on circumstances. Limited errors at optimum level are more likely to be beneficial than deleterious because they are, after all, within tolerable levels and confer economy in construction and the strongest possible adaptive capacity or robustness to environmental challenges. In biology, substituting “errors in building blocks” with “genetic diversity” gives the equivalent concept. Axiom 2 in fact highlights the valid parts of Kimura’s and Darwin’s theories.
Matter or randomness and consciousness or cognition are opposite to each other. High randomness inside the body of an individual must result in poor mental function, and the measure of randomness is genetic diversity, as diversity originates from random mutational events. Thus, complex species with higher cognitive capacity must have lower randomness or genetic diversity.
Deleterious mutations, which impair a trait, can be mitigated by compensatory mutations. Populations with high genetic diversity are better equipped to tolerate or counteract harmful mutations, making it challenging for natural selection to maintain trait quality and eliminate detrimental variants. Therefore, reducing genetic diversity may be necessary to preserve high-quality traits and effectively remove harmful mutations.
Maximum genetic diversity tends to be higher for simpler taxa and lower for more complex taxa.
Macroevolution involves changes in organismal complexity and often results in an increase in complexity, which is mirrored by an increase in the precision of the building parts or a decrease in the allowed range of the standard deviations for the parts. Microevolution, on the other hand, is an increase in genetic diversity within the allowed standard deviation ranges without significant change in complexity. It encompasses both evolution within species and evolution from one species to another. At the saturation phase, microevolution involves turnover of alleles at the equilibrium level of genetic diversity.
The positions that are conserved in simpler taxa tend to also be conserved in more complex taxa. The positions that are free to change in more complex taxa tend to also be free to change in simpler taxa.
Genetic distance among taxa and genetic diversity within a taxon is mostly at the optimum level today after a very long evolutionary time, especially so for fast-evolving sequences. Any level higher or lower than the optimum would be negatively selected. As genetic diversity, so long as it is within the maximum level, facilitates adaptation, it would be positively selected to quickly reach the optimum level. The optimum concept here means a Pareto optimum or simply the best that can be achieved due to a balance between positive and negative selection at a particular time point under a specific level of epigenetic or organismal complexity. As time, environments, and complexity change, the optimum level of nucleotide diversity will also change.
Genetic variants are mostly functional or under balancing selection and quasi-neutral (under both positive and negative selection) rather than neutral.
Genetic distance or molecular distance between two taxa of different complexity is not contributed equally by mutations in the two lineages but rather is mostly contributed by mutations in the simpler lineage.
Non-conservation is not non-function. Fast-changing non-conserved sequences play more important roles in adaptation to the environment than the slowly changing conserved housekeeping genes.
Lower MGD means higher homozygosity, which is however very different from the higher homozygosity due to inbreeding. Lower MGD results in higher fitness traits because there are more common alleles or good alleles becoming homozygous. In contrast, inbreeding leads to lower fitness traits (inbreeding depression) due to homozygosity in minor alleles or deleterious alleles. Inbreeding shows long runs of homozygosity (ROH) but lower MGD does not.
The origin of the first life involves a reduction in the randomness of the life-building molecules, which is fundamentally similar to the reduction in genetic diversity or randomness during the step-wise increase in complexity in macroevolution. They all involve the same complexifying force or anti-randomness force.
A. Macroevolution. As species evolve from simple to complex (taxon A → taxon E), the maximum level of genetic diversity that a taxon can tolerate is reduced.
B. Microevolution. Accumulation of random mutations within the tolerated range of genetic diversity leads to speciation (from taxon A → taxon B) without large changes in epigenetic complexity or in the maximum genetic diversity that the taxon can tolerate.
The MGD theory explains the GEP as a result of maximum genetic distance. Over a long evolutionary time and for fast-evolving DNAs, the genetic distance between species has reached the maximum level. The distance between the ingroup species and a simpler outgroup taxon is mainly determined by the maximum genetic diversity of the simpler outgroup. This distance is equal to the maximum distance allowed within members of the simpler outgroup, e.g., the distance between humans and fishes equals the maximum distance between different taxa of fishes. Changes in the lineage leading to the simpler outgroup mask any changes in the lineages leading to the ingroup taxa.
There are in fact two kinds of genetic equidistance results. For long evolutionary timescale or for fast-evolving sequences, one would observe “maximum genetic equidistance”: different species are equidistant to a species of lower or equal complexity. The original result of Margoliash is maximum genetic equidistance. For short evolutionary timescale or for slow evolving sequences, one observes “linear genetic equidistance” where the molecular clock holds and the distance is still linearly related to time: when ingroup species have similar mutation rates, they would be equidistant to a lower or equal complexity outgroup.
This explanation of the genetic equidistance result by the MGD theory can also be easily illustrated by a simple thought experiment. If we can create a yeast, a fish, and a human being by using identical genes for their shared homologs and let the three organisms diverge for an infinite amount of time or about 500 million years with each organism remains phenotypical largely the same as today, a gene in yeast would have changed a lot to a maximum of, say, 50%, while its homolog in fish would have changed to a maximum of, say, 30%, and its homolog in human would have changed very little, say less than 1%. Any more changes than 50% would be lethal to yeast; any more changes than 30% would be lethal to fishes; and any more changes than 1% would be lethal to humans. The reason that a gene in yeast can change much more than in fish, which is still more than in human, is because a gene in human encounters far more functional constraints than its homolog in fish or in yeast. Thus the genetic distance between yeast and human or fish is mainly determined by the mutations in yeast. In this case, the 50% change in yeast would account for the genetic distance of 50% identity between yeast and human or between yeast and fish, as well as 50% identity in within species distance in yeast. The 30% change in fish would account for the genetic distance of 30% identity between fish and human. In contrast, the neutral theory would predict that both human and fish can also, like yeast, change up to 50% or more and would have a genetic distance of 50% identity.
Comparison for tetrapod species (T1–T3; human, bird, frog), which are known to have a most recent common ancestor (T), and another species (X; fish). Time flows from the past (left) to the present (right). Species X is the outgroup species and is equally distant to species T1–T3, the ingroup species, both in terms of time of separation and sequence difference, as measured by the identity matrix. This is illustrated by the lines linking X to T1, T2, and T3 being of equal length, indicating the same level of divergence in both time and sequence. The GEP refers to the fact that X is equally different in protein sequence to T1, T2, and T3. Evolutionary lineages leading to species T1–T3 separated from the lineage leading to X at the same point, V. Furthermore, species T1–T3 are products of an evolutionary process that has been ongoing for the same duration since their common ancestor, V. Therefore, if a given protein exhibits equal divergence when comparing the same fish protein with proteins from different tetrapods, it suggests that the rate at which differences accumulate is similar among tetrapods (T1–T3). The implicit assumption here is that the molecular distance among the species has not yet reached its maximum level, allowing us to infer the rate at which sequence differences accumulate. If, however, the distance has reached an upper limit, it would no longer be related to time or mutation rate, rendering the inference of the rate invalid.
Scientific evidence supporting MGD:
Figure. The genetic equidistance result and the maximum genetic diversity theory. A. Maximum genetic equidistance. A ten amino acid peptide is used to illustrate the evolution process. When the protein is fast evolving, the observed equidistance today would be maximum distance with a high overlap ratio. The figure shows 4 overlap positions with an overlap ratio 1. The distance of C-A is 60%, the same as that of B-A. This is a schematic representation of the original Margoliash genetic equidistance result. B. An example of maximum genetic equidistance. Alignment of human, drosophila, and yeast cytochrome C proteins. Human differs from drosophila in 22 amino acid positions. Human and drosophila are equidistant to yeast with 36 amino acid differences. There are 12 overlap positions (in red color) and the overlap ratio is 12/22 = 55%. Other mutant positions are colored in green, blue and orange. C. Linear genetic equidistance. When the protein is slowly evolving, assuming molecular clock holds, the observed equidistance today would be linear distance with a low overlap ratio. Here every substitution in any species would mean an increase in distance. The figure shows 0 overlap position with an overlap ratio 0. The distance of C-A is 50% and equals that of B-A. D. An example of linear equidistance. Human, orangutan, and mouse TXND9 gene alignment. There are 2 amino acid differences between human and orangutan, which are equidistant to mouse with 6 amino acid differences. The overlap ratio is 0/2 = 0.
The actual data validated the MGD theory. Genetic non-equidistance to humans despite equidistance in time has also been found for sister species within the teleost fish clade, the arthropod phylum, the Porifera phylum, and the fungi kingdom. In all five cases where the difference in complexity of the ingroup sister species can be intuitively inferred (octopus vs. cockle, terebratulina vs. lingula, bird vs. snake, dragonfly vs. louse, and smut vs. yeast), the more complex species always shows greater sequence similarity to humans in fast-evolving genes, fully conforming to the predictions of the MGD theory but not that of the molecular clock. It has recently been shown that octopus indeed has the lowest heterozygosity level among mollusks, and that the highest heterozygosity level among mollusks is found for the least complex solenogastres. Also, by whole genome sequencing analysis, two new world monkeys are found to be non-equidistant in nucleotide sequence to humans with the most primitive monkey marmoset to be more distant to humans than the owl monkey.
Consider three extant species of similar phenotypic complexity: A, B, and C, where A and B are sister ingroup species, and C is the outgroup. According to the GEP and MGD theory, A and B are equidistant to C, with the distance at an upper limit for fast-evolving genes. Due to differing environmental conditions, ancient species would share variants adapted to past environments, which would have been replaced in their extant lineages by alleles suited to current environments. Consequently, the genetic identity between B and C (or A and C, as BC = AC per the GEP) would be greater than that between ancient B and extant C. However, for slowly-evolving genes and for an ancient taxon from the relatively recent past, the B-C identity is expected to be comparable to the B(ancient)-C identity, as significant evolutionary changes are unlikely to accumulate over a relatively short period.
Figure. Schemes for testing the molecular clock (MC) hypothesis vs. the MGD theory by using ancient DNA. Extant species are represented by A, B, and C. Ancient taxon from the relatively recent past is represented by B (ancient).
The maximum genetic diversity theory has been instrumental in directing productive research on both evolutionary problems and important biomedical problems. The theory does not mean discarding the old assumptions but merely making them more limited in their scopes. One must carefully select those DNAs that may follow those assumptions.
The maximum genetic diversity theory should help resolve difficult historical problems such as the phylogenetic tree of life. Past methods have no concept of maximum distance and use mostly non-informative distance data for inferring phylogeny. The slow clock method based on the MGD theory makes use of only slow-evolving sequences and thus ensures the linear relationship between distance and time. Its results therefore will be more objective and independent of the variations in sequence selections and investigators. The slow clock method has re-established a primate phylogeny that humans and pongids are two separate groups, which has long been the consensus view of paleoanthropologists.
To truly neutral sequences still at the linear phase of divergence, many of the assumptions of the neutral theory such as the infinite sites model would be valid. Thus phylogenetics research can largely proceed as before except that one now has a standard to separate the neutral from the noninformative DNAs. One must now distinguish two different kinds of high sequence similarity, one due to less time of separation and the other because of common construction resulting in using similar parts (convergent evolution).
The out of Africa model of modern human origins is based on the molecular clock and the neutral theory. The high genetic diversity of Africans is interpreted to mean a deeper evolutionary time for Africans if one assumes the molecular clock. Also, the infinite site model is assumed in order to infer the derived allele status, which is critical for rooting the phylogenetic tree in Africans by using the outgroup rooting method. However, both of these assumptions are invalid according to the MGD theory and experimental data. By using informative variants and allowing recurrent and back mutations, we have built a new model of modern human origins, the Recent out of East Asia (ROE) model. The ROE model is consistent with the multiregional model in terms of autosomal evidence, which indicates that the major races have separated for ~2 million years as originally claimed by the multiregional model. However, uniparental DNA data indicates a single origin in East Asia at a more recent time.
The likely scenario is that modern humans first evolved in East Asia as marked by a new modern version of uniparental DNAs and then migrated to Europe and Africa and admixed with local less modern people. Admixture led to replacement of uniparental DNAs and autosomal DNAs so that Europeans or Africans would have modern uniparental DNAs but largely local autosomal DNAs. Ancient human DNA should be very informative in falsifying the incorrect models. Our analysis of ancient DNA samples has confirmed the ROE model. In contrast, researchers who believe in the out of Africa model have yet to report any ancient DNA evidence for their model but have instead found support for the ROE model, i.e., ancient DNA samples of 40,000–45,000 years old found in Europe and East Asia are East Asian-like rather than African-like.
Most complex traits and diseases are partly inheritable and presumably caused by polymorphic genetic variations such as SNPs. The neutral theory views most such variations to be nonfunctional and neutral and hence the study of complex traits and diseases has in the past focused on searching for a few functional variants. Although such GWAS studies have met some successes in identifying a number of variants, these variants account for only a small fraction of the total trait variation and their functional roles typically remain unclear.
The maximum genetic diversity theory predicts that complex diseases may be caused by excess genetic noise over a threshold and may serve to prevent an infinite increase in genetic diversity. Complex traits evolved as a result of suppressing genetic noises and hence should be susceptible to damage by excess noises. Also, insufficient amount of genetic diversity may hurt adaptive capacities such as immunity. The quantitative variations in a complex trait may correlate with the number of genetic variations.
Results from our efforts in testing the MGD theory have shown the expected pattern that higher minor allele contents (MAC) or noises correlate with many complex diseases. These include association of MAC with higher lung cancer incidence in mice and humans. Also, Parkinson’s disease patients have higher MAC than controls and a selected set of ~37,000 minor alleles can predict 2% of Parkinson’s patients. Other diseases that show higher MAC include schizophrenia, type 1 diabetes, type 2 diabetes, lung cancer, and Alzheimer’s.
To directly examine the self-evident antagonistic relationship between cognition or consciousness and randomness or genetic diversity, we have performed a study analyzing the genotype and phenotype data from more than 400,000 people in the UK. We calculated multiple measures of genetic diversity for each individual, and examined which traits these measures were associated with using linear regression analysis that controlled for confounding factors. Among the 17 traits examined, only educational attainment, which is highly correlated with cognition or IQ, has the most robust relationship with genetic diversity, and it is an inverse association. This association is likely to be causal, since only the brain-expressed genes, but not the brain-non-expressed genes, showed an association. This result is likely to be free from the interference of confounding factors, because the correlation of non-synonymous variants is significantly higher than that of synonymous variants or intronic variants. Consistently, animal studies have also revealed an inverse relationship between learning and memory and genetic diversity.
Almost all eukaryotes reproduce sexually, through meiosis which generates haploid gametes from a diploid cell. The purpose of sex has long remained a mystery. The common explanation is that sexual reproduction increases genetic diversity. However, asexual organisms such as bacteria generally have much higher genetic diversity than eukaryotes. There is also the suggestion that sexual reproduction can remove chromosomal and epigenetic abnormalities or other deleterious mutations. However, such abnormalities could also be removed by natural selection of abnormal phenotypes.
The MGD theory offers a straightforward solution to the mystery of sex. According to the theory, macroevolution from a simple taxon to a higher complexity taxon requires a reduction in genetic diversity (at the nucleotide level). The reduction in genetic diversity in an individual of the simple taxon is necessary for the individual to become the incipient individual of the more complex new taxon. As the overall level of genetic variation in an offspring is mostly determined by the inheritance of the combination of single nucleotide variants carried by the parents, sexual reproduction can either increase or decrease the genetic variation in an offspring relative to the parents, but asexual reproduction can only increase the genetic variation in an offspring. Thus, sexual reproduction is essential for reducing genetic diversity necessary for the macroevolution of higher complexity.
Explore all publications by Shi Huang on Google Scholar. Below is a comprehensive list of papers by Huang related to the Maximum Genetic Diversity (MGD) theory:
Discover comprehensive works by Shi Huang and others on the Maximum Genetic Diversity (MGD) theory and related topics:
Note: This section will be updated as more books are published or identified.
Explore how the Maximum Genetic Diversity (MGD) theory and the "Out of East Asia" hypothesis are covered in the media:
Learn about the background and expertise of Dr. Shi Huang, the originator of the Maximum Genetic Diversity (MGD) theory.
Contact: shihuang1@gmail.com
Note: For a detailed CV, please refer to his Google Scholar profile: https://scholar.google.com/citations?user=AD8lLJgAAAAJ
Connect with the community and explore additional content:
Frequently asked questions about the Maximum Genetic Diversity (MGD) theory and related topics:
For inquiries or collaboration opportunities, please reach out via email: shihuang1@gmail.com