Releasing the Truth

Digging for knowledge…

Tag Archives: nucleotide

Abiogenesis enigma: Protein’s origin


As you might know, proteins are one of the major “building blocks” of cells; there’s up to 10.000 different types of proteins, all manufactured inside each cell. Abiogenesis theorists  obviously supports the view that these molecules have arisen “by chance”, in a prebiotic world, billion years ago, however, to date, they have absolutely no clue about it, as we can read from this article:

“Proteins are the most complex chemicals synthesized in nature and must fold into complicated three-dimensional structures to become active. This poses a particular challenge in explaining their evolution from non-living matter. So far, efforts to understand protein evolution have focused on domains, independently folding units from which modern proteins are formed. Domains however are themselves too complex to have evolved de novo in an abiotic environment. We think that domains arose from the fusion of shorter, non-folding peptides, which evolved as cofactors supporting a primitive, RNA-based life form (the ‘RNA world’).” 1

So, why is it so complicated to explain its origin? Despite the often repeated innuendo that life and all of its components has “assuredly” originated through natural means, the clear failure of scientists to solve this puzzle can be easily explained by some truths about proteins, its synthesis, structure and so on. After that, no one can reasonably take its abiogenetic origin as logically granted. These truths also explain without shadow of doubt the intriguing fact that absolutely no single protein (even the lesser one, composed of only 8 amino acids) has ever been observed to appear anywhere in the world, outside the cells and high-tech labs, of course!

What’s a protein?

“Proteins are large biological molecules consisting of one or more chains of amino acids. Proteins perform a vast array of functions within living organisms, including catalyzing metabolic reactionsreplicating DNAresponding to stimuli, and transporting molecules from one location to another. Proteins differ from one another primarily in their sequence of amino acids, which is dictated by the nucleotide sequence of their genes, and which usually results in folding of the protein into a specific three-dimensional structure that determines its activity.

A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues. The sequence of amino acids in a protein is defined by the sequence of a gene, which is encoded in the genetic code. In general, the genetic code specifies 20 standard amino acids;” 2

Talking about amino acids, we’d like to recall another crucial problem for abiogenesis: The absence of self-occurring homochiral mixtures. As it has been told in a previous article, the laws of thermodynamics obliges the occurrence of racemic mixtures, ever:

“The left and right handed forms have identical free energy (G), so the free energy difference (ΔG) is zero. The equilibrium constant for any reaction (K) is the equilibrium ratio of the concentration of products to reactants. The relationship between these quantities at any Kelvin temperature (T) is given by the standard equation:

K = exp (–ΔG/RT)

where R is the universal gas constant (= Avogadro’s number x Boltzmann’s constant k) = 8.314 J/K.mol.

For the reaction of changing left-handed to right-handed amino acids (L → R), or the reverse (R → L), ΔG = 0, so K = 1. That is, the reaction reaches equilibrium when the concentrations of R and L are equal; that is, a racemate is produced.”

Therefore, any abiogenetic theorist has this astounding problem to deal with from the very beginning; without homochiral monomers, we can have zero possibility of a ‘magic’ protein self-assembling…


Protein synthesis


It’s quite uncanny that intelligent people with advanced knowledge on the subject might attempt to conceive hypothesis of such molecules originating spontaneously, in the wild and morbid inorganic environment, because for cells to build proteins, an intricate, complex and laborious process must take place!



First, genetic information is needed:

“Proteins are assembled from amino acids using information encoded in genes. Each protein has its own unique amino acid sequence that is specified by the nucleotide sequence of the gene encoding this protein. The genetic code is a set of three-nucleotide sets called codons and each three-nucleotide combination designates an amino acid (for example AUG (adenineuracilguanine) is the code for methionine).”


Many proteins use more that one of the 64 possible codons to be built. Moreover, that specific genetic code must be first translated, transcribed:

“Genes encoded in DNA are first transcribed into pre-messenger RNA (mRNA) by proteins such as RNA polymerase. Most organisms then process the pre-mRNA (also known as a primary transcript) using various forms of Post-transcriptional modification to form the mature mRNA, which is then used as a template for protein synthesis by the ribosome.”

Oh great, a bit complicated, isn’t it? Please, read the Wikipedia article referring to the messenger RNA, for further comprehension of what it is, its manufacturing, composition, etc; all of which adds up more complexity for the protein origin’s explanation.



The process of synthesizing a protein from an mRNA template is known as translation. The mRNA is loaded onto the ribosome and is read three nucleotides at a time by matching each codon to its base pairing anticodon located on a transfer RNA molecule, which carries the amino acid corresponding to the codon it recognizes. The enzyme aminoacyl tRNA synthetase “charges” the tRNA molecules with the correct amino acids. The growing polypeptide is often termed the nascent chain. Proteins are always biosynthesized from N-terminus to C-terminus.[6]

The size of a synthesized protein can be measured by the number of amino acids it contains and by its total molecular mass, which is normally reported in units of daltons (synonymous with atomic mass units), or the derivative unit kilodalton (kDa). Yeast proteins are on average 466 amino acids long and 53 kDa in mass.[5] The largest known proteins are the titins, a component of the muscle sarcomere, with a molecular mass of almost 3,000 kDa and a total length of almost 27,000 amino acids.[8]


Phew! How complicated! You may ask now: are we finally done? And I reply you: Huh, nope! Now that the ribosome, together with the rRNA and more than 50 other proteins, has finally finished the process, a protein is formed. However, it is always found in a  random coil shape. So what? This shape is mostly useless for its usage on organism, as we can read:

Each protein exists as an unfolded polypeptide or random coil when translated from a sequence of mRNA to a linear chain of amino acids. This polypeptide lacks any stable (long-lasting) three-dimensional structure (the left hand side of the neighbouring figure). 3

In that randomly coiled shape, the protein is highly unstable, breakable, useless for cell building, so, for proper biological use and better stability, the protein folding process must take place. This 3D-shape is known as the native state.

The correct three-dimensional structure is essential to function, although some parts of functional proteins may remain unfolded.[4] Failure to fold into native structure generally produces inactive proteins, but in some instances misfolded proteins have modified or toxic functionality. Several neurodegenerative and other diseases are believed to result from the accumulation of amyloid fibrils formed by misfolded proteins.[5] Many allergies are caused by incorrect folding of some proteins, for the immune system does not produce antibodies for certain protein structures.[6]

Another importance of the protein folding is:


Minimizing the number of hydrophobic side-chains exposed to water is an important driving force behind the folding process.[9] Formation of intramolecular hydrogen bonds provides another important contribution to protein stability.[10] 


And how does the folding occurs?



The amino-acid sequence of a protein determines its native conformation.[7] A protein molecule folds spontaneously during or after biosynthesis. While these macromolecules may be regarded as “folding themselves“, the process also depends on the solvent (water or lipid bilayer),[8] the concentration of salts, the pH, the temperature, the possible presence of cofactors and of molecular chaperones.

The process of folding often begins co-translationally, so that the N-terminus of the protein begins to fold while the C-terminal portion of the protein is still beingsynthesized by the ribosome. Specialized proteins called chaperones assist in the folding of other proteins.

Although most globular proteins are able to assume their native state unassisted, chaperone-assisted folding is often necessary in the crowded intracellular environment to prevent aggregation; chaperones are also used to prevent misfolding and aggregation that may occur as a consequence of exposure to heat or other changes in the cellular environment.

There are two models of protein folding that are currently being confirmed: The first: The diffusion collision model, in which a nucleus is formed, then the secondary structure is formed, and finally these secondary structures are collided together and pack tightly together. The second: The nucleation-condensation model, in which the secondary and tertiary structures of the protein are made at the same time. Recent studies have shown that some proteins show characteristics of both of these folding models.

The essential fact of folding, however, remains that the amino acid sequence of each protein contains the information that specifies both the native structure and the pathway to attain that state. Folding is a spontaneous process independent of energy inputs from nucleoside triphosphates. The passage of the folded state is mainly guided by hydrophobic interactions, formation of intramolecular hydrogen bonds, and van der Waals forces, and it is opposed by conformational entropy.

Only after the folding process, we have an useful, stable protein, with a properly designed shape with its up to four layers, so that the molecule can perform its biological function.

But, remember, many conditions and external factors can destroy proteins, such as hydrolysis (it’s a slow, but ceaseless process, because proteins are metastable, hydrophobic) and others:

Under some conditions proteins will not fold into their biochemically functional forms. Temperatures above or below the range that cells tend to live in will cause thermally unstableproteins to unfold or “denature” (this is why boiling makes an egg white turn opaque). High concentrations of solutes, extremes of pH, mechanical forces, and the presence of chemical denaturants can do the same.

A fully denatured protein lacks both tertiary and secondary structurel. Under certain conditions some proteins can refold; however, in many cases, denaturation is irreversible.[15] Cells sometimes protect their proteins against the denaturing influence of heat with enzymes known as chaperones or heat shock proteins, which assist other proteins both in folding and in remaining folded. Some proteins never fold in cells at all except with the assistance of chaperone molecules, which either isolate individual proteins so that their folding is not interrupted by interactions with other proteins or help to unfold misfolded proteins, giving them a second chance to refold properly. This function is crucial to prevent the risk of precipitation into insoluble amorphous aggregates.


For a further an in-depth study about different factors capable of disrupting proteins, read the following articles: (a series of 6 parts)


To conclude our observation, it’s impossible not to be sceptic of any theoretic proposition that claims self-caused origin of proteins, because it turns out that science unveiled tons of facts that easily prevent any possibility of such proposed scenario:


-Absence of homochiral monomers forming in the environment;

-Necessity of genetic specific information;

-Need for an highly controlled ambient, with proper Ph level, temperature, absence of mechanical forces that may easily damage, disrupt the protein, toxins, etc; 

-Need for specific methods to protect the protein against hydrolysis, oxidation;

-Necessity of having 50 other types of protein already manufactured to help on the protein synthesis;


The question raises: how in the world could such a specific set of conditions be found in a prebiotic Earth? Such condition can only be barely found in a first-class laboratory, driven by qualified and experienced scientists!

You might as well enjoy watching this short video talking about protein synthesis:



1. <>

2. <>

3. <>


DNA- fascinating video

Deoxyribonucleic acid (DNA) is a molecule that encodes the genetic instructions used in the development and functioning of all known living organisms and many viruses. Along with RNA and proteins, DNA is one of the three major macromolecules essential for all known forms of life. Most DNA molecules are double-stranded helices, consisting of two long biopolymers of simpler units called nucleotides—each nucleotide is composed of a nucleobase (guanine, adenine, thymine, and cytosine), recorded using the letters G, A, T, and C, as well as a backbone made of alternating sugars (deoxyribose) and phosphate groups (related to phosphoric acid), with the nucleobases (G, A, T, C) attached to the sugars. DNA is well-suited for biological information storage, since the DNA backbone is resistant to cleavage and the double-stranded structure provides the molecule with a built-in duplicate of the encoded information. The following video shows animations of processes which occur within each one of the 10 trillion cells in our body!


In 1927, Nikolai Koltsov proposed that inherited traits would be inherited via a “giant hereditary molecule” made up of “two mirror strands that would replicate in a semi-conservative fashion using each strand as a template”. In 1928, Frederick Griffith discovered that traits of the “smooth” form of Pneumococcus could be transferred to the “rough” form of the same bacteria by mixing killed “smooth” bacteria with the live “rough” form. This system provided the first clear suggestion that DNA carries genetic information—the Avery–MacLeod–McCarty experiment—when Oswald Avery, along with coworkers Colin MacLeod and Maclyn McCarty, identified DNA as the transforming principle in 1943. DNA’s role in heredity was confirmed in 1952, when Alfred Hershey and Martha Chase in the Hershey–Chase experiment showed that DNA is the genetic material of the T2 phage.

In 1953, James Watson and Francis Crick suggested what is now accepted as the first correct double-helix model of DNA structure in the journal Nature. Their double-helix, molecular model of DNA was then based on a single X-ray diffraction image (labeled as “Photo 51”) taken by Rosalind Franklin and Raymond Gosling in May 1952, as well as the information that the DNA bases are paired — also obtained through private communications from Erwin Chargaff in the previous years. Chargaff’s rules played a very important role in establishing double-helix configurations for B-DNA as well as A-DNA.

Experimental evidence supporting the Watson and Crick model was published in a series of five articles in the same issue of Nature. Of these, Franklin and Gosling’s paper was the first publication of their own X-ray diffraction data and original analysis method that partially supported the Watson and Crick model; this issue also contained an article on DNA structure by Maurice Wilkins and two of his colleagues, whose analysis and in vivo B-DNA X-ray patterns also supported the presence in vivo of the double-helical DNA configurations as proposed by Crick and Watson for their double-helix molecular model of DNA in the previous two pages of Nature.

Repairing system

It’s interesting to notice that the DNA molecule is highly reactive, thus, very unstable… On a good day about one million bases in the DNA in a human cell are damaged. These lesions are caused by a combination of normal chemical activity within the cell and exposure to radiation and toxins coming from environmental sources including cigarette smoke, grilled foods and industrial wastes. So, the organisms have a handful of repairing mechanisms, as said in a recent Science Daily article:

A number of environmental toxins and chemotherapy drugs are alkylation agents that can attack DNA.

When a DNA base becomes alkylated, it forms a lesion that distorts the shape of the molecule enough to prevent successful replication. If the lesion occurs within a gene, the gene may stop functioning. To make matters worse, there are dozens of different types of alkylated DNA bases, each of which has a different effect on replication.

One method to repair such damage that all organisms have evolved is called base excision repair. In BER, special enzymes known as DNA glycosylases travel down the DNA molecule scanning for these lesions. When they encounter one, they break the base pair bond and flip the deformed base out of the DNA double helix. The enzyme contains a specially shaped pocket that holds the deformed base in place while detaching it without damaging the backbone. This leaves a gap (called an “abasic site”) in the DNA that is repaired by another set of enzymes.

Human cells contain a single glycosylase, named AAG, that repairs alkylated bases. It is specialized to detect and delete “ethenoadenine” bases, which have been deformed by combining with highly reactive, oxidized lipids in the body. However, AAG also handles many other forms of akylation damage. Many bacteria, however, have several types of glycosylases that handle different types of damage.

“It’s hard to figure out how glycosylases recognize different types of alkylation damage from studying AAG since it recognizes so many,” says Eichman. “So we have been studying bacterial glycosylases to get additional insights into the detection and repair process.”

That is how they discovered the bacterial glycosylase AlkD with its unique detection and deletion scheme. All the known glycosylases work in basically the same fashion: They flip out the deformed base and hold it in a special pocket while they excise it. AlkD, by contrast, forces both the deformed base and the base it is paired with to flip to the outside of the double helix. This appears to work because the enzyme only operates on deformed bases that have picked up an excess positive charge, making these bases very unstable. If left alone, the deformed base will detach spontaneously. But AlkD speeds up the process by about 100 times. Eichman speculates that the enzyme might also remain at the location and attract additional repair enzymes to the site.

AlkD has a molecular structure that is considerably different from that of other known DNA-binding proteins or enzymes. However, its structure may be similar to that of another class of enzymes called DNA-dependent kinases. These are very large molecules that possess a small active site that plays a role in regulating the cells’ response to DNA damage. AlkD uses several rod-like helical structures called HEAT repeats to grab hold of DNA. Similar structures have been found in the portion of DNA-dependent kinases with no known function, raising the possibility that they play an additional, unrecognized role in DNA repair.

It’s impossible to conceive that such unstable, complex molecule could have originated all alone, by chance and lasted any long in a pre-biotic environment. In a primordial Earth, with a free-oxygen atmosphere, it turns out that it would have no ozon layer, or a very thin one. Thus, the UV light from the Sun would freely bombards the Earth without filtering; the most damaging UV light types would face no barrier, and it’s a fact that UV light damages, degrades polymers!

Many natural and synthetic polymers are attacked by ultra-violet radiation and products made using these materials may crack or disintegrate (if they’re not UV-stable). The problem is known as UV degradation, and is a common problem in products exposed to sunlight. Continuous exposure is a more serious problem than intermittent exposure, since attack is dependent on the extent and degree of exposure.

Effect of UV exposure on polypropylene rope (the left one is damaged, the right one is a new rope)

Add to it the oxidation problem, hydrolysis, etc… Any natural origin of RNA/DNA is inconceivable!

Human–chimp DNA similarity re-evaluated

A review of the common claim that the human and chimpanzee (chimp) genomes are nearly identical was found to be highly questionable solely by an analysis of the methodology and data outlined in an assortment of key research publications. Reported high DNA sequence similarity estimates are primarily based on prescreened biological samples and/or data. Data too dissimilar to be conveniently aligned was typically omitted, masked and/or not reported. Furthermore, gap data from final alignments was also often discarded, further inflating final similarity estimates. It is these highly selective data-omission processes, driven by Darwinian dogma, that produce the commonly touted 98% similarity figure for human–chimp DNA comparisons. Based on the analysis of data provided in various publications, including the often cited 2005 chimpanzee genome report, it is safe to conclude that human–chimp genome similarity is not more than ~87% identical, and possibly not higher than 81%.



A common claim is that the DNA of chimpanzees (Pan troglodytes) and humans (Homo sapiens) are about 98% similar. This oversimplified and often-touted estimate can actually involve two completely separate concepts. 1) Gene content (the comparative counts of similar types of coding sequences present or absent between different species) and 2) similarities between the actual base pairs of DNA sequences in alignments. For the most part, the modern similarity paradigm refers to DNA sequence alignment research. Biological sequence data often goes through several levels of prescreening, filtering and selection before being summarized and discussed.

One of the major problems with overall research in the field of comparative genetics, as we will show, is that in most studies there is a great deal of preselection applied to the available biological samples and data before the final analysis is undertaken. Only the most promising data from a larger pool is typically extracted for a final analysis.


Early human–chimp studies with reassociation kinetics

The initial estimates of high human-chimp DNA similarity came from a field of study called reassociation kinetics. These initial reports fueled early claims by such popular evolutionary luminaries as Oxford Professor Richard Dawkins, who stated “Chimpanzees and we share more than 99 per cent of our genes.” At the time, this statement was presumptuous, because gene numbers for humans and chimps were not known. The initial drafts of the human and chimp genomes were not announced until 2001 and 2005, respectively.

The supposed gene data Dawkins referred to in 1986 was an indirect estimate based on the reassociation kinetics of mixed human and chimp DNA, not clearly defined genes.1 In reassociation kinetics, heat and/or chemistry are used to separate double-stranded DNA into single strands. When the DNA is allowed to reassociate in a controlled manner, it can be fractionated using various protocols. The slower the reassociation, the more complex and gene-dense the DNA is thought to be. In general, three types of DNA can be recovered: high-copy (highly repetitive, gene poor), low-copy (moderately repetitive, low levels of genes), and single copy (gene-rich). For comparative studies, the single copy fraction of DNA is collected from two species, mixed together, disassociated and allowed to reassociate so that human and chimp DNA can recombine. The level of complementary base matching between strands can be indirectly measured by a variety of methods that indirectly measure rates/levels of reassociation.

The caveat is that only the single-copy fractions of the human and chimp genomes were utilized to obtain early estimates of similarity. Scientists focused on the single-copy fraction because of the high gene content. However, many genes are located in the other genome fractions and were thus left out of the analysis. Another problem is that virtually the entire genome is now known to be functional in some aspect and the non-coding regions have been shown to provide many critical control features and nucleotide templates.


Genomics research—affirming the myth

Subsequent research using sequenced DNA built upon the early high similarity dogma established by reassociation kinetics. In a companion to this paper, we discuss the possibility that an unspoken dogma-based ‘Gold Standard’ regarding the human–chimp similarity issue was established during the initial studies involving reassociation kinetics.

A review paper written by creationist Todd Wood on biological similarity between human and chimp highlighted and supposedly confirmed evolutionary similarity claims, yet ignored the important bioinformatic issues surrounding widespread data omission and selective analyses. Wood’s review did little to support creationist claims that humans were uniquely created in the image of God rather than being a few DNA base pairs from a chimp. Therefore, our focus on DNA sequence similarity will address the same publications listed in Wood’s review in addition to several more recent papers.


Total genomic bases analyzed

Aligned bases

Reported DNA identity

Actual DNA identity*

Britten, 2002




~ 87%

Ebersberger et al., 2002




< 65%

Liu et al., 2003

10,600,000 (total for human, chimp, baboon, and marmoset)

4,968,069 (human–chimp)

98.9% no indels


Wildman et al., 2003

~90,000 (exons from 97 genes)




Chimp. Chrom. 22 Consort.



98.5% excluding indels

80–85% including indels

Nielson et al., 2005



99.4% selected gene regions


Chimp. Seq. Consort. 2005

Whole genome (5X redundant coverage)

2.4 Gb




* Based on the amount of omitted DNA sequence in the alignments
** Compared to data from The International Human Genome Sequencing Consortium (2004)—((.9577 x 2.4 Gb) / 2.85 Gb) x 100
? Cannot calculate actual percent identity because data was not provided.

Roy Britten, one of the early pioneers in DNA reassociation kinetics, compared the genomic sequence from five chimp large-insert DNA clones (Bacterial Artificial Chromosomes, or BACs) to human genomic sequence using an atypical fortran-based computer program. These five chimp BAC sequences were chosen because they were the only ones then available.Researchers typically choose initial seed BACs for genome sequencing because of their single-copy DNA content, which makes them easier to assemble and compare to other species. The total length of the DNA sequence for all 5 BACs was 846,016 bases. However, only 92% of this was alignable to human DNA, thus the final statistics reported on only 779,132 bases. To his credit, Britten included the alignment data on insertions and deletions (indels) and reported a human–chimp similarity of ~95%. However, a more realistic figure would include the complete high-quality sequence of all five BACs, which is just as legitimate as the indels within the alignments; giving a final DNA similarity of 87%


Figure 1. Illustration showing the caveats of a hypothetical pairwise alignment between homologous sequences from two different species Figure 1.

Another notable study published by Ebersberger et al. the same year as Britten’s paper utilized chimp genome sequence obtained from randomly sheared, size-selected fragments in the 300 to 600 base range.These DNA sequences were aligned to an early version of the human genome assembly using the BLAT (Blast-Like Alignment Tool) algorithm. Researchers selected two-thirds of the total sequence for more detailed analyses. One-third of the chimp sequence would not align to the human genome and was discarded. The methods section in the paper19 describes how the subset of prescreened data was further filtered to obtain only the very best alignments. The resulting data was then subjected to a variety of comparative analyses that, for all practical purposes, are completely meaningless given the extremely high level of selection, data masking, and filtering applied. Not surprisingly, they report only a 1.24% difference in only highly similar aligned areas between human and chimp. A more realistic sequence similarity  is not more than 65% .

Shortly after these initial human–chimp comparison papers, a disturbing trend quickly emerged. This trend involved only reporting final alignment results and omitting the specific details of how such data was filtered, masked and selected. Key data to allow critical readers of human–chimp similarity papers to calculate a more accurate overall similarity began to be consistently omitted. For example, Liu et al. reported on the alignment of human genomic sequence with chimp, baboon, and marmoset. Important information concerning the starting set of sequences and specific data for the alignments was omitted. They state only that they used a total amount of 10.6 Mb of sequence for all species combined. Their similarity estimate on the final alignment, omitting indels and non-aligned areas, was 98.9%. Including indels, we derived a value of 95.6% for the alignments, similar to Britten’s research. Important data outside the aligned areas was impossible to evaluate because of the omitted sequence data.

Another disturbing trend is that only highly conserved protein-coding sequence (exons) are often utilized to report genome-wide similarity. We now know that non protein-coding sequences, which comprise greater than 95% of the genome, are critical to all aspects of genetics and genome function. Typical of the trend to only align exonic sequences, Wildman, et al. reported on a study that compared only human and chimp protein coding regions of 97 exon fragments for a total of 90,000 bases.

In 2004, Watanabe et al. used a variety of BAC libraries to select clones for DNA sequencing representing chimp chromosome 22. The sequence was then compared to its similar human homolog. The caveat is that the individual chimp BAC clones were only selected if they each contained 6 to 10 human DNA markers. Unfortunately, critical overall DNA alignment statistics are not given in the paper or in the supplemental information. The authors state a nucleotide substitution rate of 1.44% in aligned areas, but do not give similarity estimates to include indels. While indels are omitted from the alignment similarity, the authors indicate that there were 82,000 of them and provide a histogram that graphically shows the size distribution based on binned data groupings. Oddly, no data for average indel size or total indel length was provided. Likewise, the number of sequence gaps were given, but nothing about cummulative gap size.  Based on an estimate using the limited graphical data provided regarding base substitutions and indels, an estimate of about 80 to 85% overall similarity can be inferred.

One of the most ambiguous of all human–chimp studies was published by Nielson et al. In keeping with the established obfuscational trend, only highly conserved exons were used and no data were given to allow one to calculate any type of real overall similarity. Of the total starting number of gene sequences in the analysis (20,361) the researchers decided to throw out 33% (6,630) in an ambiguously stated “very conservative quality control”. In other words, one third of the initial chimp data did not align to human, so it got tossed out. In fact, no hard data was actually given.


Chimpanzee rough draft genome assembly data—81% similarity?


The major milestone publication regarding human–chimp genome comparisons was the 2005 Nature paper from the International Chimpanzee Genome Sequencing Consortium.4 Unfortunately, this paper followed the previously established trend where most of the comparative data was given in a highly selective and obfuscated format and detailed information about the alignments was absent. The majority of the paper was primarily concerned with a variety of hypothetical evolutionary analyses for various divergence rates and selective forces. Hence, the critical issue of overall similarity was carefully avoided.

However, based on the numbers given in the chimp genome paper, one can determine a rough overall genome similarity between humans and chimp by including published concurrent information from the human genome project. In regards to the overall alignment, the authors state, “Best reciprocal nucleotide-level alignments of the chimpanzee and human genomes cover ~2.4 gigabases (Gb) of high-quality sequence”. At this time, the human euchromatic assembly was estimated to be 99% complete at 2.85 Gb and had an error rate of 1 in 100,000 bases. The chimp genome authors state, “The indel differences between the genomes thus total ~90 Mb. This difference corresponds to ~3% of both genomes and dwarfs the 1.23% difference resulting from nucleotide substitutions.”

In summary, only 2.3 Gb of chimp sequence aligned onto the highly accurate and complete human genome (2.85 Gb) an operation that included the masking of low complexity sequences. For the chimp sequence that aligned, the data for substitutions and indels indicates 95.8% similarity, a biased figure which excludes the masked regions. Using these numbers, an overall estimate of chimp compared to human DNA produces a conservative estimate of genome-wide similarity at 80.6%.


The paradigm starts to crumble


A study by Ebersberger et al., in which a large pool of human, chimp, orangutan, rhesus and gorilla genomic sequences was used in constructing phylogenies (multiple alignments analyzed in evolutionary tree format). The original pool of DNA sequences actually went through several levels of selection to preanalyze, trim and filter them for optimal alignment. First, a set of 30,112 sequences were selected that shared homology (overlapping similarity) between the five species. These sequences were aligned and only those which produced ≥ 300 base alignments were retained for another series of alignments and only the sequences that produced superior statistical probabilities > 95% were used in the final analysis. This filtering process removed over 22% of already-known, pre-selected homologous sequence. Despite all of this data filtering designed to produce the most favourable evolutionary alignment and trees, the results did not show any clear path of ancestry for humans with chimps or any of the great apes. What emerged was a true mosaic of unique human and primate DNA sequences; discounting any clear path of common ancestry. Perhaps the best summary of the research can be found in the author’s own words.

“For about 23% of our genome, we share no immediate genetic ancestry with our closest living relative, the chimpanzee.

“Thus, in two-thirds of the cases a genealogy results in which humans and chimpanzees are not each other’s closest genetic relatives. The corresponding genealogies are incongruent with the species tree. In accordance with the experimental evidences, this implies that there is no such thing as a unique evolutionary history of the human genome. Rather, it resembles a patchwork of individual regions following their own genealogy.”


The Y-chromosome

One of the most intriguing studies is the Y-chromosome comparison between humans and chimps. In this study, the male-specific region (MSY), was compared between human and chimp. The result was 25,800,000 bases of highly accurate chimp Y-chromosome sequence distributed among eight contiguous segments. When compared to the human Y-chromosome, the differences were enormous. The authors state, “About half of the chimpanzee ampliconic sequence has no homologous, alignable counterpart in the human MSY, and vice versa.”

The ampliconic sequence contains ornate repeat units (called palindromes) that read the same forwards as they do backwards. Dispersed within these palindromes are families of genes that are expressed primarily in the male testes. Not only did 50% of this type of sequence fail to align between human and chimp in the Y-chromosome, humans had over twice as many total genes (60 in humans vs 25 in chimp). There were also three complete categories of genes (gene families) found in humans that were not even present in chimps. Related to this large difference in gene content, the authors note, “Despite the elaborate structure of the chimpanzee MSY, its gene repertoire is considerably smaller and simpler than that of the human MSY,” and “the chimpanzee MSY contains only two-thirds as many distinct genes or gene families as the human MSY, and only half as many protein-coding transcription units.”

A comparison of the so-called X-degenerate gene regions between humans and chimps also showed distinct organizational and locational differences in addition to differences in gene content. In fact, humans have three types (classes) of X-degenerate genes that are not even present in chimps.

Besides the large differences in gene content between human and chimp MSY regions, the overall structural differences were enormous. Take note of some of the additional comments from the authors:

“Moreover, the MSY sequences retained in both lineages have been extraordinarily subject to rearrangement: whole chromosome dot-plot comparison of chimpanzee and human MSYs shows marked differences in gross structure.

“The chimpanzee ampliconic regions are particularly massive (44% larger than in human) and architecturally ornate, with 19 palindromes (compared to eight in human) and elaborate mirroring of nucleotide sequences between the short and long arms of the chromosome, a feature not found in the human MSY.

“Of the 19 chimpanzee palindromes, only 7 are also found in the human MSY; the other 12 are chimpanzee-specific. Unlike the human MSY, nearly all of the chimpanzee MSY palindromes exist in multiple copies.”

The large differences in both structural arrangements of unique DNA features and gene content described in the Y-chromosome study, is particularly damaging to human-chimp DNA similarity mythos and the dogma of primate evolution. In fact, the authors shockingly note that given “ … 6 million years of separation, the difference in MSY gene content in chimpanzee and human is more comparable to the difference in autosomal gene content in chicken and human, at 310 million years of separation.”

A large study of genetic variation in the human genome showed that the Y-chromosome was exceptionally stable and had five times less genetic variation than the autosomes. This data makes perfect sense because the Y-chromosome has no similar homolog in the genome and undergoes very little recombination with the X-chromosome during meiosis. Given this lack of recombination and sequence diversity on the Y-chromosome, the primate evolution model encounters a serious problem, because the human and chimp Y-chromosomes should be considerably more similar to each other.

Some cases of high similarity may be due to contamination

Another factor to consider in the human-chimp similarity debate is that some cases of high sequence similarity may be due to contamination. Not only is the chimpanzee genome assembly still largely based on the human genomic framework, it also now appears that the wide-spread contamination of non-primate databases with human DNA is a serious problem and can run as high as 10% in some cases.Human contamination results from the process of cloning DNA fragments in the lab for sequencing where airborne human cells come from coughing, sneezing, and physical contact with contaminated fingers.

On a recent website at the Ensembl database (joint bioinformatics project between EMBL-EBI and the Wellcome Trust Sanger Institute), a webpage titled ‘Chimp Genebuild’ provides the following information as to one of the ways in which the human genome is used as a guide to assemble and annotate the chimp genome:

“Owing to the small number of proteins (many of which aligned in the same location) an additional layer of gene structures was added by projection of human genes. The high-quality annotation of the human genome and the high degree of similarity between the human and chimpanzee genomes enables us to identify genes in chimpanzee by transfer of human genes to the corresponding location in chimp.

“The protein-coding transcripts of the human gene structures are projected through the WGA [whole genome assembly] onto the chromosomes in the chimp genome. Small insertions/deletions that disrupt the reading-frame of the resultant transcripts are corrected for by inserting ‘frame-shift’ introns into the structure.”



Em Português clique aqui


%d bloggers like this: