A few weeks ago, I wrote about a thoughtful review of Larry’s book, “What’s in your genome” by Gert Kertof. One of the most interesting aspects of Gert’s review was his confusion about the determinants of genome size. Interestingly, he presented all the pieces of the puzzle, but somehow did not see the big picture. I’ll try to put together the facts in a coherent manner in this post and present my view.

So let’s begin.

First off, I would like to make a two statements that are consistent with the present data in the literature.

1) The total amount of functional DNA between closely related species is somewhat similar. Now you can quibble about what counts as related, but all the data we have would agree that this applies to mammals, and likely applies to vertebrates. Of course the broader our group is the more variance we should see. Some gene families, like olfactory receptors, vary quite a bit between mammals, but these are exceptions to the rule. Since most vertebrate species have similar effective population size (around 10,000), and that this number dictates much of how evolution operates, we shouldn’t expect the amount of function DNA to vary too much.

2) The total amount of conserved DNA has a limit that is dictated by the mutation rate. We have an underlying rate of mutation that is likely dictated by the error rate of DNA polymerase, buffered by DNA repair enzymes. Again the mutation rate appears to be limited by the effective population size, which does not vary too much. Based on this, vertebrates, should all have similar limits to the amount of functional DNA they can contain. I would go further and state that the total amount of functional DNA in vertebrates is near the limit. From these assumption, we could say that the total amount of functionally relevant DNA in most vertebrate genomes is similar. Yet vertebrate genomes vary by two orders of magnitude. Lungfish genomes are hundreds of times bigger than puffershish genomes. As a consequence then genome size will depend on how much non-functional (aka junk) DNA it contains. This extra DNA will accumulate between genes and within introns.

But you may point out that adding more DNA to the genome has an associated cost. It takes resources to make this DNA, to package it so that it is not transcribed, and since gene expression tends to be sloppy, to accommodate a certain amount of junk RNA coming from these extra regions. There are other problems as well. Since genome size correlates with cell size, organisms with large genomes have larger cells. Large cells tend to be metabolically less efficient, as nutrient and gas exchange is limited by diffusion. Large cells also have longer cell division times, which can limit the rate of development. So extra DNA in most cases appears to be a liability. If the cost with a novel insertion of junk DNA is too high, then any new mutation that increases genome size will be selected against. However, as I explained previously, the effective population size of a given organism dictates the strength of natural selection. Large populations are subject to strong selection regimes, small populations are subject to weak selection regimes. Whether a slightly deleterious mutation is acted on by selection depends on this strength. If selection is not strong enough to act on a very slightly deleterious mutation, the mutation is effectively neutral and will fix in the population based on random genetic drift. So the key question is whether the insertion of junk DNA is deleterious enough to be selected against. The answer appears to be no.

On the most general terms, this story has merit. Eukaryotes are under weaker selection regimes than prokaryotes, and they have larger genomes (by orders of magnitude). Most eukaryote genomes are mostly non-coding, the reverse is true for prokaryotes. But then when we look within eukaryotes things start to fall apart. Gert writes:

Is it the correct explanation? … Moran writes: "The pufferfish genome is only one-eighth the size of our genome, and the lungfish genome is 40 times larger than the human genome". Can this be explained by different population sizes? Are effective population sizes known? Are there 8 times as many pufferfishes than humans?

And he is, to a certain extent, correct. To be fair, there is a general trend, where genome size and intron size somewhat correlates with effective population size in eukaryotes, but as Gert points out, the correlation is not that strong. For example, the differences between the effective population sizes of pufferfish, humans and lungfish does not explain their differences in genome sizes. The missing piece of the puzzle, which Gert mentions in a different part of the review, is likely mutation bias.

When thinking about genome size, the type of mutations that would affect genome size are insertion/deletion mutations, also known as indels. We can further divide indels into two main categories, small indels, which are caused by DNA replication errors, and large indels. Some of the large indels are caused by errors during recombination, some are due to other types of DNA damage, and critically, some are caused by the insertion of transposable elements. For completeness, the enzymes that insert transposable elements are somewhat sloppy and often insert other bits of DNA, such as reverse transcribed mRNAs, resulting in the creation of pseudogene DNA. So we can amend this last category of large indels as the insertion of DNA by transposable element activity.

Okay, so how does mutational bias affect genome size?

It turns out that when it comes to small indels, there is a natural bias for deletions. This is seen in all organisms. In unpublished data, my lab has documented this in humans as well – although I believe that this has been previously reported. So what about large indels? In terms of recombination, to be frank, I’m not sure what is the state of the current data. But in terms of transposable element activity and its byproducts, this is likely the main driver behind genome size. And this is easy to see. The majority of our junk DNA is dead transposable elements. Selection does limit the extent of transposable element activity. Afterall they are mutagenic, and individuals with lots of active transposable elements will experience excess mutational decay of their genomes. Indeed, it appears that eukaryotes constantly evolve new ways of fighting transposable elements - either by limiting their ability to copy themselves or by suppressing their unintended side effects. But despite all this, transposable elements, and the insertional bias that accompanies them, are the main cause of genome expansion. They are only slightly deleterious, and much of the time their effect on fitness is low enough that selection does not completely eliminate their spread. Eukaryotes have also evolved processes that dampen their direct and indirect negative effects (see this, and read the section on Global Solutions to Local Problems), which further decreases their deleteriousness, making them that more invisible to purging selection.

So what does this all mean? Decreases in the strength in selection allows for the spread of transposable elements. These impart mutational biases that cause the amount of junk DNA to increase. The current amount of junk in a genome likely reflects the number of active transposable elements in a given lineage and how effective the organism is at tamping down their activity. It is widely believed that genomes harbouring types of new transposable elements, whose activity is not effectively suppressed, tend to increase in size. So the recent history of a lineage likely plays a big part in how much junk they have. On top of this, we need to take into account the natural deletion bias for small indels, which pushes genome size down. Likely details about the recombination features and DNA repair pathways in each organism will also have an influence. These complicated dynamics largely dictate the size of an organism’s genome. This is why genome size can fluctuate widely between closely related organisms.

In the end I don’t think that there is an easy answer. But we do have enough pieces of the puzzle to have a general idea of what he picture will be: mutational biases play a large part in determining genome size and these are subject to a whole slew of factors, most notably the activity of transposable elements. Now why pufferfish genomes are small and lungfish genomes are big, we don’t really know. Details about the particular transposable elements that have managed to invade each genome, how susceptible a lineage is to acquiring new transposable elements, recombination effects, and byproducts of DNA repair in each lineage are likely to have major contributions to the particulars of that organism’s level of junk DNA. There are likely other factors that contribute as well. Selection may limit some of these biases, and there is some evidence for the fact that selection may be acting to keep bird genomes small, but over all eukaryotes, my guess is that subtle changes in selection efficiency and population size (as one would see across different vertebrates), play minor roles in shaping genome size.

Biological Information