A novel technique for surveying the genetic makeup of the complex soil ecosystem is paving the way for a wide range of new applications, including better greenhouse gas management.
There is an invisible world all around us. Air molecules mix and mingle. Radio waves course silently by. And in the soil under our feet, countless trillions of microscopic organisms perform important ecosystem services. They break down plant matter, protect crops from disease, and filter the groundwater. Yet less than 1 percent of them have been studied in a laboratory. Most of their activities—and most of the organisms themselves—remain unnamed and unexplored by science.
Our ignorance of this "secret life of dirt" is no small omission. For example, as carbon dioxide (CO2) levels rise in the atmosphere, plants absorb this greenhouse gas during photosynthesis and use its atoms to construct broader leaves, taller stalks, and bigger roots. When plants shed their leaves, recycle their roots (which happens yearly), or die, the microorganisms (microbes) in the soil go to work on them, resynthesizing and freeing some of the CO2 the plants previously absorbed. How much CO2 is returned to the atmosphere by these microbes, and how much remains in plant matter in the soil? Embarrassingly, no one knows. No one knows for certain what role the soil itself plays in the atmospheric changes that cause global warming. But Los Alamos scientist Cheryl Kuske, together with her colleague John Dunbar and a talented lineup of postdoctoral researchers, technicians, and students, is changing all that with a sweeping approach to genetic study called metagenomics.
Clostridium bacteria like these are abundant in soils, and many species are known to degrade plant matter, which affects how much carbon is stored in the soil and how much is returned to the atmosphere as CO2.
Traditional microbiology lab work entails studying the genetic information from a single organism grown in a culture. Metagenomics, on the other hand, works directly with the complex mixture of DNA found in samples taken from the natural environment—in a soil sample, image for instance. While a genome is the complete set of genetic information for a particular organism (for example, the human genome), a metagenome is the complete set of genetic information for an entire community of organisms, and metagenomics describes the gathering and processing of this rich swath of biological information. It is within such community-level biological information that Kuske finds answers about CO2 processing and other soil-related environmental issues.
A soil metagenome is a diverse and complicated thing. On average, one gram of soil contains at least a billion microorganisms. Those billion organisms typically span thousands or tens of thousands of different species (Dunbar and other colleagues have shown that this number can even range up to millions of different species), representing all three major biological domains of life. Worldwide, microbes from two of these domains, archaea and bacteria, contain about as much carbon in their cells as the entire plant kingdom, and about ten times more nitrogen- and phosphorus-based nutrients. The third domain, eukarya, is made up of plants, animals, and others, including fungi—microbes that are essential for plant survival and supply key products for human use, such as antibiotics and yeast. In other words, the underground microbial world, while largely unknown, is far from unimportant. And metagenomics is only just beginning to bring the vast majority of these microbes and their activities within range of discovery—and perhaps appropriation for human benefit.
Denizens of the dirt: Petri dishes (above) harbor various fungi that Kuske isolates in order to study new species involved in degrading plant matter. Their genetic information is added to a database that Kuske compares against gene sequences found in various soils. Penicillium (right) is an example of a fungus that's commonly found in soils (and is the originator of the antibiotic penicillin). credit: Dennis Kunkel microscopy Inc.
"Although these organisms are microscopic, they collectively impact human activities and the Earth's processes at regional and global scales," Kuske says. "And most of their roles are beneficial." Indeed, the lure of such beneficial, collective impacts is part of what makes soil metagenomics so promising. Emerging technologies derived from microbes hold the potential for revolutionary advances in human health, industry, biofuel, greenhouse gas absorption, and even the cleanup of large-scale environmental contamination. With all this on the line, scouring every microbe's every gene by traditional methods might seem inadequate. "We have all these really pressing environmental issues, and we don't have time to do this step by step," Kuske asserts. "We have to jump."
In Kuske's research, soil samples are collected and brought to Los Alamos from a variety of carefully controlled study sites around the country (see "Open-Air Laboratories," facing page). Her team extracts DNA from all the cells in a small vial of soil by bursting the cell walls and membranes and capturing the DNA as it slithers out. Then an advanced, high-throughput DNA sequencing apparatus "reads" the DNA and outputs the genetic sequence as an ordered list of four letters, like "ACCGTGTCAG," in which A, C, G, and T represent bases (as opposed to acids) found in DNA. Since DNA is double-stranded, and each base is naturally paired with a complementary base, scientists refer to this small unit of genetic information as a "base pair."
In general, living cells function by using the sequences of bases in their DNA as a blueprint for assembling proteins. A particularly important type of protein is the enzyme, because only when the right combination of enzymes is present and active do cells carry out various functions, like metabolism or replication. So DNA specifies particular enzymes, and the enzymes enable particular functions; the DNA sequence that encodes for one enzyme is called a gene.
But even with state-of-the-art sequencing technology, there are experimental limitations to how much genetic information you can gather about the microorganisms found in a complex mixture like soil. You don't get entire genomes for each microbe in the sample. You get DNA fragments because the DNA itself is brittle and because the sequencing machines can't read sequences beyond a few hundred base pairs—fewer than the thousand or so base pairs in an average bacterial gene. In a typical soil experiment, the sequencing machine might output hundreds of thousands of these different fragments, but there is no obvious way to know which fragments do what for which microbe, since the overwhelming majority of the DNA sequences found in soil communities are not yet known to science. (And a small fraction of the soil DNA isn't microbial at all, but rather comes from bits of old plant matter, dead insects, and so on.)
An electron microscope reveals DNA emerging from a bacterium whose membrane has been ruptured.
The situation is akin to waking up in a large library building in a foreign country and being asked to describe the content of everything in the entire library. There's too much information, and you don't know the language very well. In fact, the metagenomics problem is thornier than that: Since researchers have only tiny DNA fragments to work from, it's more like a library with only a few random pages from each book! Still, you might recognize a few words or concepts that come up often (analogous to recognizing base pair sequences for known genes), and you might recognize a page from a certain type of book, like a dictionary or a travel guide (analogous to recognizing a sequence from a known organism). Now what you need is some kind of procedure for understanding every book in the library—what to look for, which paragraphs to scan, when to move on, and so forth.
There are essentially two approaches to pursue. You could take a targeted approach, searching every page for occurrences of the words or concepts you already know. This might give you a sense of the prevalence of that concept in this foreign culture or allow you to track variations on its theme. Similarly, scientists can focus on a known type of gene—one that helps fungi digest dead plant matter, perhaps—and learn about its prevalence or variety within the soil community. This is called "targeted metagenomics." In the other approach, you could just skim everything in the library, looking for familiar patterns in the language. When biologists take this approach, directly sequencing whatever DNA is present in the hope of finding familiar sequences (embedded in unfamiliar genes) that might lead them to new enzymes or new species, it's called "shotgun metagenomics." Kuske's team uses both targeted and shotgun approaches.
Targeting Consequences for Humanity
For targeted metagenomics, Kuske uses a common laboratory technique known as polymerase chain reaction (PCR) to search for genes in the DNA extracted from soil. The process involves using certain known sequences called "primers" that are designed to locate and attach to a particular sequence within the soil DNA. These primers might seek out the common beginning and ending sequences for a particular class of genes whose variations, from one species to another, occur only between these endpoints. In fact, it is frequently necessary to choose primers to bracket only a small part of one gene, since sequencing reads are limited to less than a few hundred base pairs. The PCR process selectively retrieves the bracketed DNA from the sample and makes copies to be sequenced. That's the targeting: only the chosen gene fragments are sequenced, not the complete jumble of everything present in the soil.
DNA is stored in a small vial of clear liquid after being extracted from a soil sample.
Kuske's group is developing primers for many genes that can be used to identify organisms or their functions (enzymes). She has been "fishing," as she calls it, for certain specific genes so far, with many more to come. One gene she targets, for example, is called the rRNA gene and is present in all living cells. Every species known has a unique version of this gene, which allows biologists to use it to classify all life forms. So when researchers target this gene, they can track the populations of various microbes in selected environments—in normal versus elevated-CO2 environments, for example—based on the number of occurrences of each version of the gene they find. This works for all organisms, whether or not they have been studied in a laboratory before; even unnamed life forms can be tracked through their unique rRNA genes. The result is a complete picture of population numbers, population shifts, and population diversity in the soil sampled.
Another gene they track is responsible for making cellobiohydrolase, an enzyme that some fungi use to break down cellulose, the main component in plant matter. In a CO2-enriched environment, there is more plant growth because plants convert some of the extra CO2 into extra cellulose. This might sound reassuringly self-correcting: more CO2 (bad) yields more plant growth to absorb more CO2 (good). But if the extra plant matter is ultimately digested with cellobiohydrolase, then the extra absorbed CO2 just gets released and returned to the atmosphere. So it's all about knowing how much CO2 is being recycled, which is partly dependent on how much cellobiohydrolase is in play. Kuske's team is working on that, and models of how the Earth responds to climate change will eventually need to account for this type of soil activity.
Los Alamos scientists Cheryl Kuske and John Dunbar prepare multiple soil DNA samples (dyed blue) for sequencing.
Targeting fungal cellobiohydrolase is a powerful way to follow CO2 recycling from cellulose in the soil. But how can scientists find out if there are other microbes that might perform a similar function, perhaps with a different enzyme? The answer lies in a technique called stable isotope probing. Isotopes of the same element differ only in the number of neutrons in each atom, so they have different masses but behave identically otherwise. At Los Alamos, the research team enriches cellulose with carbon-13 (instead of the much more common carbon-12) and adds this artificially heavy cellulose to the soil sample.
"This technique essentially follows the adage, 'you are what you eat,'" explains Stephanie Eichorst, a postdoctoral research scientist on Kuske's team. Once the microbe digests the heavy cellulose, its "body," including its DNA, becomes enriched in carbon-13. Now, metagenomic DNA can be separated in a centrifuge, which forces the DNA that's enriched with the heavier carbon to sink. If researchers use only the enriched DNA, then whatever genes they target and sequence must belong to the organisms that are actively consuming the cellulose. Stable isotope probing, therefore, can be used to identify which organisms are performing a particular function (digesting cellulose in this case) since it's always possible to target a gene that's unique to each particular microbe. By this approach, Kuske and Eichorst discovered (heavy) genes from a variety of bacteria and fungi that were not previously known to be involved in degrading cellulose. Further study and genome sequencing of these individual organisms should identify new ways in which they digest cellulose (with new enzymes) and new genes to target in future experiments (corresponding to those new enzymes).
Much can be accomplished with targeted metagenomics. Targeting the rRNA gene gives you population statistics. Targeting cellobiohydrolase gives you data on plant degradation and CO2 release. And adding stable isotope probing gives you the specific organisms responsible, possibly providing valuable leads about which enzymes and which genes to study next. But what if you don't filter out part of the DNA by targeting only particular genes? What's possible when you sequence all the DNA you find in an ecosystem, using the shotgun approach, and then analyze those sequence data with a computer?
The marriage of biology and sophisticated computer data processing offers tremendous opportunity for advancing bioscience. What's at stake is a deeper, richer understanding of the tree of life, including knowledge of new genes and new organisms and how they co-evolved, as well as several important new applications.
While targeted metagenomics is a story of demonstrated new capability, shotgun metagenomics is still largely a story of potential. To realize that potential, researchers must learn to recognize segments of new genes when they're buried in an avalanche of unknown sequences. But as the collection of known genes grows, the shotgun approach becomes more powerful.
In principle, progress can be made by bootstrapping: Start by identifying a particular gene fragment of interest from a shotgun sample. This gene fragment could be chosen because it contains a sequence that's familiar from another organism; perhaps it encodes for a feature found in some known proteins. Study that gene and perhaps the entire genome it belongs to by traditional methods, and then use the results to expand the database of known sequences.
In practice, however, just identifying a gene fragment to study in a shotgun dataset can be a challenge. With only a few hundred thousand sequence fragments obtained from a billion microbes, and each fragment much shorter than a complete gene, you usually don't get enough genetic data to recognize even a single gene. So Kuske teamed up with computational, theoretical, and genome sequencing colleagues at Los Alamos to pioneer methods of computer-based analysis of very short, recurring base pair sequences that match pieces of known genes. In other words, if you can't recognize enough of any sequence to connect it with a particular gene, then set your sights a little lower: aim to recognize much shorter sequences, up to about 30 base pairs, that have been found before in multiple genes. It is often the case that different organisms share such blips of genetic material because the biochemical function they encode for is common to a variety of living cells. Sometimes it just takes a little handhold like this to obtain clues about which genes or which organisms to study in more detail. If the mini-sequence is somehow connected to photosynthesis, for example, then it could be related to some variety of cyanobacteria, which perform photosynthesis.
Important results are already emerging. Nick Hengartner, from the Los Alamos Information Sciences group, is the team lead for a Lab-directed R&D project to perform the computational analysis of the shotgun data. He and Kuske have been successful at identifying very short gene fragments from shotgun samples and sorting them into different branches of the bacterial "family tree" (see figure below). Doing this provides a quick snapshot of the distinctive features of the bacterial population in a particular environment. Shotgun data also allow biologists to add to the family tree whenever they find new base pair sequences that have sufficient overlap with known bacterial sequences to allow them to pinpoint the right branch of the tree. With more time and effort, researchers expect to identify unknown organisms, determine their roles in the ecosystem, and even predict their responses in the face of changing environmental conditions.
Soil census: Shotgun data from two distant soil collection sites are compared in this representation of the bacterial family tree. The green and red triangles indicate increases and decreases, respectively, in the population of different bacteria found in a Nevada desert soil sample, relative to soil from a Tennessee sweetgum plantation. The strip of green triangles at the eight o'clock position, for example, shows the increased role of various cyanobacteria (shown in the photograph); these bacteria make up for the lack of desert vegetation by performing many of the functions normally performed by plants, including photosynthesis. In addition to displaying population trends, this diagram demonstrates the capability of Los Alamos scientists to associate short sequences obtained from shotgun methodology with locations on a family tree—an essential ability for identifying and placing new species as they are discovered.
Shotgun metagenomics also offers the potential for valuable new applications. In theory, it should someday be possible to search any well-studied metagenome—in soil, ocean water, caves, digestive systems of animals, and so on—for a specific enzyme of interest. With all the diversity we observe in Earth's microbial communities, it is likely that nature has already evolved biochemical solutions to meet many of our needs. Moreover, the human body is itself a major bacterial community, and comparing bacterial metagenomes across different human populations might allow researchers to quickly identify medically significant features. For example, a shift in the relative abundance of two different types of bacteria in the human gut appears to be at least partly responsible for causing obesity. (See "Metagenomics to the Rescue," above.)
Today, of course, the results remain more modest. Kuske and her team have uncovered trends in microbial populations in different habitats, with and without increased CO2 levels. They have identified organisms that break down cellulose and release CO2—organisms that had been previously unknown to do so. They found that a version of the gene for cellobiohydrolase, employed to break down cellulose, is present in most known fungi. And they are collecting data on how much CO2 the major soil communities can be expected to re-release into the air as global warming continues. Metagenomic research is a bold new initiative, and these early results—those pertaining to our immediate environmental needs—are just the beginning.
In this issue...
- Dynamic Vision
DARHT FULFILLS ITS DESTINY
- Solar Smart Grid in the Atomic City
TEST BED FOR LOCAL CONTROL OF RENEWABLE ENERGY
EXPOSING AND EXPLOITING THE SECRET LIFE OF SOIL
Clean Air and Abundant Fuel
Shooting Rocks on Mars
Better Fuel Cell Membrane Materials