A Simple Plan to ID Every Creature on Earth
At any given time, thanks to technology and intelligent application, there are hundreds of revolutions happening quietly, in the background. Most go unnoticed by those not in that industry, and even by many who are in it, until that is they turn around one day, and their way of working is fundamentally altered, for the better.
One such quiet revolution is happening in lepidopterology.
Lepidopterology is the study of butterflies and moths. This passionate study typically involves catching, collecting, and categorising species pinned to boards, and kept safe from the elements in drawers. The stereotypical image is of an old man with wrinkly skin and white hair puttering about, and has not changed much since the Victorian period. Unfortunately, it is not all that far from the truth. Lepidopterology has not fundamentally changed in centuries. Yes, optical scanners make it easier to examine specimens and blow them up for closer detail, but still, with moths and butterfly patterns as unique as fingerprints, exactly which species does a given individual belong to?
It is a problem that plagues the study of animal and insect life around the globe. From farmer to port inspector, game warden to exterminator, even professional biologists are sometimes sure which species, or subspecies a given creature belongs to. They stare at it and simply wonder what it is. This is especially true if that particular expert has not seen those particular markings before.
Matching living things to their names is so notoriously difficult that the problem itself has been given a name: the taxonomic impediment. With insects, given the small scale and the staggering number of similar species, the taxonomic impediment is severe.
More than 90 percent of insects, tens of millions of species, have never been described. As every type of information in the world is being encoded into standard formats, accessible on the Web and searchable from anywhere, plant and animal names stand out as a stubborn exception.
That is now beginning to change.
Enter Dan Janzen, utopian lepidopterist. He has been working in a Costa Rican forest for more than 40 years, his long suffering wife is also his research partner. Together, they are struggling to complete the mammoth task of cataloguing and naming every species of moth and butterfly in the forest. Janzen seeks to understand the distribution of each species, the food chain interaction between them, and each one's little niche compared to everything else. The two of them work day and night on this. They've been at this for 40 years. Janzen knows he will die long before it is even remotely completed.
There may be no person in the world faster at spreading moths. Nonetheless, at this rate, his project will fail.
If visual identification is not enough, if human memory is not enough to recognise if the moth you have found is the same species as one you found 20 years ago, when moths arrive in their dozens daily, then a new approach is necessary. This is where the quiet revolution comes in.
Paul Hebert, evolutionary biologist at the University of Guelph, in Canada, may have a solution. It derives its existence from a supermarket shelf and an idle thought.
Bar codes. They identify very product on every supermarket shelf in the world. A product's 'species' is known instantly from its code. So, why not bar-code species? Of course, we don't have to put a bar code on them' the bar code is already there - in their genes.
The problem with DNA sequencing until very recently, was that it was impossible to sequence an entire flea genome, or a fly's or a single butterfly's - let alone dozens per day. Yet, that is what Paul Hebert is working on, an automatic machine for sequencing insect DNA.
In January 2003, Hebert published a paper in Proceedings of the Royal Society in which he claimed his technique could solve the taxonomic impediment. "Although much biological research depends on species diagnoses," Hebert wrote, "taxonomic expertise is collapsing." He went on to complain of the dwindling number of qualified taxonomists, the tendency of expert identifications to be incorrect, the extreme difficulty of telling many animals apart in various life stages, the small number of species identified in the past 250 years, the vast number of unidentified species still remaining, and, perhaps most damning of all, the fact that even when an expert has identified a group of animals and done the identification correctly and produced a guide, the guide itself is so complex that mistakes are common.
As a remedy, Hebert set out his own method of identifying animals through a small, standard sequence of DNA; he shared test data about Canadian moths, and he added some additional data gleaned from GenBank, a publicly accessible repository of gene sequences. At the end of the paper, he asked for money. "We believe that a CO1 database can be developed within 20 years for 5-10 million animal species on the planet for approximately $1 billion," he wrote.
J. Craig Venter, famous for his work on sequencing the human genome, argued that Hebert's suggestion was uninteresting. The so-called barcode region amounted to just 650 base pairs, less than a ten-millionth of the genome. Why be satisfied with something like that when the cost of whole genome sequencing was rapidly falling? But for Hebert, the triviality of sequencing a little fragment was exactly the point. "It's seven orders of magnitude smaller!" he says. "It's always going to be cheaper. If you can get whole genomes for $10, you will get barcodes for pennies."
Hebert proposed a barcoding factory: Capture a bunch of bugs, remove a leg from each, sequence a bit of DNA, and produce a chart that shows which bugs clump together as a single species. If a sample of that species has already been identified, then the factory can provide a name. Along with legs from bugs, the factory can accept other material that contains DNA - feathers from birds, or bits of mollusk, or samples from a pallet of frozen fish. Once the method is proven and the standard is accepted, such a factory could even be miniaturized. It could be taken out to the field, used to identify a species found in a crate, a box, hiding behind a shed, or even genetic residue from a sting. Find it, break it down and compare it in hours at most, coming back with an exact species match.
Dan Janzen and Paul Hebert met in 2003, at the first meeting funded by the Sloan Foundation. Janzen, after hearing Hebert's bold claims, informed the startled inventor that he was thinking too small. A barcode factory was a pretty good idea, but to rescue field biology, they needed more. Why didn't they work on a machine that was the size of a comb - a species tricorder.
At the 2003 meeting, Janzen and Hebert made a deal. Hebert would provide discount barcode analysis for around $2 each. Janzen would use his unparalleled field research operation to test whether barcoding worked, and he would create a prototype system to inventory animal life. Each barcode would link to a reference specimen, with collection notes, scientific name where possible, and detailed ecological data. Nobody in the world had access to as many fresh, annotated specimens of tropical moths as Janzen did. For decades, he had been hacking his way through the taxonomic impediment.
Barcoding has focused his attention on distinctions that had always been impossible to sort out. "Sometimes you've got all these slightly different moths, and according to convention they are the same species," he says. "The original specimen that goes with this name could be sitting in a dusty drawer in Berlin, and who knows what ecological information goes with it? Maybe none! So we send legs of all these supposedly identical insects to Paul, and sure enough, we get different barcodes. We go back to the box and sort them by barcode, and sure enough, one of the barcode clusters is big, one of them is smaller, one of them is gray, and one of them feeds on a different plant. So there goes your variation - there are four species!"
In Guelph today, the barcoding factory is running at full speed. So far, Hebert's team has analyzed nearly 375,000 specimens.
In Madagascar, a well-known myrmecologist named Brian Fisher has been barcoding ants by the thousands; there is a collaboration under way to get the barcodes of all birds (they have done 30 percent in the past five years) and every species of fish as well.
Barcoding works. When a named reference specimen exists in Hebert's database, the system can accept a bit of tissue, sequence the barcode region, and come up with a species name. Unfortunately, there are only about 47,000 barcodes that link directly to a name, because many of the barcoded specimens still lack a valid, traditional taxonomic identification.
"Each sequencer can run 500,000 sequences a year," Hebert says. "Line them up, feed them bug bits, pay the chemistry bill, and we can easily register 1 million species in a decade. Give us a few more sequencers, more chemistry money, more bug bits, and we'll register 100 million species in 20 years and then go swimming on a beach in Costa Rica."