by Andy Greenberg
WHEN BIOLOGISTS SYNTHESIZE DNA, they take pains not to create or spread a dangerous stretch of genetic code that could be used to create a toxin or, worse, an infectious disease. But one group of biohackers has demonstrated how DNA can carry a less expected threat—one designed to infect not humans nor animals but computers.
In new research they plan to present at the USENIX Security conference on Thursday, a group of researchers from the University of Washington has shown for the first time that it’s possible to encode malicious software into physical strands of DNA, so that when a gene sequencer analyzes it the resulting data becomes a program that corrupts gene-sequencing software and takes control of the underlying computer. While that attack is far from practical for any real spy or criminal, it’s one the researchers argue could become more likely over time, as DNA sequencing becomes more commonplace, powerful, and performed by third-party services on sensitive computer systems. And, perhaps more to the point for the cybersecurity community, it also represents an impressive, sci-fi feat of sheer hacker ingenuity.
“We know that if an adversary has control over the data a computer is processing, it can potentially take over that computer,” says Tadayoshi Kohno, the University of Washington computer science professor who led the project, comparing the technique to traditional hacker attacks that package malicious code in web pages or an email attachment. “That means when you’re looking at the security of computational biology systems, you’re not only thinking about the network connectivity and the USB drive and the user at the keyboard but also the information stored in the DNA they’re sequencing. It’s about considering a different class of threat.”
A Sci-Fi Hack
For now, that threat remains more of a plot point in a Michael Crichton novel than one that should concern computational biologists. But as genetic sequencing is increasingly handled by centralized services—often run by university labs that own the expensive gene sequencing equipment—that DNA-borne malware trick becomes ever so slightly more realistic. Especially given that the DNA samples come from outside sources, which may be difficult to properly vet.
If hackers did pull off the trick, the researchers say they could potentially gain access to valuable intellectual property, or possibly taint genetic analysis like criminal DNA testing. Companies could even potentially place malicious code in the DNA of genetically modified products, as a way to protect trade secrets, the researchers suggest. “There are a lot of interesting—or threatening may be a better word—applications of this coming in the future,” says Peter Ney, a researcher on the project.
Regardless of any practical reason for the research, however, the notion of building a computer attack—known as an “exploit”—with nothing but the information stored in a strand of DNA represented an epic hacker challenge for the University of Washington team. The researchers started by writing a well-known exploit called a “buffer overflow,” designed to fill the space in a computer’s memory meant for a certain piece of data and then spill out into another part of the memory to plant its own malicious commands.
But encoding that attack in actual DNA proved harder than they first imagined. DNA sequencers work by mixing DNA with chemicals that bind differently to DNA’s basic units of code—the chemical bases A, T, G, and C—and each emit a different color of light, captured in a photo of the DNA molecules. To speed up the processing, the images of millions of bases are split up into thousands of chunks and analyzed in parallel. So all the data that comprised their attack had to fit into just a few hundred of those bases, to increase the likelihood it would remain intact throughout the sequencer’s parallel processing.
When the researchers sent their carefully crafted attack to the DNA synthesis service Integrated DNA Technologies in the form of As, Ts, Gs, and Cs, they found that DNA has other physical restrictions too. For their DNA sample to remain stable, they had to maintain a certain ratio of Gs and Cs to As and Ts, because the natural stability of DNA depends on a regular proportion of A-T and G-C pairs. And while a buffer overflow often involves using the same strings of data repeatedly, doing so in this case caused the DNA strand to fold in on itself. All of that meant the group had to repeatedly rewrite their exploit code to find a form that could also survive as actual DNA, which the synthesis service would ultimately send them in a finger-sized plastic vial in the mail.
The result, finally, was a piece of attack software that could survive the translation from physical DNA to the digital format, known as FASTQ, that’s used to store the DNA sequence. And when that FASTQ file is compressed with a common compression program known as fqzcomp—FASTQ files are often compressed because they can stretch to gigabytes of text—it hacks that compression software with its buffer overflow exploit, breaking out of the program and into the memory of the computer running the software to run its own arbitrary commands.
A Far-Off Threat
Even then, the attack was fully translated only about 37 percent of the time, since the sequencer’s parallel processing often cut it short or—another hazard of writing code in a physical object—the program decoded it backward. (A strand of DNA can be sequenced in either direction, but code is meant to be read in only one. The researchers suggest in their paper that future, improved versions of the attack might be crafted as a palindrome.)
Despite that tortuous, unreliable process, the researchers admit, they also had to take some serious shortcuts in their proof-of-concept that verge on cheating. Rather than exploit an existing vulnerability in the fqzcomp program, as real-world hackers do, they modified the program’s open-source code to insert their own flaw allowing the buffer overflow. But aside from writing that DNA attack code to exploit their artificially vulnerable version of fqzcomp, the researchers also performed a survey of common DNA sequencing software and found three actual buffer overflow vulnerabilities in common programs. “A lot of this software wasn’t written with security in mind,” Ney says. That shows, the researchers say, that a future hacker might be able to pull off the attack in a more realistic setting, particularly as more powerful gene sequencers start analyzing larger chunks of data that could better preserve an exploit’s code.
Needless to say, any possible DNA-based hacking is years away. Illumina, the leading maker of gene-sequencing equipment, said as much in a statement responding to the University of Washington paper. “This is interesting research about potential long-term risks. We agree with the premise of the study that this does not pose an imminent threat and is not a typical cyber security capability,” writes Jason Callahan, the company’s chief information security officer “We are vigilant and routinely evaluate the safeguards in place for our software and instruments. We welcome any studies that create a dialogue around a broad future framework and guidelines to ensure security and privacy in DNA synthesis, sequencing, and processing.”
But hacking aside, the use of DNA for handling computer information is slowly becoming a reality, says Seth Shipman, one member of a Harvard team that recently encoded a video in a DNA sample. (Shipman is married to WIRED senior writer Emily Dreyfuss.) That storage method, while mostly theoretical for now, could someday allow data to be kept for hundreds of years, thanks to DNA’s ability to maintain its structure far longer than magnetic encoding in flash memory or on a hard drive. And if DNA-based computer storage is coming, DNA-based computer attacks may not be so farfetched, he says.
“I read this paper with a smile on my face, because I think it’s clever,” Shipman says. “Is it something we should start screening for now? I doubt it.” But he adds that, with an age of DNA-based data possibly on the horizon, the ability to plant malicious code in DNA is more than a hacker parlor trick.
“Somewhere down the line, when more information is stored in DNA and it’s being input and sequenced constantly,” Shipman says, “we’ll be glad we started thinking about these things.”