DNA can be used to store almost limitless amounts of data in almost no space

In the age of big data, we are quickly producing far more digital information than we can possibly store. Last year, $20 billion was spent on new data centers in the US alone, doubling the capital expenditure on data center infrastructure from 2016. And even with skyrocketing investment in data storage, corporations and the public sector are falling behind.

But there’s hope.

With a nascent technology leveraging DNA for data storage, this may soon become a problem of the past. By encoding bits of data into tiny molecules of DNA, researchers and companies like Microsoft hope to fit entire data centers in a few flasks of DNA by the end of the decade.

But let’s back up.


After the 20th century, we graduated from magnetic tape, floppy disks, and CDs to sophisticated semiconductor memory chips capable of holding data in countless tiny transistors. In keeping with Moore’s Law, we’ve seen an exponential increase in the storage capacity of silicon chips. At the same time, however, the rate at which humanity produces new digital information is exploding. The size of the global datasphere is increasing exponentially, predicted to reach 160 zettabytes (160 trillion gigabytes) by 2025. As of 2016, digital users produced over 44 billion gigabytes of data per day. By 2025, the International Data Corporation (IDC) estimates this figure will surpass 460 billion. And with private sector efforts to improve global connectivity—such as OneWeb and Google’s Project Loon—we’re about to see an influx of data from five billion new minds.

By 2020, three billion new minds are predicted to join the web. With private sector efforts, this number could reach five billion. While companies and services are profiting enormously from this influx, it’s extremely costly to build data centers at the rate needed. At present, about $50 million worth of new data center construction is required just to keep up, not to mention millions in furnishings, equipment, power, and cooling. Moreover, memory-grade silicon is rarely found pure in nature, and researchers predict it will run out by 2040.

Take DNA, on the other hand. At its theoretical limit, we could fit 215 million gigabytes of data in a single gram of DNA.

But how?

Crash Course

DNA is built from a double helix chain of four nucleotide bases—adenine (A), thymine (T), cytosine (C), and guanine (G). Once formed, these chains fold tightly to form extremely dense, space-saving data stores. To encode data files into these bases, we can use various algorithms that convert binary to base nucleotides—0s and 1s into A, T, C, and G. “00” might be encoded as A, “01” as G, “10” as C, and “11” as T, for instance. Once encoded, information is then stored by synthesizing DNA with specific base patterns, and the final encoded sequences are stored in vials with an extraordinary shelf-life. To retrieve data, encoded DNA can then be read using any number of sequencing technologies, such as Oxford Nanopore’s portable MinION.

Still in its deceptive growth phase, DNA data storage—or NAM (nucleic acid memory)—is only beginning to approach the knee of its exponential growth curve. But while the process remains costly and slow, several players are beginning to crack its greatest challenge: retrieval. Just as you might click on a specific file and filter a search term on your desktop, random-access across large data stores has become a top priority for scientists at Microsoft Research and the University of Washington.

Storing over 400 DNA-encoded megabytes of data, U Washington’s DNA storage system now offers random access across all its data with no bit errors.


Even before we guarantee random access for data retrieval, DNA data storage has immediate market applications. According to IDC’s Age 2025 study (Figure 5 (PDF)), a huge proportion of enterprise data goes straight to an archive. Over time, the majority of stored data becomes only potentially critical, making it less of a target for immediate retrieval.

Particularly for storing past legal documents, medical records, and other archive data, why waste precious computing power, infrastructure, and overhead?

Data-encoded DNA can last 10,000 years—guaranteed—in cold, dark, and dry conditions at a fraction of the storage cost.

Now that we can easily use natural enzymes to replicate DNA, companies have tons to gain (literally) by using DNA as a backup system—duplicating files for later retrieval and risk mitigation.

And as retrieval algorithms and biochemical technologies improve, random access across data-encoded DNA may become as easy as clicking a file on your desktop.

As you scroll, researchers are already investigating the potential of molecular computing, completely devoid of silicon and electronics.

Harvard professor George Church and his lab, for instance, envision capturing data directly in DNA. As Church has stated, “I’m interested in making biological cameras that don’t have any electronic or mechanical components,” whereby information “goes straight into DNA.” According to Church, DNA recorders would capture audiovisual data automatically. “You could paint it up on walls, and if anything interesting happens, just scrape a little bit off and read it—it’s not that far off.” One day, we may even be able to record biological events in the body. In pursuit of this end, Church’s lab is working to develop an in vivo DNA recorder of neural activity, skipping electrodes entirely.

Perhaps the most ultra-compact, long-lasting, and universal storage mechanism at our fingertips, DNA offers us unprecedented applications in data storage—perhaps even computing.


As DNA data storage plummets in tech costs and rises in speed, commercial user interfaces will become both critical and wildly profitable. Once corporations, startups, and people alike can easily save files, images or even neural activity to DNA, opportunities for disruption abound. Imagine uploading files to the cloud, which travel to encrypted DNA vials, as opposed to massive and inefficient silicon-enabled data centers. Corporations could have their own warehouses and local data networks could allow for heightened cybersecurity—particularly for archives.

And since DNA lasts millennia without maintenance, forget the need to copy databases and power digital archives. As long as we’re human, regardless of technological advances and changes, DNA will always be relevant and readable for generations to come.

But perhaps the most exciting potential of DNA is its portability. If we were to send a single exabyte of data (one billion gigabytes) to Mars using silicon binary media, it would take five Falcon Heavy rockets and cost $486 million in freight alone.

With DNA, we would need five cubic centimeters.

At scale, DNA has the true potential to dematerialize entire space colonies worth of data. Throughout evolution, DNA has unlocked extraordinary possibilities—from humans to bacteria. Soon hosting limitless data in almost zero space, it may one day unlock many more.

A Data Storage Revolution? DNA Can Store Near Limitless Data in Almost Zero Space

The eternity drive: Why DNA could be the future of data storage

By Peter Shadbolt, for CNN

How long will the data last in your hard-drive or USB stick? Five years? 10 years? Longer?

Already a storage company called Backblaze is running 25,000 hard drives simultaneously to get to the bottom of the question. As each hard drive coughs its last, the company replaces it and logs its lifespan.

While this census has only been running five years, the statistics show a 22% attrition rate over four years.

Some may last longer than a decade, the company says, others may last little more than a year; but the short answer is that storage devices don’t last forever.

Science is now looking to nature, however, to find the best way to store data in a way that will make it last for millions of years.

Researchers at ETH Zurich, in Switzerland, believe the answer may lie in the data storage system that exists in every living cell: DNA.

So compact and complex are its strands that just 1 gram of DNA is theoretically capable of containing all the data of internet giants such as Google and Facebook, with room to spare.

In data storage terms, that gram would be capable of holding 455 exabytes, where one exabyte is equivalent to a billion gigabytes.

Fossilization has been known to preserve DNA in strands long enough to gain an animal’s entire genome — the complete set of genes present in a cell or organism.

So far, scientists have extracted and sequenced the genome of a 110,000-year-old polar bear and more recently a 700,000-year-old horse.

Robert Grass, lecturer at the Department of Chemistry and Applied Biosciences, said the problem with DNA is that it degrades quickly. The project, he said, wanted to find ways of combining the possibility of the large storage density in DNA with the stability of the DNA found in fossils.

“We have found elegant ways of making DNA very stable,” he told CNN. “So we wanted to combine these two stories — to get the high storage density of DNA and combine it with the archaeological aspects of DNA.”

The synthetic process of preserving DNA actually mimics processes found in nature.

As with fossils, keeping the DNA cool, dry and encased — in this case, with microscopic spheres of glass – could keep the information contained in its strands intact for thousands of years.

“The time limit with DNA in fossils is about 700,000 years but people speculate about finding one-million-year storage of genomic material in fossil bones,” he said.

“We were able to show that decay of our DNA and store of information decays at the same rate as the fossil DNA so we get to similar time frames of close to a million years.”

Fresh fossil discoveries are throwing up new surprises about the preservation of DNA.

Human bones discovered in the Sima de los Huesos cave network in Spain show maternally inherited “mitochondrial” DNA that is 400,000 years old – a new record for human remains.

The fact that the DNA survived in the relatively cool climate of a cave — rather than in a frozen environment as with the DNA extracted from mammoth remains in Siberia – has added to the mystery about DNA longevity.

“A lot of it is not really known,” Grass says. “What we’re trying to understand is how DNA decays and what the mechanisms are to get more insight into that.”

What is known is that water and oxygen are the enemy of DNA survival. DNA in a test tube and exposed to air will last little more than two to three years. Encasing it in glass — an inert, neutral agent – and cooling it increases its chances of survival.

Grass says sol-gel technology, which produces solid materials from small molecules, has made it a relatively easy process to get the glass around the DNA molecules.

While the team’s work invites immediate comparison with Jurassic Park, where DNA was extracted from amber fossils, Grass says that prehistoric insects encased in amber are a poor source of prehistoric DNA.

“The best DNA comes from sources that are ceramic and dry — so teeth, bones and even eggshells,” he said.

So far the team has tested their storage method by preserving just 83 kilobytes of data.

“The first is the Swiss Federal Charter of 1291 — it’s like the Swiss Magna Carta — and the other was the Archimedes Palimpsest; a copy of an Ancient Greek mathematics treatise made by a monk in the 10th century but which had been overwritten by other monks in the 15th century.

“We wanted to preserve these documents to show not just that the method works, but that the method is important too,” he said.

He estimates that the information will be readable in 10,000 years’ time, and if frozen, as long as a million years.

The cost of encoding just 83Kb of data cost about $2,000, making it a relatively expensive process, but Grass is optimistic that price will come down over time. Advances in technology for medical analysis, he said, are likely to help with this.

“Already the prices for human genome sequences have dropped from several millions of dollars a few years ago to just hundreds of dollars now,” Grass said.

“It makes sense to integrate these advances in medical and genome analysis into the world of IT.”