In the age of big data, we are quickly producing far more digital information than we can possibly store. Last year, $20 billion was spent on new data centers in the US alone, doubling the capital expenditure on data center infrastructure from 2016. And even with skyrocketing investment in data storage, corporations and the public sector are falling behind.
But there’s hope.
With a nascent technology leveraging DNA for data storage, this may soon become a problem of the past. By encoding bits of data into tiny molecules of DNA, researchers and companies like Microsoft hope to fit entire data centers in a few flasks of DNA by the end of the decade.
But let’s back up.
After the 20th century, we graduated from magnetic tape, floppy disks, and CDs to sophisticated semiconductor memory chips capable of holding data in countless tiny transistors. In keeping with Moore’s Law, we’ve seen an exponential increase in the storage capacity of silicon chips. At the same time, however, the rate at which humanity produces new digital information is exploding. The size of the global datasphere is increasing exponentially, predicted to reach 160 zettabytes (160 trillion gigabytes) by 2025. As of 2016, digital users produced over 44 billion gigabytes of data per day. By 2025, the International Data Corporation (IDC) estimates this figure will surpass 460 billion. And with private sector efforts to improve global connectivity—such as OneWeb and Google’s Project Loon—we’re about to see an influx of data from five billion new minds.
By 2020, three billion new minds are predicted to join the web. With private sector efforts, this number could reach five billion. While companies and services are profiting enormously from this influx, it’s extremely costly to build data centers at the rate needed. At present, about $50 million worth of new data center construction is required just to keep up, not to mention millions in furnishings, equipment, power, and cooling. Moreover, memory-grade silicon is rarely found pure in nature, and researchers predict it will run out by 2040.
Take DNA, on the other hand. At its theoretical limit, we could fit 215 million gigabytes of data in a single gram of DNA.
DNA is built from a double helix chain of four nucleotide bases—adenine (A), thymine (T), cytosine (C), and guanine (G). Once formed, these chains fold tightly to form extremely dense, space-saving data stores. To encode data files into these bases, we can use various algorithms that convert binary to base nucleotides—0s and 1s into A, T, C, and G. “00” might be encoded as A, “01” as G, “10” as C, and “11” as T, for instance. Once encoded, information is then stored by synthesizing DNA with specific base patterns, and the final encoded sequences are stored in vials with an extraordinary shelf-life. To retrieve data, encoded DNA can then be read using any number of sequencing technologies, such as Oxford Nanopore’s portable MinION.
Still in its deceptive growth phase, DNA data storage—or NAM (nucleic acid memory)—is only beginning to approach the knee of its exponential growth curve. But while the process remains costly and slow, several players are beginning to crack its greatest challenge: retrieval. Just as you might click on a specific file and filter a search term on your desktop, random-access across large data stores has become a top priority for scientists at Microsoft Research and the University of Washington.
Storing over 400 DNA-encoded megabytes of data, U Washington’s DNA storage system now offers random access across all its data with no bit errors.
Even before we guarantee random access for data retrieval, DNA data storage has immediate market applications. According to IDC’s Age 2025 study (Figure 5 (PDF)), a huge proportion of enterprise data goes straight to an archive. Over time, the majority of stored data becomes only potentially critical, making it less of a target for immediate retrieval.
Particularly for storing past legal documents, medical records, and other archive data, why waste precious computing power, infrastructure, and overhead?
Data-encoded DNA can last 10,000 years—guaranteed—in cold, dark, and dry conditions at a fraction of the storage cost.
Now that we can easily use natural enzymes to replicate DNA, companies have tons to gain (literally) by using DNA as a backup system—duplicating files for later retrieval and risk mitigation.
And as retrieval algorithms and biochemical technologies improve, random access across data-encoded DNA may become as easy as clicking a file on your desktop.
As you scroll, researchers are already investigating the potential of molecular computing, completely devoid of silicon and electronics.
Harvard professor George Church and his lab, for instance, envision capturing data directly in DNA. As Church has stated, “I’m interested in making biological cameras that don’t have any electronic or mechanical components,” whereby information “goes straight into DNA.” According to Church, DNA recorders would capture audiovisual data automatically. “You could paint it up on walls, and if anything interesting happens, just scrape a little bit off and read it—it’s not that far off.” One day, we may even be able to record biological events in the body. In pursuit of this end, Church’s lab is working to develop an in vivo DNA recorder of neural activity, skipping electrodes entirely.
Perhaps the most ultra-compact, long-lasting, and universal storage mechanism at our fingertips, DNA offers us unprecedented applications in data storage—perhaps even computing.
As DNA data storage plummets in tech costs and rises in speed, commercial user interfaces will become both critical and wildly profitable. Once corporations, startups, and people alike can easily save files, images or even neural activity to DNA, opportunities for disruption abound. Imagine uploading files to the cloud, which travel to encrypted DNA vials, as opposed to massive and inefficient silicon-enabled data centers. Corporations could have their own warehouses and local data networks could allow for heightened cybersecurity—particularly for archives.
And since DNA lasts millennia without maintenance, forget the need to copy databases and power digital archives. As long as we’re human, regardless of technological advances and changes, DNA will always be relevant and readable for generations to come.
But perhaps the most exciting potential of DNA is its portability. If we were to send a single exabyte of data (one billion gigabytes) to Mars using silicon binary media, it would take five Falcon Heavy rockets and cost $486 million in freight alone.
With DNA, we would need five cubic centimeters.
At scale, DNA has the true potential to dematerialize entire space colonies worth of data. Throughout evolution, DNA has unlocked extraordinary possibilities—from humans to bacteria. Soon hosting limitless data in almost zero space, it may one day unlock many more.
A Data Storage Revolution? DNA Can Store Near Limitless Data in Almost Zero Space