2-40 Exabytes
“Projections of storage requirements for sequence data depend on the accuracy and application of the sequencing. For every 3 billion bases of human genome sequence, 30-fold more data (~100 gigabases) must be collected because of errors in sequencing, base calling, and genome alignment. This means that as much as 2–40 exabytes of storage capacity will be needed by 2025 just for the human genomes.”
- http://journals.plos.org
Genomic sequencing is expensive
Sequenced data needs to be verifiable (re-running a sequence is time consuming)
Large data sets can be expensive to maintain & distribute
Global research collaboration requires CDN like capabilities
Data persistence
Most medical archives need to exist for an extended period of time
PACS, X-Rays and other data typically have a required retention period
Scaling to meet the demands of sequencing growth
Exabytes of data expected to be generated
Difficult to scale traditional enterprise solutions to meet growth
Lifescience and Genomics
100 GBs
Roughly the amount of disk needed to sequence a single human genome with high accuracy. If we were to sequence a large population the amount of disk needed grows exponentially.
Atlanta, GA
Population: 484,044
Capacity Needed: ~49 PBs
New York City, NY
Population: 8.398,748
Capacity Needed: ~840 PBs