2-40 Exabytes

“Projections of storage requirements for sequence data depend on the accuracy and application of the sequencing. For every 3 billion bases of human genome sequence, 30-fold more data (~100 gigabases) must be collected because of errors in sequencing, base calling, and genome alignment. This means that as much as 2–40 exabytes of storage capacity will be needed by 2025 just for the human genomes.”

- http://journals.plos.org

Genomic sequencing is expensive

  • Sequenced data needs to be verifiable (re-running a sequence is time consuming)

  • Large data sets can be expensive to maintain & distribute

  • Global research collaboration requires CDN like capabilities

Data persistence

  • Most medical archives need to exist for an extended period of time

  • PACS, X-Rays and other data typically have a required retention period

Scaling to meet the demands of sequencing growth

  • Exabytes of data expected to be generated

  • Difficult to scale traditional enterprise solutions to meet growth

Lifescience and Genomics

100 GBs

Roughly the amount of disk needed to sequence a single human genome with high accuracy.  If we were to sequence a large population the amount of disk needed grows exponentially.

Atlanta, GA

Population: 484,044

Capacity Needed: ~49 PBs

New York City, NY

Population: 8.398,748

Capacity Needed: ~840 PBs

Next
Next

Public Archive