Slow I/O and downtime impacted the run times of the University of California Santa Cruz's Genome Browser search tool used by scientists in their work to solve questions of the postgenomic era. They were searching for a storage solution that delivered high performance random I/O to an exceptionally large number of cluster nodes and one that would allow them to focus solely on their tests instead of the systems running them.
Apidays New York 2024 - The value of a flexible API Management solution for O...
UCSC's Biomolecular Department Eliminates I/O Bottleneck with Panasas
1. Customer Success Story
University of California, Santa Cruz
University of California,
“The Panasas Storage
system has reduced Santa Cruz
run times by over 40 The Center for Biomolecular Sciences and Engineering at University of California,
hours.” Santa Cruz (UCSC) launches interdisciplinary research and academic programs
that address the scientific questions of the post-genomic era. The Center uses
Robert Baertsch
Research Assistant, computational, mathematical, and statistical approaches to probe and analyze
UCSC biological data, from DNA to biological processes to healthcare systems. One of
the Center’s major projects is the UCSC Genome Browser, a web-based tool that
allows researchers to view all 23 chromosomes of the human genome from as large
as a full chromosome down to an individual nucleotide. The UCSC Genome Browser
integrates the work of numerous scientists in laboratories worldwide and includes
work generated at UCSC in an interactive, graphical display.
The Challenge tests, instead of the systems that were
The UCSC Genome Browser leverages running them, was critically important.
extremely fast search software that runs “Many of our programs would take
SUMMARY on the KiloKluster, a second-generation years to run on a single CPU. Having a
Industry: Life Sciences 1000+ node bioinformatics Linux cluster. cluster with many nodes shortens run
It enables researchers to match any time to days or even hours making our
THE CHALLENGE DNA sequence to the human genome research possible,” said Baertsch. “Slow
Slow I/O and downtime impacted the in seconds and maps experimental data I/O and downtime can really impact
run times of their Genome Browser to the reference sequence. In order to overall run times and it is critical to
search tool used by scientists in their process the Browser’s huge quantity of have a storage system that can scale
work to solve questions of the post-
genomic era. They were searching data, the Center searched for a storage to thousands of nodes with a single
for a storage solution that delivered solution that delivered high performance system image.” Finally, price is always
high performance random I/O to an random I/O to a large number of cluster a major consideration for the Center. A
exceptionally large number of cluster
nodes. “Our KiloKluster really taxes the fundamental requirement is to deliver a
nodes and one that would allow them to
focus solely on their tests instead of the capabilities of a storage system,” said compelling price point.
systems running them. Robert Baertsch, Research Assistant at
UCSC. “To be fully effective, a storage The Solution
THE SOLUTION solution in our environment needs to be The Center conducted an extended
The fully integrated software/hardware able to deliver exceptional performance evaluation process including detailed
solution included the Panasas® with a large number of cluster nodes.” testing with many high performance
Operating Environment and the PanFS™ network-attached and direct-attached
parallel file system with the Panasas
DirectFLOW® protocol. UCSC’s system needed to scale in storage solutions. After thorough testing,
performance as well as capacity. As a the Panasas® Storage solution was
result, the Center searched for a solution selected for its ability to deliver high-speed
THE RESULT
that had the potential to scale as a large random I/O performance in a large cluster
• Exceptional I/O Performance
single pool of data. Similar to many environment, simplified management
• A single namespace for simplified universities, the UCSC researchers work through a scalable, shared pool of storage
cluster management
on several complex projects at any one and exceptional value. The Panasas Storage
• Maximized ROI from their clustered
time. The ability to focus solely on their system is now connected to the KiloKluster
computing environment
1-888-panasas www.panasas.com