1. Big Data in Genomics and Personalized
Medicine – Challenges and Solutions
Gaurav Kaul
Software Architect, Intel
JAX London 2013
2. Agenda
Global Healthcare Trends
The Rise of Personalized Medicine
Big Data Scenarios in Healthcare
Methods to Manage Big Data
Use Cases
Summary and Next Steps
2
*Other names and brands may be claimed as the property of others
3. We are at an Inflection Point in
Healthcare - TRENDS
% of population over age 60
30+ %
25-29%
20-24%
10-19%
0-9%
2050
WW Average Age 60+: 21%
Source: United Nations “Population Aging 2002”
Healthcare costs are
RISING
Significant % of GDP
Global AGING
Average Age 60+:
growing from 10% to
21% by 2050
Source: McKinsey Global Institute Analysis
ESG Research Report 2011 – North American Health Care Provider Market Size and Forecast
3
*Other names and brands may be claimed as the property of others
US Healthcare BIG DATA
Value
$300 Billion in value/year
~ 0.7% annual productivity
growth
4. We are at an Inflection Point in
Healthcare - TRENDS
Storage Growth
Total Data Healthcare Providers (PB)
15000
Admin
Imaging
10000
Medical Imaging Archive Projection
Case from just 1 healthcare system
EMR
Email
5000
File
Non Clin Img
0
2010 2011 2012 2013 2014 2015
Research
Data Explosion projected to reach 35 Zetabytes by 2020, with a 44-fold increase from 20095
Source: McKinsey Global Institute Analysis
ESG Research Report 2011 – North American Health Care Provider Market Size and Forecast
4
*Other names and brands may be claimed as the property of others
7. Vision for Personalized Medicine
7
*Other names and brands may be claimed as the property of others
8. How can we take
Personalized Medicine
Mainstream by 2020 ??
9. A “bioinformatics computing system” includes
technologies from this entire “stack”
Software Frameworks
Applications
Programming Model (abstraction)
Virtualization
System Software and Resource
Management
Computer Hardware, Storage and
Networks
10. A “bioinformatics computing system” includes
technologies from this entire “stack”
Software Frameworks
Applications
Programming Model
(abstraction)
Virtualization
System Software and
Resource Management
Computer
Hardware, Storage and
Networks
Multiple
Cores –
Shared
memory, multi
ple
threads, Open
MP
Multiple
Nodes –
MPI;
GAS, PGAS;
Hadoop
galaxy.psu.edu
Searching for SNPs with
cloud computing
Langmead, Schatz et al;
12. Big Data – A Foundation For Delivering Big Value
Big Data Building Blocks
Network
Storage
Software & Technologies
Intel® Xeon®
Product Family E3E5-E7
Intel® Ethernet
Controllers
Intelligent Storage1
Intel® Distribution for
Apache Hadoop
Energy
Efficient
Responsive
Compute
Intel®
Atom™
Xeon PhiTM
Ethernet
Adapters
Intel® Ethernet
Switch Silicon
Intel® True Scale
Fabric
Choice
High
Availability
Secure
Intel®
Intel®
Scale-out Storage1
Scale-up Storage1
Intel®
SSD 710
series, DC S3700
(SATA)
Intel® SSD 910
series (PCIe)
Intel® Node Manager
Intel® Expressway
Service Gateway
Intel® Cache
Acceleration Software
Intel’s Lustre
Intel® VT and
Intel® TXT
Intel® AES-NI
Intel’s Foundational Technologies Offer Advanced Solutions for Big data Analytics
Xeon-based storage systems are available in a wide range of configuration options from the industry’s leading storage vendors
12
Intel® Data Center
Manager
*Other names and brands may be claimed as the property of others
13. Big Data Compute Platform
Optimizations
Intel® Xeon® E5 Family
Intel® Xeon® E7 Family
RAM
QPI 1
QPI 2
Xeon E7-4800
CORE 3
CORE 4
QPI 4
CORE 5
CORE 6
CORE 7
CORE 8
CORE 9
CORE 10
Up to 4 channels
DDR3 1600 MHz
memory
Up to 8 cores
Up to 20 MB
cache
SCALE-OUT with Hadoop
and analytic/DW engines
Proof point: E5 Analytics 25X Improvement
Hadoop on E5
13
CORE 2
QPI 3
Integrated
PCI
Express*
3.0
Up to 40
lanes
per socket
CORE 1
*Other names and brands may be claimed as the property of others
4 QPI 1.0
Lanes for
robust
scalability
Up to 8 channels
DDR3 1066 MHz
memory
CACHE
Up to 10 cores
Up to 30 MB
cache
SCALE-UP in-memory analytic engines
and databases: Oracle*, SAS*, SAP Hana*
Proof point: SAP HANA
14. Big Data – A Foundation For Delivering Big Value
Intel® Ethernet Reduces Time to Process Large Data Sets
1GbE Network Connections
Trends and Challenges
Big data is hitting the enterprise with
unprecedented
volume, velocity, variety, complexity, and
OPPORTUNITY
Intel® Ethernet Solution
Up to 20x performance boost over legacy
infrastructure with optimizations on
Intel® Xeon® processors, Intel® SSD
storage, and 10Gb Intel® Ethernet
networking
10 Gigabit Ethernet allows quicker import
and export of large data sets for processing
VM VM VM
VM VM VM
Hypervisor
Hypervisor
Moving the Data with 10GbE
Up to
*Other names and brands may be claimed as the property of others
Up to
80%
15%
Reduction
in Cables & Switch
ports
Reduction
in Infrastructure
Costs
1 http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/10gbe-10gbase-t-hadoop-clusters-paper.pdf
14
2 Ports 10GbE
10 Ports 1GbE
Up to
2x
Improved
Bandwidth per
Server
15. Big Data – A Foundation For Delivering Big Value
Intel® CAS with Intel® SSD Solution
Added as cache layer accelerates Big Data workloads
50X IOPS
3X TPC-C
20X TPC-H
Performance near equal to replacing all hard drives
with SSDs at significantly lower cost
http://www.intel.com/content/www/us/en/mission-critical/mission-critical-scalability-oracle-intel-brief.html
15
*Other names and brands may be claimed as the property of others
throughput performance
16. Big Data – A Foundation For Delivering Big Value
Data Methods for the Right Data Structure
Unstructured
Data
Emerging
Technologies
Analytical
Paradigms
MapReduce
/Hive
Structured
Data
Relational
Database
EXALYTICS
* Other names and brands may be claimed as the property of others.
16
*Other names and brands may be claimed as the property of others
17. Big Data – A Foundation For Delivering Big Value
HiTune (URL)
Intel® Distribution for Apache Hadoop* & Tools
MapReduce
File-based Encryption in HDFS
Up to 20x faster decryption with AES-NI*
Role-based access control for Hadoop services
Instrument
Up to 8.5X faster Hive queries using HBase co-processor
Aggregation
Engine
Report
Engine
HiTune Controller
Optimized for SSD with Cache Acceleration Software
Adaptive replication in HDFS and HBase
HiBench (URL)
Integrated text search with Lucene
1
2
Micro Benchmarks
Sort
WordCount
TeraSort
Simplified deployment & comprehensive monitoring
Deployment of HBase across multiple datacenters
Web Search
Nutch Indexing
Page Rank
HiBench
Automated configuration with Intel ® Active Tuner
Detailed profiling of Hadoop jobs
Simplified design of HBase schemas (+ in 2.4)
REST APIs for deployment and management (+ in 2.4)
3
Machine Learning
Bayesian Classification
K-Means Clustering
4
HDFS
Enhanced DFSIO
Result = many Hadoop optimization tips
(IDF2012 presentation “Big Data
Analytics on a Performance-optimized
Hadoop Infrastructure”)
17
*Other names and brands may be claimed as the property of others
18. Life Sciences 2013:
Key Industry Challenges and Solutions
Many (most) applications are singlethreaded, single address space
Intel is delivering optimizations working with
open source community, developing
NGS+HPC curriculum
Some algorithms scale quadratically with the
size of the problem. Large data sets exceed
available memory and storage
Innovations in
acceleration, compute, storage, networking,
security, and *-as-a-service.
International collaboration is an
imperative, bioinformatics expertise is scarce
Intel is working closely with the ecosystem to
address enterprise to cloud transmission of
terabyte payloads
Need are distributed, data is siloed and
for Balanced Compute Infrastructure
Databases
18will likely stay that way
*Other names and brands may be claimed as the property of others
19. Examples of Intel®-powered Servers in Big Data
and Analytics
Cisco* UCS Server1
Intel® Xeon® 5600
Cisco UCS server with EMC
Greenplum MR software “enterprise-class” Hadoop*
distribution that features
technology from MapR
1
Dell* PowerEdge* C Series2
Intel Xeon 5500/5600
The Dell | Cloudera* solution for
Apache* Hadoop sold pre-configured
Oracle* Sun Fire* server3
Intel Xeon E7-4800
Oracle Exalytics* In-Memory
Machine, features the Oracle BI
Foundation Suite and Oracle
TimesTen In-Memory Database for
Exalytics
http://gigaom.com/cloud/ciscos-servers-now-tuned-for-hadoop/
http://www.businesswire.com/news/home/20110804005376/en/Dell-Cloudera-Collaborate-Enable-Large-Scale-Data
3
19 http://www.itp.net/mobile/588145-oracle-unveils-exalytics-in-memory-machine
INTEL CONFIDENTIAL
2
20. Solution 4.0 – NGS Appliances
16 Cores
96 GB RAM
18T Red. Storage
SSD for OS
32 Cores
1.2 TFlops
18-56TB RAID
NSS-HA Pair
NSS User Data
HSS Metadata Pair
HSS OSS Pair
HSS User Data
2U Plenum
Actual placement in racks may vary.
Scale through independent solutions,
each targeting a different segment & usage model
20
Intel Confidential may be claimed as the property of others
*Other names and brands
21. NGS Appliance
Dell Scalable Unit “SANGER”
Infrastructure:
Dell PE, PC & F10
NSS-HA Pair
NSS User
Data
Dell NSS (NFS)
(up to 180TB)
Challenge: Experiment processing takes 7
days with current infrastructure. Delays
treatment for sick patients
Solution: Dell Next Generation Sequencing
Appliance
•
•
HSS Metadata
Pair
HSS OSS Pair
Dell HSS (Lustre)
(up to 360TB)
9 Teraflops of Sandy Bridge Processors
•
Lustre File Storage
•
Intel SW tools and engineers
Benefits: RNA-Seq processing reduced to
4 hour
HSS User
Data
M420 (Compute)
(up to 32 nodes)
2U Plenum
21
Single Rack Solution
*Other names and racks may vary.
Actual placement in brands may be claimed as the property of others
Includes everything you need for NGS compute, storage, software, networking, infra
structure, installation, deployment, training,
service & support
23. Use Case: NEXTBIO
Analytics for Genomics Data
•
Cost to sequence a Genome has fallen by
800x in the last 4 years
•
Each Genome has ~4 million variants
•
Growth in the genomics data in the public
and private domain
•
Data available in variety of sources
–
•
Structured, semi-structured, Un-structured
New aggregated data growing
exponentially
Sequencing
3 Billion
base Pairs
23
Data
Processing
Cloud Storage
Visualization
Millions of
variants
*Other names and brands may be claimed as the property of others
Interpretation &
Analytics
Millions of Variants
Millions of Patients
Commercializing
Targeted
Therapeutics
Companion
Diagnostics
Actionable Biomarkers
24. Data-Intensive Discovery: Genomics
Value
Enable researchers to discover biomarkers and
drug targets by correlating genomic data sets
90% gain in throughput; 6X data compression
Analytics
Provide curated data sets with pre-computed
analysis (classification, correlation, biomarkers)
Provide APIs for applications to combine and
analyze public and private data sets
Data Management
Use Hive and Hadoop for query and search
Dynamically partition and scale Hbase
10-node cluster / Intel Xeon E5 processors
10GbE network
24
*Other names and brands may be claimed as the property of others
Intel Distribution
25. Use Case: NEXTBIO
Nextbio & Intel Collaboration
Technical Challenge:
Immutable Data – write once,
change, read many times
never
Traditional Bloom Filters works
Hadoop & HBase well suited
1 Genome 10 Million rows
100 Genomes 1Billion rows
1M Genomes 10 Trillion rows
100M Genomes 1 Quadrillion
1,000,000,000,000,000 rows
App can dynamically partitions HBase as
data size grows
Intel Optimizations for Hadoop:
Optimized Hadoop stack in Open Source
Stabilize HBase to provide reliable scalable
25
deployment
*Other names and brands may be claimed as the property of others
26. Putting it together ..
Software Frameworks
Applications
Programming Model (abstraction)
Virtualization
System Software and Resource
Management
Computer Hardware, Storage and
Networks
27. Summary
• Enabling ecosystem of partners to innovate and make
Personalized Medicine vision a reality
• Delivering hardware-enhanced capabilities and software to
deploy Personalized Medicine
• Work with Big Data Vendors to onboard increasing number
of life science workloads to Hadoop and other analytics
technologies
Our main building blocks consist of: Server. The Xeon family of processors consists of the E3, E5 and E7 product lines which offer different combinations of capabilities and price points for different workloads. The upcoming Intel MIC (Many Integrated Core) processor is targeted primarily at the portion of the HPC market that values maximum parallel processing density such as…. And our Atom line aims at the low-cost, low-power, ultra dense microserver market where node density is paramount. Networking. Intel is the Industry’s #1 selling 1GbE and 10GbE adapters and silicon and also offers a family of industry leading, low latency 10GbE/40GbE switch silicon productsStorage: one of the biggest trend in storage is the increasing use of compute within the storage box to reduce latencies and also provide lower overall cost/GB of storage thru more efficient storage. For large data sets and those storage workloads requiring the lowest latencies Xeon is the industry choice. Xeon provides the compute capability in over 80% of the storage market. And Intel enterprise SSD’s are designed for the demanding performance and endurance needs of the datacenterSoftware and other technologies: We are developing strong open-source components such as our Intel Distribution of Hadoop. Intel Datacenter Manager enables better power management at the server, rack and datacenter level. Advanced RAS (reliability, availability and serviceability) features ensure high levels of system resiliency and availability. And Intel’s heavy investment in industry enabling ensures these come available in the widest choice of systems. The most popular are general purpose systems, but many of our partners innovate further to create highly workload-optimized platforms and converged architecture systems. The greater level of bundling and integration in these systems allows for simpler and faster deployments and ongoing maintenance.Now lets look at the specific building blocks….
Field note: There are few hyperlinks on this presentation in the blue boxes. The first link in E5 leads to a solution showing a 25x increase in data analytics running on Intel architecture, which shows the capability of the new Xeon E5 processor family, using AVX technology and a variety of other performance optimizations from IBM. The second link in E5 will lead to a solution brief highlighting how Intel® Xeon® E5 processor based servers running Hadoop are at least three times faster than previous solution. They can load, sort, and perform their data analyses faster, and Intel® Hyper-Threading Technology really helps with Hadoop workloads The link in E7 proof point is focused on a scale-up in-memory analytics solution, SAP HANA, running on Intel’s Xeon E7 processor family. All these proof points help the customer understand the power and variability of our processor solutions for Big Data.Key points:Significant performance gains delivered by featuressuch as new Intel® Advanced Vector Extensions and improved Intel® Turbo Boost Technology 2.0To improve flexibility and operational efficiency significant improvements in I/O with new Intel® Integrated I/O which reduces latency ~30% will adding more lanes and higher bandwidth with support for PCI Express 3.0Story:To meet the growing demands of IT such as readiness for cloud computing, the growth in users and the ability to tackle the most complex technical problems, Intel has focused on increasing the capabilities of the processor that lies at the heart of a next generation data center. The Intel Xeon processor E5-2600 product family is the next generation Xeon processor that replaces Platforms based on the Intel Xeon processor 5600 & 5500 series. Continuing to build on the success of the Xeon 5600, the E5-2600 product family has increased core count and cache size in addition to supporting more efficient instructions with Intel® Advance Vector Extensions, to deliver up to an average of 80% more performance across a range of workloads. These processors will offer better than ever performance no matter what your constraint is – floor space, power or budget – and on workloads that range from the most complicated scientific exploration to simple, yet crucial, web serving and infrastructure applications. In addition to the raw performance gains, we’ve invested in improved I/O with Intel Integrated I/O which reduces latency ~30% will adding more lanes and higher bandwidth with support for PCIe 3.0. This helps to reduce network and storage bottlenecks to unleash the performance capabilities of the latest Xeon processor. The Intel® Xeon® processor E5-2600 product family – versatile processers at the heart of today’s data center. Let’s look at just what kind of performance that these products are capable of…Legal Info:Configuration for 80% claim:Source: Performance comparison using best submitted/published 2-socket server results on the SPECfp*_rate_base2006 benchmark as of 6 March 2012. Baseline score of 271 published by Itautec on the ServidorItautec MX203* and ServidorItautec MX223* platforms based on the prior generation Intel® Xeon® processor X5690. New score of 492 submitted for publication by Dell on the PowerEdge T620 platform and Fujitsu on the PRIMERGY RX300 S7* platform based on the Intel® Xeon® processor E5-2690. For additional details, please visit www.spec.org.Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase.Configuration for latency reduction: Source: Intel internal measurements of average time for an I/O device read to local system memory under idle conditions comparing the Intel® Xeon® processor E5-2600 product family (230 ns) vs.. the Intel® Xeon® processor 5500 series (340 ns). Baseline Configuration: Green City system with two Intel® Xeon® processor E5520 (2.26GHz, 4C), 12GB memory @ 1333, C-States Disabled, Turbo Disabled, SMT Disabled. New Configuration: Meridian system with two Intel® Xeon® processor E5-2665 (2.4GHz, 8C), 32GB memory @1600 MHz, C-States Enabled, Turbo Enabled. The measurements were taken with a LeCroy* PCIe* protocol analyzer using Intel internal Rubicon (PCIe* 2.0) and Florin (PCIe* 3.0) test cards running under Windows* 2008 R2 w/SP1.
Field note: There is a link to a proof point on this slide. Intel IT has a whitepaper on the performance benefits of 10GbE on Apache Hadoop. This whitepaper is at our Intel IT Resource Center, which is useful in many ways for your customer. We would recommend pointing the customer to this site for answers to a variety of questions and configurations.Up to 20x performance boost over legacy infrastructure with optimizations on Intel® Xeon processors, SSD storage, and 10GbE networking 10 Gigabit Ethernet (GbE) networks allow you to quickly import large data sets for processing in multiple locationsNetwork: 10 Gigabit Ethernet (10GbE) networking demonstrates its value in the form of high levels of network utilization in the Hadoop cluster. The full use of greater bandwidth can reduce time to ingest and to export data by 80 percent. Moreover, the cost per gigabit of bandwidth with 10GbE is now much lower than 1GbE, making it a natural choice for big data.Much of the performance gain from the underlying hardware requires deep optimization in the software as well as careful tuning of Hadoop configuration parameters. The Intel Distribution is optimized with the latest Intel® processor, storage, and networking hardware components to ensure that the platform delivers balanced performance for the widest range of use cases. The Need for a Balanced System Hadoop is designed and optimized for commonly available hardware. The pace of server innovation has continued unabated for many years, and mainstream systems now deliver massive processing power. To keep pace with that capability, it is vital to deploy Hadoop in the environment it was designed for, one that is balanced between compute, storage, and networking.Hadoop* is increasingly popular for processing big data. Dramatic improvements in mainstream compute and storage resources help make Hadoop clusters viable for most organizations. But to provide a balanced system, those building blocks must be complemented by 10 Gigabit Ethernet (10GbE), rather than legacy Gigabit Ethernet (GbE) networking. This study found success by building on a 10GBASE-T foundation that combines Arista switches, Intel® Ethernet 10 Gigabit Converged Network Adapters, and Intel® Xeon® processor based servers. In the area of networking for this balanced system, the performance of Gigabit Ethernet (GbE) implementations for Hadoop has been a major limiting factor to overall performance. Using the large block size means that, forexample, when a packet is dropped and retransmitted, the system needs to handle a large piece of data, which strains network bandwidth in a GbE environment. 10 Gigabit Ethernet (10GbE) networking proves its value in Hadoop clusters through high observed levels of network utilization, demonstrating the benefit of the higher bandwidth.4x Increase in Write PerformanceHadoop* PUT operation completed in 80 percent less time using 10 Gigabit Ethernet, compared to Gigabit Ethernet
Field note: There is a link to Intel CAS throughput performance data that is in the backup of this presentation.Field note: There is a link to a proof point for performance of an SSD on Oracle TimesTen using Intel SSDs. This is a useful whitepaper that shows how adding SSDs to a system configuration saves in both hardware acquisition and software license costs that pay many times over for the initial investment.There are a variety of new opportunities for solid state disk technologies in the enterprise, and this is enhanced by our new Intel CAS software.Intel Solid State drives come in a variety of form factors, and have enterprise-class levels of reliability along with capacities that are near those of fast rotating media. They can be used as a direct replacement for rotating media. For high-performance needs in the datacenter, Intel SSDs are a great solution that will likely pay for themselves in a short time. We have a pointer to an example that uses Oracle TimesTen if you’re interested in further information or examples.For some applications, adding the Intel Cache Acceleration Software (Intel CAS) solution enables an SSD to act as a local buffer for data on rotating media in the server. This enables you to add in a minimum of cost and get performance at near-SSD levels for all your data, which is a good hybrid solution for cost-conscious deployments. We can look at the performance data in backup if you’re interested.
Key Message: Whatever the solution, Intel is actively working with partners to optimize solutions for analyzing the huge variety of data, providing new insight models, and delivering real-time or near real-time information services.Intel is at the Core of the Big Data across provisioning models and in understanding the right data methods for the right data structure. In the last 24 months there has been abundant innovation on the DB product market than at any time in the last 10 years. While locality and distribution of compute, storage and IO platforms many vary. Intel has been actively working to optimize its technology portfolio within relational, emerging technologies and in the Analytical Engines that are commercially available
While Intel has started doing work in the area of Big Data with a distribution of Apache Hadoop, you should not assume that this will be the only thing we plan to do. It’s useful to look at what we’re doing and understand the type of capability we can bring to your company with our optimized tools.We are currently focusing our IDH efforts at adding key functionality that we can uniquely provide. For instance, we have added AES-NI support to the distribution, which makes encryption of the data set up to 20x faster. In other words, you have the capability to encrypt your data “for free” in terms of performance, making your data secure without penalty.We are also using our Intel CAS software to optimize data acquisition, and we are adding a variety of other features. Many of these features will be checked back into the Apache open source, providing benefit. If you have interest in understanding our Hadoop roadmap, we would be happy to set up a more detailed meeting with our team to give you details.Note to field: There is an additional slide in the backup for the Intel Lustre file system distribution for another example of where Intel is contributing to Big Data, specifically in the area of open-source file systems for better performance.Intel Tools for Apache Hadoop – Getting under the Hood of Hadoop for tuning & insightHiTune: monitors key performance metrics on each server in cluster, then aggregates/correlates these low-level indicators w/high-level data flow models – providing insight into performance bottlenecks, hw problems, application hot spots and more.HiBench: Measure, validate & compare performance of Hadoop clusters across a variety of workloads. Cluster performance can be measured for specific/common tasks such as sorting, word counting, web searching and data analytics.Distributed Hadoop environments can be challenging to fine-tune because of the way the framework handles data partitioning, load balancing, fault tolerance, and other low-level operations that Hadoop structures automatically. Intel recently introduced two open-source tools—HiBench and HiTune—to help optimize Hadoop clusters for faster analytics.
Many (most) applications are single threaded, single address spaceMany (most) applications are written for a single address space.NGS-size data quickly pushes 1) and 2) beyond the capacity of a single nodeNeed multiple threads, A large memory footprint Some algorithms (SW as an example) scale quadratically with the size of the problemMotivating algorithmic substitution or hardware accelerationCloud - Building in house means capital equipment investment, DC operating costs, and fixed capacity for growing workloads Building in the cloud offers elastic hourly capacity expansion, but brings challenge around management, ease of use, and data movement How best to leverage cloud resources in HPC business process? As a service – Working subsets are growing too large to fit into available memoryMapping/aligning with BW and assembly with De Bruijn are good examplesMotivating algorithmic innovations and novel approaches to large memory computers. The amount of data barely fits into currently available disk space. (And soon might not ) Databases are distributed and will likely stay that wayMotivating much talk of “bringing the computing to the data”Of preprocessing for downstream upload, etc.
Cisco* UCS Server1 Intel® Xeon® 5600Dell*PowerEdge* C Series2 Intel Xeon 5500/5600The Dell | Cloudera* solution for Apache*Hadoop combines Dell servers and networking components with Cloudera’s Distribution Including Apache Hadoop (CDH), as well as management tools, training, technology support and professional services, to give customers a single source to deploy, manage, and scale a comprehensive Apache Hadoop-based stackOracle* Sun Fire* server3 Intel Xeon E7-4800Oracle Exalytics* In-Memory Machine, features the Oracle BI Foundation Suite and Oracle TimesTen In-Memory Database for Exalytics, enhanced for an Oracle server designed for in-memory analytics. Contains 1 Terabyte of RAM, 40 Gb/s InfiniBand and 10 Gb/s Ethernet connectivity, and Integrated Lights Out Management.
IMS Demo Unit Provided to BioTeam configured with:3 blades each with dual 5650 CPUs and 24GB of RAM & 4 GbE NICsDual Ethernet Switches7 x 600GB Intel 320 Series SSD drivesTurnkey solutionMiniLIMS + Local Analysis EnginePlan is to link to cloud resources: automatic backup & link to hosted MiniLIMSWill ship with Ion Torrent initallySolution for any lab needing LIMS
Cost to soon reach $1000 to sequence the full Genomehttp://www.youtube.com/watch?v=F27BvqqNcY4