SlideShare a Scribd company logo
1 of 17
WANdisco and Hadoop:
The Future of Big Data
     December 11, 2012
• WANdisco: Wide Area Network Distributed Computing
• Patented technology for active-active replication
• Leader in tools for software engineers (Subversion)
• No venture capital, angel investors or private equity funding
• Listed on the London Stock Exchange on June 1, 2012 in a highly
  successful IPO (LSE:WAND)
• Offices in San Ramon (CA), Boston (MA), Sheffield (UK), Belfast
  (UK), Chengdu (China), Tokyo (Japan)




                                                                    2
WANdisco Technology   Traditional Approach




                                             3
"unlike conventional solutions, the multi-site
computing system architecture does not rely on a central
transaction coordinator that is known to be a single-point-of-
failure."
                                                                  4
“Big Data is the new
       definitive source of
   competitive advantage
 across all industries. For
 those organizations that
understand and embrace
    the new reality of Big
 Data, the possibilities for
new innovation, improved
    agility, and increased
    profitability are nearly
                   endless”
                     - Wikibon




                             5
• Fixing specific problems:
    • Easy to use appliance
    • High availability
    • Disaster recovery / zero time
      to recovery (over a WAN)
• Highly differentiated
    • Nobody else can do
      active-active replication
      over a WAN




                                      6
Dr. Konstantin Shvachko
•   Co-founder of AltorStor, acquired by WANdisco
•   Was part of the team that invented Hadoop at Yahoo in 2006 and went on to
    become the Principal Big Data architect at eBay
•   Credited with the creation and maintenance of the Hadoop Distributed File
    System (HDFS), which is at the very core of both Hadoop and any replication
    solution for Hadoop
Jagane Sundar
•   Co-founder of AltorStor, acquired by WANdisco
•   Was responsible for conceiving, architecting and managing the development
    of AltoScale’s Hadoop As A Service platform before selling it to VertiCloud
•   Visionary behind AltoStor’s Cloud and Big Data Storage Appliance
•   Former Director of Hadoop Engineering at Yahoo! and managed the
    development of Hadoop 0.20.204 with Disk Fail In Place
Complementary IP and skills
•   Ideal fit with our patented active-active replication technology
•   Altostor founders faced problems (scaling, performance and high availability)
    we are planning to solve

                                                                                    7
• 24-by-7 Reliability, Availability, Scalability and Performance
• Planned as well as unplanned outages are extremely expensive
• Steep learning curve and dearth of trained specialists
• Many enterprises forced to rely on public cloud options such as Amazon
    •   Expensive hourly billing models
    •   Vendor lock-in with difficult migration paths
    •   Periodic availability and performance problems
    •   Data security concerns with cloud-based deployments
• Moving from batch model to real-time transaction model
• Our product suite will be designed to meet all of these challenges



                                                                           8
• Plug-and-play pre-packaged software eliminates the need for
  specialized Hadoop skills
• Wizard based deployment, monitoring and management
• Supports migration from Amazon to private in-house clouds
• S3-enabled filesystem unique to WANdisco’s AltoStor appliance
   • Allows searches for any kind of data (images, videos, etc.) based
      on descriptive characteristics
• HBase support for real-time transaction processing




                                                                         9
NameNode    NameNode
       NameNode




     HDFS Data




                       10
• Works over a LAN within a single data
  center
                                          NameNode    NameNode
• Works over a WAN across data centers
  thousands of miles apart

• Supports simultaneous read and write
  access on every server

                                              HDFS Data




                                                                 11
Public Cloud S3 Apps for
            the private cloud – e.g.
            JungleDisk, SmugMug,
             Senduit, Zmanda, etc.                                        Traditional Hadoop M/R
                                                    HBase Apps                     Apps

Step 1
  WANdisco AltoStor
     Appliance
  Hadoop Mgmt Server                   S3 API          HBase API        HDFS API            JobTracker
• Deploy Hadoop(s)
• Manage Hadoop                                              AltoStor Hadoop
• Monitor Hadoop
                                     Step 3



    Enterprise                                     Physical (e.g. rack of Dell servers) or
     Active                                        Virtual Infrastructure (e.g. VMware VI)
    Directory
                   Step 2
                                              for WANdisco AltoStor to use in building Hadoop

                                                                                                         12
• WANdisco AltoStor appliance has the capability to deploy on virtualized
  infrastructure such as VMware
• Advantages:
    • Extra level of reliability
    • Elasticity – live cluster shrinking and expansion
    • Extremely high hardware resource utilization
    • Resource isolation
    • Ease of management – preconfigured VMs




                                                                            13
HDFS Clients



HDFS Cluster
                                       DataNodes


      Active
                   Standby
    NameNode
                  NameNode




                      Shared Storage



                                                   14
Client                                 Client                                     Client




      Active                               Active                                       Active
    NameNode                             NameNode                             …       NameNode
                    Proposal Handler




                                                           Proposal Handler




                                                                                                      Proposal Handler
                                                                                  Dispatch
                                       Dispatch
Dispatch




                                          WANdisco PAXOS


                                                                                                                         15
Questions and
  Answers
Thank You


http://blogs.wandisco.com/autho
r/jagane-sundar

Visit us www.wandisc.com

More Related Content

What's hot

Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
larsgeorge
 

What's hot (20)

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recovery
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 

Viewers also liked

Business Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery NotesBusiness Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery Notes
Alan McSweeney
 
Disaster Recovery Presentation
Disaster Recovery PresentationDisaster Recovery Presentation
Disaster Recovery Presentation
TimSchaefer
 
An Introduction to Disaster Recovery Planning
An Introduction to Disaster Recovery PlanningAn Introduction to Disaster Recovery Planning
An Introduction to Disaster Recovery Planning
NEBizRecovery
 
Disaster Recovery Plan for IT
Disaster Recovery Plan for ITDisaster Recovery Plan for IT
Disaster Recovery Plan for IT
hhuihhui
 

Viewers also liked (9)

Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...
AWS re:Invent 2016: Disaster Recovery and Business Continuity for Systemicall...
 
Business Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery NotesBusiness Continuity And Disaster Recovery Notes
Business Continuity And Disaster Recovery Notes
 
Disaster Recovery Presentation
Disaster Recovery PresentationDisaster Recovery Presentation
Disaster Recovery Presentation
 
An Introduction to Disaster Recovery Planning
An Introduction to Disaster Recovery PlanningAn Introduction to Disaster Recovery Planning
An Introduction to Disaster Recovery Planning
 
The A to Z Guide to Business Continuity and Disaster Recovery
The A to Z Guide to Business Continuity and Disaster RecoveryThe A to Z Guide to Business Continuity and Disaster Recovery
The A to Z Guide to Business Continuity and Disaster Recovery
 
Disaster Recovery Plan for IT
Disaster Recovery Plan for ITDisaster Recovery Plan for IT
Disaster Recovery Plan for IT
 

Similar to Hadoop and WANdisco: The Future of Big Data

Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
DataWorks Summit
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFS
GlusterFS
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
Richard McDougall
 

Similar to Hadoop and WANdisco: The Future of Big Data (20)

Why Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataWhy Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueData
 
13. The Transition to IPv6 and the Necessity for IP Address Management - Frey...
13. The Transition to IPv6 and the Necessity for IP Address Management - Frey...13. The Transition to IPv6 and the Necessity for IP Address Management - Frey...
13. The Transition to IPv6 and the Necessity for IP Address Management - Frey...
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment Evolution
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph Replication
 
Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceLiberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
 
Inside Triton, July 2015
Inside Triton, July 2015Inside Triton, July 2015
Inside Triton, July 2015
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFS
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 

More from WANdisco Plc

Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
WANdisco Plc
 
02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for Development02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for Development
WANdisco Plc
 

More from WANdisco Plc (13)

Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
 
Forrester On Using Subversion to Optimize Globally Distributed Development
Forrester On Using Subversion to Optimize Globally Distributed DevelopmentForrester On Using Subversion to Optimize Globally Distributed Development
Forrester On Using Subversion to Optimize Globally Distributed Development
 
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & Merging03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
 
02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.28.13 WANDisco SVN Training: Getting Info Out of SVN02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.28.13 WANDisco SVN Training: Getting Info Out of SVN
 
02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for Development02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for Development
 
uberSVN introduction by WANdisco
uberSVN introduction by WANdiscouberSVN introduction by WANdisco
uberSVN introduction by WANdisco
 
Subversion Zen
Subversion ZenSubversion Zen
Subversion Zen
 
WANdisco Subversion Support Services
WANdisco Subversion Support ServicesWANdisco Subversion Support Services
WANdisco Subversion Support Services
 
Make Subversion Agile
Make Subversion AgileMake Subversion Agile
Make Subversion Agile
 
Why Svn
Why SvnWhy Svn
Why Svn
 
Subversion in 2010 and Beyond
Subversion in 2010 and BeyondSubversion in 2010 and Beyond
Subversion in 2010 and Beyond
 
Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Optimizing Globally Distributed Software Development Us...Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Optimizing Globally Distributed Software Development Us...
 
Forrester Research on Globally Distributed Development Using Subversion
Forrester Research on Globally Distributed Development Using SubversionForrester Research on Globally Distributed Development Using Subversion
Forrester Research on Globally Distributed Development Using Subversion
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Hadoop and WANdisco: The Future of Big Data

  • 1. WANdisco and Hadoop: The Future of Big Data December 11, 2012
  • 2. • WANdisco: Wide Area Network Distributed Computing • Patented technology for active-active replication • Leader in tools for software engineers (Subversion) • No venture capital, angel investors or private equity funding • Listed on the London Stock Exchange on June 1, 2012 in a highly successful IPO (LSE:WAND) • Offices in San Ramon (CA), Boston (MA), Sheffield (UK), Belfast (UK), Chengdu (China), Tokyo (Japan) 2
  • 3. WANdisco Technology Traditional Approach 3
  • 4. "unlike conventional solutions, the multi-site computing system architecture does not rely on a central transaction coordinator that is known to be a single-point-of- failure." 4
  • 5. “Big Data is the new definitive source of competitive advantage across all industries. For those organizations that understand and embrace the new reality of Big Data, the possibilities for new innovation, improved agility, and increased profitability are nearly endless” - Wikibon 5
  • 6. • Fixing specific problems: • Easy to use appliance • High availability • Disaster recovery / zero time to recovery (over a WAN) • Highly differentiated • Nobody else can do active-active replication over a WAN 6
  • 7. Dr. Konstantin Shvachko • Co-founder of AltorStor, acquired by WANdisco • Was part of the team that invented Hadoop at Yahoo in 2006 and went on to become the Principal Big Data architect at eBay • Credited with the creation and maintenance of the Hadoop Distributed File System (HDFS), which is at the very core of both Hadoop and any replication solution for Hadoop Jagane Sundar • Co-founder of AltorStor, acquired by WANdisco • Was responsible for conceiving, architecting and managing the development of AltoScale’s Hadoop As A Service platform before selling it to VertiCloud • Visionary behind AltoStor’s Cloud and Big Data Storage Appliance • Former Director of Hadoop Engineering at Yahoo! and managed the development of Hadoop 0.20.204 with Disk Fail In Place Complementary IP and skills • Ideal fit with our patented active-active replication technology • Altostor founders faced problems (scaling, performance and high availability) we are planning to solve 7
  • 8. • 24-by-7 Reliability, Availability, Scalability and Performance • Planned as well as unplanned outages are extremely expensive • Steep learning curve and dearth of trained specialists • Many enterprises forced to rely on public cloud options such as Amazon • Expensive hourly billing models • Vendor lock-in with difficult migration paths • Periodic availability and performance problems • Data security concerns with cloud-based deployments • Moving from batch model to real-time transaction model • Our product suite will be designed to meet all of these challenges 8
  • 9. • Plug-and-play pre-packaged software eliminates the need for specialized Hadoop skills • Wizard based deployment, monitoring and management • Supports migration from Amazon to private in-house clouds • S3-enabled filesystem unique to WANdisco’s AltoStor appliance • Allows searches for any kind of data (images, videos, etc.) based on descriptive characteristics • HBase support for real-time transaction processing 9
  • 10. NameNode NameNode NameNode HDFS Data 10
  • 11. • Works over a LAN within a single data center NameNode NameNode • Works over a WAN across data centers thousands of miles apart • Supports simultaneous read and write access on every server HDFS Data 11
  • 12. Public Cloud S3 Apps for the private cloud – e.g. JungleDisk, SmugMug, Senduit, Zmanda, etc. Traditional Hadoop M/R HBase Apps Apps Step 1 WANdisco AltoStor Appliance Hadoop Mgmt Server S3 API HBase API HDFS API JobTracker • Deploy Hadoop(s) • Manage Hadoop AltoStor Hadoop • Monitor Hadoop Step 3 Enterprise Physical (e.g. rack of Dell servers) or Active Virtual Infrastructure (e.g. VMware VI) Directory Step 2 for WANdisco AltoStor to use in building Hadoop 12
  • 13. • WANdisco AltoStor appliance has the capability to deploy on virtualized infrastructure such as VMware • Advantages: • Extra level of reliability • Elasticity – live cluster shrinking and expansion • Extremely high hardware resource utilization • Resource isolation • Ease of management – preconfigured VMs 13
  • 14. HDFS Clients HDFS Cluster DataNodes Active Standby NameNode NameNode Shared Storage 14
  • 15. Client Client Client Active Active Active NameNode NameNode … NameNode Proposal Handler Proposal Handler Proposal Handler Dispatch Dispatch Dispatch WANdisco PAXOS 15
  • 16. Questions and Answers

Editor's Notes

  1. SPOFs – Name Node, Hbase, YARN
  2. SPOFs – Name Node, Hbase, YARN