Many modern businesses have IT infrastructures made up of products that have been bolted on and cobbled together over the years. These systems rarely talk to each other, and if they do, it’s even less likely the business is linking the data to be valuable for them, and valuable for you.(CLICK - ANIMATED BUBBLES) You can find varying definitions for Big Data nearly everywhere you look. Who’s got it right and why should you care? Potentially every definition has a piece of the truth – the one everyone can agree upon. (CLICK - ANIMATED RECTANGLE)
What types of answers are being derived from Big Data?{Talk about your sports car/train analogy – why people are turning to Hadoop}What is Hadoop good for? What’s a database good for? Hadoop is like the train, not as fast as the sportscar, but similar because it carries the whole load of answers to previously unsolvable problems. With Oracle and SQL, for example, you have speed, and some transactions have to happen quickly. But, only Hadoop does Big Data. When Hadoop comes into a datacenter it’s usually because the old systems cannot handle the volume and variety of data. Analytics & data processing -- the two workloads that really matter.Nothing scales this big or is this flexible. Nothing else can handle this variety of data. When it was created, Hadoop seemed magical.What types of previously unsolvable problems are being answered?{Touch on a couple of these examples to provide context.}
{Set the stage with this slide – do not get into the nitty-gritty of installation}. This is an massivelysimplified view of how you go about installing Apache Hadoop. It is a manual process and requires a great deal of expertise, not to mention that Hadoop is now implemented as a “stack.” Therefore installation is no longer just a matter of deploying MapReduce and HDFS (i.e “core Hadoop”). In fact, more than half (55%) of Hadoop users today have implemented 5 or more components (Hive, Pig, Zookeeper, etc.), and this number will continue to increase as the ecosystem grows.it’s the apps that run on top that are important. These analytical apps would be impossible without Hadoop. Per some of the use cases we just discussed, these apps help you find bad guys, help find fraud, etc.
{Refer to the installation steps from the previous slide and how, once in production, at a high level, a Hadoop deployment might look like this. Touch on some of the various ways customers are putting Hadoop into production. Transition with the cautionary tale, reiterated from the previous slide, that, while this slide might look simple, it is indeed a major undertaking to “go it alone.”}
As mentioned, installingHadoop isn’t just “that” simple. {CLICK – RECTANGLES FADE IN} People face challenges when deploying Hadoop on their own. We’re illustrating just some of the potential issues that could be encountered. {Talk to a couple of the challenges as they appear.}{CLICK – BLOCK FADES IN} One of the biggest challenges, along with having the expertise necessary to deploy Hadoop, is the length of time between the initiation of production and when you begin seeing value. This is one of the many reasons what we’re doing at Cloudera is so exciting. The empowerment we provide to the end user and ability to see results faster over the entire lifecycle of the deployment is honestly unparalleled. Let me share some specifics about how a distribution like Cloudera’s Distribution Including Apache Hadoop (aka CDH) truly helps you rapidly and efficiently operationalize Hadoop.
{Why does it make sense to consider a distribution like CDH? Why is CDH a visionary distribution? Beyond the selling points below, really help the audience understand why CDH is beyond what they could do for themselves or by adopting anyone else’s distribution. Do not comment on competitor faults, but focus on where CDH is going and Cloudera’s commitment to the adoption of Apache Hadoop with a distribution like CDH.}{You are going to deep dive on CDH, SCM Express and Cloudera Enterprise in the next three slides, so hit these next comments at a high level.}{CLICK}The “packaging” steps are done for you – CDH is an integrated Apache Hadoop-based stack * Contains all of the components needed for production use* Tested and packaged to work together{CLICK}It’s completely free and open source* Nothing proprietary* Leverages the rapid innovation by a community of 500+ developers Broad partner ecosystem – solution vendors are integrating with open source Apache Hadoop{CLICK}It streamlines your path to success with Hadoop* Simplified deployment with SCM Express* Wizard-based installation* Automated configuration based on best practices* Central management of services across the cluster Free download from Cloudera.com{CLICK}Rapidly “operationalize” your Apache Hadoop cluster with Cloudera Enterprise * Get comprehensive visibility and control over your cluster with the Cloudera Management Suite* Leverage Cloudera’s expertise to consistently meet or exceed your SLAs with Cloudera Support
While Cloudera’s solutions might be the right fit for you, there are many decisions to make that warrant serious consideration before you actually deploy Hadoop.{Don’t get down into the weeds on these. Stay future-focused – why should people be thinking about these things? How does Cloudera help them arrive at the answers? What is Cloudera doing and where are they going that could make answering these questions easier?}What are the use cases?* Beyond core Hadoop, which components will you need?* Where are the integration points?* How much capacity/performance do I need?Will the cluster be a shared resource?* Which group(s) will be active on the cluster?* How will cluster resources be provisioned?* What level of security is required?What service levels do I need to maintain?* Do the service levels differ from group to group?How will we manage and operate the cluster?* What capabilities do I need?* What tools/applications are available?* Build or buy?How equipped is my organization?* What is the level of in-house Hadoop expertise?* How do I fill the gaps (hiring, training)?Who will provide advanced support?{CLICK}How do I manage all of this within my time and budget constraints?{As you close this slide, lead the audience away from deployment and into lifecycle management.}
So, let’s talk a bit more technically about what Cloudera brings to the table to get you running quickly and reliably pulling real value from all of your data. CDH is the foundation to your deployment, because, as I mentioned, it contains all of the components needed for production use and is tested and packaged to work together….Apache Hadoop– reliable, scalable distributed computingApache Hive– SQL-like language and metadata repositoryApache Pig – High level language for expressing data analysis programsApache HBase– Hadoop database for random, real-time read/write accessApache Zookeeper – Highly reliable distributed coordination serviceApache Flume*– Distributed service for collecting and aggregating log and event dataApache Whirr*– Library for running Hadoop in the cloudApache Sqoop*– Integrating Hadoop with RDBMSApache Oozie* – Server-based workflow engine for Hadoop ActivitiesFuse-DFS – Module within Hadoop for mounting HDFS as a traditional file systemHue – Browser-based desktop interface for interacting with Hadoop
SCM Express is the key piece that helps to ensure your rapid deployment. It can help you provision a complete Apache Hadoop stack in minutes and centrally manages system services through a user-friendly interface for up to 50 notes. It is also available for free on cloudera.com. Let me explain some of the challenges people were facing specifically that led us to offering this solution…Now, let me dive into some of the key features that will speed your deployment…{CLICK for each feature – 5 clicks}
If you really want to spend your time finding the meaningful answers buried in your data, you don’t necessarily want to be concerned about Hadoop. The easier it is for you monitor and manage it over its entire lifecycle, the more you can focus on what matters to your organization – the things that keep you competitive, secure, relevant and productive.Getting tied up in the weeds of operational challenges can inhibit progress toward your organizational goals.{CLICK}Obviously, despite these challenges, we believe in the value of Apache Hadoop. We developed Cloudera Enterprise to that others can more easily recognize that value, not just in deployment efforts, but throughout its lifecycle in production.
So what exactly is it and how exactly does it help you utilize Apache Hadoop in a way that really unlocks your data….
{You may want to point out here the depth of expertise on your staff – do not point at competitors’ lack of depth, but really exploit Cloudera’s expertise.}
Once in production, your operation will likely resemble this on the surface (a very basic illustration of a complex environment), and, over time, there will be a natural ebb and flow of your Hadoop usage.Aside from making your deployment easier, we care about this ebb and flow of the full Hadoop lifecycle. As workloads shift, your team evolves and the types of questions you want to ask change, you want your system to easily be able to shift up and shift down the number of resources your spend on these things (acitivty monitor, for example, lets you build a picture of the general health and shifting workloads of the data; speeds of jobs; what jobs ran in the past). You need to be be able to shift your clusters. You also need to be able to plan for expansion.We let you start, and then operate and then grow or sunset pieces of the cluster over time. This is a steady-state, long term, ongoing platform that delivers value to admins. We not only let you not just get Hadoop, but use Hadoop long term.
If you would like to learn more about the Hadoop lifecycle, I encourage you to attend the next webcast in this series. Charles Zedlewski, our VP of Product, will explain how to plan for and manage the Apache Hadoop lifecycle inside a Cloudera deployment.{Make a mention of Hadoop World. Something simple along these lines, “Also, don’t forget that Hadoop World is coming up November 8th through the 9th. We encourage you to register now.”}In the meantime, I welcome questions on today’s presentation. We built in time to have a “discussion” on the topic of deploying Apache Hadoop.