Many organizations are struggling to understand Big Data, what it is, and how to best harness it. Generated by mobile devices, social media, click streams, machines, applications, and more, data is exploding at an exponential rate from sources that are increasingly complex and varied.
How do you manage and leverage both structured and unstructured data? How do you use advanced analytics to gain new insights, find anomalies, correlations, and answers that can transform the business?
Learn how enterprises are implementing Hadoop to get the answers to these questions and more.
Thank you for your time today. Today we’ll walk through a brief presentation to give you an overview of MapR. The high level summary of what we’ll talk about can be summarized in 3 points.
Hadoop is the leading technology for Big Data platform with the power to transform customer’s business
MapR gives you the most technologically advanced distribution for Hadoop
MapR has the product, services, and partner network to ensure production success and continued success.
Hadoop is making CIO’s rethink their data architecture. It is a fundamental shift in the economics of data storage/processing/analytics, and is opening up entirely new business opportunities. Let’s talk about 3 key trends we are seeing, as well as 3 realities or implications on your business and “readiness” to harness the power of big data and Hadoop.
The first trend is that the industry leaders have shown how to use big data to compete and win in their markets. It’s no longer a nice to have – you need big data to compete
Google pioneered MapReduce processing on commodity hardware and used that to catapult themselves to into the leading search engine even though they were 19th in the market
Yahoo! Leveraged these ideas to create Hadoop to keep up with Google and many mainstream companies have followed with new data-driven applications such as “people you may know” (started by LinkedIN and now used by Facebook, Twitter, and every social application), product recommendation engines, contextual and personalized music services (beats), measuring digital media effectiveness (comScore), serving more relevant/targeted ads(Comcast, rubicon project), fraud and risk detection, healthcare efficacy, and more
What makes the difference? A lot of attention is given to data science and developing sophisticated new algorithms, but in many cases just having more data beats better algorithms. (make point on collecting more consumer interaction as well as transaction data, as an example).
In addition, competitive advantage is decided by very small percentages. Just 1% improvement in fraud can mean hundreds $millions in savings. A ½% lift in advertising effectiveness means millions in new product sales and profitability. The same can be applied to customer churn, disease diagnosis, and more.
A second trend in enterprise architecture has been big data overwhelming the existing workload-specific systems which are in production. (list of requirements for each of these on the side in text)
People started with mainframes or operational systems which run ERP, finance, CRM and other mission-critical applications. They require… (pick out attributes you want to stress on the left)
You also have data warehouses, marts, data mining, and other analytical systems which pull data from these operational and other systems for providing insights to the business for decision making
The amount/variety of data has been overloading these systems. You reach a certain point as you try to ingest new types of data when these systems are not cost-effective to scale to terabytes or petabytes of data
Hadoop has become the defacto big data platform which allows organizations to keep up with big data and feed data-driven applications and processes
This chart shows the percentage growth of jobs from Indeed.com.
Compared to other popular technologies such as MongoDB and Cassandra, Hadoop is not only the fastest growing big data technology it’s one of the fastest growing technologies period.
Hadoop has the most robust ecosystem and momentum and is the big data platform of choice for industry-leading, data-driven companies
(Also of interest is that Indeed.com (which is a subsidiary of a Japanese-owned company) is a customer of MapR – they harness and analyze all of the job trends data using MapR)
The first reality is that as people put Hadoop into production, to relieve the pressure from other systems in their enterprise architecture it needs to reliable . Hadoop needs to be held to the same enterprise standards as your Oracle, SAP, Teradata, NetApp storage, or any other enterprise system.
Many organizations are putting Hadoop into their data center to provide (list of use cases underneath) … it can do all of this and more, but
For Hadoop to act as a system of record , it must provide the same guarantees for SLA’s, performance, data protection, and more
Most importantly, Hadoop has the potential for both analytics AND operations. It can be used to optimize the data warehouse provide batch data refining or storage. But Hadoop can provide many operational analytics or database operations/jobs when done right.
In a recent article by Tom Davenport (http://www.cmswire.com/cms/big-data/5-things-to-lessen-your-anxiety-about-big-data-024382.php) – he says
“Big data’s biggest wins come from making many small decisions vs. one that’s huge. The majority of big data driven decisions will be recurring, made at speed (in milliseconds), and at scale; actions will be taken automatically (vs. reviewed and approved by an individual). Examples include ad platforms making many constant adjustments, fraud detection on millions of transactions that are based on individual patterns, fleet management and routing taking into account current conditions….
This requires a Hadoop platform that can go beyond batch and support streaming writes so data can be constantly writing to the system while analysis is being conducted. High performance to meet the business needs and real-time operations the ability to perform online database operations to react to the business situation and impact business as it happens not report on it one week, month or quarter later.
To do this requires THE RIGHT ARCHITECTURE
96% of US internet traffic
Formerly used 2 other distros
Went to MapR to meet very high SLA’s and performance
Push messaging. Starbucks or ESPN applications, and others.
MapR is the only software that they pay for. Have HBase committers on staff.
Taken 8 applications clusters and moved into 1 MapR cluster; have 1 cluster with 8 sub-clusters running on different sets of nodes. Data placement control enables this.
Went from 12 CDH servers and cut it down to 6. Just for HBase tables. (They won’t use M7 since they are HBase committers. )
Verizon Teradata example
Less than 10% of CDR’s analyzed
More relevant local example Experian
Solutionary is a Managed Security Services provider with services that include network intrusion detection
----- Meeting Notes (3/27/14 11:12) -----
Zions Bank
Video - Phishing Attack
MapR is the Hadoop technology leader with over 500 paying customers and the largest production deployments in the world.
People like to think of Yahoo, Facebook, and LinkedIn as big Hadoop users, and they are, but you would expect this because of their deep engineering heritage. Mainstream organizations who want to leverage Hadoop without hiring armies of engineers turn to MapR. We have the largest retailer, largest financial services deployment, largest media, healthcare, and government agencies
Through a combination of Apache Hadoop community participation and a differentiated data platform, MapR lets organizations do more with Hadoop in both operational and analytical use cases that are expensive or impossible with other Hadoop distributions.
Again,
Hadoop is the leading technology for Big Data platform with the power to transform customer’s business
MapR gives you the most technologically advanced distribution for Hadoop
MapR has the product, services, and partner network to ensure production success and continued success.
-