SlideShare una empresa de Scribd logo
1 de 11
Combining Hadoop with
Big Data Analytics
Page 2 of 11 Sponsored by
Combining Hadoop with
Big Data Analytics
Contents
Hadoop for Big Data
Puts Architects on
Journeyof Discovery
‘Big Data’ Analytics
Programs Require Tech
Savvy, BusinessKnow-
How
This E-Guide explores the growing popularity of
leveraging Hadoop, an open source software framework, to
tackle the challengesof big data analytics, aswellassome of
the key businessand technicalbarriersto adoption. Gain
expertadvice on the skill sets and rolesneeded for successful
big data analyticsprograms,and discoverhoworganizations
can design and assemble the rightteam.
Hadoop for Big Data Puts ArchitectsonJourney of
Discovery
By: Jessica Twentyman, Contributor
―Problems don‘t care how you solve them,‖ said James Kobielus, an analyst
with Forrester Research, in a recent blog on the topic of ‗big data‘. ―The only
thing that matters is that you do indeed solve them, using any tools or
approaches at your disposal.‖
In recent years, that imperative -- to solve the problems posed by big data --
has led data architects at many organizations on a journey of discovery. Put
simply, the traditional database and business intelligence tools that they
routinely use to slice and dice corporate data are just not up to the task of
handling big data.
To put the challenge into perspective, think back a decade: few enterprise
data warehouses grew as large as a terabyte. By 2009, Forrester analysts
were reporting that over two-thirds of Enterprise Data Warehouses (EDWs)
deployed were in the 1 to 10-terabyte range. By 2015, they claim, a majority
of EDWs in large organizations will be 100 TB or larger, ―with petabyte-scale
EDWs becoming well entrenched in sectors such as telecoms, financial
services and web commerce.‖
What‘s needed to analyze these big data stores are special new tools and
approaches that deliver ―extremely scalable analytics,‖ said Kobielus. By
―extremely scalable,‖ he means analytics that can deal with data that stands
Page 3 of 11 Sponsored by
Combining Hadoop with
Big Data Analytics
Contents
Hadoop for Big Data
Puts Architects on
Journeyof Discovery
‘Big Data’ Analytics
Programs Require Tech
Savvy, BusinessKnow-
How
out in four key ways: for its volume (from hundreds of terabytes to petabytes
and beyond); its velocity (up to and including real-time, sub-second delivery);
its variety (diverse structured, unstructured and semi-structured formats); and
its volatility (wherein scores to hundreds of new data sources come and go
from new applications, services, social networks and so on).
Hadoop for Big Data on the Rise
One of the principal approaches to emerge in recent years has been Apache
Hadoop, an open source software framework that supports data-intensive
distributed analytics involving thousands of nodes and petabytes of data.
The underlying technology was originally invented at search engine giant
Google by in-house developers looking for a way to usefully index textual
data and other ―rich‖ information and present it back to users in meaningful
ways. They called this technology MapReduce – and today‘s Hadoop is an
open-source version, used by data architects for analytics that are deep and
computationally intensive.
In a recent survey of enterprise Hadoop users conducted by Ventana
Research on behalf of Hadoop specialist Karmasphere, 94% of Hadoop
users reported that they can now perform analyses on large volumes of data
that weren‘t possible before; 88% said that they analyze data in greater
detail; and 82% can now retain more of their data.
It‘s still a niche technology, but Hadoop‘s profile received a serious boost
over that past year, thanks in part to start-up companies such as Cloudera
and MapR that offer commercially licensed and supported distributions of
Hadoop. Its growing popularity is also the result of serious interest shown by
EDW vendors like EMC, IBM and Teradata. EMC bought Hadoop specialist
Greenplum in June 2010; Teradata announced its acquisition of Aster Data in
March 2011; and IBM announced its own Hadoop offering, Infosphere, in
May 2011.
What stood out about Greenplum for EMC was its x86-based, scale-out
MPP, shared-nothing design, said Mark Sears, architect at the new
Page 4 of 11 Sponsored by
Combining Hadoop with
Big Data Analytics
Contents
Hadoop for Big Data
Puts Architects on
Journeyof Discovery
‘Big Data’ Analytics
Programs Require Tech
Savvy, BusinessKnow-
How
organization EMC Greenplum. ―Not everyone requires a set-up like that, but
more and more customers do,‖ he said.
―In many cases,‖ he continued ―these are companies that are already adept
at analysis, at mining customer data for buying trends, for example. In the
past, however, they may have had to query a random sample of data relating
to 500,000 customers from an overall pool of data of 5 million customers.
That opens up the risk that they might miss something important. With a big
data approach, it‘s perfectly possible and relatively cost-effective to query
data relating to all 5 million.‖
Hadoop Ill Understood by the Business
While there‘s a real buzz around Hadoop among technologists, it‘s still not
widely understood in business circles. In the simplest terms, Hadoop is
engineered to run across a large number of low-cost commodity servers and
distribute data volumes across that cluster, keeping track of where individual
pieces of data reside using the Hadoop Distributed File System (HDFS).
Analytic workloads are performed across the cluster, in a massively parallel
processing (MPP) model, using tools such as Pig and Hive. Results,
meanwhile, are delivered as a unified whole.
Earlier this year, Gartner analyst Marcus Collins mapped out some of the
ways he‘s seeing Hadoop used today: by financial services companies, to
discover fraud patterns in years of credit-card transaction records; by mobile
telecoms providers, to identify customer churn patterns; by academic
researchers, to identify celestial objects from telescope imagery.
―The cost-performance curve of commodity servers and storage is putting
seemingly intractable complex analysis on extremely large volumes of data
within the budget of an increasingly large number of enterprises,‖ he
concluded. ―This technology promises to be a significant competitive
advantage to early adopter organizations.‖
Early adopters include daily deal site Groupon, which uses the Cloudera
distribution of Hadoop to analyse transaction data generated by over 70
million registered users worldwide, and NYSE Euronext, which operates the
Page 5 of 11 Sponsored by
Combining Hadoop with
Big Data Analytics
Contents
Hadoop for Big Data
Puts Architects on
Journeyof Discovery
‘Big Data’ Analytics
Programs Require Tech
Savvy, BusinessKnow-
How
New York stock exchange as well as other equities and derivatives markets
across Europe and the US. NYSE Euronext uses EMC Greenplum to
manage transaction data that is growing, on average, by as much as 2TB
each day. US bookseller Barnes & Noble, meanwhile, is using Aster Data
nCluster (now owned by Teradata) to better understand customer
preferences and buying patterns across three sales channels: its retail
outlets, its online store and e-reader downloads.
Skills Deficit in Big Data Analytics
While Hadoop‘s cost advantage might give it widespread appeal, the skills
challenge it presents is more daunting, according to Collins. ―Big data
analytics is an emerging field, and insufficiently skilled people exist within
many organizations or the [wider] job market,‖ he said.
Skills-related issues that stand in the way of Hadoop adoption include the
technical nature of the MapReduce framework, which requires users to
develop Java code, and the lack of institutional knowledge on how to
architect the Hadoop infrastructure. Another impediment to adoption is the
lack of support for analysis tools used to examine data residing within the
Hadoop architecture.
That, however, is starting to change. For a start, established BI tools vendors
are adding support for Apache Hadoop: one example is Pentaho, which
introduced support in May 2010 and, subsequently, specific support for the
EMC Greenplum distribution.
Another sign of Hadoop‘s creep towards the mainstream is support from data
integration suppliers, such as Informatica. One of the most common uses of
Hadoop to date has been as an engine for the ‗transformation‘ stage of ETL
(extraction, transformation, loading) processes: the MapReduce technology
lends itself well to the task of preparing data for loading and further analysis
in both convential RDBMS-based data warehouses or those based on
HDFS/Hive. In response, Informatica announced that integration is underway
between EMC Greenplum and its own Data Integration Platform in May
2011.
Page 6 of 11 Sponsored by
Combining Hadoop with
Big Data Analytics
Contents
Hadoop for Big Data
Puts Architects on
Journeyof Discovery
‘Big Data’ Analytics
Programs Require Tech
Savvy, BusinessKnow-
How
Additionally, there‘s the emergence of Hadoop appliances, as evidenced by
EMC‘s Greenplum HD (a bundle that combines MapR‘s version of Hadoop,
the Greenplum database and a standard x86 based server), announced in
May, and more recently (August 2011), the Dell/Cloudera Solution (which
combines Dell PowerEdge C2100 servers and PowerConnect switches with
Cloudera‘s Hadoop distribution and its Cloudera Enterprise management
tools).
Finally, Hadoop is proving wellsuited to deployment in cloud environments, a
fact which gives IT teams the chance to experiment with pilot projects on
cloud infrastructures. Amazon, for example, offers Amazon Elastic
MapReduce as a hosted service running on its Amazon Elastic Compute
Cloud (EC2) and Amazon Simple Storage Service (S3). The Apache
distribution of Hadoop, moreover, now comes with a set of tools designed to
make it easier to deploy and run on EC2.
―Running Hadoop on EC2 is particularly appropriate where the data already
resides within Amazon S3,‖ said Collins. If the data does not reside on
Amazon S3, he said, then it must be transferred: the Amazon Web Services
(AWS) pricing model includes charges for network costs and Amazon Elastic
MapReduce pricing is in addition to normal Amazon EC3 and S3 pricing.
―Given the volume of data required to run big data analytics, the cost of
transferring data and running the analysis should be carefully considered,‖ he
cautioned.
The bottom line is that ―Hadoop is the future of the EDW and its footprint in
companies‘ core EDW architectures is likely to keep growing throughout this
decade,‖ said Kobielus.
‘Big Data’ Analytics Programs Require TechSavvy, Business
Know-How
By: Rick Sherman, Contributor
Articles and case studies written about ―big data‖ analytics programs
proclaim the successes of organizations, typically with a focus on the
Page 7 of 11 Sponsored by
Combining Hadoop with
Big Data Analytics
Contents
Hadoop for Big Data
Puts Architects on
Journeyof Discovery
‘Big Data’ Analytics
Programs Require Tech
Savvy, BusinessKnow-
How
technologies used to build the systems. Product information is useful, of
course, but it‘s also critical for enterprises embarking on these projects to be
sure they have people with the right skills and experience in place to
successfully leverage big data analytics tools.
Knowledge of new or emerging big data technologies, such as Hadoop, is
often required, especially if a project involves unstructured or semi-structured
data. But even in such cases, Hadoop know-how is only a small portion of
the overall staffing requirements; skills are also needed in areas such as
business intelligence (BI), data integration and data warehousing. Business
and industry expertise is another must-have for a successful big data
analytics program, and many organizations also require advanced analytics
skills, such as predictive analytics and data mining capabilities.
Data scientist is a new job title that is getting some industry buzz. But it‘s
really a combination of skills from the areas listed above, including predictive
modeling, both statistical and business analysis, and data integration
development. I‘ve seen some job postings for data scientists that require a
statistical degree as well as an industry-specific one. Having both degrees
and matching professional experience is great, but what are the chances of
finding job candidates who do? Pretty slim!
Taking a more realistic look at what to plan for, I describe below the key skill
sets and roles that should be part of big data analytics deployments. Instead
of listing specific job titles, I‘ve kept it general; organizations can bundle
together the required roles and responsibilities in various ways based on the
capabilities and experience levels of their existing workforce and new
employees that they hire.
Business knowledge. In addition to technical skills, companies engaging in
big data analytics projects need to involve people with extensive business
and industry expertise. That must include knowledge of a company‘s own
business strategy and operations as well as its competitors, current industry
conditions and emerging trends, customer demographics and both macro-
and microeconomic factors.
Page 8 of 11 Sponsored by
Combining Hadoop with
Big Data Analytics
Contents
Hadoop for Big Data
Puts Architects on
Journeyof Discovery
‘Big Data’ Analytics
Programs Require Tech
Savvy, BusinessKnow-
How
Much of the business value derived from big data analytics comes not from
textbook key performance indicators but rather from insights and metrics
gleaned as part of the analytics process. That process is part science (i.e.,
statistics) and part art, with users doing what-if analysis to gain actionable
information about an organization‘s business operations. Such findings are
only possible with the participation of business managers and workers who
have firsthand knowledge of business strategies and issues.
Business analysts. This group also has an important role to play in helping
organizations to understand the business ramifications of big data. In
addition to doing analytics themselves, business analysts can be tasked with
things such as gathering business and data requirements for a project,
helping to design dashboards, reports and data visualizations for presenting
analytical findings to business users and assisting in measuring the business
value of the analytics program.
BI developers. These people work with the business analysts to build the
required dashboards, reports and visualizations for business users. In
addition, depending on internal needs and the BI tools that an organization is
using, they can enable self-service BI capabilities by preparing the data and
the required BI templates for use by business executives and workers.
Predictive model builders. In general, predictive models for analyzing big
data need to be custom-built. Deep statistical skills are essential to creating
good models; too often, companies underestimate the required skill level and
hire people who have used statistical tools, including models developed by
others, but don‘t have knowledge of the mathematical principles underlying
predictive models.
Predictive modelers also need some business knowledge and an
understanding of how to gather and integrate data into models, although in
both cases they can leverage other people‘s more extensive expertise in
creating and refining models. Another key skill for modelers to have is the
ability to assess how comprehensive, clean and current the available data is
before building models. There often are data gaps, and a modeler has to be
able to close them.
Page 9 of 11 Sponsored by
Combining Hadoop with
Big Data Analytics
Contents
Hadoop for Big Data
Puts Architects on
Journeyof Discovery
‘Big Data’ Analytics
Programs Require Tech
Savvy, BusinessKnow-
How
Don‘t be lulled into thinking that tools alone spell big data success. The real
key to success is ensuring that your organization has the skills it needs.
Data architects. A big data analytics project needs someone to design the
data architecture and guide its development. Typically, data architects will
need to incorporate various data structures into an architecture, along with
processes for capturing and storing data and making it available for analysis.
This role involves traditional IT skills such as data modeling, data profiling,
data quality, data integration and data governance.
Data integration developers. Data integration is important enough to
require its own developers. These folks design and develop integration
processes to handle the full spectrum of data structures in a big data
environment. In doing so, they ideally ensure that integration is done not in
silos but as part of a comprehensive data architecture.
It‘s also best to use packaged data integration tools that support multiple
forms of data, including structured, unstructured and semi-structured
sources. Avoid the temptation to develop custom code to handle extract,
transform and load operations on pools of big data; hand coding can
increase a project‘s overall costs -- not to mention the likelihood of an
organization being saddled with undocumented data integration processes
that can‘t scale up as data volumes continue to grow.
Technology architects. This role involves designing the underlying IT
infrastructure that will support a big data analytics initiative. In addition to
understanding the principles of designing traditional data warehousing and BI
systems, technology architects need to have expertise in the new
technologies that often are used in big data projects. And if an organization
plans to implement open source tools, such as Hadoop, the architects need
to understand how to configure, design, develop and deploy such systems.
It can be exciting to tackle a big data analytics project and to acquire new
technologies to support the analytics process. But don‘t be lulled into thinking
that tools alone spell big data success. The real key to success is ensuring
Page 10 of 11 Sponsored by
Combining Hadoop with
Big Data Analytics
Contents
Hadoop for Big Data
Puts Architects on
Journeyof Discovery
‘Big Data’ Analytics
Programs Require Tech
Savvy, BusinessKnow-
How
that your organization has the skills it needs to effectively manage large and
varied data sets as well as the analytics that will be used to derive business
value from the available data.
About the Author:
Rick Sherman is the founder of Athena IT Solutions, which provides
consulting, training and vendor services on business intelligence, data
integration and data warehousing. Sherman has written more than 100
articles and spoken at dozens of events and webinars; he also is an adjunct
faculty member at Northeastern University‘s Graduate School of Engineering.
He blogs at The Data Doghouse and can be reached at rsherman@athena-
solutions.com.
Page 11 of 11 Sponsored by
Combining Hadoop with
Big Data Analytics
Contents
Hadoop for Big Data
Puts Architects on
Journeyof Discovery
‘Big Data’ Analytics
Programs Require Tech
Savvy, BusinessKnow-
How
Free resourcesfor technologyprofessionals
TechTarget publishes targeted technology media that address your need for
information and resources for researching products, developing strategy and
making cost-effective purchase decisions. Our network of technology-specific
Web sites gives you access to industry experts, independent content and
analysis and the Web‘s largest library of vendor-provided white papers,
webcasts, podcasts, videos, virtual trade shows, research reports and more
—drawing on the rich R&D resources of technology providers to address
market trends, challenges and solutions. Our live events and virtual seminars
give you access to vendor neutral, expert commentary and advice on the
issues and challenges you face daily. Our social community IT Knowledge
Exchange allows you to share real world information in real time with peers
and experts.
What makes TechTarget unique?
TechTarget is squarely focused on the enterprise IT space. Our team of
editors and network of industry experts provide the richest, most relevant
content to IT professionals and management. We leverage the immediacy of
the Web, the networking and face-to-face opportunities of events and virtual
events, and the ability to interact with peers—all to create compelling and
actionable information for enterprise IT professionals across all industries
and markets.
Related TechTarget Websites

Más contenido relacionado

La actualidad más candente

Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
The Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleThe Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleVasu S
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017Ray Bugg
 
Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects IJMER
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelEditor IJCATR
 
IIA: The Current State of Hadoop in the Enterprise
IIA: The Current State of Hadoop in the EnterpriseIIA: The Current State of Hadoop in the Enterprise
IIA: The Current State of Hadoop in the EnterpriseCoy Dean
 
Big dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondBig dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondPatrick Bouillaud
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop SampleAlan Quayle
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
The Forrester Wave - Big Data Hadoop
The Forrester Wave - Big Data HadoopThe Forrester Wave - Big Data Hadoop
The Forrester Wave - Big Data HadoopIBM Software India
 
Hadoop for Finance - sample chapter
Hadoop for Finance - sample chapterHadoop for Finance - sample chapter
Hadoop for Finance - sample chapterRajiv Tiwari
 
Lokesh_Kansal_Resume
Lokesh_Kansal_ResumeLokesh_Kansal_Resume
Lokesh_Kansal_ResumeLokesh Kansal
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseDataWorks Summit
 
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr..."Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...Dataconomy Media
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Donghui Zhang
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data WarehousingThomas Kejser
 
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012m_hepburn
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive AnalyticsInfochimps, a CSC Big Data Business
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise DataWorks Summit
 

La actualidad más candente (20)

Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
The Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleThe Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | Qubole
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
 
Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects Influence of Hadoop in Big Data Analysis and Its Aspects
Influence of Hadoop in Big Data Analysis and Its Aspects
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
IIA: The Current State of Hadoop in the Enterprise
IIA: The Current State of Hadoop in the EnterpriseIIA: The Current State of Hadoop in the Enterprise
IIA: The Current State of Hadoop in the Enterprise
 
Big dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondBig dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyond
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
The Forrester Wave - Big Data Hadoop
The Forrester Wave - Big Data HadoopThe Forrester Wave - Big Data Hadoop
The Forrester Wave - Big Data Hadoop
 
Hadoop for Finance - sample chapter
Hadoop for Finance - sample chapterHadoop for Finance - sample chapter
Hadoop for Finance - sample chapter
 
Lokesh_Kansal_Resume
Lokesh_Kansal_ResumeLokesh_Kansal_Resume
Lokesh_Kansal_Resume
 
Integrating Hadoop Into the Enterprise
Integrating Hadoop Into the EnterpriseIntegrating Hadoop Into the Enterprise
Integrating Hadoop Into the Enterprise
 
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr..."Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
 
Ibm big data
Ibm big dataIbm big data
Ibm big data
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012The Forrester Wave Enterprise Hadoop Solutions Q1 2012
The Forrester Wave Enterprise Hadoop Solutions Q1 2012
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
 

Destacado

Destacado (7)

Big Tree Architects - Portfolio
Big Tree Architects - PortfolioBig Tree Architects - Portfolio
Big Tree Architects - Portfolio
 
Bjarke Myrthu: Fra maskine til menneske
Bjarke Myrthu: Fra maskine til menneskeBjarke Myrthu: Fra maskine til menneske
Bjarke Myrthu: Fra maskine til menneske
 
Bjarke ingels ppt
Bjarke ingels pptBjarke ingels ppt
Bjarke ingels ppt
 
Vilhelmsro primary school
Vilhelmsro primary schoolVilhelmsro primary school
Vilhelmsro primary school
 
8 House - Big architects
8 House - Big architects8 House - Big architects
8 House - Big architects
 
The Use of Precedent in Design
The Use of Precedent in DesignThe Use of Precedent in Design
The Use of Precedent in Design
 
Bjarke Ingels Group - Resumen de obras
Bjarke Ingels Group - Resumen de obrasBjarke Ingels Group - Resumen de obras
Bjarke Ingels Group - Resumen de obras
 

Similar a Combining hadoop with big data analytics

Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoopAnusha sweety
 
Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013nkabra
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
 
The Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosThe Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosSenturus
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED
 
Hadoop Twelve Predictions for 2012
Hadoop Twelve Predictions for 2012Hadoop Twelve Predictions for 2012
Hadoop Twelve Predictions for 2012Cloudera, Inc.
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesIJRESJOURNAL
 
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
Big Data Systems: Past, Present &  (Possibly) Future with @techmilindBig Data Systems: Past, Present &  (Possibly) Future with @techmilind
Big Data Systems: Past, Present & (Possibly) Future with @techmilindEMC
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceAssignment Help
 
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET Journal
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 

Similar a Combining hadoop with big data analytics (20)

Hadoop Overview
Hadoop OverviewHadoop Overview
Hadoop Overview
 
Big Data
Big DataBig Data
Big Data
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Hadoop
HadoopHadoop
Hadoop
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
The Big Picture on Big Data and Cognos
The Big Picture on Big Data and CognosThe Big Picture on Big Data and Cognos
The Big Picture on Big Data and Cognos
 
Big Data 2.0
Big Data 2.0Big Data 2.0
Big Data 2.0
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Big Data
Big DataBig Data
Big Data
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43
 
Hadoop Twelve Predictions for 2012
Hadoop Twelve Predictions for 2012Hadoop Twelve Predictions for 2012
Hadoop Twelve Predictions for 2012
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and Perspectives
 
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
Big Data Systems: Past, Present &  (Possibly) Future with @techmilindBig Data Systems: Past, Present &  (Possibly) Future with @techmilind
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant Resource
 
Hadoop Does Not Equal Big Data
Hadoop Does Not Equal Big Data Hadoop Does Not Equal Big Data
Hadoop Does Not Equal Big Data
 
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 

Más de The Marketing Distillery

Internet of Things: manage the complexity, seize the opportunity
Internet of Things: manage the complexity, seize the opportunityInternet of Things: manage the complexity, seize the opportunity
Internet of Things: manage the complexity, seize the opportunityThe Marketing Distillery
 
7 steps to business success on the Internet of Things
7 steps to business success on the Internet of Things7 steps to business success on the Internet of Things
7 steps to business success on the Internet of ThingsThe Marketing Distillery
 
Capitalizing on the Internet of Things: a primer
Capitalizing on the Internet of Things: a primerCapitalizing on the Internet of Things: a primer
Capitalizing on the Internet of Things: a primerThe Marketing Distillery
 
From the Internet of Computers to the Internet of Things
From the Internet of Computers to the Internet of ThingsFrom the Internet of Computers to the Internet of Things
From the Internet of Computers to the Internet of ThingsThe Marketing Distillery
 
Smart networked objects and the Internet of Things
Smart networked objects and the Internet of ThingsSmart networked objects and the Internet of Things
Smart networked objects and the Internet of ThingsThe Marketing Distillery
 
Enhancing intelligence with the Internet of Things
Enhancing intelligence with the Internet of ThingsEnhancing intelligence with the Internet of Things
Enhancing intelligence with the Internet of ThingsThe Marketing Distillery
 
M2M innovations invigorate warehouse management
M2M innovations invigorate warehouse managementM2M innovations invigorate warehouse management
M2M innovations invigorate warehouse managementThe Marketing Distillery
 
How Big Data can help optimize business marketing efforts
How Big Data can help optimize business marketing effortsHow Big Data can help optimize business marketing efforts
How Big Data can help optimize business marketing effortsThe Marketing Distillery
 

Más de The Marketing Distillery (20)

The M2M platform for a connected world
The M2M platform for a connected worldThe M2M platform for a connected world
The M2M platform for a connected world
 
Internet of Things: manage the complexity, seize the opportunity
Internet of Things: manage the complexity, seize the opportunityInternet of Things: manage the complexity, seize the opportunity
Internet of Things: manage the complexity, seize the opportunity
 
7 steps to business success on the Internet of Things
7 steps to business success on the Internet of Things7 steps to business success on the Internet of Things
7 steps to business success on the Internet of Things
 
Capitalizing on the Internet of Things: a primer
Capitalizing on the Internet of Things: a primerCapitalizing on the Internet of Things: a primer
Capitalizing on the Internet of Things: a primer
 
Making sense of consumer data
Making sense of consumer dataMaking sense of consumer data
Making sense of consumer data
 
Capitalizing on the Internet of Things
Capitalizing on the Internet of ThingsCapitalizing on the Internet of Things
Capitalizing on the Internet of Things
 
Managing the Internet of Things
Managing the Internet of ThingsManaging the Internet of Things
Managing the Internet of Things
 
From the Internet of Computers to the Internet of Things
From the Internet of Computers to the Internet of ThingsFrom the Internet of Computers to the Internet of Things
From the Internet of Computers to the Internet of Things
 
Getting started in Big Data
Getting started in Big DataGetting started in Big Data
Getting started in Big Data
 
Smart networked objects and the Internet of Things
Smart networked objects and the Internet of ThingsSmart networked objects and the Internet of Things
Smart networked objects and the Internet of Things
 
Enhancing intelligence with the Internet of Things
Enhancing intelligence with the Internet of ThingsEnhancing intelligence with the Internet of Things
Enhancing intelligence with the Internet of Things
 
Internet of Things application platforms
Internet of Things application platformsInternet of Things application platforms
Internet of Things application platforms
 
Internet of Things building blocks
Internet of Things building blocksInternet of Things building blocks
Internet of Things building blocks
 
Smart cities and the Internet of Things
Smart cities and the Internet of ThingsSmart cities and the Internet of Things
Smart cities and the Internet of Things
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
M2M innovations invigorate warehouse management
M2M innovations invigorate warehouse managementM2M innovations invigorate warehouse management
M2M innovations invigorate warehouse management
 
Big Data: 8 facts and 8 fictions
Big Data: 8 facts and 8 fictionsBig Data: 8 facts and 8 fictions
Big Data: 8 facts and 8 fictions
 
How Big Data can help optimize business marketing efforts
How Big Data can help optimize business marketing effortsHow Big Data can help optimize business marketing efforts
How Big Data can help optimize business marketing efforts
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
 
The Business Analyst as a leader
The Business Analyst as a leaderThe Business Analyst as a leader
The Business Analyst as a leader
 

Último

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Último (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Combining hadoop with big data analytics

  • 1. Combining Hadoop with Big Data Analytics
  • 2. Page 2 of 11 Sponsored by Combining Hadoop with Big Data Analytics Contents Hadoop for Big Data Puts Architects on Journeyof Discovery ‘Big Data’ Analytics Programs Require Tech Savvy, BusinessKnow- How This E-Guide explores the growing popularity of leveraging Hadoop, an open source software framework, to tackle the challengesof big data analytics, aswellassome of the key businessand technicalbarriersto adoption. Gain expertadvice on the skill sets and rolesneeded for successful big data analyticsprograms,and discoverhoworganizations can design and assemble the rightteam. Hadoop for Big Data Puts ArchitectsonJourney of Discovery By: Jessica Twentyman, Contributor ―Problems don‘t care how you solve them,‖ said James Kobielus, an analyst with Forrester Research, in a recent blog on the topic of ‗big data‘. ―The only thing that matters is that you do indeed solve them, using any tools or approaches at your disposal.‖ In recent years, that imperative -- to solve the problems posed by big data -- has led data architects at many organizations on a journey of discovery. Put simply, the traditional database and business intelligence tools that they routinely use to slice and dice corporate data are just not up to the task of handling big data. To put the challenge into perspective, think back a decade: few enterprise data warehouses grew as large as a terabyte. By 2009, Forrester analysts were reporting that over two-thirds of Enterprise Data Warehouses (EDWs) deployed were in the 1 to 10-terabyte range. By 2015, they claim, a majority of EDWs in large organizations will be 100 TB or larger, ―with petabyte-scale EDWs becoming well entrenched in sectors such as telecoms, financial services and web commerce.‖ What‘s needed to analyze these big data stores are special new tools and approaches that deliver ―extremely scalable analytics,‖ said Kobielus. By ―extremely scalable,‖ he means analytics that can deal with data that stands
  • 3. Page 3 of 11 Sponsored by Combining Hadoop with Big Data Analytics Contents Hadoop for Big Data Puts Architects on Journeyof Discovery ‘Big Data’ Analytics Programs Require Tech Savvy, BusinessKnow- How out in four key ways: for its volume (from hundreds of terabytes to petabytes and beyond); its velocity (up to and including real-time, sub-second delivery); its variety (diverse structured, unstructured and semi-structured formats); and its volatility (wherein scores to hundreds of new data sources come and go from new applications, services, social networks and so on). Hadoop for Big Data on the Rise One of the principal approaches to emerge in recent years has been Apache Hadoop, an open source software framework that supports data-intensive distributed analytics involving thousands of nodes and petabytes of data. The underlying technology was originally invented at search engine giant Google by in-house developers looking for a way to usefully index textual data and other ―rich‖ information and present it back to users in meaningful ways. They called this technology MapReduce – and today‘s Hadoop is an open-source version, used by data architects for analytics that are deep and computationally intensive. In a recent survey of enterprise Hadoop users conducted by Ventana Research on behalf of Hadoop specialist Karmasphere, 94% of Hadoop users reported that they can now perform analyses on large volumes of data that weren‘t possible before; 88% said that they analyze data in greater detail; and 82% can now retain more of their data. It‘s still a niche technology, but Hadoop‘s profile received a serious boost over that past year, thanks in part to start-up companies such as Cloudera and MapR that offer commercially licensed and supported distributions of Hadoop. Its growing popularity is also the result of serious interest shown by EDW vendors like EMC, IBM and Teradata. EMC bought Hadoop specialist Greenplum in June 2010; Teradata announced its acquisition of Aster Data in March 2011; and IBM announced its own Hadoop offering, Infosphere, in May 2011. What stood out about Greenplum for EMC was its x86-based, scale-out MPP, shared-nothing design, said Mark Sears, architect at the new
  • 4. Page 4 of 11 Sponsored by Combining Hadoop with Big Data Analytics Contents Hadoop for Big Data Puts Architects on Journeyof Discovery ‘Big Data’ Analytics Programs Require Tech Savvy, BusinessKnow- How organization EMC Greenplum. ―Not everyone requires a set-up like that, but more and more customers do,‖ he said. ―In many cases,‖ he continued ―these are companies that are already adept at analysis, at mining customer data for buying trends, for example. In the past, however, they may have had to query a random sample of data relating to 500,000 customers from an overall pool of data of 5 million customers. That opens up the risk that they might miss something important. With a big data approach, it‘s perfectly possible and relatively cost-effective to query data relating to all 5 million.‖ Hadoop Ill Understood by the Business While there‘s a real buzz around Hadoop among technologists, it‘s still not widely understood in business circles. In the simplest terms, Hadoop is engineered to run across a large number of low-cost commodity servers and distribute data volumes across that cluster, keeping track of where individual pieces of data reside using the Hadoop Distributed File System (HDFS). Analytic workloads are performed across the cluster, in a massively parallel processing (MPP) model, using tools such as Pig and Hive. Results, meanwhile, are delivered as a unified whole. Earlier this year, Gartner analyst Marcus Collins mapped out some of the ways he‘s seeing Hadoop used today: by financial services companies, to discover fraud patterns in years of credit-card transaction records; by mobile telecoms providers, to identify customer churn patterns; by academic researchers, to identify celestial objects from telescope imagery. ―The cost-performance curve of commodity servers and storage is putting seemingly intractable complex analysis on extremely large volumes of data within the budget of an increasingly large number of enterprises,‖ he concluded. ―This technology promises to be a significant competitive advantage to early adopter organizations.‖ Early adopters include daily deal site Groupon, which uses the Cloudera distribution of Hadoop to analyse transaction data generated by over 70 million registered users worldwide, and NYSE Euronext, which operates the
  • 5. Page 5 of 11 Sponsored by Combining Hadoop with Big Data Analytics Contents Hadoop for Big Data Puts Architects on Journeyof Discovery ‘Big Data’ Analytics Programs Require Tech Savvy, BusinessKnow- How New York stock exchange as well as other equities and derivatives markets across Europe and the US. NYSE Euronext uses EMC Greenplum to manage transaction data that is growing, on average, by as much as 2TB each day. US bookseller Barnes & Noble, meanwhile, is using Aster Data nCluster (now owned by Teradata) to better understand customer preferences and buying patterns across three sales channels: its retail outlets, its online store and e-reader downloads. Skills Deficit in Big Data Analytics While Hadoop‘s cost advantage might give it widespread appeal, the skills challenge it presents is more daunting, according to Collins. ―Big data analytics is an emerging field, and insufficiently skilled people exist within many organizations or the [wider] job market,‖ he said. Skills-related issues that stand in the way of Hadoop adoption include the technical nature of the MapReduce framework, which requires users to develop Java code, and the lack of institutional knowledge on how to architect the Hadoop infrastructure. Another impediment to adoption is the lack of support for analysis tools used to examine data residing within the Hadoop architecture. That, however, is starting to change. For a start, established BI tools vendors are adding support for Apache Hadoop: one example is Pentaho, which introduced support in May 2010 and, subsequently, specific support for the EMC Greenplum distribution. Another sign of Hadoop‘s creep towards the mainstream is support from data integration suppliers, such as Informatica. One of the most common uses of Hadoop to date has been as an engine for the ‗transformation‘ stage of ETL (extraction, transformation, loading) processes: the MapReduce technology lends itself well to the task of preparing data for loading and further analysis in both convential RDBMS-based data warehouses or those based on HDFS/Hive. In response, Informatica announced that integration is underway between EMC Greenplum and its own Data Integration Platform in May 2011.
  • 6. Page 6 of 11 Sponsored by Combining Hadoop with Big Data Analytics Contents Hadoop for Big Data Puts Architects on Journeyof Discovery ‘Big Data’ Analytics Programs Require Tech Savvy, BusinessKnow- How Additionally, there‘s the emergence of Hadoop appliances, as evidenced by EMC‘s Greenplum HD (a bundle that combines MapR‘s version of Hadoop, the Greenplum database and a standard x86 based server), announced in May, and more recently (August 2011), the Dell/Cloudera Solution (which combines Dell PowerEdge C2100 servers and PowerConnect switches with Cloudera‘s Hadoop distribution and its Cloudera Enterprise management tools). Finally, Hadoop is proving wellsuited to deployment in cloud environments, a fact which gives IT teams the chance to experiment with pilot projects on cloud infrastructures. Amazon, for example, offers Amazon Elastic MapReduce as a hosted service running on its Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3). The Apache distribution of Hadoop, moreover, now comes with a set of tools designed to make it easier to deploy and run on EC2. ―Running Hadoop on EC2 is particularly appropriate where the data already resides within Amazon S3,‖ said Collins. If the data does not reside on Amazon S3, he said, then it must be transferred: the Amazon Web Services (AWS) pricing model includes charges for network costs and Amazon Elastic MapReduce pricing is in addition to normal Amazon EC3 and S3 pricing. ―Given the volume of data required to run big data analytics, the cost of transferring data and running the analysis should be carefully considered,‖ he cautioned. The bottom line is that ―Hadoop is the future of the EDW and its footprint in companies‘ core EDW architectures is likely to keep growing throughout this decade,‖ said Kobielus. ‘Big Data’ Analytics Programs Require TechSavvy, Business Know-How By: Rick Sherman, Contributor Articles and case studies written about ―big data‖ analytics programs proclaim the successes of organizations, typically with a focus on the
  • 7. Page 7 of 11 Sponsored by Combining Hadoop with Big Data Analytics Contents Hadoop for Big Data Puts Architects on Journeyof Discovery ‘Big Data’ Analytics Programs Require Tech Savvy, BusinessKnow- How technologies used to build the systems. Product information is useful, of course, but it‘s also critical for enterprises embarking on these projects to be sure they have people with the right skills and experience in place to successfully leverage big data analytics tools. Knowledge of new or emerging big data technologies, such as Hadoop, is often required, especially if a project involves unstructured or semi-structured data. But even in such cases, Hadoop know-how is only a small portion of the overall staffing requirements; skills are also needed in areas such as business intelligence (BI), data integration and data warehousing. Business and industry expertise is another must-have for a successful big data analytics program, and many organizations also require advanced analytics skills, such as predictive analytics and data mining capabilities. Data scientist is a new job title that is getting some industry buzz. But it‘s really a combination of skills from the areas listed above, including predictive modeling, both statistical and business analysis, and data integration development. I‘ve seen some job postings for data scientists that require a statistical degree as well as an industry-specific one. Having both degrees and matching professional experience is great, but what are the chances of finding job candidates who do? Pretty slim! Taking a more realistic look at what to plan for, I describe below the key skill sets and roles that should be part of big data analytics deployments. Instead of listing specific job titles, I‘ve kept it general; organizations can bundle together the required roles and responsibilities in various ways based on the capabilities and experience levels of their existing workforce and new employees that they hire. Business knowledge. In addition to technical skills, companies engaging in big data analytics projects need to involve people with extensive business and industry expertise. That must include knowledge of a company‘s own business strategy and operations as well as its competitors, current industry conditions and emerging trends, customer demographics and both macro- and microeconomic factors.
  • 8. Page 8 of 11 Sponsored by Combining Hadoop with Big Data Analytics Contents Hadoop for Big Data Puts Architects on Journeyof Discovery ‘Big Data’ Analytics Programs Require Tech Savvy, BusinessKnow- How Much of the business value derived from big data analytics comes not from textbook key performance indicators but rather from insights and metrics gleaned as part of the analytics process. That process is part science (i.e., statistics) and part art, with users doing what-if analysis to gain actionable information about an organization‘s business operations. Such findings are only possible with the participation of business managers and workers who have firsthand knowledge of business strategies and issues. Business analysts. This group also has an important role to play in helping organizations to understand the business ramifications of big data. In addition to doing analytics themselves, business analysts can be tasked with things such as gathering business and data requirements for a project, helping to design dashboards, reports and data visualizations for presenting analytical findings to business users and assisting in measuring the business value of the analytics program. BI developers. These people work with the business analysts to build the required dashboards, reports and visualizations for business users. In addition, depending on internal needs and the BI tools that an organization is using, they can enable self-service BI capabilities by preparing the data and the required BI templates for use by business executives and workers. Predictive model builders. In general, predictive models for analyzing big data need to be custom-built. Deep statistical skills are essential to creating good models; too often, companies underestimate the required skill level and hire people who have used statistical tools, including models developed by others, but don‘t have knowledge of the mathematical principles underlying predictive models. Predictive modelers also need some business knowledge and an understanding of how to gather and integrate data into models, although in both cases they can leverage other people‘s more extensive expertise in creating and refining models. Another key skill for modelers to have is the ability to assess how comprehensive, clean and current the available data is before building models. There often are data gaps, and a modeler has to be able to close them.
  • 9. Page 9 of 11 Sponsored by Combining Hadoop with Big Data Analytics Contents Hadoop for Big Data Puts Architects on Journeyof Discovery ‘Big Data’ Analytics Programs Require Tech Savvy, BusinessKnow- How Don‘t be lulled into thinking that tools alone spell big data success. The real key to success is ensuring that your organization has the skills it needs. Data architects. A big data analytics project needs someone to design the data architecture and guide its development. Typically, data architects will need to incorporate various data structures into an architecture, along with processes for capturing and storing data and making it available for analysis. This role involves traditional IT skills such as data modeling, data profiling, data quality, data integration and data governance. Data integration developers. Data integration is important enough to require its own developers. These folks design and develop integration processes to handle the full spectrum of data structures in a big data environment. In doing so, they ideally ensure that integration is done not in silos but as part of a comprehensive data architecture. It‘s also best to use packaged data integration tools that support multiple forms of data, including structured, unstructured and semi-structured sources. Avoid the temptation to develop custom code to handle extract, transform and load operations on pools of big data; hand coding can increase a project‘s overall costs -- not to mention the likelihood of an organization being saddled with undocumented data integration processes that can‘t scale up as data volumes continue to grow. Technology architects. This role involves designing the underlying IT infrastructure that will support a big data analytics initiative. In addition to understanding the principles of designing traditional data warehousing and BI systems, technology architects need to have expertise in the new technologies that often are used in big data projects. And if an organization plans to implement open source tools, such as Hadoop, the architects need to understand how to configure, design, develop and deploy such systems. It can be exciting to tackle a big data analytics project and to acquire new technologies to support the analytics process. But don‘t be lulled into thinking that tools alone spell big data success. The real key to success is ensuring
  • 10. Page 10 of 11 Sponsored by Combining Hadoop with Big Data Analytics Contents Hadoop for Big Data Puts Architects on Journeyof Discovery ‘Big Data’ Analytics Programs Require Tech Savvy, BusinessKnow- How that your organization has the skills it needs to effectively manage large and varied data sets as well as the analytics that will be used to derive business value from the available data. About the Author: Rick Sherman is the founder of Athena IT Solutions, which provides consulting, training and vendor services on business intelligence, data integration and data warehousing. Sherman has written more than 100 articles and spoken at dozens of events and webinars; he also is an adjunct faculty member at Northeastern University‘s Graduate School of Engineering. He blogs at The Data Doghouse and can be reached at rsherman@athena- solutions.com.
  • 11. Page 11 of 11 Sponsored by Combining Hadoop with Big Data Analytics Contents Hadoop for Big Data Puts Architects on Journeyof Discovery ‘Big Data’ Analytics Programs Require Tech Savvy, BusinessKnow- How Free resourcesfor technologyprofessionals TechTarget publishes targeted technology media that address your need for information and resources for researching products, developing strategy and making cost-effective purchase decisions. Our network of technology-specific Web sites gives you access to industry experts, independent content and analysis and the Web‘s largest library of vendor-provided white papers, webcasts, podcasts, videos, virtual trade shows, research reports and more —drawing on the rich R&D resources of technology providers to address market trends, challenges and solutions. Our live events and virtual seminars give you access to vendor neutral, expert commentary and advice on the issues and challenges you face daily. Our social community IT Knowledge Exchange allows you to share real world information in real time with peers and experts. What makes TechTarget unique? TechTarget is squarely focused on the enterprise IT space. Our team of editors and network of industry experts provide the richest, most relevant content to IT professionals and management. We leverage the immediacy of the Web, the networking and face-to-face opportunities of events and virtual events, and the ability to interact with peers—all to create compelling and actionable information for enterprise IT professionals across all industries and markets. Related TechTarget Websites