The Briefing Room with Rick van der Lans and Think Big, a Teradata Company
Live Webcast on June 16, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=197f8106531874cc5c14081ca214eaff
Hadoop is arguably one of the most disruptive technologies of the last decade. Once lauded solely for its ability to transform the speed of batch processing, it has marched steadily forward and promulgated an array of performance-enhancing accessories, notably Spark and YARN. Hadoop has evolved into much more than a file system and batch processor, and it now promises to stand as the data management and analytics backbone for enterprises.
Register for this episode of The Briefing Room to learn from veteran Analyst Rick van der Lans, as he discusses the emerging roles of Hadoop within the analytics ecosystem. He’ll be briefed by Ron Bodkin of Think Big, a Teradata Company, who will explore Hadoop’s maturity spectrum, from typical entry use cases all the way up the value chain. He’ll show how enterprises that already use Hadoop in production are finding new ways to exploit its power and build creative, dynamic analytics environments.
Visit InsideAnalysis.com for more information.
3. Twitter Tag: #briefr The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
4. Twitter Tag: #briefr The Briefing Room
Reveal the essential characteristics of enterprise
software, good and bad
Provide a forum for detailed analysis of today s innovative
technologies
Give vendors a chance to explain their product to savvy
analysts
Allow audience members to pose serious questions... and
get answers!
Mission
5. Twitter Tag: #briefr The Briefing Room
Topics
June: INNOVATORS
July: SQL INNOVATION
August: REAL-TIME DATA
6. Twitter Tag: #briefr The Briefing Room
Three Types of Evolution
Ø Disruptive selection
Ø Stabilizing selection
Ø Directional selection
7. Twitter Tag: #briefr The Briefing Room
Types of Evolution?
WHAT GOES AROUND, COMES AROUND!
8. Twitter Tag: #briefr The Briefing Room
Analyst: Rick van der Lans
Rick F. van der Lans is an independent
analyst, consultant, author and
lecturer specializing in data
warehousing, business intelligence,
analytics, big data and database
technology. He is Managing Director of
R20/Consultancy. He has advised many
large companies worldwide on
defining their business intelligence
architectures. His popular IT books
have been translated into many
languages and have sold over 100,000
copies. Rick writes for TechTarget.com
and B-eye-Network. For the last 25
years, he has been presenting
professionally around the globe and at
international events.
9. Twitter Tag: #briefr The Briefing Room
Think Big, A Teradata Company
Last year Teradata acquired Think Big Analytics, Inc., a
consulting and solutions company focused on big data
solutions
Think Big has expertise in implementing a variety of open
source technologies, such as Hadoop, Hbase, Cassandra,
MongoDB and Storm, as well as experience with
Hortonworks, Cloudera and MapR
Its consultants can assist with the planning, management
and deployment of big data implementations
10. Twitter Tag: #briefr The Briefing Room
Guest: Ron Bodkin
Ron Bodkin is Founder & President of Think Big, A
Teradata Company. Ron founded Think Big to help
companies realize measurable value from Big Data. The
company’s expertise spans all facets of data science
and data engineering and helps our customers to drive
maximum value from their Big Data initiative.
Previously, Ron was vice president Engineering at
Quantcast where he led the data science and engineer
teams that pioneered the use of Hadoop and NoSQL for
batch and real-time decision making. Prior to that, Ron
was Founder of New Aspects, which provided enterprise
consulting for Aspect-oriented programming. Ron was
also Co-Founder and CTO of B2B applications provider
C-Bridge, which he led to team of 900 people and a
successful IPO. Ron graduated with honors from McGill
University with a B.S. in Math and Computer Science.
Ron also earned his Master’s Degree in Computer
Science from MIT, leaving the PhD program after
presenting the idea for C-bridge and placing in the
finals of the 50k Entrepreneurship Contest.
11. MAKING BIG DATA COME ALIVE
The Maturity Model: Taking the Growing Pains out of
Hadoop
Ron Bodkin
Founder and President, Think Big
12. 12
• What is Big Data?
• Hadoop Maturity Spectrum
(from MapReduce to Integrated
Hadoop Architecture)
• Taking the Growing Pains out of
Hadoop
Agenda
34. 34
• 100% Big Data Focus
• Founded in 2010 with100+ engagements across 70 clients
• Unlock value of big data with data science and data
engineering services
• Proven vendor-neutral open source integration expertise
• Agile team-based development methodology
• Think Big Academy for skills and organizational development
• Global delivery model
Who is Think Big?
35. Twitter Tag: #briefr The Briefing Room
Perceptions & Questions
Analyst:
Rick van der Lans
47. Questions on acceptance
You talked about the skills gap. Sometimes Hadoop is used to develop operational systems, and
sometimes to develop BI systems. Now, many specialists working in BICCs are good ETL programming,
reporting, data modeling, but not so good at low-level programming. For example, many of them
have never programmed Java. What would you recommend them to do to be able to use Hadoop?
Should organizations use Hadoop for developing new systems, or for migrating existing systems? How
easy do you think it is to integrate a new Hadoop-based system with the current infrastructure?
Normally, when a technology becomes more and more popular, it becomes clearer and clearer to
everyone what the costs and the risks are. For example, how much man-hours should be spend on
managing a Hadoop environment? So, what’s the productivity of a MapReduce programmer versus a
SQL programmer?
Questions on architectural
Now that so much data is produced in a distributed fashion, is the concept of physically centralizing
data for integration really realistic? Isn’t big data too big to move?
Over the years, large multi-national organizations have always struggled to develop one large
centralized enterprise-wide data warehouse. In most cases, the problems were organizational. Who
owns that data, and so on? Why would the data lake or data reservoir succeed today?
48. Questions on technical aspects
It was clear that for you YARN is a key module in the Hadoop platform. How mature is YARN today?
How good is it, for example, in avoiding one massive query that monopolizes the entire platform and
slows down all other applications?
My feeling is still that HDFS is not good at joins of files. The SQL-on-Hadoop engines are not strong in
joins either. Do you think this will change? For example, can you take a star model from a SQL-based
data mart, consisting of 7 dimensions and 1 fact, migrate it to Hadoop and expect a great
performance?
Any idea what the tipping point is, when to forget about a proprietary SQL and move to Hadoop with
SQL-on-Hadoop?
You mentioned OLAP on Hadoop using Spark and you indicated that such a solution scales to thousands
of simultaneous users. Is that really true? Has that been proven? Or does that work when the reporting
tools extract data and load it in memory of the client machine?
Almost every vendor of some big data tool publishes benchmark results showing how fast their
product is. But many of these benchmarks are single user, single query benchmarks, which I think is
not that useful. Why are they not showing multi-users benchmark results? Do you think they’re hiding
something?
49. On one of your slides you said that Hadoop is 50 PB+ scale. But realistically, how many companies on
this planet want to store that much data? And if it is 50PB+ scale, doesn’t that mean for the majority
of the companies it’s overkill of functionality? In general, how big should big be to justify the Hadoop
platform?
To end this list of questions and the finalize the briefing, you indicated that analytics on big data is
where the real business value lies. I fully agree with that, but can you elaborate on this a little more?
Questions on technical aspects, cont’d.
51. Twitter Tag: #briefr The Briefing Room
Upcoming Topics
www.insideanalysis.com
June: INNOVATORS
July: SQL INNOVATION
August: REAL-TIME DATA
52. Twitter Tag: #briefr The Briefing Room
THANK YOU
for your
ATTENTION!
Some images provided courtesy of Wikimedia Commons and
"Selection Types Chart" by Azcolvin429 - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons -
https://commons.wikimedia.org/wiki/File:Selection_Types_Chart.png#/media/
File:Selection_Types_Chart.png and http://sagamer.co.za/img/evolution-phone.jpg