Dr. Gail Zhou presented this topic at DevNexus on Feb 25, 2014. Big Data history, opportunities, and applications. Big Data key concepts, reference architecture with open source technology stacks. Hadoop architecture explained (HDFS, Map Reduce, and YARN). Big Data start-up challenges and strategies to overcome them. Technology update: Hadoop and Cassandra based technology offerings.
Gail Zhou on "Big Data Technology, Strategy, and Applications"
1. DevNexus 2014, Data + Integration
Big Data Technology, Strategy, and Applications
Dr. Gail Zhou
Gail Z Associates, LLC
February 25, 2014
LinkedIn: http://www.linkedin.com/in/gailZhou
Email: gail.r.zhou@gmail.com
Gail Z Associates, LLC
2. Outline
•What is Big Data and why is it such a big deal? Where can we use
Big Data?
• Big Data Key Concepts and Technologies using Hadoop as an
example
•Big Data Challenges and Start up Strategy: What are the
challenges? How do you get started on Big Data?
Appendix: Other Big Data Technologies, Integration of Big Data
with Existing Applications (an example)
2
Gail Z Associates, LLC
3. What is Big Data and
why is it such a big deal?
3
Gail Z Associates, LLC
4. A Brief History of Big Data
Sources: Wikipedia, Forbes.com, and other articles
• 1941: “Information Explosion” term coined.
• 1963: Physicist and science historian Derek Price concluded the
number of new journals grown exponentially.
• 1990: Computer Scientist Peter J. Denning, “Saving All the Bits”,
what machines can we build to monitor, process, and understand
the data, its meanings, and patterns? – Intelligence out of the
data?
• 1998: Steve Bryson et all, “Visually exploring gigabyte data sets
in real time”, ACM, Section “Big Data for Scientific Visualization”.
4
Gail Z Associates, LLC
5. A Brief History of Big Data Cont’d
Sources: Wikipedia, Forbes.com, and other articles
• 2001, Doug Laney, Meta Group, “3D Data
Management, Controlling Data Volume,
Velocity, and Variety” (More now: Veracity,
Variability, and Value)
5
Gail Z Associates, LLC
6. A Brief History of Big Data Cont’d
Sources: Wikipedia, Forbes.com, and other articles
• 2001 - 2003: Google outgrown as a result of new
revenue model, 5 cents per click. Google is now a
giant big data leader.
• 1994 – Present: Yahoo!, Hadoop Shop (10K Nodes),
Genome, Big Data Analytics.
• 1994 – Present: Amazon, AWS Cloud.
• 2003 – Present: Facebook, Twitter, LinkedIn, etc.
• 2013 and beyond : Many others.
6
Gail Z Associates, LLC
8. Population Growth Chart: Does it have something to do with Big Data? Machines,
Satellites, Cameras, Internet, computers, and mobile phones are just “enablers” of
big data.
Source: Global Education Project
8
Gail Z Associates, LLC
9. Source: Newbury College, UK
www.spchui.net
Information Explosion. It is just the real beginning.
You got mail (too much).
You are embarrassed to admit you don’t know a lot
of cool things happening in the world.
www.ucg.org
Don’t despair. You are not alone.
9
Gail Z Associates, LLC
11. Big Data Opportunities
• Medical Research and Healthcare: Massive collected research and clinical
information can be used to predict and prevent diseases, moving us from
‘sick care’ to ‘health care’.
• Telecom: Traffic data and patterns can be utilized in real time to re-route.
• Defense: Satellite images and other information can be meshed up to
identify threats.
• Utilities: Smart meter monitoring.
• Public Safety: Pattern recognition and social media can help to predict
crimes.
• Financial Industry: Patten recognition and business rules to flag fraudulent
activities.
• Functional Areas: Investigational Search, Pricing Optimization, Risk
Analysis, Churn Analysis, Behavior Analysis, Transactions Analysis,
Revenue Assurance, Recommendation Engines, etc.
11
Gail Z Associates, LLC
12. Where Big Data Can Shine
• Traditional (Examples)
Financial Transactions
Energy and
Infrastructure
Transportation
Life Science and
HealthCare
•Big Data (Examples)
Advertisements
Search and Indexing
Social Networks
Science Research
Communications
• Notes
– Big Data Technology is not the replacement
– Big Data is complementary
– In some cases, Big Data is the only way to get things
done
– Big Data has its own challenges
12
Gail Z Associates, LLC
13. Key Concepts in Big Data –
Technology and Architectures
13
Gail Z Associates, LLC
23. Big Data Start up Challenges
Business urgency, time to market pressures
Big Data start up needs careful planning
Big Data needs infrastructure, software stacks, people, start up
plan
Lack of Big Data Resources, Lack of Sponsorships (except in some
companies)
Big Data is complex and multiple skill sets (mostly new to many
companies) – Infrastructure, Administration, Security,
Programming, Testing, etc.
Skepticism about Big Data
Integration with Existing Technologies and Systems
Can not develop isolated big data solutions
Integration with existing systems will be a top challenge (requires
both sides to do additional work)
Open Sources: Stability, Maturity, and Security
Gail Z Associates, LLC
24. Suggested Big Data Start up Strategy
Full business needs and information requirements analysis. Business Drivers
Revenue generation? Cost reduction? Customer retention? Compliance?
Process Improvement? Fraud detection? Analytics? Dashboard?
Solving a tough problem? Retiring/replacing technologies and systems?
Technology Evaluation and Selection
Define requirements and objective first
Evaluation a variety of technology stacks – develop a framework first
Executive Support for Start up Resources
Prototyping, Discovery, and Planning
Rent Infrastructure in Cloud – VMWare, Amazon EC2, and others
Use Spare Hardware and Network Bandwidth
Assessment, Proposal. Project/Program Plan for next steps
Start small and keep delivering
Architecture Design, Estimation, Business Case
Obtain funding and executive sponsorships, owners, etc.
SDLC, don’t forget Hardware, Security, Testing, etc.
Gail Z Associates, LLC
26. Hadoop & Cassandra Based Offerings
Name
Offerings
Notes
Apache Hadoop
Hadoop Core
Enhancement: YARN
Cloudera
Enhanced Hadoop
Leader
DataStax
Enhanced Apache Cassandra
Cassandra is a distributed
NoSQL DB
Hortonworks
Hadoop Development and support.
Hortonworks Data Platform (HDP)
Yahoo Funded $23M +
Others . Major alliances.
MapR
Develops and sells Hadoop-derived
software. M3. M5, M7.
Alliance with EMC,
Amazon, and Google.
Sqoop
HDFS and SQL Integration
Hue
Hadoop GUI Tools
Amazon
AWS, Cloud Hadoop Cluster
Microsoft
Windows Azure HDInsight
IBM, Dell, etc.
Hardware, Software, Services
Gail Z Associates, LLC
27. Hadoop Related Technologies (Examples)
Name
Functions
Notes
Apache Hue
Hadoop GUI
Hadoop has cmd.
Apache HBase
NoSQL Distributed DB, Key/value
Column Family Store, runs on top
of Hadoop
Big Table Like Storage for
Hadoop, written in Java.
Apache PIG
High Level programming language
for Map Reduce
Pig Latin, interoperability with
Python, JavaScript, Ruby and
Groovy
Apache HIVE
Data Warehouse on top of
Hadoop. HiveQL
Summaries, queries, and
analysis. Open Sourced by
Facebook.
Apache Zoo Keeper
Hadoop Configuration / Build
Tools
Distributed configuration,
synchronization, etc)
Apache Sqoop
Move RDBMS data into Hadoop
Command lines
Gail Z Associates, LLC