SlideShare una empresa de Scribd logo
1 de 30
DevNexus 2014, Data + Integration
Big Data Technology, Strategy, and Applications

Dr. Gail Zhou
Gail Z Associates, LLC
February 25, 2014
LinkedIn: http://www.linkedin.com/in/gailZhou
Email: gail.r.zhou@gmail.com

Gail Z Associates, LLC
Outline
•What is Big Data and why is it such a big deal? Where can we use
Big Data?
• Big Data Key Concepts and Technologies using Hadoop as an
example
•Big Data Challenges and Start up Strategy: What are the
challenges? How do you get started on Big Data?

Appendix: Other Big Data Technologies, Integration of Big Data
with Existing Applications (an example)

2

Gail Z Associates, LLC
What is Big Data and
why is it such a big deal?

3

Gail Z Associates, LLC
A Brief History of Big Data
Sources: Wikipedia, Forbes.com, and other articles

• 1941: “Information Explosion” term coined.
• 1963: Physicist and science historian Derek Price concluded the
number of new journals grown exponentially.
• 1990: Computer Scientist Peter J. Denning, “Saving All the Bits”,
what machines can we build to monitor, process, and understand
the data, its meanings, and patterns? – Intelligence out of the
data?
• 1998: Steve Bryson et all, “Visually exploring gigabyte data sets
in real time”, ACM, Section “Big Data for Scientific Visualization”.

4

Gail Z Associates, LLC
A Brief History of Big Data Cont’d
Sources: Wikipedia, Forbes.com, and other articles

• 2001, Doug Laney, Meta Group, “3D Data
Management, Controlling Data Volume,
Velocity, and Variety” (More now: Veracity,
Variability, and Value)

5

Gail Z Associates, LLC
A Brief History of Big Data Cont’d
Sources: Wikipedia, Forbes.com, and other articles

• 2001 - 2003: Google outgrown as a result of new
revenue model, 5 cents per click. Google is now a
giant big data leader.
• 1994 – Present: Yahoo!, Hadoop Shop (10K Nodes),
Genome, Big Data Analytics.
• 1994 – Present: Amazon, AWS Cloud.
• 2003 – Present: Facebook, Twitter, LinkedIn, etc.

• 2013 and beyond : Many others.

6

Gail Z Associates, LLC
7

Gail Z Associates, LLC
Population Growth Chart: Does it have something to do with Big Data? Machines,
Satellites, Cameras, Internet, computers, and mobile phones are just “enablers” of
big data.

Source: Global Education Project
8

Gail Z Associates, LLC
Source: Newbury College, UK

www.spchui.net

Information Explosion. It is just the real beginning.
You got mail (too much).
You are embarrassed to admit you don’t know a lot
of cool things happening in the world.

www.ucg.org

Don’t despair. You are not alone.

9

Gail Z Associates, LLC
Big Data Opportunities

10

Gail Z Associates, LLC
Big Data Opportunities
• Medical Research and Healthcare: Massive collected research and clinical
information can be used to predict and prevent diseases, moving us from
‘sick care’ to ‘health care’.
• Telecom: Traffic data and patterns can be utilized in real time to re-route.
• Defense: Satellite images and other information can be meshed up to
identify threats.
• Utilities: Smart meter monitoring.
• Public Safety: Pattern recognition and social media can help to predict
crimes.
• Financial Industry: Patten recognition and business rules to flag fraudulent
activities.
• Functional Areas: Investigational Search, Pricing Optimization, Risk
Analysis, Churn Analysis, Behavior Analysis, Transactions Analysis,
Revenue Assurance, Recommendation Engines, etc.

11

Gail Z Associates, LLC
Where Big Data Can Shine
• Traditional (Examples)
 Financial Transactions
 Energy and
Infrastructure
 Transportation
 Life Science and
HealthCare

•Big Data (Examples)
Advertisements
Search and Indexing
Social Networks
Science Research
Communications

• Notes
– Big Data Technology is not the replacement
– Big Data is complementary
– In some cases, Big Data is the only way to get things
done
– Big Data has its own challenges

12

Gail Z Associates, LLC
Key Concepts in Big Data –
Technology and Architectures

13

Gail Z Associates, LLC
14

Gail Z Associates, LLC
15

Gail Z Associates, LLC
Hadoop HDFS
Blocks (64M, 128M, etc.) are saved in different nodes with a replication factor ( default 3)

16

Gail Z Associates, LLC
Hadoop Logical View

http://nosqlessentials.com
Professor: Fernando Rodriguez Olivera
17

Gail Z Associates, LLC
Hadoop Logical View (HDFS + Map Reduce)

18

Gail Z Associates, LLC
Hadoop V1 – Map Reduce Jobs Execution

19

Gail Z Associates, LLC
Hadoop 2.0 with YARN

Gail Z Associates, LLC
YARN Interaction & Sequence

Gail Z Associates, LLC
Big Data Challenges,
Suggested Startup Strategy

22

Gail Z Associates, LLC
Big Data Start up Challenges
 Business urgency, time to market pressures
 Big Data start up needs careful planning
 Big Data needs infrastructure, software stacks, people, start up
plan
 Lack of Big Data Resources, Lack of Sponsorships (except in some
companies)
 Big Data is complex and multiple skill sets (mostly new to many
companies) – Infrastructure, Administration, Security,
Programming, Testing, etc.
 Skepticism about Big Data
 Integration with Existing Technologies and Systems
 Can not develop isolated big data solutions
 Integration with existing systems will be a top challenge (requires
both sides to do additional work)
 Open Sources: Stability, Maturity, and Security
Gail Z Associates, LLC
Suggested Big Data Start up Strategy
 Full business needs and information requirements analysis. Business Drivers
 Revenue generation? Cost reduction? Customer retention? Compliance?
 Process Improvement? Fraud detection? Analytics? Dashboard?
 Solving a tough problem? Retiring/replacing technologies and systems?
 Technology Evaluation and Selection
 Define requirements and objective first
 Evaluation a variety of technology stacks – develop a framework first
 Executive Support for Start up Resources
 Prototyping, Discovery, and Planning
 Rent Infrastructure in Cloud – VMWare, Amazon EC2, and others
 Use Spare Hardware and Network Bandwidth
 Assessment, Proposal. Project/Program Plan for next steps
 Start small and keep delivering
 Architecture Design, Estimation, Business Case
 Obtain funding and executive sponsorships, owners, etc.
 SDLC, don’t forget Hardware, Security, Testing, etc.

Gail Z Associates, LLC
Appendix

25

Gail Z Associates, LLC
Hadoop & Cassandra Based Offerings
Name

Offerings

Notes

Apache Hadoop

Hadoop Core

Enhancement: YARN

Cloudera

Enhanced Hadoop

Leader

DataStax

Enhanced Apache Cassandra

Cassandra is a distributed
NoSQL DB

Hortonworks

Hadoop Development and support.
Hortonworks Data Platform (HDP)

Yahoo Funded $23M +
Others . Major alliances.

MapR

Develops and sells Hadoop-derived
software. M3. M5, M7.

Alliance with EMC,
Amazon, and Google.

Sqoop

HDFS and SQL Integration

Hue

Hadoop GUI Tools

Amazon

AWS, Cloud Hadoop Cluster

Microsoft

Windows Azure HDInsight

IBM, Dell, etc.

Hardware, Software, Services

Gail Z Associates, LLC
Hadoop Related Technologies (Examples)
Name

Functions

Notes

Apache Hue

Hadoop GUI

Hadoop has cmd.

Apache HBase

NoSQL Distributed DB, Key/value
Column Family Store, runs on top
of Hadoop

Big Table Like Storage for
Hadoop, written in Java.

Apache PIG

High Level programming language
for Map Reduce

Pig Latin, interoperability with
Python, JavaScript, Ruby and
Groovy

Apache HIVE

Data Warehouse on top of
Hadoop. HiveQL

Summaries, queries, and
analysis. Open Sourced by
Facebook.

Apache Zoo Keeper

Hadoop Configuration / Build
Tools

Distributed configuration,
synchronization, etc)

Apache Sqoop

Move RDBMS data into Hadoop

Command lines

Gail Z Associates, LLC
Cassandra

http://nosqlessentials.com
Professor: Fernando Rodriguez Olivera
Gail Z Associates, LLC
HBase

http://blog.sematext.com/2012/07/16/hbase-memstore-what-you-should-know/
Gail Z Associates, LLC
http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/

Gail Z Associates, LLC

Más contenido relacionado

La actualidad más candente

Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveHien Luu
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview pptVIKAS KATARE
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Introduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrIntroduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrPranav Kulkarni
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataJoey Li
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)SiamAhmed16
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big AnalyticsAjay Ohri
 
big data analytics in mobile cellular network
big data analytics in mobile cellular networkbig data analytics in mobile cellular network
big data analytics in mobile cellular networkshubham patil
 

La actualidad más candente (20)

Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's Perspective
 
Big data
Big dataBig data
Big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Introduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrIntroduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj Bongirr
 
What is big data?
What is big data?What is big data?
What is big data?
 
A Brief History Of Data
A Brief History Of DataA Brief History Of Data
A Brief History Of Data
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Our big data
Our big dataOur big data
Our big data
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
 
Chapter 1 big data
Chapter 1 big dataChapter 1 big data
Chapter 1 big data
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 
big data analytics in mobile cellular network
big data analytics in mobile cellular networkbig data analytics in mobile cellular network
big data analytics in mobile cellular network
 

Similar a Gail Zhou on "Big Data Technology, Strategy, and Applications"

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsHappiest Minds Technologies
 
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Inside Analysis
 
Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15Edureka!
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Edureka!
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science TeamsEMC
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 
Big dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondBig dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondPatrick Bouillaud
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
Big Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataBig Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataKai Wähner
 
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPCIDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPCinside-BigData.com
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 

Similar a Gail Zhou on "Big Data Technology, Strategy, and Applications" (20)

HadoopWorkshopJuly2014
HadoopWorkshopJuly2014HadoopWorkshopJuly2014
HadoopWorkshopJuly2014
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big Data 2.0
Big Data 2.0Big Data 2.0
Big Data 2.0
 
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest MindsWhitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
 
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
 
ANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEWANALYTICS OF DATA USING HADOOP-A REVIEW
ANALYTICS OF DATA USING HADOOP-A REVIEW
 
Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Big dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyondBig dataimplementation hadoop_and_beyond
Big dataimplementation hadoop_and_beyond
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
Big Data
Big DataBig Data
Big Data
 
Big Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataBig Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your Data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
IDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPCIDC Perspectives on Big Data Outside of HPC
IDC Perspectives on Big Data Outside of HPC
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
The value of our data
The value of our dataThe value of our data
The value of our data
 

Último

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Último (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Gail Zhou on "Big Data Technology, Strategy, and Applications"

  • 1. DevNexus 2014, Data + Integration Big Data Technology, Strategy, and Applications Dr. Gail Zhou Gail Z Associates, LLC February 25, 2014 LinkedIn: http://www.linkedin.com/in/gailZhou Email: gail.r.zhou@gmail.com Gail Z Associates, LLC
  • 2. Outline •What is Big Data and why is it such a big deal? Where can we use Big Data? • Big Data Key Concepts and Technologies using Hadoop as an example •Big Data Challenges and Start up Strategy: What are the challenges? How do you get started on Big Data? Appendix: Other Big Data Technologies, Integration of Big Data with Existing Applications (an example) 2 Gail Z Associates, LLC
  • 3. What is Big Data and why is it such a big deal? 3 Gail Z Associates, LLC
  • 4. A Brief History of Big Data Sources: Wikipedia, Forbes.com, and other articles • 1941: “Information Explosion” term coined. • 1963: Physicist and science historian Derek Price concluded the number of new journals grown exponentially. • 1990: Computer Scientist Peter J. Denning, “Saving All the Bits”, what machines can we build to monitor, process, and understand the data, its meanings, and patterns? – Intelligence out of the data? • 1998: Steve Bryson et all, “Visually exploring gigabyte data sets in real time”, ACM, Section “Big Data for Scientific Visualization”. 4 Gail Z Associates, LLC
  • 5. A Brief History of Big Data Cont’d Sources: Wikipedia, Forbes.com, and other articles • 2001, Doug Laney, Meta Group, “3D Data Management, Controlling Data Volume, Velocity, and Variety” (More now: Veracity, Variability, and Value) 5 Gail Z Associates, LLC
  • 6. A Brief History of Big Data Cont’d Sources: Wikipedia, Forbes.com, and other articles • 2001 - 2003: Google outgrown as a result of new revenue model, 5 cents per click. Google is now a giant big data leader. • 1994 – Present: Yahoo!, Hadoop Shop (10K Nodes), Genome, Big Data Analytics. • 1994 – Present: Amazon, AWS Cloud. • 2003 – Present: Facebook, Twitter, LinkedIn, etc. • 2013 and beyond : Many others. 6 Gail Z Associates, LLC
  • 8. Population Growth Chart: Does it have something to do with Big Data? Machines, Satellites, Cameras, Internet, computers, and mobile phones are just “enablers” of big data. Source: Global Education Project 8 Gail Z Associates, LLC
  • 9. Source: Newbury College, UK www.spchui.net Information Explosion. It is just the real beginning. You got mail (too much). You are embarrassed to admit you don’t know a lot of cool things happening in the world. www.ucg.org Don’t despair. You are not alone. 9 Gail Z Associates, LLC
  • 10. Big Data Opportunities 10 Gail Z Associates, LLC
  • 11. Big Data Opportunities • Medical Research and Healthcare: Massive collected research and clinical information can be used to predict and prevent diseases, moving us from ‘sick care’ to ‘health care’. • Telecom: Traffic data and patterns can be utilized in real time to re-route. • Defense: Satellite images and other information can be meshed up to identify threats. • Utilities: Smart meter monitoring. • Public Safety: Pattern recognition and social media can help to predict crimes. • Financial Industry: Patten recognition and business rules to flag fraudulent activities. • Functional Areas: Investigational Search, Pricing Optimization, Risk Analysis, Churn Analysis, Behavior Analysis, Transactions Analysis, Revenue Assurance, Recommendation Engines, etc. 11 Gail Z Associates, LLC
  • 12. Where Big Data Can Shine • Traditional (Examples)  Financial Transactions  Energy and Infrastructure  Transportation  Life Science and HealthCare •Big Data (Examples) Advertisements Search and Indexing Social Networks Science Research Communications • Notes – Big Data Technology is not the replacement – Big Data is complementary – In some cases, Big Data is the only way to get things done – Big Data has its own challenges 12 Gail Z Associates, LLC
  • 13. Key Concepts in Big Data – Technology and Architectures 13 Gail Z Associates, LLC
  • 16. Hadoop HDFS Blocks (64M, 128M, etc.) are saved in different nodes with a replication factor ( default 3) 16 Gail Z Associates, LLC
  • 17. Hadoop Logical View http://nosqlessentials.com Professor: Fernando Rodriguez Olivera 17 Gail Z Associates, LLC
  • 18. Hadoop Logical View (HDFS + Map Reduce) 18 Gail Z Associates, LLC
  • 19. Hadoop V1 – Map Reduce Jobs Execution 19 Gail Z Associates, LLC
  • 20. Hadoop 2.0 with YARN Gail Z Associates, LLC
  • 21. YARN Interaction & Sequence Gail Z Associates, LLC
  • 22. Big Data Challenges, Suggested Startup Strategy 22 Gail Z Associates, LLC
  • 23. Big Data Start up Challenges  Business urgency, time to market pressures  Big Data start up needs careful planning  Big Data needs infrastructure, software stacks, people, start up plan  Lack of Big Data Resources, Lack of Sponsorships (except in some companies)  Big Data is complex and multiple skill sets (mostly new to many companies) – Infrastructure, Administration, Security, Programming, Testing, etc.  Skepticism about Big Data  Integration with Existing Technologies and Systems  Can not develop isolated big data solutions  Integration with existing systems will be a top challenge (requires both sides to do additional work)  Open Sources: Stability, Maturity, and Security Gail Z Associates, LLC
  • 24. Suggested Big Data Start up Strategy  Full business needs and information requirements analysis. Business Drivers  Revenue generation? Cost reduction? Customer retention? Compliance?  Process Improvement? Fraud detection? Analytics? Dashboard?  Solving a tough problem? Retiring/replacing technologies and systems?  Technology Evaluation and Selection  Define requirements and objective first  Evaluation a variety of technology stacks – develop a framework first  Executive Support for Start up Resources  Prototyping, Discovery, and Planning  Rent Infrastructure in Cloud – VMWare, Amazon EC2, and others  Use Spare Hardware and Network Bandwidth  Assessment, Proposal. Project/Program Plan for next steps  Start small and keep delivering  Architecture Design, Estimation, Business Case  Obtain funding and executive sponsorships, owners, etc.  SDLC, don’t forget Hardware, Security, Testing, etc. Gail Z Associates, LLC
  • 26. Hadoop & Cassandra Based Offerings Name Offerings Notes Apache Hadoop Hadoop Core Enhancement: YARN Cloudera Enhanced Hadoop Leader DataStax Enhanced Apache Cassandra Cassandra is a distributed NoSQL DB Hortonworks Hadoop Development and support. Hortonworks Data Platform (HDP) Yahoo Funded $23M + Others . Major alliances. MapR Develops and sells Hadoop-derived software. M3. M5, M7. Alliance with EMC, Amazon, and Google. Sqoop HDFS and SQL Integration Hue Hadoop GUI Tools Amazon AWS, Cloud Hadoop Cluster Microsoft Windows Azure HDInsight IBM, Dell, etc. Hardware, Software, Services Gail Z Associates, LLC
  • 27. Hadoop Related Technologies (Examples) Name Functions Notes Apache Hue Hadoop GUI Hadoop has cmd. Apache HBase NoSQL Distributed DB, Key/value Column Family Store, runs on top of Hadoop Big Table Like Storage for Hadoop, written in Java. Apache PIG High Level programming language for Map Reduce Pig Latin, interoperability with Python, JavaScript, Ruby and Groovy Apache HIVE Data Warehouse on top of Hadoop. HiveQL Summaries, queries, and analysis. Open Sourced by Facebook. Apache Zoo Keeper Hadoop Configuration / Build Tools Distributed configuration, synchronization, etc) Apache Sqoop Move RDBMS data into Hadoop Command lines Gail Z Associates, LLC