SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
巨量資料處理工具的過去、現在與未來
三種處理工具與未來的挑戰

Big Data Tools : PAST, NOW and FUTURE
- Three Types of Processing Tools and the Next Big Thing

National Center for High-performance Computing

Jazz Yao-Tsung Wang
<0303109@narlabs.org.tw>
WHO AM I ? JAZZ ?
• About Speaker :
– Jazz Yao-Tsung Wang / Associate Researcher
– Co-Founder and Evangelist of Hadoop.TW
– 0303109@narlabs.org.tw

• All slides and training materials could be found at
– http://trac.nchc.org.tw/cloud
– Be Green ! Try not to print the slides. It changes frequently.

FOSS User
Debian/Ubutnu
Access Grid
Motion/VLC
Red5
Debian Router
DRBL/Clonezilla
Hadoop

FOSS Developer
TRTC WSU/
Haduzilla /
Hadop4Win / Ezilla
FOSS Evangelist
DRBL/Clonezilla
Partclone/Tuxboot
Hadoop Ecosystem
2
演講大綱 Agenda

Linux is everywhere

開放無所不在

What is Big Data ?

何謂巨量資料

Big Data in Motion !

即時巨資應用

The Next Big Thing ? 下半場的重點
Conclusion

三大結論回顧
3
3 Buzzwords in 2013
三大年度熱門關鍵字

物聯網
Internet of Things
雲端運算
Cloud Computing
巨量資料
Big Data
4
巨量資料的奇幻漂流
     Life of Big Data

5
Linux Adoption in Enterprise
increase in last 3 years

Source: "2013 Enterprise End User Report",
Linux Foundation, March 2013
http://www.linuxfoundation.org/publications/linux-foundation/linux-adoption-trends-end-user-report-2013

6
Linux in the Cloud
← Amazon
Yahoo ↓

If you have 4000+
server, which OS will
you choose?
http://www.datacenterknowledge.com/archives/2010/09/20/inside-the-yahoo-computing-coop/

http://bits.blogs.nytimes.com/2013/01/08/amazons-unknown-unknowns/?_r=0

7
Linux in the Devices !!
Linux have dominated Embedded, Mobile .....
maybe Internet of Things in near future ….

http://crackberry.com/sites/crackberry.com/files/styles/large/public/topic_images/2013/ANDROID.png
http://boxysystems.com/myblog/wp-content/uploads/2011/01/google-chrome-OS-logo.jpg

Source: The most popular end-user Linux distributions are ...
http://www.zdnet.com/the-most-popular-end-user-linux-distributions-are-7000017223/

8
Why Linux ?
Total Cost of Ownership !

Source: How Red Hat Enterprise Linux trims total cost of ownership (TCO) in comparison to Windows Server, October 7, 2013
http://www.redhat.com/about/news/archive/2013/10/how-red-hat-enterprise-linux-trims-total-cost-of-ownership-in-comparison-to-windows-server

9
演講大綱 Agenda

Linux is everywhere

開放無所不在

What is Big Data ?

何謂巨量資料

Big Data in Motion !

即時巨資應用

The Next Big Thing ? 下半場的重點
Conclusion

三大結論回顧
10
巨量資料的三大挑戰
Challenges - 3 Vs of Big Data
Volume 資料數量
(amount of data)
EB

參考來源:
[1] Laney, Douglas. "3D Data Management: Controlling
Data Volume, Velocity and Variety" (6 February 2001)
[2] Gartner Says Solving 'Big Data' Challenge Involves
More Than Just Managing Volumes of Data, June 2011

Structured
結構化資料

Batch ( 批次作業 )

Semi-structured
半結構化資料

PB

Unstructured
非結構化資料
Variety 資料多樣性
(data types, sources)

Realtime ( 即時資料 )
TB

Velocity 資料增加率
(speed of data in/out)

巨量資料的挑戰在於如何管理「數量」、「增加率」與「多樣性」

11 11
知識源自彙整過去,智慧在能預測未來
Knowledge is from the PAST, Wisdom is for the FUTURE.
資料多寡不是重點,重點是
我們想要產生什麼價值呢?
時效合理嘛?成本合理嘛?

It does not
matter how big is
your data. The
goal is to create
VALUE within
reasonable time
period and total
cost of
ownership.
http://www.pursuantgroup.com/blog/tag/dikw-model/

12 12
PAST: Big Data at Rest

http://nextbigsound.com/

http://www.openpdc.com/
Source: 10 ways big data changes everything,
http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything

13
處理巨量資料的三類技術 (1)
Data at Rest – MapReduce Framework

e

Volume

e
uc
ed
R
ap
M
Batch

Hadoop
HPCC
Unstructured
Variety

EB

PB

TB

Structured
Petabyte File System

m
ra
F

k
or
w

Realtime

Velocity

14 14
巨量資料處理的資訊架構
The SMAQ stack for big data
LAMP is for Web Services.

未來處理海量資料的人必需知道
You will need a big data stack called
SMAQ ( Storage, MapReduce and Query )

參考來源: The SMAQ stack for big data , Edd Dumbill , 22 September 2010 ,   
      http://radar.oreilly.com/2010/09/the-smaq-stack-for-big-data.html
圖片來源: http://smashingweb.ge6.org/wp-content/uploads/2011/10/apache-php-mysql-ubuntu.png

15
高資料通量處理平台 Hadoop
Key Concept : Data Locality

Hadoop is a framework for developer
to wrote and execute massive data
processing applications easily.
Hadoop includes two parts: HDFS and MapReduce.
Warehouse
for data source and
output results.
HDFS stores
unstructure data
and structure data

Processing

Grouping

Map
One in
One out

Reduce
Multiple in, One out
16
批次作業的運算時間
Processing Time of Batch Jobs

17
演講大綱 Agenda

Linux is everywhere

開放無所不在

What is Big Data ?

何謂巨量資料

Big Data in Motion !

即時巨資應用

The Next Big Thing ? 下半場的重點
Conclusion

三大結論回顧
18
NOW: Big Data in Motion

[ 金融 ] Trading Robot

[ 災防 ] 海嘯、土石流
Disaster Prevention
Tsunami Forecast
[ 資訊 ] 機房即時用電資訊監控、警訊

Realtime Data Center Power Usage
and related notifications

http://www.newmobilelife.com/2013/04/21/facebook-pue-real-time-charts/

19
處理巨量資料的三類技術 (2)
Data in Motion – In-Memory Processing

Volume
EB

Batch

HBase / Drill
/ Impala

PB

Unstructured
Variety

Structured

Realtime
TB

Velocity

20 20
Google 的技術演進 vs
Apache 專案
Big Query
(JSON, SQL-like)
Incremental Index Update
(Caffeine)
Graph Database

Dremel
(2010)

Apache Drill
(2012)

Percolator
(2010)
Pregel
(2009)

Apache Giraph
(2011)

BigTable
(2006)

Apache HBase
(2007)

MapReduce
(2004)

Hadoop MapReduce
(2006)

Google File System
(2003)

HDFS
(2006)

21
令人眼花撩亂的多樣化資料庫選擇
NoSQL vs NewSQL

http://www.infoq.com/news/2011/04/newsql

22 22
In-Memory Processing 的運算時間
以 HBase 為例

23
處理巨量資料的三類技術 (3)
Streaming Data Collection
Volume
EB

Structured

Batch
PB

Unstructured
Variety

Realtime
TB

Message Queue
Storm / Kafka

Velocity

24 24
Twitter Storm
+ Apache Kafka

http://blog.infochimps.com/2012/10/30/next-gen-real-time-streaming-storm-kafka-integration/

25
混合模式的巨量資料處理架構
Lambda Architecture for Big Data after 2013
HBase
Storm

ElephantDB
Or
Voldemort

Source: Lambda Architecture, 8. March 2013
http://www.ymc.ch/en/lambda-architecture-part-1
26
結論二: Golden Ratio for Big Data
1 core : 2+ GB RAM : 1 SSD Disk

FLOPS=~IOPS

27
演講大綱 Agenda

Linux is everywhere

開放無所不在

What is Big Data ?

何謂巨量資料

Big Data in Motion !

即時巨資應用

The Next Big Thing ? 下半場的重點
Conclusion

三大結論回顧
28
市場現況: Gartner Hype Cycle 2013
萌芽期

夢幻期

幻滅期

平原期

高原期

29
NEXT: Big Data Security
品質管控

當我們緊密相連 .....

世界政經:歐盟想分 Tweeter
找出經濟、政治的脈動

國家安全:美國 PRISM 計劃
( 網軍 ! 終極警探 4.0 )

權限管控
組織如何因應 APT ?
Big Data 平台本身的安全性 ?

有太多安全的問題等待解決!

數量管控

Source: Gartner (March 2011), 'Big
Data' Is Only the Beginning of
Extreme Information Management,
April 2011,
http://www.gartner.com/id=1622715
30
結論三: For Big Data Security,
buy hardware with encryption support
Power Hardware

PowerVM

PowerSC

Systems

Virtualization

Security &
Compliance

Power Hardware
with security built
in & hardware
assists for
encryption

Secure Virtualization
Platform ensuring
isolation integrity

Virtualization Centric
Security Extensions
to protect the Cloud
and Virtual Data
Center

AIX Operating
System

Systems Software

Secure Operating
Systems– defense in
depth: role based
access, trusted
execution, encrypted
file system
31
演講大綱 Agenda

Linux is everywhere

開放無所不在

What is Big Data ?

何謂巨量資料

Big Data in Motion !

即時巨資應用

The Next Big Thing ? 下半場的重點
Conclusion

三大結論回顧
32
Conclusion:
1.Switch to Linux, it helps to reduce TCO (total cost of
ownership) about 34% !
2.Hadoop only solve the problem for Big Data at Rest. You will
need In-Memory Processing tools such as HBase, Spark,
Impala for Big Data in Motion. To cleaning Big Data in realtime, you could try Message Queue, Storm and Kafka.
3.Consider to buy servers with SSD disk, it helps to improve
processing time for Big Data at Rest. For Big Data in Motion,
you will need tools like In-Memory Processing. Remember to
buy more RAM for your cluster.
4.To protect your big data, please survey hardwares and
operation system (OS) which have hardware encryption
support !
33
問題與討論
Questions?

34

Más contenido relacionado

La actualidad más candente

Büyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi GörmekBüyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi Görmekideaport
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureRoman Nikitchenko
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
 
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...Romeo Kienzler
 
Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopHadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopAdam Muise
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata Mk Kim
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IEdureka!
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introductionFrans van Noort
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoopAdam Muise
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013Adam Muise
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataHaluan Irsad
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopFebiyan Rachman
 
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Ashok Royal
 

La actualidad más candente (20)

Büyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi GörmekBüyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi Görmek
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Thinking BIG
Thinking BIGThinking BIG
Thinking BIG
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
 
Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopHadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of Hadoop
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -I
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 
2014 july 24_what_ishadoop
2014 july 24_what_ishadoop2014 july 24_what_ishadoop
2014 july 24_what_ishadoop
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013What is Hadoop? Oct 17 2013
What is Hadoop? Oct 17 2013
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
社群、協會、國際連結
社群、協會、國際連結社群、協會、國際連結
社群、協會、國際連結
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
 

Destacado

Hadoop Deployment Model @ OSDC.TW
Hadoop Deployment Model @ OSDC.TWHadoop Deployment Model @ OSDC.TW
Hadoop Deployment Model @ OSDC.TWJazz Yao-Tsung Wang
 
2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法
2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法
2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法Jazz Yao-Tsung Wang
 
13 09-28 hadoop-in_taiwan_2013_opening
13 09-28 hadoop-in_taiwan_2013_opening13 09-28 hadoop-in_taiwan_2013_opening
13 09-28 hadoop-in_taiwan_2013_openingJazz Yao-Tsung Wang
 
2006-11-16 RFID and OSS for Agriculture
2006-11-16 RFID and OSS for Agriculture2006-11-16 RFID and OSS for Agriculture
2006-11-16 RFID and OSS for AgricultureJazz Yao-Tsung Wang
 
Build Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and HaduzillaBuild Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and HaduzillaJazz Yao-Tsung Wang
 
2014-10-17 探析台灣巨量資料產業供應鏈串聯現況
2014-10-17 探析台灣巨量資料產業供應鏈串聯現況2014-10-17 探析台灣巨量資料產業供應鏈串聯現況
2014-10-17 探析台灣巨量資料產業供應鏈串聯現況Jazz Yao-Tsung Wang
 
ClassCloud: switch your PC Classroom into Cloud Testbed
ClassCloud: switch your PC Classroom into Cloud TestbedClassCloud: switch your PC Classroom into Cloud Testbed
ClassCloud: switch your PC Classroom into Cloud TestbedJazz Yao-Tsung Wang
 
2016-07-12 Introduction to Big Data Platform Security
2016-07-12 Introduction to Big Data Platform Security2016-07-12 Introduction to Big Data Platform Security
2016-07-12 Introduction to Big Data Platform SecurityJazz Yao-Tsung Wang
 
Big Data Projet Management the Body of Knowledge (BDPMBOK)
Big Data Projet Management the Body of Knowledge (BDPMBOK)Big Data Projet Management the Body of Knowledge (BDPMBOK)
Big Data Projet Management the Body of Knowledge (BDPMBOK)Jazz Yao-Tsung Wang
 
Hadoop 生態系十年回顧與未來展望
Hadoop 生態系十年回顧與未來展望Hadoop 生態系十年回顧與未來展望
Hadoop 生態系十年回顧與未來展望Jazz Yao-Tsung Wang
 

Destacado (11)

Hadoop Deployment Model @ OSDC.TW
Hadoop Deployment Model @ OSDC.TWHadoop Deployment Model @ OSDC.TW
Hadoop Deployment Model @ OSDC.TW
 
2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法
2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法
2015-05-20 製造業生產歷程全方位整合查詢與探勘的規劃心法
 
13 09-28 hadoop-in_taiwan_2013_opening
13 09-28 hadoop-in_taiwan_2013_opening13 09-28 hadoop-in_taiwan_2013_opening
13 09-28 hadoop-in_taiwan_2013_opening
 
2006-11-16 RFID and OSS for Agriculture
2006-11-16 RFID and OSS for Agriculture2006-11-16 RFID and OSS for Agriculture
2006-11-16 RFID and OSS for Agriculture
 
Build Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and HaduzillaBuild Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and Haduzilla
 
2014-10-17 探析台灣巨量資料產業供應鏈串聯現況
2014-10-17 探析台灣巨量資料產業供應鏈串聯現況2014-10-17 探析台灣巨量資料產業供應鏈串聯現況
2014-10-17 探析台灣巨量資料產業供應鏈串聯現況
 
ClassCloud: switch your PC Classroom into Cloud Testbed
ClassCloud: switch your PC Classroom into Cloud TestbedClassCloud: switch your PC Classroom into Cloud Testbed
ClassCloud: switch your PC Classroom into Cloud Testbed
 
2016-07-12 Introduction to Big Data Platform Security
2016-07-12 Introduction to Big Data Platform Security2016-07-12 Introduction to Big Data Platform Security
2016-07-12 Introduction to Big Data Platform Security
 
Big Data Projet Management the Body of Knowledge (BDPMBOK)
Big Data Projet Management the Body of Knowledge (BDPMBOK)Big Data Projet Management the Body of Knowledge (BDPMBOK)
Big Data Projet Management the Body of Knowledge (BDPMBOK)
 
Hadoop.TW : Now and Future
Hadoop.TW : Now and FutureHadoop.TW : Now and Future
Hadoop.TW : Now and Future
 
Hadoop 生態系十年回顧與未來展望
Hadoop 生態系十年回顧與未來展望Hadoop 生態系十年回顧與未來展望
Hadoop 生態系十年回顧與未來展望
 

Similar a Big Data Tools : PAST, NOW and FUTURE

Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Inside Analysis
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond Rajesh Kumar
 
20150630 kca big-data-with-cloud_output
20150630 kca big-data-with-cloud_output20150630 kca big-data-with-cloud_output
20150630 kca big-data-with-cloud_outputericpi Bi
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantStuart Miniman
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1gauravsc36
 
Data analysis trend 2015 2016 v071
Data analysis trend 2015 2016 v071Data analysis trend 2015 2016 v071
Data analysis trend 2015 2016 v071Chun Myung Kyu
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPeter Wang
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Scott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science sessionScott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science sessionGigaScience, BGI Hong Kong
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life RevolutionCapgemini
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course pptNjain85
 

Similar a Big Data Tools : PAST, NOW and FUTURE (20)

Bigdata notes
Bigdata notesBigdata notes
Bigdata notes
 
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
20150630 kca big-data-with-cloud_output
20150630 kca big-data-with-cloud_output20150630 kca big-data-with-cloud_output
20150630 kca big-data-with-cloud_output
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Big data
Big dataBig data
Big data
 
Data analysis trend 2015 2016 v071
Data analysis trend 2015 2016 v071Data analysis trend 2015 2016 v071
Data analysis trend 2015 2016 v071
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 
On Big Data
On Big DataOn Big Data
On Big Data
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big Data 2.0
Big Data 2.0Big Data 2.0
Big Data 2.0
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Scott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science sessionScott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science session
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 
Unit 1
Unit 1Unit 1
Unit 1
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course ppt
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 

Último

All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementNuwan Dias
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Alexander Turgeon
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdfPaige Cruz
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Juan Carlos Gonzalez
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 

Último (20)

All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
The Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API ManagementThe Kubernetes Gateway API and its role in Cloud Native API Management
The Kubernetes Gateway API and its role in Cloud Native API Management
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 

Big Data Tools : PAST, NOW and FUTURE

  • 1. 巨量資料處理工具的過去、現在與未來 三種處理工具與未來的挑戰 Big Data Tools : PAST, NOW and FUTURE - Three Types of Processing Tools and the Next Big Thing National Center for High-performance Computing Jazz Yao-Tsung Wang <0303109@narlabs.org.tw>
  • 2. WHO AM I ? JAZZ ? • About Speaker : – Jazz Yao-Tsung Wang / Associate Researcher – Co-Founder and Evangelist of Hadoop.TW – 0303109@narlabs.org.tw • All slides and training materials could be found at – http://trac.nchc.org.tw/cloud – Be Green ! Try not to print the slides. It changes frequently. FOSS User Debian/Ubutnu Access Grid Motion/VLC Red5 Debian Router DRBL/Clonezilla Hadoop FOSS Developer TRTC WSU/ Haduzilla / Hadop4Win / Ezilla FOSS Evangelist DRBL/Clonezilla Partclone/Tuxboot Hadoop Ecosystem 2
  • 3. 演講大綱 Agenda Linux is everywhere 開放無所不在 What is Big Data ? 何謂巨量資料 Big Data in Motion ! 即時巨資應用 The Next Big Thing ? 下半場的重點 Conclusion 三大結論回顧 3
  • 4. 3 Buzzwords in 2013 三大年度熱門關鍵字 物聯網 Internet of Things 雲端運算 Cloud Computing 巨量資料 Big Data 4
  • 6. Linux Adoption in Enterprise increase in last 3 years Source: "2013 Enterprise End User Report", Linux Foundation, March 2013 http://www.linuxfoundation.org/publications/linux-foundation/linux-adoption-trends-end-user-report-2013 6
  • 7. Linux in the Cloud ← Amazon Yahoo ↓ If you have 4000+ server, which OS will you choose? http://www.datacenterknowledge.com/archives/2010/09/20/inside-the-yahoo-computing-coop/ http://bits.blogs.nytimes.com/2013/01/08/amazons-unknown-unknowns/?_r=0 7
  • 8. Linux in the Devices !! Linux have dominated Embedded, Mobile ..... maybe Internet of Things in near future …. http://crackberry.com/sites/crackberry.com/files/styles/large/public/topic_images/2013/ANDROID.png http://boxysystems.com/myblog/wp-content/uploads/2011/01/google-chrome-OS-logo.jpg Source: The most popular end-user Linux distributions are ... http://www.zdnet.com/the-most-popular-end-user-linux-distributions-are-7000017223/ 8
  • 9. Why Linux ? Total Cost of Ownership ! Source: How Red Hat Enterprise Linux trims total cost of ownership (TCO) in comparison to Windows Server, October 7, 2013 http://www.redhat.com/about/news/archive/2013/10/how-red-hat-enterprise-linux-trims-total-cost-of-ownership-in-comparison-to-windows-server 9
  • 10. 演講大綱 Agenda Linux is everywhere 開放無所不在 What is Big Data ? 何謂巨量資料 Big Data in Motion ! 即時巨資應用 The Next Big Thing ? 下半場的重點 Conclusion 三大結論回顧 10
  • 11. 巨量資料的三大挑戰 Challenges - 3 Vs of Big Data Volume 資料數量 (amount of data) EB 參考來源: [1] Laney, Douglas. "3D Data Management: Controlling Data Volume, Velocity and Variety" (6 February 2001) [2] Gartner Says Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data, June 2011 Structured 結構化資料 Batch ( 批次作業 ) Semi-structured 半結構化資料 PB Unstructured 非結構化資料 Variety 資料多樣性 (data types, sources) Realtime ( 即時資料 ) TB Velocity 資料增加率 (speed of data in/out) 巨量資料的挑戰在於如何管理「數量」、「增加率」與「多樣性」 11 11
  • 12. 知識源自彙整過去,智慧在能預測未來 Knowledge is from the PAST, Wisdom is for the FUTURE. 資料多寡不是重點,重點是 我們想要產生什麼價值呢? 時效合理嘛?成本合理嘛? It does not matter how big is your data. The goal is to create VALUE within reasonable time period and total cost of ownership. http://www.pursuantgroup.com/blog/tag/dikw-model/ 12 12
  • 13. PAST: Big Data at Rest http://nextbigsound.com/ http://www.openpdc.com/ Source: 10 ways big data changes everything, http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything 13
  • 14. 處理巨量資料的三類技術 (1) Data at Rest – MapReduce Framework e Volume e uc ed R ap M Batch Hadoop HPCC Unstructured Variety EB PB TB Structured Petabyte File System m ra F k or w Realtime Velocity 14 14
  • 15. 巨量資料處理的資訊架構 The SMAQ stack for big data LAMP is for Web Services. 未來處理海量資料的人必需知道 You will need a big data stack called SMAQ ( Storage, MapReduce and Query ) 參考來源: The SMAQ stack for big data , Edd Dumbill , 22 September 2010 ,          http://radar.oreilly.com/2010/09/the-smaq-stack-for-big-data.html 圖片來源: http://smashingweb.ge6.org/wp-content/uploads/2011/10/apache-php-mysql-ubuntu.png 15
  • 16. 高資料通量處理平台 Hadoop Key Concept : Data Locality Hadoop is a framework for developer to wrote and execute massive data processing applications easily. Hadoop includes two parts: HDFS and MapReduce. Warehouse for data source and output results. HDFS stores unstructure data and structure data Processing Grouping Map One in One out Reduce Multiple in, One out 16
  • 18. 演講大綱 Agenda Linux is everywhere 開放無所不在 What is Big Data ? 何謂巨量資料 Big Data in Motion ! 即時巨資應用 The Next Big Thing ? 下半場的重點 Conclusion 三大結論回顧 18
  • 19. NOW: Big Data in Motion [ 金融 ] Trading Robot [ 災防 ] 海嘯、土石流 Disaster Prevention Tsunami Forecast [ 資訊 ] 機房即時用電資訊監控、警訊 Realtime Data Center Power Usage and related notifications http://www.newmobilelife.com/2013/04/21/facebook-pue-real-time-charts/ 19
  • 20. 處理巨量資料的三類技術 (2) Data in Motion – In-Memory Processing Volume EB Batch HBase / Drill / Impala PB Unstructured Variety Structured Realtime TB Velocity 20 20
  • 21. Google 的技術演進 vs Apache 專案 Big Query (JSON, SQL-like) Incremental Index Update (Caffeine) Graph Database Dremel (2010) Apache Drill (2012) Percolator (2010) Pregel (2009) Apache Giraph (2011) BigTable (2006) Apache HBase (2007) MapReduce (2004) Hadoop MapReduce (2006) Google File System (2003) HDFS (2006) 21
  • 24. 處理巨量資料的三類技術 (3) Streaming Data Collection Volume EB Structured Batch PB Unstructured Variety Realtime TB Message Queue Storm / Kafka Velocity 24 24
  • 25. Twitter Storm + Apache Kafka http://blog.infochimps.com/2012/10/30/next-gen-real-time-streaming-storm-kafka-integration/ 25
  • 26. 混合模式的巨量資料處理架構 Lambda Architecture for Big Data after 2013 HBase Storm ElephantDB Or Voldemort Source: Lambda Architecture, 8. March 2013 http://www.ymc.ch/en/lambda-architecture-part-1 26
  • 27. 結論二: Golden Ratio for Big Data 1 core : 2+ GB RAM : 1 SSD Disk FLOPS=~IOPS 27
  • 28. 演講大綱 Agenda Linux is everywhere 開放無所不在 What is Big Data ? 何謂巨量資料 Big Data in Motion ! 即時巨資應用 The Next Big Thing ? 下半場的重點 Conclusion 三大結論回顧 28
  • 29. 市場現況: Gartner Hype Cycle 2013 萌芽期 夢幻期 幻滅期 平原期 高原期 29
  • 30. NEXT: Big Data Security 品質管控 當我們緊密相連 ..... 世界政經:歐盟想分 Tweeter 找出經濟、政治的脈動 國家安全:美國 PRISM 計劃 ( 網軍 ! 終極警探 4.0 ) 權限管控 組織如何因應 APT ? Big Data 平台本身的安全性 ? 有太多安全的問題等待解決! 數量管控 Source: Gartner (March 2011), 'Big Data' Is Only the Beginning of Extreme Information Management, April 2011, http://www.gartner.com/id=1622715 30
  • 31. 結論三: For Big Data Security, buy hardware with encryption support Power Hardware PowerVM PowerSC Systems Virtualization Security & Compliance Power Hardware with security built in & hardware assists for encryption Secure Virtualization Platform ensuring isolation integrity Virtualization Centric Security Extensions to protect the Cloud and Virtual Data Center AIX Operating System Systems Software Secure Operating Systems– defense in depth: role based access, trusted execution, encrypted file system 31
  • 32. 演講大綱 Agenda Linux is everywhere 開放無所不在 What is Big Data ? 何謂巨量資料 Big Data in Motion ! 即時巨資應用 The Next Big Thing ? 下半場的重點 Conclusion 三大結論回顧 32
  • 33. Conclusion: 1.Switch to Linux, it helps to reduce TCO (total cost of ownership) about 34% ! 2.Hadoop only solve the problem for Big Data at Rest. You will need In-Memory Processing tools such as HBase, Spark, Impala for Big Data in Motion. To cleaning Big Data in realtime, you could try Message Queue, Storm and Kafka. 3.Consider to buy servers with SSD disk, it helps to improve processing time for Big Data at Rest. For Big Data in Motion, you will need tools like In-Memory Processing. Remember to buy more RAM for your cluster. 4.To protect your big data, please survey hardwares and operation system (OS) which have hardware encryption support ! 33