Más contenido relacionado
La actualidad más candente (20)
Similar a Hadoop introduction (20)
Hadoop introduction
- 2. Outline
• Background
• Hello world
• Installation
• Related
12/20/12 2
- 3. Background
• Why Hadoop?
• Accessible: AWS
• Robust : handle most such failures
• Scalable: linearly
• Simple: 1 == 1 w
• Key Points:
• Scale-out
• Moving code to data
12/20/12 3
- 4. Background: History
• Apache Top Project: Doug Cutting
• Lucence -> Nutch -> Hadoop(2004)
• Yahoo (1w)
• Facebook (Hive, Hbase,…)
• HULU (Hbase)
• Baidu (3000TB, one week)
• Twitter (sweat data)
12/20/12 4
- 5. Background
• Comparing SQL database and Hadoop
• Structure:
• SQL(structure data, Specific Pattern)
• Hadoop(Key-value, like Text, Picture)
• Scale-out <- scale-up
• Key-Value <- Relation Tables
• Functional Programming <- Declarative Queries
• Offline batch processing <- Online (Once
Write , Read many times)
12/20/12 5
- 6. Background – Understanding
• Word Count
• File Size ++ , Memory Leak
• Disk-Hash Table (More complex)
• Distributed:
• Phase 1: Part Processing
• Phase 2: Merge Results
• Shuffle the partitions the appropriate machines(AlphaBeta)
• Now, We have already finish a minimal Hadoop.
12/20/12 6
- 7. Hello World: Word Count
• Two Phase:
• Mapping: 获取输入数据,并将其装载到 mapper 中
• Reducing: 处理来自 mapper 的所有输出,产生最终结果。
• 1.1 list(filename, file content)
• 1.2 list(word, 1)
• 2.1 list(word, list(word))
• 2.2 list(word, count)
12/20/12 7
- 9. Installation
• Mode:
• 单机模式( default)
• 伪分布模式 推荐开发和调试模式
• 全分布模式
• Configuration:
• 基本配置
• Ssh 配置
• Ubuntu 配置
12/20/12 9
- 10. Hadoop Framework
• HDFS:
• NameNode : 跟踪,指导,记录
• DataNode :底层 IO 操作
• Secondary NameNode
• Map Reduce :
• Job Tracker
• Task Tracker
12/20/12 10
- 11. Related
• Programming:
• Java
• Python
• Jython ( Translate Python )
• Hadoop Streaming ( stdin , stdout )
• Dumbo
• Happy
12/20/12 11
- 12. Related
• Pig: 高级数据流语言
• Hive: SQL 数据仓库
• Hbase : Google BigTable , 面向列的数据库
• ZookKeeper: 共享状态的协同系统
• Chukwa : 数据收集系统
• Mahout :数据挖掘与机器学习
• Hama: 矩阵计算
12/20/12 12
- 13. Resource
• Book:
• Hadoop In action
• Hadoop 实战 (第二版)
• Video && Google Course
• URL:
• 资源收藏
12/20/12 13
Notas del editor
- 素材天下 sucaitianxia.com