SlideShare una empresa de Scribd logo
1 de 2
Descargar para leer sin conexión
读一块数据的基本流程                   一个解压流的对象关系 (以LZO为例)

       readBlock
   (压缩块在文件中的偏移量,
    硬盘上的压缩块大小,                        BufferedInputStream
                                                               FilterInputStream
解压后数据的大小[一般在块压缩文件中会                    (缓存Buffer是1KB)
      记住这个大小])




                                    包含底层流
compressAlgo.getDecompressor()
   根据用户选择的压缩算法获得一个
                                  BlockDecompressorStream
Decomprssor[可能是从CodecPool中得到或
                                     (解压buffer为64KB,           DecompressorStream        CompressionInputStream
             new出来]
                                    对应一个Decompressor)




                                    包含底层流
   根据前面设置的文件中的块 new
 BoundedRangeFileInputStream     BoundedRangeFileInputStream
 用来读取文件中的其中一块压缩数据                    (对应底层流中的一段数据
                                          start-end)
                                 可在同一个底层流上有多个,close
                                       时不会close底层流
        compressAlgo.
                                    包含底层流



 createDecompressionStream()
          获得解压流


                                        FSDataInputStream      DataInputStream
                                        (对应到HDFS上的文件)          Seekable, PositionedReadable
      在这个流上读数据
     读到的数据就是解压过的




          关闭该解压流
写一块数据的基本流程                   一个压缩流的对象关系 (以LZO为例)



          Start writeBlock
                                            DataOutputStream      FilterInputStream
                                   (最上层,为了能写各种类型的数据)              DataOutput




                                    包含底层流
  compressAlgo.getCompressor()
    根据用户选择的压缩算法获得一个
Comprssor[可能是从CodecPool中得到或new      BufferedOutputStream
              出来]                    (写缓存Buffer 4KB)               FilterOutputStream




                                    包含底层流
         compressAlgo.
   createCompressionStream()
           获得压缩流                 FinishOnFlushCompressionStream
                                   在flush的时候先调用底层压缩流的
                                                                      FilterOutputStream
                                 finish,然后flush,并reset底层流
                                          的resetStarte
                                    包含底层流




       new DataOutputStream
         用于写的直接接口



                                   BlockCompressorStream
                                                                  CompressorStream         CompressionOutputStream
                                      压缩buffer 64KB
     写各种各样的数据到这个流
                                    包含底层流




 在一个块写完的时候flush该流,但不必
close。因为close就会将底层的流都close              FSDataOutputStream
                                                                  DataOutputStream
掉,也就close了底层文件,我们必须在写                       底层的文件流
                                                                  Syncable
完所有的block后再单独close底层文件流

Más contenido relacionado

La actualidad más candente

Cassandra运维之道(office2003)
Cassandra运维之道(office2003)Cassandra运维之道(office2003)
Cassandra运维之道(office2003)haiyuan ning
 
網域名稱系統
網域名稱系統網域名稱系統
網域名稱系統祐豪 余
 
分布式系统中的 RPC 与串行化
分布式系统中的 RPC 与串行化分布式系统中的 RPC 与串行化
分布式系统中的 RPC 与串行化freeplant
 
Linux常用命令
Linux常用命令Linux常用命令
Linux常用命令Tony Deng
 
Google key technologies
Google key technologiesGoogle key technologies
Google key technologiesStefanie Zhao
 
Tcpcopy benchmark
Tcpcopy benchmarkTcpcopy benchmark
Tcpcopy benchmarkLouis liu
 
Mac os Terminal 常用指令與小技巧
Mac os Terminal 常用指令與小技巧Mac os Terminal 常用指令與小技巧
Mac os Terminal 常用指令與小技巧Chen Liwei
 
深入Docker的资源管理
深入Docker的资源管理深入Docker的资源管理
深入Docker的资源管理SpeedyCloud
 
程式設計師的自我修養 Chapter 8
程式設計師的自我修養 Chapter 8程式設計師的自我修養 Chapter 8
程式設計師的自我修養 Chapter 8Shu-Yu Fu
 
Ftn存储设计
Ftn存储设计Ftn存储设计
Ftn存储设计gzterrytan
 
如何解Zip壓縮檔(以Win Rar為例)
如何解Zip壓縮檔(以Win Rar為例)如何解Zip壓縮檔(以Win Rar為例)
如何解Zip壓縮檔(以Win Rar為例)p_yang
 
探索 Everything 背后的技术
探索 Everything 背后的技术探索 Everything 背后的技术
探索 Everything 背后的技术yiwenshengmei
 
Linux基础
Linux基础Linux基础
Linux基础Eric Lo
 
常用Mac/Linux命令分享
常用Mac/Linux命令分享常用Mac/Linux命令分享
常用Mac/Linux命令分享Yihua Huang
 

La actualidad más candente (19)

Cassandra运维之道(office2003)
Cassandra运维之道(office2003)Cassandra运维之道(office2003)
Cassandra运维之道(office2003)
 
網域名稱系統
網域名稱系統網域名稱系統
網域名稱系統
 
Os讀書會20170415
Os讀書會20170415Os讀書會20170415
Os讀書會20170415
 
分布式系统中的 RPC 与串行化
分布式系统中的 RPC 与串行化分布式系统中的 RPC 与串行化
分布式系统中的 RPC 与串行化
 
Linux常用命令
Linux常用命令Linux常用命令
Linux常用命令
 
linux mm
linux mmlinux mm
linux mm
 
Google key technologies
Google key technologiesGoogle key technologies
Google key technologies
 
使用dd命令快速复制LV
使用dd命令快速复制LV使用dd命令快速复制LV
使用dd命令快速复制LV
 
Tcpcopy benchmark
Tcpcopy benchmarkTcpcopy benchmark
Tcpcopy benchmark
 
Dropbox講義
Dropbox講義Dropbox講義
Dropbox講義
 
Mac os Terminal 常用指令與小技巧
Mac os Terminal 常用指令與小技巧Mac os Terminal 常用指令與小技巧
Mac os Terminal 常用指令與小技巧
 
深入Docker的资源管理
深入Docker的资源管理深入Docker的资源管理
深入Docker的资源管理
 
程式設計師的自我修養 Chapter 8
程式設計師的自我修養 Chapter 8程式設計師的自我修養 Chapter 8
程式設計師的自我修養 Chapter 8
 
Ftn存储设计
Ftn存储设计Ftn存储设计
Ftn存储设计
 
如何解Zip壓縮檔(以Win Rar為例)
如何解Zip壓縮檔(以Win Rar為例)如何解Zip壓縮檔(以Win Rar為例)
如何解Zip壓縮檔(以Win Rar為例)
 
探索 Everything 背后的技术
探索 Everything 背后的技术探索 Everything 背后的技术
探索 Everything 背后的技术
 
Make talk-cn
Make talk-cnMake talk-cn
Make talk-cn
 
Linux基础
Linux基础Linux基础
Linux基础
 
常用Mac/Linux命令分享
常用Mac/Linux命令分享常用Mac/Linux命令分享
常用Mac/Linux命令分享
 

Destacado

Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBaseSchubert Zhang
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aSchubert Zhang
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Schubert Zhang
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on HadoopSchubert Zhang
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile DevelopmentSchubert Zhang
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor IntroductionSchubert Zhang
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionSchubert Zhang
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验Schubert Zhang
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processingSchubert Zhang
 

Destacado (15)

Fans of running gump
Fans of running gumpFans of running gump
Fans of running gump
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBase
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223a
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on Hadoop
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile Development
 
Career Advice
Career AdviceCareer Advice
Career Advice
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 

Más de Schubert Zhang

Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and InfrastructureSchubert Zhang
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSchubert Zhang
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南Schubert Zhang
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Schubert Zhang
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aSchubert Zhang
 
Cassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationCassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationSchubert Zhang
 
The World of Structured Storage System
The World of Structured Storage SystemThe World of Structured Storage System
The World of Structured Storage SystemSchubert Zhang
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems ReviewSchubert Zhang
 
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Schubert Zhang
 
无线信息传媒的技术分析和商业模式
无线信息传媒的技术分析和商业模式无线信息传媒的技术分析和商业模式
无线信息传媒的技术分析和商业模式Schubert Zhang
 
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataSchubert Zhang
 
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value PairsHFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value PairsSchubert Zhang
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationSchubert Zhang
 

Más de Schubert Zhang (17)

Blockchain in Action
Blockchain in ActionBlockchain in Action
Blockchain in Action
 
科普区块链
科普区块链科普区块链
科普区块链
 
Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and Infrastructure
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluation
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
 
Cassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationCassandra Compression and Performance Evaluation
Cassandra Compression and Performance Evaluation
 
The World of Structured Storage System
The World of Structured Storage SystemThe World of Structured Storage System
The World of Structured Storage System
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems Review
 
Red Hat Global File System (GFS)
Red Hat Global File System (GFS)Red Hat Global File System (GFS)
Red Hat Global File System (GFS)
 
pNFS Introduction
pNFS IntroductionpNFS Introduction
pNFS Introduction
 
无线信息传媒的技术分析和商业模式
无线信息传媒的技术分析和商业模式无线信息传媒的技术分析和商业模式
无线信息传媒的技术分析和商业模式
 
Case Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of DataCase Study - How Rackspace Query Terabytes Of Data
Case Study - How Rackspace Query Terabytes Of Data
 
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value PairsHFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
 

Hadoop compress-stream

  • 1. 读一块数据的基本流程 一个解压流的对象关系 (以LZO为例) readBlock (压缩块在文件中的偏移量, 硬盘上的压缩块大小, BufferedInputStream FilterInputStream 解压后数据的大小[一般在块压缩文件中会 (缓存Buffer是1KB) 记住这个大小]) 包含底层流 compressAlgo.getDecompressor() 根据用户选择的压缩算法获得一个 BlockDecompressorStream Decomprssor[可能是从CodecPool中得到或 (解压buffer为64KB, DecompressorStream CompressionInputStream new出来] 对应一个Decompressor) 包含底层流 根据前面设置的文件中的块 new BoundedRangeFileInputStream BoundedRangeFileInputStream 用来读取文件中的其中一块压缩数据 (对应底层流中的一段数据 start-end) 可在同一个底层流上有多个,close 时不会close底层流 compressAlgo. 包含底层流 createDecompressionStream() 获得解压流 FSDataInputStream DataInputStream (对应到HDFS上的文件) Seekable, PositionedReadable 在这个流上读数据 读到的数据就是解压过的 关闭该解压流
  • 2. 写一块数据的基本流程 一个压缩流的对象关系 (以LZO为例) Start writeBlock DataOutputStream FilterInputStream (最上层,为了能写各种类型的数据) DataOutput 包含底层流 compressAlgo.getCompressor() 根据用户选择的压缩算法获得一个 Comprssor[可能是从CodecPool中得到或new BufferedOutputStream 出来] (写缓存Buffer 4KB) FilterOutputStream 包含底层流 compressAlgo. createCompressionStream() 获得压缩流 FinishOnFlushCompressionStream 在flush的时候先调用底层压缩流的 FilterOutputStream finish,然后flush,并reset底层流 的resetStarte 包含底层流 new DataOutputStream 用于写的直接接口 BlockCompressorStream CompressorStream CompressionOutputStream 压缩buffer 64KB 写各种各样的数据到这个流 包含底层流 在一个块写完的时候flush该流,但不必 close。因为close就会将底层的流都close FSDataOutputStream DataOutputStream 掉,也就close了底层文件,我们必须在写 底层的文件流 Syncable 完所有的block后再单独close底层文件流