SlideShare una empresa de Scribd logo
1 de 10
Hive 进阶
目录
      HQL 函数


      HQL 语句


      HQL 优化


     Hive 参数优化
HQL 函数

    类型转换函数   cast (“1” as bigint )

                case when a.uid is null
     条件函数    then 0 else a.uid end = b.uid


                   regexp_extract
     文本函数          regexp_replace


    复杂数据函数     A[n]   、 M[key] 、 S.x
HQL 语句
                    Hive uses the columns in SORT
      Sort by         BY to sort the rows before
                    feeding the rows to a reducer


      Order by


                      Hive uses the columns in Distribute By
    Distribute By     to distribute the rows among reducers



                        Cluster By is a short-cut for
     Cluster By        both Distribute By and Sort By.
HQL 语句

          From *
    多表文件插入insert overwrite table
          insert overwrite directory

                  ADD { FILE[S] | JAR[S] |
    引入外部资源             ARCHIVE[S] }


                SELECT /*+ MAPJOIN(b) */
     Map Join   a.key, a.Value FROM a join b
                      on a.key = b.key
HQL 优化
                    尽量内存读写
                    map 的输出数据更均匀的分布
         目标         到 reduce 中去
                    explain

                    hive.groupby.skewindata
     Group by                =true



   Count Distinct   将值为空的情况单独处理
HQL 优化
         join


     用 join key 分布最均匀的表作为驱动表


     小表 Join 大表:使用 map join 让小
     的维度表先进内存。在 map 端完成 reduce

    大表 Join 大表:把空值的 key 变成一个字符
    串加上随机数,把倾斜的数据分到不同
    的 reduce 上
Hive 参数

     必要的列        hive.optimize.cp = true



                 hive.optimize.pruner=true
     必要的分区

                hive.optimize.bucke
                tmapjoin = true;
     Map Join   hive.optimize.bucketmapjoin.s
                ortedmerge = true;
Hive 参数


                set mapred.reduce.tasks
                hive.exec.reducers.bytes.per.reducer
    Reduce 个数   (默认为 1000^3 )
                hive.exec.reducers.max (默认为 999 )




                hive.merge.mapfiles =
                true 是否和并 Map 输出文件
                hive.merge.mapredfiles
     合并小文件      = false 是否合并 Reduce 输出文件
                hive.merge.size.per.task
                = fileSize 合并文件的大小
参考文献
•   http://wiki.apache.org/hadoop/Hive
•   http://www.tbdata.org/archives/622
•   http://www.tbdata.org/archives/595
•   http://www.tbdata.org/archives/2109

Más contenido relacionado

Destacado

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destacado (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Hive进阶

  • 2. 目录 HQL 函数 HQL 语句 HQL 优化 Hive 参数优化
  • 3. HQL 函数 类型转换函数 cast (“1” as bigint ) case when a.uid is null 条件函数 then 0 else a.uid end = b.uid regexp_extract 文本函数 regexp_replace 复杂数据函数 A[n] 、 M[key] 、 S.x
  • 4. HQL 语句 Hive uses the columns in SORT Sort by BY to sort the rows before feeding the rows to a reducer Order by Hive uses the columns in Distribute By Distribute By to distribute the rows among reducers Cluster By is a short-cut for Cluster By both Distribute By and Sort By.
  • 5. HQL 语句 From * 多表文件插入insert overwrite table insert overwrite directory ADD { FILE[S] | JAR[S] | 引入外部资源 ARCHIVE[S] } SELECT /*+ MAPJOIN(b) */ Map Join a.key, a.Value FROM a join b on a.key = b.key
  • 6. HQL 优化 尽量内存读写 map 的输出数据更均匀的分布 目标 到 reduce 中去 explain hive.groupby.skewindata Group by =true Count Distinct 将值为空的情况单独处理
  • 7. HQL 优化 join 用 join key 分布最均匀的表作为驱动表 小表 Join 大表:使用 map join 让小 的维度表先进内存。在 map 端完成 reduce 大表 Join 大表:把空值的 key 变成一个字符 串加上随机数,把倾斜的数据分到不同 的 reduce 上
  • 8. Hive 参数 必要的列 hive.optimize.cp = true hive.optimize.pruner=true 必要的分区 hive.optimize.bucke tmapjoin = true; Map Join hive.optimize.bucketmapjoin.s ortedmerge = true;
  • 9. Hive 参数 set mapred.reduce.tasks hive.exec.reducers.bytes.per.reducer Reduce 个数 (默认为 1000^3 ) hive.exec.reducers.max (默认为 999 ) hive.merge.mapfiles = true 是否和并 Map 输出文件 hive.merge.mapredfiles 合并小文件 = false 是否合并 Reduce 输出文件 hive.merge.size.per.task = fileSize 合并文件的大小
  • 10. 参考文献 • http://wiki.apache.org/hadoop/Hive • http://www.tbdata.org/archives/622 • http://www.tbdata.org/archives/595 • http://www.tbdata.org/archives/2109