3. HQL 函数
类型转换函数 cast (“1” as bigint )
case when a.uid is null
条件函数 then 0 else a.uid end = b.uid
regexp_extract
文本函数 regexp_replace
复杂数据函数 A[n] 、 M[key] 、 S.x
4. HQL 语句
Hive uses the columns in SORT
Sort by BY to sort the rows before
feeding the rows to a reducer
Order by
Hive uses the columns in Distribute By
Distribute By to distribute the rows among reducers
Cluster By is a short-cut for
Cluster By both Distribute By and Sort By.
5. HQL 语句
From *
多表文件插入insert overwrite table
insert overwrite directory
ADD { FILE[S] | JAR[S] |
引入外部资源 ARCHIVE[S] }
SELECT /*+ MAPJOIN(b) */
Map Join a.key, a.Value FROM a join b
on a.key = b.key
6. HQL 优化
尽量内存读写
map 的输出数据更均匀的分布
目标 到 reduce 中去
explain
hive.groupby.skewindata
Group by =true
Count Distinct 将值为空的情况单独处理