19. S3 Filesystem
Query Optimizer
Parquet File Format
Complex Types
Multipart upload
Instance credentials
Role support
Reliability
Single distinct => Group By
Joins with similar subqueries
Schema evolution
Parquet 1.6
Various new
functions
Comparability
21. Single Distinct => Group By
select
count(distinct c)
from t
select count(*)
from (select c
from t
group by c)
Output
Count
Aggregation
masks = {column$distinct}
Distinct
marker = column$distinct
Table Scan
Output
Count
Aggregation
masks = {}
Group By
Aggregation
count
Table Scan
22. Joins with Similar Subqueries
select *
from (select k,
agg1,
agg2
from t
group by k) a
join (select k,
agg3,
agg4
from t
group by k) b
on ( a.k = b.k )
Output
Table Scan
table = t
Join
key= k
Group By
Aggregation
key= k
agg1, agg2
Group By
Aggregation
key= k
agg3, agg4
Table Scan
table = t
23. Output
Table Scan
table = t
Group By
Aggregation
key= k
agg1, agg2, agg3, agg4
select k, agg1,
agg2, agg3,
agg4
from t
group by k
Joins with Similar Subqueries