SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
Scalable	Data	Warehousing	on	
Hadoop
Alan	F.	Gates,	Co-founder,	Hortonworks
2 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What	Do	You	Expect	in	a	Hadoop	Data	Warehouse?
Benchmarks	focus	on	two	questions:
– How	much	of	the	TPC-DS	query	set	can	it	run?
– How	fast	can	it	run	it?
3 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What	You	Expect	in	a	Data	Warehouse?
High	Performance	
SQL	2011
High	Storage	Capacity
Security
Support	for	BI,
Cubes,	Data	Science
Monitoring	&	Management
Governance
Data	Lifecycle	Management
Replication	&	D/R
Workload	Management
Data	Ingestion
4 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
So,	back	to	TPC-DS...
High	Performance	
SQL	2011
5 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Apache	Hive	Overview
Apache	Hive is	a	SQL	data	warehouse	engine	that	
delivers	fast,	scalable	SQL	processing	on	Hadoop	and	
in	the	Cloud.
Features:
• Extensive	SQL:2011	Support
• ACID	Transactions
• In-Memory	Caching
• Cost-Based	Optimizer
• User-Based	Dynamic	Security
• JDBC	and	ODBC	Support
• Compatible	with	every	major	BI	Tool
• Proven	at	300+	PB	Scale
6 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Apache	Hive:	Fast	Facts
Most	Queries	Per	Hour
100,000	Queries	Per	Hour
(Yahoo	Japan)
Analytics	Performance
100	Million	rows/s	Per	Node
(with	Hive	LLAP)
Largest	Hive	Warehouse
300+	PB	Raw	Storage
(Facebook)
Largest	Cluster
4,500+	Nodes
(Yahoo)
7 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Data	Types SQL Features File Formats Hive 2
Numeric Core	SQL	Features Columnar ACID	MERGE
FLOAT,	DOUBLE Date,	Time and	Arithmetical	Functions ORCFile Multi	Subquery
DECIMAL INNER,	OUTER,	CROSS	and	SEMI	Joins Parquet Scalar	Subqueries
INT, TINYINT, SMALLINT, BIGINT Derived	Table	Subqueries Text Non-Equijoins
BOOLEAN Correlated	+ Uncorrelated	Subqueries CSV INTERSECT	/	EXCEPT
String UNION	ALL Logfile
CHAR,	VARCHAR UDFs, UDAFs,	UDTFs Nested	/	Complex Recursive CTEs
BLOB (BINARY),	CLOB	(String) Common	Table	Expressions Avro NOT	NULL	Constraints
Date, Time UNION	DISTINCT JSON Default	Values
DATE,	TIMESTAMP,	Interval	Types Advanced	Analytics XML Multi-statement	Transactions
Complex	Types OLAP	and	Windowing	Functions Custom	Formats
ARRAY	/	MAP	/	STRUCT	/	UNION OLAP:	Partition, Order	by	UDAF Other	Features
Nested	Data	Analytics CUBE and	Grouping	Sets XPath Analytics
Nested	Data	Traversal ACID	Transactions
Lateral	Views INSERT	/	UPDATE	/	DELETE
Procedural	Extensions Constraints
HPL/SQL Primary /	Foreign	Key	(Non	Validated)
Apache	Hive:	Journey	to	SQL:2011	Analytics
Legend
New
Future	work
Hive	2
Track	Hive	SQL:2011	Complete: HIVE-13554
8 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hive	2	with	LLAP:	Architecture	Overview
Deep	
Storage
YARN	Cluster
LLAP	Daemon
Query	
Executors
LLAP	Daemon
Query	
Executors
LLAP	Daemon
Query	
Executors
LLAP	Daemon
Query	
Executors
Query
Coordinators
Coord-
inator
Coord-
inator
Coord-
inator
HiveServer2	
(Query	
Endpoint)
ODBC	/
JDBC
SQL
Queries In-Memory	Cache
(Shared	Across	All	Users)
HDFS	and	
Compatible
S3 WASB Isilon
9 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
0
5
10
15
20
25
30
35
40
45
50
0
50
100
150
200
250
Speedup	(x	Factor)
Query	Time(s)	(Lower	is	Better)
Hive	2	with	LLAP	averages	26x	faster	than	Hive	1
Hive	1	/	Tez	Time	(s) Hive	2	/	LLAP	Time(s) Speedup	(x	Factor)
Hive	2	with	LLAP:	25+x	Performance	Boost:	Interactive	/	1TB	Scale
10 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Apache	Hive	vs.	Apache	Impala	at	10TB
à 10TB	scale	on	10	identical	
AWS	nodes.
à Hive	and	Impala	showed	
similar	times	on	most	
smaller	queries.
à Hive	scaled	better,	with	
many	queries	completing	in	
<2m	where	Impala	ran	to	
timeout	(3000s).
Highlights
11 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Apache	Hive	vs.	Presto	on	a	partitioned	1TB	dataset.
à Presto	lacks	basic	
performance	optimizations	
like	dynamic	partition	
pruning.
à On	a	real	dataset	/	workload	
Presto	perform	poorly	
without	full	re-writes.
à Example:	Query	55	without	
re-writes	=	185.17s,	with	re-
writes	=	16s.	LLAP	=	1.37s.
Highlights
12 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hive	LLAP:	Stable	Performance	under	High	Concurrency
4x	Queries,
2.8x
Runtime
Difference
5x	Queries,
4.6x
Runtime
Difference
Mark
Concurrent
Queries
Average
Runtime
5 7.76s
25 36.24s
100 102.89s
13 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
How	Much	Can	it	Hold,	and	Where?
High	Storage	Capacity
14 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Storage
à Of	course	HDFS,	default	in	the	Hadoop	world
à More	and	more	cloud
à Move	is	copy	in	S3,	but	current	implementation	assumes	move	is	atomic	and	nearly	free
– modifying	Hadoop	(HADOOP-11694)	and Hive (HIVE-14535)
à ACID	in	the	cloud
– Compactor	moves	a	lot	of	files	around,	need	to	optimize
– Need	to	figure	out	how	streaming	ingest	works	in	the	cloud
à LLAP,	caching	much	more	valuable	in	the	cloud
– Looking	at	flushing	cache	to	SSD	so	misses	are	less	costly
15 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Is	My	Data	Safe?
Security
16 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
• Wire	
encryption
• HDFS	
encryption	+
Ranger	KMS
• Centralized	
audit	
reporting	w/	
Apache	
Ranger
• Fine	grain	
access	control	
with	Apache	
Ranger
Security	today	in	Hadoop
Authorization
What	can	I	do?
Audit
What	did	I	do?
Data	Protection
Can	data	be	encrypted	at	
rest	and	over	the	wire?
• Kerberos
• API	security	
with	Apache	
Knox
Authentication
Who	am	I/prove	
it?
Centralized	Security	Administration	w/	Ranger	&	Knox
17 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Authentication—API	Security	with	Knox
• Eliminates SSH “edge node”
• Central API management
• Central audit control
• Service level authorization
• SSO - SAMLv2, Siteminder
and OAM
• LDAP and AD integration
• SSO for Hadoop UIs (Ranger,
Ambari..)
Apache	Knox	extends	the	reach	of	Hadoop	REST	API	without	
Kerberos	complexities
Integrated	with	existing	IdM
systems
Single,	simple	point	of	
access	for	a	cluster
Centralized		and	consistent	
secure	API	across	one	or	
more	clusters
• Kerberos Encapsulation
• Single Hadoop access point
• REST API hierarchy
• Consolidated API calls
• Multi-cluster support
18 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
LLAP	Data	Access
User	ID Region Total Spend
1 East 5,131
2 East 27,828
3 West 55,493
4 West 7,193
5 East 18,193
Apache	Ranger:	Per-User	Row	Filtering	by	Region	in	Hive
User	2
(East	Region)
User	1
(West	Region)
Original	Query:
SELECT	*	from	CUSTOMERS
WHERE	total_spend >	10000
Query	Rewrites	based	on
Dynamic	Ranger	Policies
Dynamic	Rewrite:
SELECT	*	from	CUSTOMERS
WHERE	total_spend >	10000
AND	region	=	“east”
Dynamic	Rewrite:
SELECT	*	from	CUSTOMERS
WHERE	total_spend >	10000
AND	region	=	“west”
19 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Apache	Ranger:	Dynamic	Data	Masking	of	Hive	Columns
R A N G E R
Protect	Sensitive	Data	in	real-time	with	Dynamic	Data	Masking/Obfuscation!
Goal:	Mask	or	anonymize	sensitive	columns	of	data	
(e.g.	PII,	PCI,	PHI)	from	Hive	query	output
⬢ Benefits
– Sensitive	information	never	leaves	database
– No	changes	are	required	at	the	application	or	Hive	layer
– No	need	to	produce	additional	protected	duplicate	
versions	of	datasets
– Simple &	easy	to	setup	masking	policies
⬢ Core	Technologies:	Ranger,	Hive
AT L A S
H I V E
20 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Dynamic	Tag-based	Access	Policies	with	Apache	Atlas
• Basic	Tag	policy	– PII	example.		Access	and	
entitlements	must	be	tag	based	ABAC	and	scalable	in	
implementation.			
• Geo-based	policy	– Policy	based	on	IP	address,	proxy	
IP	substitution	maybe	required.			The	rule	
enforcement	must	be	geo	aware.
• Time-based	policy	– Timer	for	data	access,	de-
coupled	from	deletion	of	data.
• Prohibitions – Prevention	of	combination	of	Hive	
tables	that	may	pose	a	risk	together.	
Key	Benefits:
New	scalable	metadata	
based	security	paradigm
Dynamic,	real-time	policy	
Active	protection	– fast	
updates	to	changes
Centralized	and	simple	to	
manage	policy
21 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What’s	There	and	Where	Did	It	Come	From?
Governance
22 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Sqoop
Teradata
Connector
Apache
Kafka
Apache Atlas: Cross-Component Dataset Lineage
Custom	
Activity	
Reporter
Metadata
Repository
RDBMS
Any	process	
using	Sqoop is	
covered
No	other	tool	
tracks	IOT	out	
of	the	box
23 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Apache	Atlas	Enables	Business	Catalog	for	Ease	of	Use	
à Organize	data	assets	along	business	terms
– Authoritative:	Hierarchical	Taxonomy	Creation
– Agile	modeling:		Model	Conceptual,	Logical,	Physical	assets
– Definition	and	assignment	of	tags	like	PII	(Personally	
Identifiable	Information)
à Comprehensive	features	for	compliance	
– Multiple	user	profiles	including	Data	Steward	and	Business	
Analysts
– Object	auditing	to	track	“Who	did	it”
– Metadata	Versioning	to	track	”what	did	they	do”
à Faster	Insight:
– Data	Quality	tab	for	profiling	and	sampling
– User	Comments
Key	Benefits:	
Organize	data	assets	along	
business	terms
Compliance	Features
Faster	Insight
24 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
How	Will	My	Users	Interact	With	It?
Support	for	BI,
Cubes,	Data	Science
25 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Druid:	Deep	Multidimensional	Analytics
Real-Time
Analytics
Hive	/	
Spark
BI	Tools
REST	
API
Superset	
UI
Events
Logs
Trans-
actions
Sensors
Historical
Sources
HDFS S3
Druid	Data	Cubes
Ultra-Fast	Analytics
Slice-and-Dice
Streaming
Sources
Storm
Kafka Spark
Deep,	Fast	Drilldown
Across	Any	Dimension
Scalably Ingest	Historical	Data	from
Transactional	and	Web	Systems
=	Future
26 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Druid’s	Role	in	Scalable	Data	Warehousing
UI
Core	Platform
S3	or	HDFS
HiveServer2
MDX
Unified	SQL	and	MDX	Layer
SQL	BI	Tools MDX	Tools
Hive
Realtime Feeds
(Kafka,	Storm,	etc.)
Druid
OLAP	Indexes
HiveServer2
Hive	SQL
Thrift	Server
SparkSQL
Fast	SQL MDX
Superset	UI
Fast	Exploration
Ranger
Atlas
Ambari	
Management
27 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Analytics	at	Scale	with	No	Data	Movement
Syncsort
High-Performance	
Data	Movement
Hadoop
Scalable	Storage	and	Compute
Hive	LLAP
High	Performance	SQL
AtScale	Intelligence	Platform
OLAP	Cubes	for	Higher	Performance
Source	Data	
Systems
Fast,	scalable	SQL	analytics
Intelligent	in-memory	caching
Define	OLAP	cubes	for	10x	faster	queries
Unified	semantic	layer	for	all	BI	tools
High	performance	data	import
from all	major	EDW	platforms
Pre-aggregated
data
...	Or,	full-fidelity
data
28 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Spark	Column	Security	with	LLAP
à Fine-Grained	Column	Level	Access	Control	for	SparkSQL.
à Fully	dynamic	policies	per	user.	Doesn’t	require	views.
à Use	Standard	Ranger	policies	and	tools	to	control	access	and	masking	policies.
Flow:
1. SparkSQL gets	data	locations	
known	as	“splits”	from	HiveServer
and	plans	query.
2. HiveServer2	authorizes	access	
using	Ranger.	Per-user	policies	
like	row	filtering	are	applied.
3. Spark	gets	a	modified	query	plan	
based	on	dynamic	security	policy.
4. Spark	reads	data	from	LLAP.	
Filtering	/	masking	guaranteed	by	
LLAP	server.
HiveServer2
Authorization
Hive	Metastore
Data	Locations
View	Definitions
LLAP
Data	Read
Filter	Pushdown
Ranger	Server
Dynamic	Policies
Spark	Client
1
2
4
3
29 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Apache	Zeppelin,	Attaches	to	Hive	and	Spark
30 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
But	Wait,	There’s	More
Monitoring	&	Management
Data	Lifecycle	Management
Replication	&	D/R
Data	Ingestion
31 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Scalable	Data	Warehousing	on	Hadoop
Capabilities
Batch	SQL OLAP	/	CubeInteractive	SQL
Sub-Second	
SQL
ACID	/	MERGE
Applications
• ETL
• Reporting
• Data	Mining
• Deep	Analytics
• Multidimensional	
Analytics
• MDX	Tools
• Excel
• Reporting
• BI	Tools:	Tableau,	
Microstrategy,	
Cognos
• Ad-Hoc
• Drill-Down
• BI	Tools:	Tableau,	
Excel
• Continuous	
Ingestion	from	
Operational	DBMS
• Slowly	Changing	
Dimensions
Existing
Development
Emerging
Legend
Core
Platform
Scale-Out	Storage
Petabyte	Scale	
Processing
Core	SQL	Engine
Apache	Tez:	Scalable	
Distributed	Processing
Advanced	Cost-Based	
Optimizer
Connectivity
Advanced	Security
JDBC	/	ODBC
Comprehensive
SQL:2011	Coverage
MDX
32 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
For	More	Details
à Today
– Running	Zeppelin	in	Enterprise	– 3:10
– Dancing	Elephants	– Efficiently	Working	with	Object	Stores	from	Apache	Spark	and	Hive	– 4:20
– Open	Metadata	and	Governance	with	Apache	Atlas	– 5:10
– LLAP:	Building	Cloud	First	BI	– 5:50pm
à Tomorrow
– Interactive	Analytics	At	Scale	in	Apache	Hive	Using	Druid	– 9:00
– Disaster	Recovery	and	Cloud	Migration	for	you	Apache	Hive	Warehouse	– 11:00
– LLAP:	Building	Cloud-First	BI	– 11:50
– Treat	Your	Enterprise	Data	Lake	Indigestion:	Enterprise	Ready	Security	and	Governance	– 3:10
– Birds	of	a	Feather	Session	for	Hive	and	HBase – 6:00

Más contenido relacionado

La actualidad más candente

Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
DataWorks Summit
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 

La actualidad más candente (20)

Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
ELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log ManagementELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log Management
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 

Similar a An Apache Hive Based Data Warehouse

Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
DataWorks Summit
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 

Similar a An Apache Hive Based Data Warehouse (20)

Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData Day
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 

Más de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Último (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 

An Apache Hive Based Data Warehouse