This talk will describe how Hue can be integrated with existing Hadoop deployments with minimal changes/disturbances. Romain will cover details on how Hue can leverage the existing authentication system and security model of your company. He will also cover the Hive/Shark/Pig/Oozie best practice setup for Hue.
http://www.meetup.com/hadoop/events/125191612/
2. WHAT
IS HUE?
WEB INTERFACE FOR MAKING
HADOOP EASIER TO USE
Suite of apps for each Hadoop component,
like Hive, Pig, Impala, Oozie, Solr, Sqoop2,
HBase...
5. TARGET
OF HUE
GETTING STARTED WITH HADOOP
BEING PRODUCTIVE EXPLORING
DIFFERENT ANGLES OF THE PLATFORM
!
LET ANY USER FOCUS ON BIG DATA
PROCESSING
BEING COMPATIBLE WITH ANY HADOOP
VERSION (0.20/1.2.0/2.3.0)
8. TALKS
Meetups and events in NYC,
Paris, LA, Tokyo, SF,
Stockholm, Vienna, San Jose,
Singapore…
Coming up in London, West
coast
AROUND
THE WORLD
RETREATS
Nov 13 Koh Chang, Thailand
May 14 Curaçao, Netherlands
Antilles
9. FAST PACE
LAST 30 DAYS
41 issues created and 38
resolved.
Core team + Community
11. HISTORY
HUE 1
Desktop-like in a browser,
did its job but pretty slow,
memory leaks and not very
IE friendly but definitely
advanced for its time
(2009-2010).
15. HISTORY
HUE 3.5+
Where we are now, new UI,
several new apps, the most
user friendly features to
date.
16. WHICH VERSION TO USE?
6 months
1k commits later1-2 years old
HUE 2.X HUE 3.X HUE 3.5 + 1/2 3.6
17. WHICH DISTRIBUTION?
Advanced preview The most stable and
cross component
checked
Very latest
GITHUB CDH / CMTARBALL
HACKER ADVANCED USER NORMAL USER
21. WHAT DO YOU NEED?
Python 2.4 2.6
That’s it if using a packaged version. If
building from the source, here are the extra
packages
SERVER CLIENT
Web Browser
IE 9+, FF 10+, Chrome, Safari
22. HOW DOES THE HUE SERVICE LOOK LIKE?
Process serving pages
and also static content
1 SERVER 1 DB
For cookies, saved
queries, workflows, …
23. HOW TO CONFIGURE HUE
HUE.INI
Similar to core-site.xml but
with .INI syntax
!
Where?
/etc/hue/conf/hue.ini
or
$HUE_HOME/desktop/conf/
pseudo-distributed.ini
[desktop]
[[database]]
# Database engine is typically one of:
# postgresql_psycopg2, mysql, or sqlite3
engine=sqlite3
## host=
## port=
## user=
## password=
name=desktop/desktop.db
28. LIST OF GROUPS AND PERMISSIONS
A permission can:
- allow access to one app
(e.g. Hive Editor)
- modify data from the app
(e.g drop Hive Tables or
edit cells in HBase Browser)
CONFIGURE APPS
AND PERMISSIONS
A list of permissions
29. PERMISSIONS IN ACTION
User ‘test’ belonging to the
group ‘hiveonly’ that has just
the ‘hive’ permissions
CONFIGURE APPS
AND PERMISSIONS
31. RCP CALLS TO ALL
THE HADOOP COMPONENTS
HDFS EXAMPLE
WebHDFS
REST
DN
DN
DN
…
DN
NN
http://localhost:50070/webhdfs/v1/<PATH>?op=LISTSTATUS
32. HOW
Host/port of all services like
Oozie, Yarn, HDFS, HBase…
APIs are specified in hue.ini
on sections, e.g. [hbase] by
major service, Hue core
[desktop] or Hue lib
[liboozie]
[hbase]
# Comma-separated list of HBase Thrift servers for
# clusters in the format of '(name|host:port)'.
hbase_clusters=(Cluster|localhost:9090)
!
[liboozie]
# The URL where the Oozie service runs on.
# oozie_url=http://hue.ent.cloudera.com:11000/oozie
RCP CALLS TO ALL
THE HADOOP COMPONENTS
Full list
33. KERBEROS
1 Hue ticket/ principal - no user ticket
!
Hue uses its ticket for authenticating to every other service
(HDFS, Oozie, …)
read more on the Hue Security Guide
34. HUE KERBEROS TICKET
kadmin: addprinc -randkey hue/hue.server.fully.qualified.domain.name@YOUR-REALM.COM
Add Hue user principal to Kerberos
$ kinit -k -t /etc/hue/hue.keytab hue/hue.server.fully.qualified.domain.name@YOUR-REALM.COM
Test
Ticket should be renewable (krb5.conf and kdc.conf)
[desktop]
[[kerberos]]
# Path to Hue's Kerberos keytab file
hue_keytab=/etc/hue/hue.keytab
# Kerberos principal name for Hue
hue_principal=hue/FQDN@REALM
# add kinit path for non root users
kinit_path=/usr/kerberos/bin/kinit
hue.ini
35. HOW
Hue is a “super proxy”
Client could be on a
Windows machine, phone…
and interact with all the
Hadoop services
http://localhost:50070/webhdfs/v1/tmp?
op=GETFILESTATUS&user.name=hue&doas=bob
IMPERSONATION
<!-- Hue WebHDFS proxy user setting -->
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
Call for getting the information about an HDFS file
WebHDFS, add to core-site.xml
36. HTTPS SSL DBSSL WITH HIVESERVER2
READ MORE …AUDITING
OTHER SECURITY
FEATURES
37. 2 Hue instances
HA proxy
Multi DB
Performances: like a website,
mostly RPC calls
HIGH AVAILABILITY
HOW
39. SUM-UP
Enable Hadoop Service
APIs for Hue as a proxy
user
Configure hue.ini to
point to each Service API
Get help on @gethue or
hue-user
Install Hue on one
machine + Hue Kerberos
ticket
Use an LDAP backend
INSTALL CONFIGUREENABLE
HELPLDAP
43. GET HUE
Try in advance the latest
and greatest but you’ll
have to configure
everything on your own.
Get to play with Hue and
various Hadoop
components in 5
minutes. It’s a self
contained CDH
environment ready to
use.
Newer version than HDP,
close to the original 2.5
minus apps like HBase,
Impala, Sqoop, Search.
The newest addition,
ships Hue 3.0 through
the GreenButton
products.
Stable and highly tested
releases perfectly
integrated with the
Hadoop ecosystem,
automagically configured
by Cloudera Manager.
In HDP there’s an old
forked version of Hue
2.3.
CLOUDERA’S CDH TARBALL CLOUDERA’S DEMO VM
HORTONWORKS* MAPR* HP CLOUD*
* YOUR MILEAGE MAY VARY.
BIGTOP EMBEDDED/DEMO IN IND. COMPANIES
44. WHAT ARE YOUR USE
CASES?
WHICH COMPONENTS DO
YOU USE?
WHAT WOULD YOU LIKE TO
SEE IN HUE?
INTERESTED IN
CONTRIBUTING?
WANNA SAY HELLO?
DO YOU WANT A TAILOR
MADE TEAM RETREAT?
QUESTIONS?
TEAM@
GETHUE.COM
47. HOW
Add Hue as WebHDFS proxy
user setting like 3 slides ago
Add the property on the
right in hdfs-site.xml to
enable WebHDFS in the
NameNode and DataNodes
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
HDFS FILE BROWSER
[hadoop]
[[hdfs_clusters]]
# HA support by using HttpFs
!
[[[default]]]
# Enter the filesystem uri
##fs_defaultfs=hdfs://localhost:8020
!
# Use WebHdfs/HttpFs as the communication mechanism.
##webhdfs_url=http://localhost:50070/webhdfs/v1
hdfs-site.xml
hue.ini
48. HOW
Example of config for having
Hue interact with Yarn
[hadoop]
[[yarn_clusters]]
!
[[[default]]]
# Enter the host on which you are running the ResourceManager
resourcemanager_host=localhost
!
# The port where the ResourceManager IPC listens on
## resourcemanager_port=8032
!
# Whether to submit jobs to this cluster
submit_to=True
!
# Change this if your YARN cluster is Kerberos-secured
## security_enabled=false
!
# URL of the ResourceManager API
## resourcemanager_api_url=http://localhost:8088
!
# URL of the ProxyServer API
## proxy_api_url=http://localhost:8088
!
# URL of the HistoryServer API
# history_server_api_url=http://localhost:19888
!
[[[ha]]]
# Enter the host on which you are running the failover Resource Manager
resourcemanager_api_url=http://localhost:8088
## logical_name=
submit_to=True
YARN / MR2
49. HOW
Based on HiveServer2
interface
!
Note for Hive:
<property>
<name>hive.server2.enable.doAs</
name>
<value>true</value>
</property>
!
Video demo
Setup tutorial
[beeswax]
# Host where Hive server Thrift daemon is running.
# If Kerberos security is enabled, use fully-qualified domain
name (FQDN).
## hive_server_host=localhost
## hive_server_port=10000
!
# Hive configuration directory, where hive-site.xml is located
## hive_conf_dir=/etc/hive/conf
HIVE (IMPALA / SHARK)
50. HOW
Make sure share lib is
installed
!
Alternative Dashboard and
Editors
[liboozie]
#oozie_url=http://localhost.com:11000/oozie
OOZIE
HOW
Comes with Oozie, no PigServer yet
Oozie sharelib
Oozie credentials for security
PIG