Putting all your data in one place makes it a target for bad guys.
Plenty of examples in the news of major data breaches – no one wants to be responsible for that.
Best to consider Security up front in design time.
It’s the right thing to do.
5. Kerberos Overview
• Kerberos: A computer network authentication protocol that works on basis of tickets
to allow nodes to prove identity to each other in a secure manner using encryption
extensively
• Messages are exchanged between:
• Client
• Server
• Kerberos Key Distribution Center (KDC).
• Note this is not part of Hadoop, but most Linux Distros come with MIT
Kerberos KDC.
• Passwords are not sent across network, Instead passwords are used to compute
encryption keys
• Authentication status is cached (don’t need to send credentials with each request)
• Timestamps are essential to Kerberos (make sure system clocks are synchronized !)
7. Configuring Security in Hadoop
• Hadoop Security configuration is a specialized
topic
• Many specifics depend on:
• Version of Hadoop
• Type of Kerberos being used (AD or MIT)
• Operating System and Distribution
• Little room for misconfiguration
• Must follow instructions exactly
• Mistakes often result in vague “access denied” errors
• May need to work around Version specific bugs
• The can help
configure a secure system