What will you learn:
- Key Insights on Existing Big Data Architecture
- Unique Security Risks and Vulnerabilities of Big Data Technologies
- Top 5 Solutions to mitigate these security challenges
7. Big Data Architecture
No Single Silver Bullet
• Hadoop is already unsuitable for many Big
data problems
• Real-time analytics
• Cloudscale, Storm
• Graph computation
o Giraph and Pregel (Some examples graph
computation are Shortest Paths, Degree of
Separation etc.)
• Low latency queries
o Dremel
10. Input Validation and Filtering
• Input Validation
o What kind of data is untrusted?
o What are the untrusted data sources?
• Data Filtering
o Filter Rogue or malicious data
• Challenges
o GBs or TBs continuous data
o Signature based data filtering has limitations
How to filter Behavior aspect of data?
11. Granular Access Controls
• Designed for Performance, almost no
security in mind
• Security in Big Data still ongoing research
• Table, Row or Cell level access control gone
missing
• Adhoc Queries poses additional challenges
• Access Control is disabled by default
12. Insecure Data Storage
• Data at various nodes, Authentication,
Authorization & Encryption is challenging
• Autotiering moves cold data to lesser secure
medium
o What if cold data is sensitive?
• Encryption of Real time data can have
performance impacts
• Secure communication among nodes,
middleware and end users are disabled by
default
13. Privacy Concerns in Data Mining
and Analytics
• Monetization of Big Data generally involves
Data Mining and Analytics
• Sharing of Results involve multiple
challenges
o Invasion of Privacy
o Invasive Marketing
o Unintentional Disclosure of Information
• Examples
o AOL release of Anonymzed search logs, Users can
easily be identified
o Netflix faced a similar problem
14. Top 5 Best Practices
• Secure your Computation Code
• Implement access control, code signing, dynamic
analysis of computational code
• Strategy to prevent data in case of untrusted code
• Implement Comprehensive Input Validation
and Filtering
• Implement validation and filtering of input data, from
internal or external sources
• Evaluate input validation filtering of your Big Data
solution
15. Top 5 Best Practices
• Implement Granular Access Control
• Review Role and Privilege Matrix
• Review permission to execute Adhoc queries
• Enable Access Control
• Secure your Data Storage and Computation
• Sensitive Data should be segregated
• Enable Data encryption for sensitive data
• Audit Administrative Access on Data Nodes
• API Security
16. Top 5 Best Practices
• Review and Implement Privacy Preserving
Data Mining and Analytics
• Analytics data should not disclose sensitive
information
• Get the Big Data Audited
18. Big Data Architecture
Key Insights
• Distributed Architecture & Auto Tiering
• Real Time, Streaming and Continuous
Computation
• Adhoc Queries
• Parallel and Powerful Computation
Language
• Move the Code, Not the data
• Non Relational Data
• Variety of Input Sources
19. Top 5 Security Risks
• Insecure Computation
• End Point Input Validation and
Filtering
• Granular Access Control
• Insecure Data Storage and
Communication
• Privacy Preserving Data Mining and
Analytics
Notas del editor
Partitioned, Distributed and Replicated among multiple Data Nodes
1000,s of Data nodes
Autotiering: Moving hottest data to high performance drive, coldest data to low performance, less secure drive