5. Risk Management
• Problem: Scoring of Customers and
Projects
• Solution: Finance History, Communication
and Pattern Detection
• User: Finance, Insurance
6. Recommendations
• Problem: Recommend convenient products
to purchased products, matching the
interests
• Solution: Statistical analysis of interests,
purchase history, detect matching swarm
patterns
• Users: eCommerce, Advertising
7. Graph-Analytics
• Problem: Detect trends and curves in large
distributed networks (Wired, Social, Mesh)
• Solution: Collecting and Data Mining all
data, applying to self learning patterns to
detect trends and forecasts
• User: Enterprises, Gov, NGO, Provider,
Telco, Stock Exchange
9. Text Analysis
• Problem: Detect the meaning of the written
word (Sentiment Analysis)
• Solution: Keyword patterns, Coherences
detection, Path detection
• Users: eCommerce, Social Media Service
Provider, Attitude Research
10. Amounts of real Data
• Ebay: 12 PB, Search Optimization
• Facebook: 50 PB, Logs, Reports
• Walmart, 4.5 PB, Customer Transactions
http://wiki.apache.org/hadoop/PoweredBy
http://en.wikipedia.org/wiki/Big_data
11. Apache Hadoop
• Software Framework for large amounts of
unstructured data
• Apache-License
• Two main cores
• HDFS: Distributed data storage
• MapReduce: Distributed data handling
12. Hadoop Cluster
Data Node Data Node Data Node Data Node
Data Node Data Node Data Node Data Node
Data Node Data Node Data Node Data Node
Data Node Data Node Data Node Data Node
Data Node Data Node Data Node Data Node
Data Node Data Node Data Node Data Node
Data Node Data Node Data Node Data Node
Data Node: 4-16 Cores, 4-16 Disks,
8-64 GB RAM, 1-10GB Network
13. Hadoop Distributed
File System
File
Block Block Block Block Block Block Block
Data Node Data Node Data Node
18. Scope
• Successful Audits per ISO 27001
• Analyze different Data Sources from
different Data Bases and CRM Systems
• Realtime and Lifetime Statistics per Product
• Periodical Analytic and Statistic Jobs
• Weekly Re-Import into CRM
• Single Queries per User (Analyst) over a
Secured GUI
19. Solution Path
• Cluster Authentication and Authorization via
Kerberos and crypted data communication / Data
Protection
• Sqoop Connector to CRM / DB
• Terradata, Oracle, Postgres, MySQL, MS SQL
• Hive - HBase Integration
• Hive Analytics, controlled automatically over Oozie
Workload Orchestrator
• Hue Shell, Authentication via Kerberos SPNEGO
20. CRM Park Integration CDH Authentification
Sqoop
Kerberos
(AD, MITv5)
Real Time HBase Hive Oozie
Automation
Enduser HUE