This document discusses using Hadoop for real-time business intelligence (BI). It notes that scalability is now critical for BI due to the growth of data and social media. Hadoop provides a scalable and cost-effective solution compared to traditional SQL databases. Key components are distributed search with Katta and faceted search for interactive exploration and aggregation of both structured and unstructured data. The goal is responses in under 5 seconds to support real-time BI.
10. Scalability in BI
• Scalbility matters now
• Social Media: Catalyst
• All data is important
• Data doesn’t scale with business size any
more
11. Search as BI
• Katta = Distributed Search on Haddoop
• Bobo = Faceted Lucene
12.
13.
14.
15.
16.
17. Doing it Cheap
• 100 TB, Structured and Unstructured
• Oracle- $100,000,000
• “NewSQL” - $4,000,000
• Hadoop + Katta - $250,000
18. Why We Need Hadoop
• Need to process high-latency data to get
the “small stuff” fast
• Robust Ecosystem
• Need more than SQL. RDBMS not a Swiss-
Army Knife
19. Aggregation is Real-
Time
• Distributed Search w/ Katta + Facets =
Aggregation-Based BI
• Sum, Count, Filter, Avg, Group
20. Protips: Review
• Understand High vs. Low Latency data
• Hadoop makes it cheap
• Pre-aggregate w/ Hadoop, Explore w/ Katta
+ Faceted Search
21. The Future
• Search/BI as a Platform: “Google my Data
Warehouse”
• Real-Time MR on HBase