Implementing a Big Data program can be a long and arduous journey. Each organization has its own unique business drivers and technical considerations that drive their big data adoption roadmaps. Whatever be your organization's specific big data driver - be it managing a rapid surge of data, implementing a new set of analytic capabilities, incorporating unstructured data as part of your enterprise data platform or accessing real time information for actionable intelligence - the approach and roadmap that you put in place to reach that end goal becomes all the more critical in a space where early success stories are relatively rare, skill sets are hard to find and technologies are still evolving.
In this session we will chronicle the journeys of four different organizations that were early adopters of big data. Each of them charted a different path to achieve their big data goals. We will look at what were the key drivers behind their respective approaches, what worked and what did not work for them.
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Big Data Journeys: Review of roadmaps taken by early adopters to achieve their big data goals
1. Big Data Journeys
A review of roadmaps taken by early
adopters to achieve their big data goals
TDWI Big Data Solution Summit
San Deigo, CA // June 4-6, 2012
Krishnan Parasuraman
CTO, Digital Media
Netezza& Big Data Solutions
2. Big Data Journeys
A review of roadmaps taken by early
adopters to achieve their big data goals
Talking Points
• Journeys of 4 organizations
• Different Drivers and
Considerations
• Different paths to big data
realization
• Key learning
3. Big Data Considerations
1 Leading Financial Services Solution Provider Volume
2 Large Online Content Publisher Value
3 Global Telecommunications Major Velocity
4 Emerging Digital Media Marketer Variety
4. 1 Leading Financial Services solution provider
Provider of Financial Services, Products and
Services to both businesses and consumers
• Provide personalized customer
experience online
• Anticipate user behavior and
Strategic shift to deliver goods and guide them to specific
services via Digital Channels functionality
• Maintain consistent experience
across online, mobile and social
channels
5. 1 Leading Financial Services solution provider
Provider of Financial Services, Products and
Services to both businesses and consumers
Volume
• Large volumes of Data
• Data integration
Big Data Solution Considerations
• Deep Analytics
• Large number of attributes
6. 1 Leading Financial Services solution provider
2007 : Before the Digital Shift
EDW Analytics
4 3
Biz. Users
Internal Data
Sources
2 Data
Analysts
1 • Top 10 display advertiser in the US
Digital Data • 25Billion Impressions per quarter
(Clickstream) • 1Billion clicks per day during peak usage
• Regression analysis for conversion tracking
7. 1 Leading Financial Services solution provider
2008 : Roadmap 1.0
EDW Analytics
Biz. Users
Internal Data
Sources
Data
Analysts
Digital Data Step 1: Move to Massively Parallel Data Warehousing Appliance –
(Clickstream) Address volume, scale and performance considerations
8. 1 Leading Financial Services solution provider
2010 : Roadmap 2.0
EDW + Analytics
Biz. Users
Internal Data
Sources In DB
Anal-
ytics
Data
Analysts
Digital Data Step 2: Leverage In Database Analytics –
(Clickstream) Run analytics at scale, closer to data
9. 1 Leading Financial Services solution provider
2010 : Roadmap 3.0
EDW + Analytics
Biz. Users
Internal Data
Sources In DB
Anal-
ytics
Data
Analysts
Digital Data
(Clickstream)
Step 3: Offload data pre-processing, cleansing and normalization to
Hadoop – Elastic scalability + Analytics sandbox
10. 2 Large Online Content Publisher
One of the internet’s top destinations for
specialized content
Value
• Provide regular data feeds – no
Support business partners and performance SLAs
affiliate marketer’s data needs
• Manage cost of infrastructure
11. 2 Large Online Content Publisher
2010 : Roadmap 1.0
EDW + Analytics
1 Biz. Users
Data Sources
ETL
• 15 Million unique visitors Data
• 210 million page views
2
Analysts
• 2TB of new data per day
• 1 million+ new content items per day
Partners &
Affiliates
12. 2 Large Online Content Publisher
2011 : Roadmap 2.0
EDW + Analytics
1 Biz. Users
Data Sources ELT
Data
Analysts
2 3 4
Partners &
Affiliates
13. 3 Global Telecommunications Major
Leading cell phone carrier networks in the world
Velocity
• Predict outages and congestions
React to network disruptions before they appear
immediately
• Address disruptions in Real Time
14. 3 Global Telecommunications Major
Till 2007: World of Voice and limited Data
EDW Analytics
Call Centers
Data Sources
Network
engineers
• Call Detail Records
• Network Transmissions logs
• Thousands of events per second
1
Response Latency = Hours or Days
15. 3 Global Telecommunications Major
2008 – 2011 – Voice, Data, Smartphones and 3G
EDW + Analytics
Call Centers
Data Sources
In-database
modeling and
scoring Network
• Call Detail Records engineers
• Network Transmissions logs
• Millions of events per second
Response Latency = Minutes
Step 1: Adoption of Massively Parallel Data Warehousing
Appliance reduced overall latency from hours to minutes
16. 3 Global Telecommunications Major
2011+Video Services, 4G LTE
Stream processing EDW + Analytics
Call Centers
Data Sources
In-database
modeling and
scoring Network
• Call Detail Records engineers
• Network Transmissions logs
• Hundreds of Millions of events
per second
Response Latency = Seconds
Step 2: Stream processing provided Real Time analytics capability,
took processing workload off DW and was designed to scale
17. 4 Digital Media Marketer
Specializes in multi-channel marketing across
online, offline, mobile and social channels
Variety
• Manage large volumes of
unstructured data
Monitor social media channels and
engage with customers • Correlate structured and
unstructured data to increase
targeting and relevance
18. • 100TB+ data under management
4 Digital Media Marketer • “Listen” to 100Million+ Tweets per day
• Manage large volumes of unstructured data
2010: Roadmap 1.0
EDW + Analytics
3 1
Biz. Users
Data Sources
2
Data
Analysts
• 100 Node Hadoop Cluster
• Manage Structured and Unstructured Data
• Components included Hive, HBase, Mahout
19. • 100TB+ data under management
4 Digital Media Marketer • “Listen” to 100Million+ Tweets per day
• Manage large volumes of unstructured data
2012: Roadmap 2.0
EDW + Analytics
Biz. Users
Data Sources
Data
Analysts
Hadoop and Massively Parallel Data Warehouse co-existence EDW –
Manage unstructured and structured data at scale
20. Key Takeaways
1 Adopt your roadmap based on the big data consideration
Massively Parallel Data-warehouse Appliances and Hadoop
2 are complementary technologies
3 Consider Evolutionary approach over a Big Bang approach
4 Your EDW will be non-monolithic – understand intra-product
integration implications
21. Big Data Journeys
A review of roadmaps taken by early
adopters to achieve their big data goals
Bon Voyage!
Krishnan Parasuraman
@kparasuraman