1. Digital Enterprise Data Explosion
Deriving Insights from Convergence of
Structured, Unstructured & Semi-structured Data
Sanjeev Kumar
VP & MD, Informatica India
Dec 2013
2. 2014
The Challenge
2011
Devices
& Machines
2007
Communities
& Society
1990s
Business
Ecosystems
1980s
BUSINESS
1960s-1970s
USERS
VALUE
TECHNOLOGIES
Few
Employees
Back Office
Automation
Customers/
Consumers
Many
Employees
Front Office
Productivity
E-Commerce
Line-of-Business
Self-Service
Real-Time
Optimization
Social
Engagement
OS/360
SOURCES
TECHNOLOGY
MAINFRAME
10 2
CLIENT-SERVER
10 4
WEB
10 6
CLOUD
10 7
SOCIAL
10 9
INTERNET
OF THINGS
10 11
2
3. 2014
The Challenge
2011
Data integration becoming the
barrier to business success
Devices
& Machines
2007
Communities
& Society
1990s
Business
Ecosystems
1980s
BUSINESS
1960s-1970s
USERS
VALUE
TECHNOLOGIES
SOURCES
Few
Employees
Customers/
Consumers
Many
Employees
Processes
Back Office
Automation
Front Office
Productivity
E-Commerce
+
People
Line-of-Business
Self-Service
Real-Time
Optimization
Social
Engagement
+Products and
things
OS/360
Stand alone projects
Corporate IT driven
Data
Data
Infrastructure ecosystem
LOB driven
TECHNOLOGY
MAINFRAME
10 2
CLIENT-SERVER
10 4
WEB
10 6
CLOUD
10 7
SOCIAL
10 9
INTERNET
OF THINGS
10 11
3
4. Telecom: Data Stream Flow & Integration
Hadoop based “Data Lake”
Stream Processing (Real Time)
Transactional
Application
Messaging
Collector Agent
PowerExchange
for MOM
Machine
Generated
Data
Streaming
Collection
Telecom Switches
Ultra
Messaging
Machine
Generated
Data
Messaging
Collector Agent
PowerCenter +
Data Transformation
Message
Queue
Event
Feeds
DWH / DM
Operational
Intelligence
Event
Processing
Data Lake
Time Sliced Data
Telecom Switches
(4G in future)
Analytic
Application
Transactional
Application
Analytics Processing (Batch)
5. Converged Data Integration (DI) Architecture
Real Time + Batch, Core ETL + Big Data
Collection
Layer
EDW
Business
Intelligence
Real-time
Streaming
Grid
NoSQL
Social Media
Batch
Network Logs
Content
Network Elements
BI
Layer
Staging
Layer
CEP
Transactions
External Data
Data Integration
Layer
Hadoop
Hadoop
Replication
MDM
Data
Quality
Management and Operations
Archival
Data Distribution
Data Sources
Data
Exploration
6. Lambda Architecture = Batch + RealTime
Query = function (all data)
New data
stream
Realtime
View
All Data
Batch Layer
Serving
Layer
Speed Layer
Stream
Processing
Precompute
Views
Batch
View
Batch
View
Query
6
Notas del editor
In herein lies the challenge we all are facing. If we are going to harness this new world we will have to become much more data centric in our design of architectures. If you look at this chart showing the progression of technology innovations over the year it tells an interesting story. Wave after wave of new technologies promised more business potential. We saw some big changes, most recently with the introduction of the internet and ecommerce. And now we are yet again at the next wave of the Internet of Things.But, all of this comes at a price for IT. IT just cannot absorb new technologies at the pace of their introduction. There is a complete new learning curve and adoption curve for IT to understand how to program these and harness these. And that complexity drives up costs, increases risk across the board – whether your are building new applications or try to analyze new sets of data.So the first couple of generations of technologies help businesses focus on their processes and optimizing for greater efficiencies. The data integration between apps or data warehouse were done mostly by corporate IT through stand alone projects – separate for different use cases.Then came the social and we started focusing down on people and optimizing the experiences of customers etc, in addition to the processes. In fact, many business processes are enhanced by combining the social type data with our traditional systems. Think of how recruiters now use LinkedIn more and more versus traditional candidate databases. And we are seeing the cracks in our data integration approaches of the past as we now need to think much more in terms of data infrastructure than data integration. We need a platform that can handle all these things. And we also need to collaboratively develop with our lines of business. Its no longer shadow IT but cooperative IT.And then beyond that we are starting to build products that can self-optimize in real-time. Again, it is not replacing the old, but this will even further stress our data infrastrucuture as we need to think of a data ecosystem.A change is needed. IT need to get away from their traditional cycle of new technology adoption through re-skilling and re-programming. Instead, IT need to invest in an information platform that will be able to shield them from the new technology to such an extent that they can harness its power at the speed of business. That is the key to unleashing the true potential of the information in our company.