BICS uses a Hadoop data lake powered by the Informatica Big Data Platform to enable predictive analytics and customer centricity. The data lake provides scalable storage and processing for billions of telecommunications transactions. BICS aims to migrate more analytics and reporting from its Teradata data warehouse to Hadoop to gain cost efficiencies and handle increasing data volumes and complex analytics. The roadmap includes moving near real-time subscriber tracking to Hadoop while maintaining low latency, as well as computing new analytics and providing longer term historical reporting from Hadoop.
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
BICS empowers predictive analytics and customer centricity with a Hadoop based Data Lake
1. BICS EMPOWERS PREDICTIVE
ANALYTICS AND CUSTOMER
CENTRICITY WITH A HADOOP
BASED DATA LAKE
Danielle Kana, BICS
Bert Oosterhof, Informatica
15 April, 2015
2. slide 2 | BICS confidential | 27 April 2015
Agenda
Introduction
Business Intelligence @ BICS
Informatica Big Data Platform
Who is BICS?
BIG Data @ BICS
Q and A
3. The #1 Independent Leader in Data Integration
Informatica
3
267
325
391
456
501
650
784
812
948
1,048
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Founded: 1993
Headquarters: Redwood City, CA
2014 Revenue: $1.048 billion
2014 GAAP Diluted EPS: $1.03
2014 Non-GAAP* Diluted EPS: $1.59
Partners: Over 500
• Major SI, ISV, OEM and on-demand leaders
Customers: Over 5,500
• Customers in 82 countries
• Direct presence in 28 countries
• Ranked #1 in TNS Customer Loyalty rankings for 9
consecutive years
Employees: Over 3,600
Technology Leadership:
• Gartner positions INFA in leaders quadrant for Data
Integration, Data Quality, MDM – Customer Data,
Integration Platform as a Service (iPaaS), Structured Data
Archiving and Application Retirement, and Data Masking
• Forrester Research names INFA a leader in Hybrid
Integration, Master Data Management Solutions, Data
Governance Tools, Product Information Management,
and Big Data Streaming Analytics Solutions.
2005-2014
Total Revenue CAGR = 16%
* A reconciliation of GAAP and non-GAAP results is provided in the Appendix section, as well as on Informatica’s Investor Relations website.
Annual Total Revenue ($ millions)
5. Data Governance Tools
Master Data ManagementData Virtualization
Big Data Streaming Analytics Platforms
Enterprise ETL Cloud Data Integration
Product Information Management
Proven Technology Leadership
7. Big Data Related Business Initiatives
• Fraud Detection
• Risk & Portfolio
Analysis
• Investment
Recommendations
Financial Services
• Proactive Customer
Engagement
• Location Based
Services
Retail & Telco
• Connected Vehicle
• Predictive
Maintenance
Manufacturing
• Predicting Patient
Outcomes
• Total Cost of Care
• Drug Discovery
Healthcare & Pharma
• Health Insurance
Exchanges
• Public Safety
• Tax Optimization
• Fraud Detection
Public Sector
Media & Entertainment
• Online & In-Game
Behavior
• Customer X/Up-Sell
8. 80% of the work in big data projects is
data integration and data quality
“80% of the work in any
data project is in cleaning
the data”
“70% of my value is an
ability to pull the data,
20% of my value is using
data-science…”
“I spend more than half
my time integrating,
cleansing, and
transforming data without
doing any actual
analysis.”
9. Big Data Competencies and Disciplines
Data scientists, data analysts,
business SMEs
Exploration &
Discovery
CDO, data stewards, data engineers,
data management, architects
Data Management
& Governance
Analysts, SMEs, apps and data
engineers, dev ops
Operationalize &
Monetize
Explore data, build PA models,
test/validate models
Data wrangling, preparing the
data for analysis is both batch &
schema-on-read, visualization,
advanced analytics, managed
data lake
Data warehouse optimization,
managed data lake, data quality,
certified data sets, managing master
data, enriching master data,
managing metadata, enterprise
information catalog, security, data
masking
Manage data as an asset, catalog
data assets, certify data, master
data, build repeatable data pipelines
to feed analytic apps
Operationalize Insights, build data
products that monetize data
assets (e.g. “be Google”), Agile
SDLC
Automated/scheduled big data
integration pipelines, streaming
analytics, pub/sub delivery (DIH),
deliver data to the point-of-use,
data warehouse, managed data
lake
Process
Technology
People
10. First Pilot(s)
Data
Warehouse
Optimization
Data
Discovery
Real-Time
Operational
Intelligence
The Big Data Journey and Use Cases
A phased approach to implementing Big Data initiatives
“Get all the
information you
can, we’ll think of
a use for it later”
“Deriving value
from all this data
is hard and
costly. Is there a
better way?”
“I need my data
to tell the future
to help me
succeed and
prevent me from
making
mistakes”.
Predictive
Maintenance
Lower Total
Cost of Care
Customer
X/Up-Sell
Public Safety
Fraud
Detection
Machine
Device, Cloud
Documents
and Emails
Relational,
Mainframe
Social Media,
Web Logs
DrivenbyITDrivenbyBusiness
Lower Infrastructure Cost Added Business Value
“What’s Hadoop
and how does it
work?”
Intelligent Data Lake
12. We deliver best-in-class wholesale
telecommunication solutions to any
communication service provider
worldwide
1997
Creation of
Carrier &
Wholesale
business
unit @
Belgacom
2000 – 2003
A global player
is born: offices
are opened in
Asia Pacific,
America and
the Middle East
2001 – 2004
Belgacom ICS
goes mobile.
Soon 100
mobile
operators are
connected
2005
Spin-off &
JV with
Swisscom
2006
Strategic
partnerships
with Omantel
and MTN &
launch of VoIP
2007
GRX leader
with over 100
customers
connected
2009
JV with MTN
ICS & launch
of HomeSend,
the first global
mobile money
hub
2011
BICS connects
its 40th service
provider to its
IPX
2010
Full
deployment of
roaming suite
of services &
Belgacom ICS
becomes
2013
BICS enables
world's 1st
international LTE
roaming relations
& concludes a
partnership with
MasterCard for
HomeSend
13. slide 16 | BICS confidential | 27 April 2015
more than 700 customers
incl. over 400 mobile data customers
top 3 voice carrier with over 28 bio minutes
world leader in mobile data services
1.65 bio euro revenues
HQ in Brussels with offices
in Bern, Dubai, Singapore
and New York
400+ employees
22.4%
20%57.6%
Introducing BICS
14. slide 17 | BICS confidential | 27 April 2015
Sender Receiver
Belgacom
Swisscom
MTN
Fixed operators
Mobile operators
xSP’s
Fixed operators
Mobile operators
xSP’s
BICS business
16. slide 19 | BICS confidential | 27 April 2015
Top 3 voice carrier
Over 28 bio minutes
Backed by a Tier 1 network
• 100 points of presence (PoPs) in 55
cities and 33 countries
• 9 metropolitan area networks
• teleport in La Ciotat
• participations in 40 submarine cables
3
3
1
17. world leader
mobile data
services
Innovative market leader in 3GRX services
with 220+ customers connected
N°1 in Signalling services with access to
more than 850 mobile networks
Over 2.3 bio international SMS transported
(2014)
Connectivity Messaging
Roaming
21. slide 24 | BICS confidential | 27 April 2015
• Monitoring on the customer’s network performance
to guarantee service levels
• Analysis of consumer trends to support expansion
plans
• Identifying consumer preferences to launch targeted
marketing campaigns
• Monitoring and tracking of subscribers for problem
identification and resolution in real time
Reports
22. slide 25 | BICS confidential | 27 April 2015
Near-Realtime (5-10mins)
Distribution
by Country
Network Performance Monitoring
23. slide 26 | BICS confidential | 27 April 2015
Transactions
Status
Network Performance Monitoring (2)
25. slide 28 | BICS confidential | 27 April 2015
Troubleshooting subscribers for problem
resolution in real time
26. slide 29 | BICS confidential | 27 April 2015
• Storage Cost
− Various type of Products/Technologies (SS7, 2G,
3G, 4G)
− Billions of transactions by day and by technology
− Estimate growth of more than 100% by year
• Performance is key!
− Near real time reporting (latency < 10 minutes)
− Processing of Huge volume of data
− Increasing demand in complex analytics
IT
Challenges
29. slide 32 | BICS confidential | 27 April 2015
Hadoop: How ?
• Which Flavour ? (Cloudera, Hortonworks, MAP-R…)
• Hadoop set-up: Commodity ? Appliance ?
• How to integrate Hadoop in the current DWH
Architecture ?
30. slide 33 | BICS confidential | 27 April 2015
BICS Integration Strategy
• Teradata Hadoop Appliance (HortonWorks)
− Kick start with Hadoop
− Easy set-up/Implementation of a functional Platform
− Pre-configured /designed /tested
− Plug & Play
• Hybrid Architecture
− Keep the current architecture for the critical flows and mainly
use Hadoop for high volume data with a less critical constraint
on the latency.
31. slide 34 | BICS confidential | 27 April 2015
BICS Hybrid Big Data Architecture
32. slide 35 | BICS confidential | 27 April 2015
Informatica BDE Expectations
• Shorter the learning curve
Existing ETL skills can be reused to develop on Hadoop
• Easy Data Integration on Hadoop
Visual development environment & Extensive library of prebuilt
transformation
• Reuse of existing ETL code
Existing ETL code can be easily reused on Hadoop
33. slide 36 | BICS confidential | 27 April 2015
Informatica BDE Expectations (2)
• Provide Universal data access
Easy Ingestion and processing of all types of data types and
formats into Hadoop
• Provide High-speed data ingestion and extraction
Move data between source systems, Hadoop, and target
applications using high-performance connectivity
• Allow Data profiling on Hadoop
Profile data on Hadoop to understand the data, identify data
quality issues
34. slide 37 | BICS confidential | 27 April 2015
• Phase 1 : Migrate the data storage and processing
from the Teradata DB to the Hybrid Platform
Set up the loading of the data into hadoop
Move all the Tracking Applications (using the detailed data) into
Hadoop and keep the SLA (<1 min) for the Subscriber Tracking
Move the processing of all the high latencies (15 minutes,
Hourly, Daily) application to Hadoop
• Phase 2: Compute the new analytics on Hadoop and
provide longer historical reporting to the customers
The Roadmap
36. Data Sources
Applications
Data Ingestion
Visualization
Data Security
Archiving
Data Streaming
Change Data
Capture
Batch Load
Event-Based
Processing
Agile Analytics
Advanced
Analytics
Machine
Learning
Data Management Data Delivery
Machine Device,
Cloud
Documents and
Emails
Relational,
Mainframe
Social Media,
Web Logs
Mobile Apps
Visualization
& Analytics
Real-Time
Alerts
Batch Load
Data
Integration
Hub
Pub / Sub
Data
Virtualization
Data as a Service
Data
Integration &
Data Quality
Integrate & Prepare
Virtual Data
Machine
Loose Coupling &
Abstraction
Single, Complete,
Version of Truth
Master Data
Management
Data
Warehouse
Scalable Storage & Processing
37. Unleash the Power of Hadoop
Informatica Developers are Now Hadoop Developers
Archive
Profile Parse CleanseETL Match
Stream
Load Load
Services
Events
Replicate
Topics
Machine Device,
Cloud
Documents and
Emails
Relational, Mainframe
Social Media, Web
Logs
Data Warehouse
Mobile Apps
Analytics & Op
Dashboards
Alerts
Analytics Teams
38. Staff Projects with Readily Available Skills
Informatica Developers are Hadoop Developers
Hand-coding
A large global bank grew staff from 2 Java
developers to 100 Informatica developers after
implementing Informatica Big Data Edition
Careerbuilder.com found in a survey
there were 27,000 requests for Hadoop
skills and only 3,000 resumes with
Hadoop skills
– whereas there are over 100,000
trained Informatica developers globally.
39. Reduce Risk of Changing Technologies
Informatica provides an insurance policy as Hadoop changes
Minimize or eliminate the
need to rebuild or recode
data pipelines & quickly
adopt new innovations in
the Big Data community
Hadoop
Cloud DI Servers Data
Warehouse
Development
Deployment
40. Transactions,
OLTP, OLAP
Social Media,
Web Logs
Documents,
Email
Machine Device,
Scientific
Maximize Your Return On Big Data
Hadoop complements your existing infrastructure
Data
WarehouseMDM
Operational Systems Analytical SystemsData Assets Data Products
Data
Mart
ODS
OLTP
OLTP
Access
& Ingest
Parse &
Prepare
Discover
& Profile
Transform
& Cleanse
Extract &
Deliver
Manage (i.e. Security, Performance, Governance, Collaboration)
& other NoSQL
41. Data
Warehouse
MDM
Applications
Data Ingestion and Extraction
Moving terabytes of data per hour
Replicate
Streaming
Batch Load
Extract
Archive Extract Low
Cost
Store
Transactions,
OLTP, OLAP
Social Media,
Web Logs
Documents,
Email
Industry
Standards
Machine Device,
Scientific
42. Unleash the Power of Big Data
With high performance Universal Data Access
WebSphere MQ
JMS
MSMQ
SAP NetWeaver XI
JD Edwards
Lotus Notes
Oracle E-Business
PeopleSoft
Oracle
DB2 UDB
DB2/400
SQL Server
Sybase
ADABAS
Datacom
DB2
IDMS
IMS
Word, Excel
PDF
StarOffice
WordPerfect
Email (POP, IMPA)
HTTP
Informix
Teradata
Netezza
ODBC
JDBC
VSAM
C-ISAM
Binary Flat Files
Tape Formats…
Web Services
TIBCO
webMethods
SAP NetWeaver
SAP NetWeaver BI
SAS
Siebel
Messaging,
and Web Services
Relational and
Flat Files
Mainframe
and Midrange
Unstructured
Data and Files
Flat files
ASCII reports
HTML
RPG
ANSI
LDAP
EDI–X12
EDI-Fact
RosettaNet
HL7
HIPAA
ebXML
HL7 v3.0
ACORD (AL3, XML)
XML
LegalXML
IFX
cXML
AST
FIX
SWIFT
Cargo IMP
MVR
Salesforce CRM
Force.com
RightNow
NetSuite
ADP
Hewitt
SAP By Design
Oracle OnDemand
Packaged
Applications
Industry
Standards
XML Standards
SaaS/BPO
Social Media
Facebook
Twitter
LinkedIn
Kapow
Datasift
Pivotal
Vertica
Netezza
Teradata
Aster
MPP Appliances
43. Real-Time Data Collection and Streaming
46
UltraMessagingBus
Publish/Subscribe
Leverage High Performance Messaging
Infrastructure Publish with Ultra
Messaging for global distribution without
additional staging or landing.
HDFS, HBase,
Targets
Web Servers,
Operations
Monitors, rsyslog,
SLF4J, etc.
Handhelds, Smart
Meters, etc.
Discrete Data
Messages
Sources
Zookeeper
Management
and Monitoring
Internet of Things,
Sensor Data
Real Time
Analysis, Complex
Event Processing
No SQL
Databases:
Cassandara, Riak,
MongoDB
Node
Node
Node
Node
Node
Node
44. Informatica Vibe Data Stream for Machine Data
47
• High performance/efficient
streaming data collection over
LAN/WAN
• GUI interface provides ease of
configuration, deployment & use
• Continuous ingestion of real-time
generated data (sensors; logs;
etc.). Machine generated & other
data sources
• Enable real-time interactions &
response
• Real-time delivery directly to
multiple targets (batch/stream
processing)
• Highly available; efficient;
scalable
• Available ecosystem of light
weight agents (sources & targets)