AWS Webcast - Informatica - Big Data Solutions Showcase
1. Big Data Solution Showcase
Informatica High Performance Big Data Loading for AWS
Watch this webinar on demand on: https://connect.awswebcasts.com/p4pshu7r7fi/
2. Presenters
Ronen Schwartz
Vice President and General Manager
Informatica Cloud
Chris Keyser
Partner Solution Architect
Amazon Web Services
3. Agenda
• Introduction to AWS
• How Informatica Cloud powers Cloud Analytics with Redshift
• UBM’s Business Challenge: Understanding Their Customers
• How UBM Achieves Customer Insight
• Q & A
4. Why Are Customers Adopting AWS?
Agility, Speed to market
& Flexibility
4.
Don’t have to guess on
capacity
3.
Global in minutes
5.
Cost savings through
economics of scale
1. 2.
Trade capital expense for
variable expense
Security and Compliance
6.
6. Big Data
Potentially Massive Data Sets
Iterative, experimental style of data
manipulation and analysis
Frequently not a steady-state workload;
peaks and valleys
Time to results is key
Hard to configure/manage
AWS Cloud
Massive, virtually unlimited capacity
Iterative, experimental style of
infrastructure deployment/usage
Efficient with highly variable
workloads
Parallel compute clusters from single
data source
Managed services for data storage
and analysis
7. AWS Data Services
Data
Velocity
Variety
Volume
Structured, Unstructured, Text, Binary
Gigabytes, Terabytes, Petabytes
Millisecond, Second, Minute, Hour, Day
EBS EC2
Instance Storage
RDS Redshift
SQL Stores
EMR
Hadoop
DynamoDB
NoSQL
Kinesis
Storage Services Stream
S3
Cloud
Front
Glacier Elasticache
Caching
Data
Pipeline
Orchestrate
8. Storage Services – Object Store
Amazon S3
99.999999999% durability
Stores anything
Lifecycle and Versioning
Fine Grained Access Control
Reduced Redundancy Storage
9. WRITES
Continuously replicated to 3 AZ’s
Persisted to disk (SSD)
READS
Strongly or eventually consistent
Amazon DynamoDB - NoSQL Durable Low Latency At Scale
11. Nokia and AWS: 50% Cost Savings with 2x Faster Queries
Hadoop Tools Improving
Rapidly
On-demand, Flexible, Big
Data Technologies
Cheaper and
Faster
Redshift & Hadoop Price-performance
Advantage over RDBMS
>50% platform cost savings>2x faster
queries Minimal DBA support
Redshift, Hadoop, S3, EMR, Data Pipeline for ETL
Cost-effective for 10s of TB data sets
AMI-based Services
Internet Speed Report Authoring
Hypothesis testing vs. waterfall
14. Agenda
• Introduction to AWS
• How Informatica Cloud powers Cloud Analytics with Redshift
• UBM’s Business Challenge: Understanding Their Customers
• How UBM Achieves Customer Insight
• Q & A
15. 15
So, How Do You Try Amazon Redshift –Quickly & Easily?
Amazon Redshift
16. 16
Amazon Redshift
ERP, CRM Apps
Files
Legacy, RDBMS
Firewall
Logs, JSONs, Social
SaaS Apps
Use New Cloud & Traditional Data Sources
17. 17
How To Manage Integration In This New World?
Amazon Redshift
ERP, CRM Apps
Files
Legacy, RDBMS
Firewall
Experiment.
Prototype.
Repeat.
Logs, JSONs, Social
SaaS Apps
19. Amazon EMR (Hadoop) and Amazon DynamoDB (NoSQL)
ERP, CRM Apps
Files
Legacy, RDBMS
Amazon
RDS
Amazon
Redshift
Amazon
EMR
Logs, JSONs, Social
SaaS Apps
Dynamo
DB
20. Growth Path to Hybrid Data Warehouse
ERP, CRM Apps
Files
Legacy, RDBMS
Amazon
RDS
Amazon
Redshift
Amazon
EMR
Logs, JSONs, Social
SaaS Apps
Dynamo
DB
Traditional
Staging
DB
Traditional
Data
Warehouse
21. Informatica Cloud -Get it right. Go live. Grow flexibly.
Cloud
Data Integration
Cloud
Application
Integration
Cloud Test Data Management
Cloud
Data
Quality
Cloud Master
Data
Management
Secure
DevelopmentData
Leverage Existing Bulk Data
Cleanse and
De-dupe Data
Consolidate and
Visualize Data
Real Time Access to Actionable Data
“The Informatica Cloud Platform is the only complete solution for cloud integration and data management that allows SaaS application administrators, architects, and developers to easily power optimal processes connected with enterprise-ready data across cloud, on-premises, big data, social, and mobile environments.”
23. Technical Innovations for AWS Data Loading
•Broadest out-of-the-box integration for AWS: S3, DynamoDB, Kinesis, Redshift and RDS available
•Agile data loading for cloud data warehousing with Redshift
•Create target using cloud designer and multiple source objects
•High performance parallel data loading architecture
•E.g. load data in parallel across all 32 nodes in a Redshift cluster
•Push down optimization for increased throughput
•Push data transformations down to optimal source/target database engine 23
28. Agenda
• Introduction to AWS
• How Informatica Cloud powers Cloud Analytics with Redshift
• UBM’s Business Challenge: Understanding Their Customers
• How UBM Achieves Customer Insight
• Q & A
30. UBM Tech Events
Bringing Together the World’s Technology Communities
Our complete understanding of how technology is built, soldand used creates unique market value for you.
TechnologySegments
Security
Game & App Development
TechnologyMarkets
Enterprise
Infrastructure & Cloud
Mobile Broadband &
Wireless Infrastructure
Communications & Collaboration
Vertical Markets and Professionals
Government IT
Tech Marketers
IT Service & Support
Electronics
Electronic & ARM-based computer design
Signal Integrity & High-speed Design
Designers of Things
Embedded System Design
IT Executives
31. UBM Tech Media
Bringing Together the World’s Technology Communities
Strong Editorial Brands | 135+ Awards In 3 Years | 3 Launches In 4 Months
TechnologySegments
Security
Infrastructure
Game Development
Development
TechnologyMarkets
Enterprise IT
Telecommunications
Unified Communications
Vertical Markets and Professionals
Government & Healthcare IT
IT Service & Support
ThinkHDI.com
Financial Services
Electronics
Electronics Engineering
Global Supply & Design Chain
EE Training & Education
Analog Design
System Design
Electronic Parts Search
32. UBM Tech Credentials
Decades of Insight and Experience. Proven Results.
Data Warehouseand Analytics
•Customer Insights team will utilize new technologies and tools to build an even better understanding of the needs of our communities
•Allows us to foster deeper relationships with customers by providing them with the right products and services at the right time
•Ability to provide more holistic view of prospects and customers for our clients
Customer Insight is Critical to What We Do
•Consolidated data presented at the client level
•Journey mapped on a buying cycle funnel
•Ability to drill down and view information at a topic, product, campaign & customer levelDeep Analytics
38. Goal is Behavioral Targeting
Online and Live Event Content Engagement
Topical Metadata
Topical Metadata
Topical Metadata
Topical Metadata
Events Topical Metadata
Lead Nurturing Topical Metadata
Webinars Topical Metadata
•Metadata generated from content created and engaged with from live events and online products
•As users engage with these products, metadata gets attached to them with a weightage assigned based on level of engagement, recency, etc.
•Increased engagement leads to collection of more behavior which then can be matched with the live event metadata that needs to be promoted
•As the promotion is underway, the matched users get highly personalized emails inviting them to attend certain events, online programs, etc.
40. Agenda
• Introduction to AWS
• How Informatica Cloud powers Cloud Analytics with Redshift
• UBM’s Business Challenge: Understanding Their Customers
• How UBM Achieves Customer Insight
• Q & A
42. Sources
Reporting Component
Tableau
Advanced Analytics Component (R)
Generating a holistic view of customers by:
•Merging Customer data from Online, Registration and Onsite
•Ensuring data quality and consistency
Create an integrated View of customer
ff
Provide visibility to customer information
Phase 1 -Solution Summary
EV2
OpenCalais
NextGen
Eloqua
OpenCalais
•Content classes and taxonomy
NextGen
•Customer
•Demographics
•Event
•Site Registration
Data made available in Data Warehouse
Providing coherent information to aid decision making by:
•Storing Customer information across time and across event
•Ensuring availability of relevant information
Data Warehouse Redshift
Data Integration:
InformaticaCloud
Develop complete understanding of customers
•Report on customer sources
•Report on customer, content consumption & registrations
•Report on customer behaviour
Generate insightsto improve customer engagement
•Analyse online behaviour and create customer personas based on customer content affinity
•Identify potential attendees based on their personas
Eloqua
•Customer
•Permissions
•Behaviour
EV2
•Customer
•Demographics
•Event
•Registration
43. BI& Reporting
Reports
Dashboard
Interactive Visualisation
Predictive Modeling
Simulation
Data Sources
Data Integration Layer
Data Warehouse Layer
BI & Reporting and Advance Analytics
Next Gen
EV2
Event Registration
Eloqua
Marketing and Automation
Open Calais
Content Tagging
Governance and Security
Source Dependent Data Integration —ETL
Business rules consolidation
Data staging
Source System Data Sets
Common Business Rules
Basic Data Cleansing
Source Independent Data Integration — ETL
BI & Reporting and Advance Analytics
Advance Analytics
SSO & Site Registration
Product
Customer
Behavior
Registration
Content Taxonomy
Campaign
History
Demographic
Sources of data
required for phase 1
Extracting data from various sources into staging area, transforming data to a target state and loading into the Data Warehouse
Creating a central repository of data both current and historical, that allows decision makers/ users to have all potential information required for decision making
Transforming raw data into meaningful and actionable
insights ,discovery of patterns & hidden opportunities for decision making and closing the loop with source systems as required
Data Warehouse Architecture
The slide outlines the elements and services of the warehouse, with details showing how the components will fit together, providing an organizing framework to support implementation
Amazon Redshift
Informatica Cloud
Amazon Redshift
Tableau
Revolution R
Data Warehouse