SlideShare una empresa de Scribd logo
1 de 30
Session ID:
Prepared by:
Designing Big Data Analytics on
the Cloud With Oracle
Applications, Redshift & Tableau
An architecture overview of
designing a cloud based big data
analytic solution - Leverage best
practices & avoid common pitfalls.
10333
@samx18
Sam Palani
Director, Infrastructure & Cloud Solutions
CTR, Inc.
Introduction – Sam Palani
• Engaged with CTR as the Director, Infrastructure &
Cloud Solutions.
• 14+ Years working with Oracle and related
products: Oracle RDBMS, Oracle Applications, Unix
& Linux
• 4+ Years building and supporting enterprise class
data driven applications that leverage the cloud.
• Certified on Oracle, Java, AWS, PMP , CSM & Six
Sigma Green Belt.
• More about me ? http://samx18.io
2
Introduction - CTR
• Global Systems Integrator – North Americas & Asia
Pacific locations.
• Oracle Platinum Partner.
• AWS & Tableau Partner.
• AWS, Tableau & Oracle certified resources
strategically located across the globe.
• Working with clients since 1998 to Deploy &
Support Data Driven Applications both on site and in
a cloud model.
More about CTR? http://ctrworld.com
3
Session Agenda
• Summary – What the session will cover
• Pre-requisites and Assumptions
• The Data Problem
• The Enterprise Data Warehouse - Redshift
• Data Loading & Transformations
• Data Visualizations – Tableau
• Live Data & Data Extracts
• Live Demo
• Additional Resources – Where I can explore more
• Questions / Feedback
4
Summary
This session will cover how to leverage Amazon Web
Services (AWS) Redshift to design a data warehouse
on the cloud for Oracle Applications and integrate it
with data visualization tools like Tableau.
The session will focus on best practices and tips &
tricks to perform fast data analysis as well as discover
hidden insights in EBS data.
We will also discuss the best data load strategies
specific to Oracle Applications.
The session will be aimed mainly at technical
audiences.
5
Prerequisites and Assumptions
The topics discussed in this presentation and related
white paper are of an intermediate to an advanced
level.
It is assumed that readers and participants have a
good understanding of Oracle products and concepts,
namely Oracle Database, Oracle Applications as well
as a basic understanding of the current enterprise
cloud models available with AWS.
In addition to that you will also need an active Oracle
Technology Network account and an active Amazon
Web Services account and an account with Tableau.
6
The Data Problem
7
How did we end up here?
As business expand their Oracle
Applications / ERP footprint it starts
to get increasingly challenging to
manage and consolidate all their
reporting data.
Segregation of transactional data from the reporting the
data is just one of the challenges.
The Data Problem
Data Size
•Data Growth
•Low storage
costs
•Analyze &
report on big
data
Data
Complexity
•Normalized
data
•Summarized
data
•Transformati
ons needed
Multiple Data
Sources
•ERP data
•Interface
systems
•External
sources
Performance
•Trade-off
between
functionally
&
performance
•Transactional
system vs.
Reporting
system
Scalability
•Non constant
reporting
needs
•Regular vs.
peak phases
•Respond to
changing
business
demands
8
Lets briefly look at the challenges below
Enter The Enterprise Data Warehouse
 Enterprise data warehouses have attempted to
address some or all of the concerns listed.
 Historically traditional data warehouses have not
been easy to design, build and maintain.
 Often required a significant up front investment in
terms and capital as well as resources.
 Not easy to scale up or ramp down.
 Significant investment in terms of supporting
infrastructure.
9
Cloud Data Warehouse - Redshift
 AWS Redshift is an on demand data warehouse
solution hosted on Amazon AWS cloud.
 An Amazon Redshift data warehouse is a collection
of computing resources called nodes, which are
organized into a group called a cluster.
 Each cluster runs an Amazon Redshift engine and
contains one or more databases.
10
AWS Components
In addition to Redshift, AWS offers a variety of IaaS and
PaaS products, however for the purpose of this discussion
and the presentation, we will be leveraging these specific
tools.
• S3 Blocks Simple Storage Service (S3) is an object
based storage that you can access on demand.
• IAM & Security Groups AWS Identity and Access
Management (IAM) enables you to securely control
access to AWS services and resources for your users.
• VPC - Amazon Virtual Private cloud (VPC) - A virtual
private cloud (VPC) is a virtual network dedicated to your
account. It is logically isolated from other virtual networks
in the AWS cloud.
11
Redshift Architecture
12
Redshift Architecture
Key components of the Redshift architecture
• AWS Region
– Region where your cluster will be created.
– Cannot be changed.
• Clusters
– One or more nodes.
• Leader Nodes
– Each cluster will have one leader.
– Responsible for all management & assignment tasks.
• Compute Nodes –
– One or more compute nodes.
– Single node cluster the same node acts as leader &
compute.
13
Redshift Architecture
Key components of the Redshift architecture
• Database
– Clusters have one or more databases.
– Default DB is created when the cluster is launched.
– Additional DBs are created manually after connecting to
the cluster.
• Access Control
– Supports database level access control.
– Also supports AWS IAM level access control.
– Application level access control via security groups.
• SSL
– Allows encrypted connection to the cluster.
– SSL does not cover user authentication or data
encryption.
– Redshift provides various other methods of encrypting
data at rest
14
Setup Prerequisites for Redshift
Couple of things you need to take care before you can
launch & connect.
• A SQL client
– Client that supports JDBC/ODBC connections
– SQL Workbench J
• Firewall Rules
– Default port of 5439
– Inbound & outbound rules to accept TCP for 5439
• Security group settings
15
Redshift Cluster Launch
16
Redshift Best Practices
• Sort Keys
– Based on the kind of data queried most often.
– Recently update data, use the timestamp.
– For joined tables, use joined column.
– For range filters, use the filer column.
• Distribution Style
– Minimize redistribution while query execution.
• COPY compression
– Use native copy command to enable compression
during data loads.
• Primary & Foreign Key constrains
– Informational only.
– But used by optimizer for efficient query plans.
17
Redshift Best Practices
• Smallest Possible Column Size
– Compression by default.
– Complex queries need temporary space.
• Date & Timestamp columns where appropriate
18
Redshift Data Load
• Redshift can load data in parallel
– Almost proportional to the number of nodes
• COPY command to load data from S3
– Use a manifest file to enforce data load constraints.
– Use multiple input files.
– Automatic compression.
• Using a data pipeline
• ETL load from direct data sources
19
Redshift Data Load – COPY
• COPY provides better performance compared to
traditional inserts.
• Can be used to load data directly from S3 or an
EC2 via SSH.
• Uses automatic compression by default to ensure
the best performance.
copy users from 's3://awssampledb/tickit/allusers_pipe.txt'
credentials ‘aws_access_key_id=XXXXXXXXXXXX;aws_secret_ac
cess_key=XXXXXXXXXXX'
delimiter '|';
20
Redshift Data Load – Data Pipeline
21
Redshift Data Load – Cloud ETL
22
Redshift
Data
Source
Secure
Agent
On Premise
Data Visualizations – Tableau
Tableau is a visual data exploration tool that can
efficiently work with multiple data sources to create
interactive data visualizations and dashboards.
• Visual Data Exploration
– Think visually.
– Automatically composes the queries and analytical
computations needed to create the picture.
• Multiple Data Sources
– Redshift, Oracle & many more.
– No support yet for NoSQL.
23
Data Visualizations – Tableau
• Database Independence
– No dependence on how data is stored.
• VIZQL
– Built upon the industry standard SQL & MDX.
• Publishing Options
– Interactive reports via server
– Static reports as PDFs etc.
– Data exports to excel etc.
24
Tableau Components
• Tableau Desktop
– The main the main tool where you will design and create
your workbooks and dashboard.
– Both windows and Mac platforms, however the Mac
version has a few feature limitations.
• Tableau Server
– Used to serve the reports and dashboard to the users.
– Reports & dashboards are published from the desktop to
the server.
– Can only be installed on a windows server.
• Tableau Online
– Hosted software as a service solution.
– All features of Tableau server are available on Tableau
online.
In addition to the above you also have tableau public where you can publish your
workbooks to a public domain. This is not designed for enterprise use.
25
Tableau Redshift Connection
26
Live VS Data Extracts
• Tableau can make both live connection to your data
source as well as use data extracts (TDE).
• TDE is a compressed snapshot of data stored on
disk and loaded into memory as required.
• TDE has a couple of advantages with Redshift
– TDE is a columnar store similar to the columnar
format that is used in Redshift itself.
– Key aspect of TDE design is how they are structured
which impacts how they are loaded into memory and
used by Tableau.
– Also enables you to add portability to your application.
27
Live Demo
Putting it all together in a live demo
28
Additional Resources
Informatica
Getting Started with Informatica Cloud -
http://videos.informaticacloud.com/HY6u/getting-started-with-informatica-cloud/
AWS Documentation
Redshift - https://aws.amazon.com/redshift/getting-started/
Computing - http://aws.amazon.com/ec2/
AMI - https://aws.amazon.com/marketplace
Tableau
Getting started with Tableau - http://www.tableau.com/learn/tutorials/on-
demand/getting-started
29
Thank You
Sam Palani
spalani@ctrworld.com
@samx18
http://samx18.io
30

Más contenido relacionado

La actualidad más candente

(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduceAmazon Web Services
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon RedshiftAmazon Web Services
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & DataductAmazon Web Services
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...Amazon Web Services
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAmazon Web Services
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...Amazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Amazon Web Services
 
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...Amazon Web Services
 
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...Amazon Web Services
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSightAmazon Web Services
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Amazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 

La actualidad más candente (20)

(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce
 
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
(BDT314) A Big Data & Analytics App on Amazon EMR & Amazon Redshift
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
(BDT404) Large-Scale ETL Data Flows w/AWS Data Pipeline & Dataduct
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
AWS re:Invent 2016: ElastiCache Deep Dive: Best Practices and Usage Patterns ...
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
 
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
AWS re:Invent 2016: Bring Microsoft Applications to AWS to Save Money and Sta...
 
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
Managing Data with Amazon ElastiCache for Redis - August 2016 Monthly Webinar...
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 

Similar a Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksDatabricks
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Connor McDonald
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
A3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloudA3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloudDr. Wilfred Lin (Ph.D.)
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Qubole
 
A Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationA Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationDenodo
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeDATAVERSITY
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...Amazon Web Services
 
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceAmazon Web Services
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 

Similar a Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau (20)

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
A3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloudA3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloud
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
 
A Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationA Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data Virtualization
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
ENT305 Migrating Your Databases to AWS: Deep Dive on Amazon Relational Databa...
 
Migrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration ServiceMigrating to Amazon RDS with Database Migration Service
Migrating to Amazon RDS with Database Migration Service
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Vue d'ensemble Dremio
Vue d'ensemble DremioVue d'ensemble Dremio
Vue d'ensemble Dremio
 

Último

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Último (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau

  • 1. Session ID: Prepared by: Designing Big Data Analytics on the Cloud With Oracle Applications, Redshift & Tableau An architecture overview of designing a cloud based big data analytic solution - Leverage best practices & avoid common pitfalls. 10333 @samx18 Sam Palani Director, Infrastructure & Cloud Solutions CTR, Inc.
  • 2. Introduction – Sam Palani • Engaged with CTR as the Director, Infrastructure & Cloud Solutions. • 14+ Years working with Oracle and related products: Oracle RDBMS, Oracle Applications, Unix & Linux • 4+ Years building and supporting enterprise class data driven applications that leverage the cloud. • Certified on Oracle, Java, AWS, PMP , CSM & Six Sigma Green Belt. • More about me ? http://samx18.io 2
  • 3. Introduction - CTR • Global Systems Integrator – North Americas & Asia Pacific locations. • Oracle Platinum Partner. • AWS & Tableau Partner. • AWS, Tableau & Oracle certified resources strategically located across the globe. • Working with clients since 1998 to Deploy & Support Data Driven Applications both on site and in a cloud model. More about CTR? http://ctrworld.com 3
  • 4. Session Agenda • Summary – What the session will cover • Pre-requisites and Assumptions • The Data Problem • The Enterprise Data Warehouse - Redshift • Data Loading & Transformations • Data Visualizations – Tableau • Live Data & Data Extracts • Live Demo • Additional Resources – Where I can explore more • Questions / Feedback 4
  • 5. Summary This session will cover how to leverage Amazon Web Services (AWS) Redshift to design a data warehouse on the cloud for Oracle Applications and integrate it with data visualization tools like Tableau. The session will focus on best practices and tips & tricks to perform fast data analysis as well as discover hidden insights in EBS data. We will also discuss the best data load strategies specific to Oracle Applications. The session will be aimed mainly at technical audiences. 5
  • 6. Prerequisites and Assumptions The topics discussed in this presentation and related white paper are of an intermediate to an advanced level. It is assumed that readers and participants have a good understanding of Oracle products and concepts, namely Oracle Database, Oracle Applications as well as a basic understanding of the current enterprise cloud models available with AWS. In addition to that you will also need an active Oracle Technology Network account and an active Amazon Web Services account and an account with Tableau. 6
  • 7. The Data Problem 7 How did we end up here? As business expand their Oracle Applications / ERP footprint it starts to get increasingly challenging to manage and consolidate all their reporting data. Segregation of transactional data from the reporting the data is just one of the challenges.
  • 8. The Data Problem Data Size •Data Growth •Low storage costs •Analyze & report on big data Data Complexity •Normalized data •Summarized data •Transformati ons needed Multiple Data Sources •ERP data •Interface systems •External sources Performance •Trade-off between functionally & performance •Transactional system vs. Reporting system Scalability •Non constant reporting needs •Regular vs. peak phases •Respond to changing business demands 8 Lets briefly look at the challenges below
  • 9. Enter The Enterprise Data Warehouse  Enterprise data warehouses have attempted to address some or all of the concerns listed.  Historically traditional data warehouses have not been easy to design, build and maintain.  Often required a significant up front investment in terms and capital as well as resources.  Not easy to scale up or ramp down.  Significant investment in terms of supporting infrastructure. 9
  • 10. Cloud Data Warehouse - Redshift  AWS Redshift is an on demand data warehouse solution hosted on Amazon AWS cloud.  An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster.  Each cluster runs an Amazon Redshift engine and contains one or more databases. 10
  • 11. AWS Components In addition to Redshift, AWS offers a variety of IaaS and PaaS products, however for the purpose of this discussion and the presentation, we will be leveraging these specific tools. • S3 Blocks Simple Storage Service (S3) is an object based storage that you can access on demand. • IAM & Security Groups AWS Identity and Access Management (IAM) enables you to securely control access to AWS services and resources for your users. • VPC - Amazon Virtual Private cloud (VPC) - A virtual private cloud (VPC) is a virtual network dedicated to your account. It is logically isolated from other virtual networks in the AWS cloud. 11
  • 13. Redshift Architecture Key components of the Redshift architecture • AWS Region – Region where your cluster will be created. – Cannot be changed. • Clusters – One or more nodes. • Leader Nodes – Each cluster will have one leader. – Responsible for all management & assignment tasks. • Compute Nodes – – One or more compute nodes. – Single node cluster the same node acts as leader & compute. 13
  • 14. Redshift Architecture Key components of the Redshift architecture • Database – Clusters have one or more databases. – Default DB is created when the cluster is launched. – Additional DBs are created manually after connecting to the cluster. • Access Control – Supports database level access control. – Also supports AWS IAM level access control. – Application level access control via security groups. • SSL – Allows encrypted connection to the cluster. – SSL does not cover user authentication or data encryption. – Redshift provides various other methods of encrypting data at rest 14
  • 15. Setup Prerequisites for Redshift Couple of things you need to take care before you can launch & connect. • A SQL client – Client that supports JDBC/ODBC connections – SQL Workbench J • Firewall Rules – Default port of 5439 – Inbound & outbound rules to accept TCP for 5439 • Security group settings 15
  • 17. Redshift Best Practices • Sort Keys – Based on the kind of data queried most often. – Recently update data, use the timestamp. – For joined tables, use joined column. – For range filters, use the filer column. • Distribution Style – Minimize redistribution while query execution. • COPY compression – Use native copy command to enable compression during data loads. • Primary & Foreign Key constrains – Informational only. – But used by optimizer for efficient query plans. 17
  • 18. Redshift Best Practices • Smallest Possible Column Size – Compression by default. – Complex queries need temporary space. • Date & Timestamp columns where appropriate 18
  • 19. Redshift Data Load • Redshift can load data in parallel – Almost proportional to the number of nodes • COPY command to load data from S3 – Use a manifest file to enforce data load constraints. – Use multiple input files. – Automatic compression. • Using a data pipeline • ETL load from direct data sources 19
  • 20. Redshift Data Load – COPY • COPY provides better performance compared to traditional inserts. • Can be used to load data directly from S3 or an EC2 via SSH. • Uses automatic compression by default to ensure the best performance. copy users from 's3://awssampledb/tickit/allusers_pipe.txt' credentials ‘aws_access_key_id=XXXXXXXXXXXX;aws_secret_ac cess_key=XXXXXXXXXXX' delimiter '|'; 20
  • 21. Redshift Data Load – Data Pipeline 21
  • 22. Redshift Data Load – Cloud ETL 22 Redshift Data Source Secure Agent On Premise
  • 23. Data Visualizations – Tableau Tableau is a visual data exploration tool that can efficiently work with multiple data sources to create interactive data visualizations and dashboards. • Visual Data Exploration – Think visually. – Automatically composes the queries and analytical computations needed to create the picture. • Multiple Data Sources – Redshift, Oracle & many more. – No support yet for NoSQL. 23
  • 24. Data Visualizations – Tableau • Database Independence – No dependence on how data is stored. • VIZQL – Built upon the industry standard SQL & MDX. • Publishing Options – Interactive reports via server – Static reports as PDFs etc. – Data exports to excel etc. 24
  • 25. Tableau Components • Tableau Desktop – The main the main tool where you will design and create your workbooks and dashboard. – Both windows and Mac platforms, however the Mac version has a few feature limitations. • Tableau Server – Used to serve the reports and dashboard to the users. – Reports & dashboards are published from the desktop to the server. – Can only be installed on a windows server. • Tableau Online – Hosted software as a service solution. – All features of Tableau server are available on Tableau online. In addition to the above you also have tableau public where you can publish your workbooks to a public domain. This is not designed for enterprise use. 25
  • 27. Live VS Data Extracts • Tableau can make both live connection to your data source as well as use data extracts (TDE). • TDE is a compressed snapshot of data stored on disk and loaded into memory as required. • TDE has a couple of advantages with Redshift – TDE is a columnar store similar to the columnar format that is used in Redshift itself. – Key aspect of TDE design is how they are structured which impacts how they are loaded into memory and used by Tableau. – Also enables you to add portability to your application. 27
  • 28. Live Demo Putting it all together in a live demo 28
  • 29. Additional Resources Informatica Getting Started with Informatica Cloud - http://videos.informaticacloud.com/HY6u/getting-started-with-informatica-cloud/ AWS Documentation Redshift - https://aws.amazon.com/redshift/getting-started/ Computing - http://aws.amazon.com/ec2/ AMI - https://aws.amazon.com/marketplace Tableau Getting started with Tableau - http://www.tableau.com/learn/tutorials/on- demand/getting-started 29

Notas del editor

  1. Certified on Oracle , Java & AWS technologies as well as industry standard certifications like PMP,CSM & 6 Sigma
  2. This isnt something new to us and we have been working with clients for almost the last two decades ..
  3. We plan to make this more flexible & interactive. Use 30-35 mins on this presentation followed by a short demo around 15 mins. The rest for QA
  4. Although the session is aimed at technical audiences, the subject is presented in a fairly simple format to make it easy to understand. We do have a technical whitepaper along with presentation that covers this topics in greater technical depth. I strongly encourage you to go through the same.
  5. So how did we end up here? Where did all this data come from and why suddenly ? The answer, it’s always been there, but now we have the tools & technology to look at them. I emphasize intenationally on reporting data. Transactional data is important, but its not the core objective of our discussion today.
  6. If we were to break or bucket this data problem up we could do as follows starting with Data Size. As data grows over time, the storage costs have gone down drastically. Thereby giving business an opportunity to analyze and report on this big data. Data is complex by nature and specially in a transactional system – complexity includes normalization and the lack of summarization. Additionally there might also the need of transformations Another related data problem, is performance specially when you are using the same system to run your transactional & reporting needs. Last but not the least is the scalability problem
  7. Traditional data warehouses have attempted to address these issues Often too months to design build & implement a traditional data warehouse Big CapEx projects, Scaling a traditional DW was never easy & a ramp down was un heard of.
  8. Enter the cloud DW, Redshift. You think of this as DWaaS . We will discuss what this architecture looks like & means to you a little later but the important thing to remember is that all this happens seamlessly behind the scenes and the best part is that you can spin up this within a few few minutes
  9. Before we go further I will You can read the details about this in the additional resources section of the PPT or in the whitepaper
  10. Now going back to Redshift, remember the node – cluster architecture. This is how it looks Single node architecture LN +CN Take snapshot backups and easily create multiple Redshift instances
  11. A AWS region is a geographically distinct location. At present there are 11 AWS regions. Based on your business location, you can choose the region closest to you launch your redshift cluster.
  12. Redshidt provides various other methods of encrypting data at rest
  13. Before you launch make sire you have the tools to connect. One of my favorites is SQL Workbench J, but basically any tool that can support ODBC/JDBC will work , remember SQL Workbench J is different from MySQLWorkbench
  14. The accompanying whitepaper goes over the process of creating a Redshift cluster in a step by step detail along with explanations. Here is the summary
  15. Amazon Redshift stores your data on disk in sorted order according to the sort key. Used sort keys based on the data queried most often Distribution Style - ?? When you execute a query, the query optimizer redistributes the rows to the compute nodes as needed to perform any joins and aggregations. The goal in selecting a table distribution style is to minimize the impact of the redistribution step by locating the data where it needs to be before the query is executed. If you are loading data, use the native COPY command which compresses data by default before load Key constraints are option, but you can use them to optimize query plans
  16. Redshift is known for its massive parallel processing and load abilities. You can do this in 3 main ways, The COPY command, Using a data pipeline and via ETLs
  17. The native COPY command is always a preferred method of loading data compared to other alternatives. It has performance tuning & advanced data compressions in built by default.
  18. Data pipeline is a AWS specific technology that can be used to load data from AWS database services like RDS. Think of a pipeline as a DB link on steroids 
  19. Why choose an ETL? Transformation needed on the data Combine & merge data from multiple sources Parse data like XML Many ETL options available, but since we are building a cloud based solution, we choose informatica’s cloud based ETL solution. Data in the original format resides on premise on your Oracle apps DB , which the secure agent transforms using the rules and templates that you specify within Informatica cloud. Your transformed data resides on Redshift on AWS cloud. At no point is your data uploaded to Informatica cloud. From a security standpoint you can encrypt the connections between you secure agent and AWS Redshift as well as encrypt the data at rest within Redshift. So it’s a highly secure model.
  20. So now we have the data consolidated from multiple sources. We can now start exploring data. Enter Tableau, Visual data exploration. Redshift, Oracle & many more out of box.
  21. Publish to a Tableau server for interactive analytic reports delivered via a browser or mobile devices. Static options like PDF exports and data exports to excel
  22. Server can server both to browsers and mobile devices. As of now can only be installed on a windows physical or virtual server. You can install Tableau server either on premise or on a cloud platform like AWS. Tableau public on the other hand is a SAAS service from Tableau.
  23. As mentioned Tableau provides out of box connectivity to Redshift. You start by proving the jdbc url of your Redshift cluster and port number. Next specify the Redshift database name ( remember Redshift cluster can have multiple databases.) Lastly the Schema/password where your data is stored. That’s it.
  24. Before we jump into the demo there is one cool thing about Tableau that I would like to mention here. Live Vs Data extracts. For the purpose of demos, you can easily move around TDEs across machines