Data driven organizations can be challenged to deliver new and growing business intelligence requirements from existing data warehouse platforms, constrained by lack of scalability and performance. The solution for customers is a data warehouse that scales for real-time demands and uses resources in a more optimized and cost-effective manner. Join Snowflake, AWS and Ask.com to learn how Ask.com enhanced BI service levels and decreased expenses while meeting demand to collect, store and analyze over a terabyte of data per day. Snowflake Computing delivers a fast and flexible elastic data warehouse solution that reduces complexity and overhead, built on top of the elasticity, flexibility, and resiliency of AWS.
Join us to learn:
• Learn how Ask.com eliminates data redundancy, and simplifies and accelerates data load, unload, and administration
• Learn how to support new and fluid data consumption patterns with consistently high performance
• Best practices for scaling high data volume on Amazon EC2 and Amazon S3
Who should attend: CIOs, CTOs, CDOs, Directors of IT, IT Administrators, IT Architects, Data Warehouse Developers, Database Administrators, Business Analysts and Data Architects
3. Why AWS for Big Data?
Immediately
Available
Broad and Deep
Capabilities
Trusted and
Secure
Scalable
4. Big Data is for everyone
The market for Big Data technologies is growing more than six times faster than the
information technology market as a whole….
…and those companies who use their data well win.
6. Amazon S3
Highly durable object storage for all types of data
Internet-scale storage
Grow without limits
Benefit from AWS’s
massive security
investments
Built-in redundancy
Designed for
99.999999999%
durability
Low price per GB
per month
No commitment
No up-front cost
7. Amazon Elastic Cloud Compute (EC2)
Virtual servers
hosted on the
Amazon Cloud
Scale up or down
quickly, as needed
Pay for what
you use
Familiar operating
systems
9. Poll Question 2: Data sources
What data sources would you like to analyze today (choose multiple)?
– Customer facing mobile applications (JSON/XML, semi-structured, etc.)
– Traditional applications (structured data)
– Machine data or Weblog data from your website (JSON / XML)
10. About IAC Publishing Labs
Headquartered in Oakland, CA
One of the largest collections of premium
publishers
Growth through acquisition
BI Team
Provides centralized analytics
Manages 300+ million events and 50-100 million
keywords
Manages marketing terms for bidding and
monetization
Imports more than 1.5 terabytes of raw data daily
Incorporate different data sources
IAC Publishing Labs (Ask.com, About.com)
IAC Publishing Labs turns data from a business barrier to a business
accelerator with Snowflake on AWS
11. Let’s be honest
I want to spend all my day
waiting for infrastructure,
struggling to get access to
data, and competing for
resources
12. I want to be a data
superhero, taking on
the toughest data
challenges to wrestle
new insights from data
I want to spend all my day
waiting for infrastructure,
struggling to get access to
data, and competing for
resources
Let’s be honest
13. The Challenge
With a query running
longer than 30
minutes, there was a
70% chance the (on-
premises) database
would shut down
Legacy Data Warehouse
Large MPP warehouse - 6 months of history
Large Cloudera Hadoop cluster - longer than 6
months of data retention
Pains
Can’t natively process JSON data
No TEST/DEV environment
Unstable !
A Rigid System that Could Not Keep Pace with the Growing Business
14. Path to Better BI
Requirements
120+ metrics
Need to move to ‘as-a-service’ architecture
Evaluation
PoC Vendors- Snowflake on AWS, Google Big Query, other
cloud data warehouse alternatives
After completing
evaluations and
choosing Snowflake
Elastic Data
Warehouse on AWS,
IAC Publishing Labs
was in full
production in just
three months
15. Benefits
Ability for large number of users to query the same data
Querying JSON data from the web logs
Pinpoint logging in near-real time
Spin up/down warehouses of any size as required
Chosen Solution - Snowflake on AWS
IAC Publishing Labs now has a single environment for processing data
and producing results
16. Scalability with Greater Control on the AWS Cloud
Flexibility
Spin up/down
warehouses of any size
as required
One data
warehouse to rule
them all
Snowflake architecture
allows us to combine
data and workloads in
one environment
Thorough Testing
Instantly able to create
new dev/test
environment without
creating multiple copies
of data and impacting
production
Concurrency
Unlimited processing
with Snowflake and
Amazon EC2
Stability
Separate, controlled
access to new users
17. Enhanced Service Levels
Improvements
Stable systems
30+analysts querying concurrently
24x7x365 loading data (1.5 TB loaded every day)
Process data once a day, over 3 hours
Data load every 15 seconds
Data load every 5 hours
Process data every hour under 10 min
Then
Now
Vs.
Then
Now
Vs.
18. Transition from Cost Center to Value Center
Snowflake on AWS holds both internal and external data together
Single source of truth Real time visibility Data metrics matches
speed of business
No capital/
infrastructure
investments
Elastic Data Warehouse
19. Results
Establishing one source of truth in a centralized data warehouse
Consolidating technologies and eliminating legacy platforms
Providing enhanced BI service levels through
Changing the BI team from a cost center to a value center
Decreasing expenses significantly (by 78%) for the data warehouse
environment
20. Poll Question 3: Pain points
For your data warehouse and/or big data platforms, what are the
biggest pain-points? (you can select multiple)
– Constantly juggling performance, management and scaling users
– Difficulty in bringing in new sources of data
– Current solution costs are too high
21. What data analysts want
Easy access to all
relevant data
Expand direct access to
data insights
Without burdensome cost
and complexity
22. Common realities
Silos of Data
Difficult to bring together diverse
data—application data, machine-
generated data, streaming data
23. Common realities
Complex Data Infrastructure
Significant resources spent building
and maintaining data platforms
Silos of Data
Difficult to bring together diverse
data—application data, machine-
generated data, streaming data
Data Warehouse(s) Datamarts
Hadoop &
noSQL
24. Common realities
Frustrated Analysts
Limited by incomplete data, delays in
access, performance
Complex Data Infrastructure
Significant resources spent building
and maintaining data platforms
Silos of Data
Difficult to bring together diverse data--
application data, machine-generated
data, streaming data
DatamartsData Warehouse(s)
Hadoop &
noSQL
25. Introducing Snowflake’s Elastic Data Warehouse
All-new SQL data
warehouse
No legacy code or constraints
Designed for the cloud
Running on AWS
Delivered as a service
No infrastructure, knobs or
tuning to manage
26. Bring together data in one place
Any scale of data
– Transparently scale up and down, online and
on-demand
27. Bring together data in one place
Any scale of data
– Transparently scale up and down, online and
on-demand
Efficient storage, low cost
– Columnar, automatically compressed storage +
pay for only what you use
28. Bring together data in one place
Any scale of data
– Transparently scale up and down, online and
on-demand
Efficient storage, low cost
– Columnar, automatically compressed storage +
pay for only what you use
Native support for diverse data
– Structured + semi-structured (JSON, Avro, ...)
in one system, without sacrificing performance
or flexibility
29. Accelerate analytics
Simpler, faster data pipeline
– Load data directly into Snowflake, at any time,
without additional systems or transformations
30. Accelerate analytics
Simpler, faster data pipeline
– Load data directly into Snowflake, at any time,
without additional systems or transformations
Any scale of workload
– Elastic scaling to handle any size job—in-
database performance for complex queries
31. Accelerate analytics
Simpler, faster data pipeline
– Load data directly into Snowflake, at any time,
without additional systems or transformations
Any scale of workload
– Elastic scaling to handle any size job—in-
database performance for complex queries
32. Accelerate analytics at a fraction of the cost
Simpler, faster data pipeline
– Load data directly into Snowflake, at any time,
without additional systems or transformations
Any scale of workload
– Elastic scaling to handle any size job—in-
database performance for complex queries
Infinite concurrency scaling
– Give workloads independently processing
resources, without needing to copy or
move data
At a fraction of the Cost
– Cloud economies of scale, pay for only what
you use, and align storage & processing costs
with use
33. No infrastructure, knobs, or tuning
On-premises Cloud data warehouse Data warehouse service
Infrastructure
Datacenter ✘ (customer) ✔ (vendor)
✔
Hardware & software ✘ ✔
Upgrades & scaling ✘ ✘
Database
management
& tuning
Index management ✘ ✘
Data partitioning ✘ ✘
Metadata & statistics maintenance ✘ ✘
Query optimization ✘ ✘
Data & service
availability
Failure recovery ✘ ✔
Disaster recovery ✘ ✘
Data protection ✘ ✘
Service monitoring & alerting ✘ ✘
Security
Physical security ✘ ✔
Deployment security ✘ ✔
Security monitoring ✘ ✘
34. In summary
Fast analytics
“Snowflake gets us answers an
order of magnitude faster. As a
result we can do 100 times more
queries per day.”
Balaji Rao, Accordant Media
Without the cost
“Snowflake is extremely cost
effective—we have saved nearly 80%
by implementing Snowflake.”
Rolfe Lindberg, DoubleDown
Zero management
“Snowflake makes it possible for us
to focus on making use of our data
without the complexity and resources
required by traditional data
warehousing and big data solutions.”
Ethan Erchinger, Chime
One place for data
“I can't say enough about how
fantastic the native JSON support is.”
Josh McDonald, KIXEYE