Launching a Data Platform on Snowflake

•Descargar como PPTX, PDF•

0 recomendaciones•192 vistas

KETL Limited

Launching a Data Platform on SnowflakeUsing “old skills” in a new world

Tecnología

A Seasoned Professional – My Data Journey
Old World Data Rock StarMe
Evolution
rikkyal © Creative Market

What skills and attributes are required in a data team?
Evolution
David Benes © 123RF.com

Running a platform requires people with the
right skills
1. There is complexity to
manage
2. Agile working
environment
3. Data Rock Stars are rare
4. Data modelling and SQL
invaluable

What technology enabled us to launch a
data platform?
A step change in the evolution of cloud computing.
1. Decoupling storage from compute
Pay for compute only when needed (scale up, down, out)
Pay for storage separately (very cheap)
2. Low barrier to entry
Extremely easy to set up
Very low price (no CAPEX)
3. Same data used by everybody – no impact, each to their own
compute

Data
Extracted
Loaded
Transformed
How your
business
refers to
terms that
you
report on.
Called a
Semantic
Layer
Our platform is a wrapper service based on
Snowflake

Price of Entry
 KETL can now offer cloud services with little risk
 No software licensing costs
 No hardware costs
 Time to deployment rapid
 We can use existing team skills to carry out Extract Load
Transform
 We can iterate quickly and let designs evolve
 Time to value for clients massively reduced

https://www.snowflake.com/zero-to-snowflake/zero-to-snowflake-in-90-minutes-bristol/

KETL
30 Queen Charlotte St
Bristol
BS1 4JH
+44 (0)117 251 0064
www.ketl.co.uk
info@ketl.co.uk
@KETL_BI
For more information on
what we do please contact
Helen Woodcock.
We host regular workshops

Más contenido relacionado

La actualidad más candente

Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...Patrick Van Renterghem

Delivering Data Democratization in the Cloud with SnowflakeKent Graziano

Actionable Insights with AI - Snowflake for Data ScienceHarald Erb

Delivering rapid-fire Analytics with Snowflake and TableauHarald Erb

Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing

Elastic Data WarehousingSnowflake Computing

Demystifying Data Warehousing as a Service - DFWKent Graziano

Snowflake: The Good, the Bad, and the UglyTyler Wishnoff

Snowflake essentialsqureshihamid

Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Certus Solutions

Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Databricks

Automate and Optimize Data Warehouse Migration to SnowflakeImpetus Technologies

Does it only have to be ML + AI?Harald Erb

Analyzing Semi-Structured Data At Volume In The CloudRobert Dempsey

Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksGrega Kespret

HOW TO SAVE PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...Kent Graziano

Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services

New! Real-Time Data Replication to SnowflakePrecisely

Altis AWS Snowflake PracticeSamanthaSwain7

La actualidad más candente (19)

Cloud Data Warehousing presentation by Rogier Werschkull, including tips, bes...

Delivering Data Democratization in the Cloud with Snowflake

Actionable Insights with AI - Snowflake for Data Science

Delivering rapid-fire Analytics with Snowflake and Tableau

Introducing the Snowflake Computing Cloud Data Warehouse

Elastic Data Warehousing

Demystifying Data Warehousing as a Service - DFW

Snowflake: The Good, the Bad, and the Ugly

Snowflake essentials

Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...

Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...

Automate and Optimize Data Warehouse Migration to Snowflake

Does it only have to be ML + AI?

Analyzing Semi-Structured Data At Volume In The Cloud

Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks

HOW TO SAVE PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...

Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...

New! Real-Time Data Replication to Snowflake

Altis AWS Snowflake Practice

Similar a Launching a Data Platform on Snowflake

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks

Navigating Cloud and Multi-CloudAdvanced Technology Consulting (ATC)

The DevOps PaaS Infusion - May meetupNorm Leitman

Exponential-e | Cloud Revolution Seminar at the Ritz, 20th November 2014Exponential_e

FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo AquinoHugo Aquino

Evolving From Monolithic to Distributed Architecture Patterns in the CloudDenodo

Coud discovery chap 1Alain Charpentier

Tim Jones – CTO, Trader MediaRightScale

2018 open apereomontreal-moving-a large-sakai-installation-to-the-cloudFrancois Campbell

Introduction to ActOnMagicMadan Ganesh Velayudham

451 Group Increasing Cloud Application PerformanceCDNetworks

Cloud Computing - Why it is so popularSoham Mondal

What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...Lucas Jellema

AppNexus' First Pitch DeckCamille Ricketts

First Round First Pitch: AppNexusFirst Round Capital

AppnexusTeam2080

RightScale Roadtrip Boston: Accelerate to CloudRightScale

Webinar - Learn How to Deploy Microsoft SQL in the CloudTuan Yang

Cloud Businesses: Strategic ConsiderationsTathagat Varma

How Data Drives Business at Choice HotelsCloudera, Inc.

Similar a Launching a Data Platform on Snowflake (20)

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks

Navigating Cloud and Multi-Cloud

The DevOps PaaS Infusion - May meetup

Exponential-e | Cloud Revolution Seminar at the Ritz, 20th November 2014

FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino

Evolving From Monolithic to Distributed Architecture Patterns in the Cloud

Coud discovery chap 1

Tim Jones – CTO, Trader Media

2018 open apereomontreal-moving-a large-sakai-installation-to-the-cloud

Introduction to ActOnMagic

451 Group Increasing Cloud Application Performance

Cloud Computing - Why it is so popular

What is the Oracle PaaS Cloud for Developers (Oracle Cloud Day, The Netherlan...

AppNexus' First Pitch Deck

First Round First Pitch: AppNexus

Appnexus

RightScale Roadtrip Boston: Accelerate to Cloud

Webinar - Learn How to Deploy Microsoft SQL in the Cloud

Cloud Businesses: Strategic Considerations

How Data Drives Business at Choice Hotels

Más de KETL Limited

London Jaspersoft Community User Group Event 2 KETL presentationKETL Limited

Talend Community Use Group Bristol: Preparing your business for mastering dat...KETL Limited

London Jaspersoft Community User Group presentation KETLKETL Limited

Jaspersoft 6.2KETL Limited

KETL Quick guide to data analytics KETL Limited

Marketing Network presentation: Why marketers need to be concerned with data ...KETL Limited

Talend community user group Bristol: commercial versus community versionKETL Limited

Talend community user group Bristol & SW UK eventKETL Limited

Más de KETL Limited (8)

London Jaspersoft Community User Group Event 2 KETL presentation

Talend Community Use Group Bristol: Preparing your business for mastering dat...

London Jaspersoft Community User Group presentation KETL

Jaspersoft 6.2

KETL Quick guide to data analytics

Marketing Network presentation: Why marketers need to be concerned with data ...

Talend community user group Bristol: commercial versus community version

Talend community user group Bristol & SW UK event

Último

Architecting Cloud Native ApplicationsWSO2

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Corporate and higher education May webinar.pptxRustici Software

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays

A Year of the Servo Reboot: Where Are We Now?Igalia

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Why Teams call analytics are critical to your entire businesspanagenda

Launching a Data Platform on Snowflake

1. Launching a Data Platform on Snowflake Using “old skills” in a new world Simon Sleight Data Guy • People • Technology • Pricing

4. Running a platform requires people with the right skills 1. There is complexity to manage 2. Agile working environment 3. Data Rock Stars are rare 4. Data modelling and SQL invaluable

5. What technology enabled us to launch a data platform? A step change in the evolution of cloud computing. 1. Decoupling storage from compute Pay for compute only when needed (scale up, down, out) Pay for storage separately (very cheap) 2. Low barrier to entry Extremely easy to set up Very low price (no CAPEX) 3. Same data used by everybody – no impact, each to their own compute

6. Snowflake Architecture Overview

7. Snowflake Architecture Overview

8. doyouevendata.com Let me explain…

9. Data Extracted Loaded Transformed How your business refers to terms that you report on. Called a Semantic Layer Our platform is a wrapper service based on Snowflake

10. Price of Entry  KETL can now offer cloud services with little risk  No software licensing costs  No hardware costs  Time to deployment rapid  We can use existing team skills to carry out Extract Load Transform  We can iterate quickly and let designs evolve  Time to value for clients massively reduced

11. https://www.snowflake.com/zero-to-snowflake/zero-to-snowflake-in-90-minutes-bristol/

12. KETL 30 Queen Charlotte St Bristol BS1 4JH +44 (0)117 251 0064 www.ketl.co.uk info@ketl.co.uk @KETL_BI For more information on what we do please contact Helen Woodcock. We host regular workshops

Notas del editor

This talk is about how KETL were able to launch a data platform There are three key ingredients – People, Technology, and Pricing
People This talk is about KETLs experience with developing a cloud platform. Three key factors in it’s development are people, technology, and price. I’d like to share my experiences to date in order to give you some context. We are all products of our experiences. Good or Bad. It’s true to say that we can only improve going forward. As our Marketing team would say “I am a seasoned professional”. These grey hairs come from experience – I wish I had cloud when I started. When I started work we had to physically build the servers, compile operating systems, fine tune software and run business processes as best we could. The environment was fragile because budgetary and technical constraints meant that true resilience was expensive and difficult to obtain. The financial and emotional investment in systems stymied free form creative development. Changes to any code often came with downtime, arduous deployments, and on occasion hardware upgrades too. In the old world, as sharp and keen as I think I was, I was still perceived as a bottleneck by the business. I was also the custodian and gatekeeper to the data. We all talk cloud and new world but for many people and businesses that I meet they are still carry lots of technical debt, frustrations, and fears – and I completely understand where they are coming from. CALL CENTRE I had to replace three spreadsheets and an access database for a call-centre If I looked at the data it was mostly complete and superficially fit for purpose When I sat with the team taking calls I began to understand the stress of using Excel and Access whilst on the phone to the client When you start to realise that technology should enable and is a service and the power of the right data at the right time you can better understand my personal goal as being an enabler That is what we want for our platform too I’m not a data rock star yet
The data platform is a service that has to integrate with different organisations and data sources Interactions with the outside world 70% of the time require dealing with people Good Service requires good people – people are key For me good data people display some key personal qualities: Accuracy (don’t send me a CV with missing full stops and typos) Consistency and reliability Evidence of working with data problems (the number of rows do not always determine the complexity of the problem) The “old skills” are still valuable today Environments are build from scripts – we can deploy identical client server environments using a script called with a different variable, vanilla configuration of cloud servers and data services Coding of key business functions into reusable working patterns – systematic thought processes (we still have to deliver a chain of events even if they now last seconds rather than hours) SQL skills – it matters little which database you learnt SQL on, the fundamentals of SQL should be able to give an employee an understanding of the data warehousing concepts I look for experience of Kimball or Data Vault schema design (even if not directly stated) It still requires humans to interpret business processes Can’t build a data service without a data team Why is this piece of data here? Why is there missing data? What does this piece of data relate to in the business? What does people friendly mean? how do we get to the business of what a customer wants Part of our platform has to deliver a semantic layer to the client – this is where we describe and codify data values in business terms The outcome we are trying to achieve is consistency in business reporting and a centralisation of business logic We are only able to code this layer if we understand and interact with the business users and match code to meaning An Example: What is a customer? Is it a credit card? An email list member? A loyalty card? A gift recipient? Do they expire? How are customers counted? How are they uniquely identifiable? Other typical scenarios are summarising business activity markers into sales stages or grouping products into categories
IN MY EXPERIENCE: There is complexity still to manage Client technical debt Access to information and third party systems Multiple data sources, multiple vendors Provision of data ingestion access points and EC2 servers Reporting Requirements We use agile methodologies: de-risk the complexity make tasks manageable Time to value for our clients has improved dramatically with snowflake We recently implemented a proof of concept (end-to-end) from extract to load in four days Previously it would have taken about four months (elapsed time) – hardware, VPCs, software, licences, schema pre-design, etc Proof of concept uses production data and replicates production functionality, the only limitation was the scope of required outputs We used to have a long design stage prior to data load, now we spend the time exploring the real data and adapting the design as we go Typically we can add fields and new sources within a sprint (two weeks) New Skills can be taught and learned (many skills transferable) So many tutorials online, so many great e-learning courses We are running a zero to snowflake courseLike any of us coders that feel we can do pretty much anything – there is no substitute for hand on experience Necessity is the mother of invention Our platform was designed to make the steps we do for all clients repeatable and automated Data Rock stars are rare Teams with a combination of skills, differences of opinion, different backgrounds and domain experience provide the best results No single point of failure, or anything to esoteric Domain experience helps speed up insight Data modelling and SQL are key to the product Kimball Star Schemas / Data Vault Understanding join logic SQL load scripts Many of you will have these attributes, so starting a journey in Snowflake will not be as hard as you may think
TECHNOLOGY is now meeting our expectation Legacy data pipelines with long batch windows on maxed out limited hardware – fragile and high maintenance, changes difficult Initial cloud offerings reduced capital expenditure on equipment but still required lots of system administrators Recent improvements in shared services and containerisation (Docker) Key elements of the Snowflake technology de-risked the investment Benefits of the technology easily replicated and shared for different client types enables fail-fast (development and querying) lots of different teams can work on the same production data set the same data can be split between different server groups (no impact across teams) ability to process data loading/unloading without impacting running queries zero copy clones and separate compute time travel we are running a hands on zero to snowflake session November 6th Details at end There are other MPP 2016 productive use
TECHNOLOGY: This architecture diagram illustrates the core concepts Every user has access to the same data (subject to permissions) Data is stored once Teams can use “clones” of production data to carry out development on Different teams can use their own virtual warehouses (compute resources) Loading warehouse is generally about parallel file ingestion (particularly for legacy) CSV is the quickest File sizes should be about the same 10MB-100MB compressed maximum NUMBER OF FILES key 4 cores / 8 threads -> so eight files in parallel, one file per thread Adhoc analytics warehouse Scaled for query time responsiveness Multiple users, more clusters – resolves concurrency, prevents queued queries Development can scale up and down the warehouse to test different functions Proof of concept single small server adequate for view development We are running a hands on Zero to Snowflake session November 6th Bring a laptop
TECHNOLOGY: Each virtual warehouse can have from 1 server per cluster (X-Small) to 128 servers per cluster (4X-Large) Each virtual warehouse cluster can be scaled out identically (up to 10 clusters) Automated Cluster Scale Out Single command line scale-up/down Result Cache persisted for 24 hours – reset each time the results are accessed, for up to 31 days Pay for only the compute used As a company we do not have to predict demand but are able to respond to it We are able to set limits and alerts around usage such that we can be pro-active with running costs
TECHNOLOGY: The ETL processes of old where data took ages to load in series and had to be manipulated outside of the database are over Load (and reload) all the data, transform in situ – the Extract Load Transform process Transformation in situ does not require tooling, just a good understanding of SQL Snowflake allows the parallel loading of data files (streams and other feeds) The more nodes you have in the virtual server, the more threads you have to load files Query Result Cache – the cache is part of the snowflake service and returns previously calculated Disk slow but cheap, SSD cache proportional to virtual warehouse size nodes – disappears on ramp down Tuning – ramp up/ramp down Ramp-up for parallel ingestion Ramp out for concurrency – many BI report users
Snowflake is an enabler We can ingest data very easily Bash Python Connectors SQL Scripts and S3 We can rapidly prototype and deploy data models Develop views in design stage Share data with customers As mentioned earlier, we are able to implement Proof of Concept warehouses (end-to-end) From four months to four days Data tables can be refined through continual iterations The scalability and speed allow experimentation and measurement before the outcome has to be fixed The investigation of data becomes a doing thing rather than a thinking thing – find issues quicker Flexibility in modelling We can use our “old skills” in designing the data model and generating the semantic layer for the client Notes: AWS Lambda limit 15 minutes EC2 Orchestrated Scripts – CRON / Python / Bash Apache Airflow
Come and meet our Rock Star You learn from doing Other Notes: Forecasting using auto.arima - possible ARIMA models are searched through to find the best fit. ARIMA is Auto-Regressive Integrated Moving Average

Launching a Data Platform on Snowflake

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Similar a Launching a Data Platform on Snowflake

Similar a Launching a Data Platform on Snowflake (20)

Más de KETL Limited

Más de KETL Limited (8)

Último

Último (20)

Launching a Data Platform on Snowflake

Notas del editor