SlideShare una empresa de Scribd logo
1 de 23
 Snowflake is a Cloud Data warehouse provided as SaaS with full support of ANSI
SQL, also includes both Structural and Semi-Structure data.
 Enables Users to CreateTables, Start Querying data with less administration.
 Offers bothTraditional Share disk and Shared Nothing architecture to offer the
best of both.
Shared Nothing Architecture Shared disk Architecture
 Snowflake facilitates Unlimited Storage Scalabilty without refactoring and Multiple
Clusters can read or write share data, ResizeClusters Instantly – no downtime is
involved.
 FullTransaction consistencyACID Across entire System.
 Centrally Manage Logical assets such as Servers, buckets etc.
Snowflake Support by 3 different layers
 Storage layer, Compute and Cloud Service
 Snowflake process queries using MPP Concept such as each node has parts of the
data stored locally while using a central data repository to store the data that is
accessible by all compute nodes.
 Snowflake Architecture consist of 3 layers
1. Data Storage
2. Query Processing
3. Cloud Services.
Database Storage Layer:
 Snowflake Organize data into multiple micro partition that are internally
optimized and compressed. It uses columnar format to store. Data stored in
Cloud Storage and works as Shared disk model which provides a simplicity in
data management.
 Compute Node connect with Storage layer to fetch data for querying as the
Storage layer is independent.This allows as Snowflake is provisioned on Cloud,
there by Storage is elastic resulting the user only pay perTB every month.
Query Layer
 Snowflake usesVirtualWarehouse for running the query.The Phenomenal of
snowflake that separate the query processing layer from disk storage.
Cloud Service Layer:
AllActivities Such as Authentication, Security, Meta Management of loaded data and
Query optimizer that coordinates across this layer.
Benefits:
Cloud Services :
 Multi-tenant, transactional and Secure
 Runs in AWS Cloud
 Million of Queries per day over
petabytes of data.
 Replicated for Availability and Scalability
 Focus on easy of use and service
experience
 Collection of services such as Access-
Control,QueryOptimizer and
transactional Manager
1. Extract data from oracle to CSV using SQL Plus
2. Data type conversation and other transformations.
3. Staging files to S3
4. Finally Copy Staged files to Snowflake tables.
Step 1: Code: --Turn on the spool
spool spool file.txt
select * from dba_table;
spool off
Note : Spool file will not be available until it is turned off.
#!/usr/bin/bash
FILE="students.csv"
sqlplus -s user_name/password@oracle_db <<EOF
SET PAGESIZE 35000
SET COLSEP "|"
SET LINESIZE 230
SET FEEDBACKOFF
SPOOL $FILE
SELECT * FROM EMP;
SPOOLOFF
EXIT
EOF#!/usr/bin/bash
FILE="emp.csv"
sqlplus -s scott/tiger@XE <<EOF
SET PAGESIZE 50000
SET COLSEP ","
SET LINESIZE 200
SET FEEDBACKOFF
SPOOL $FILE
SELECT * FROM STUDENTS;
SPOOLOFF
EXIT
EOF
Step 1.2:
 For incremental load,we need
to generate sql with proper
condition to select only the
records which we are modified
after the last data pull.
Query : select * from students
where last_modified_time >
last_pull_time and
last_modified_time <=
sys_time.
Step 2 : Below are the recommendation for
transferring data type conversation from oracle
to snowflake.
Step 3:
 To load data to Snowflake, the
data needs to be upload to s3
loaction (step 2 explains about
extract of oracle to flat files)
 We need a Snowflake instance
which runs on AWS.This
instance needs to have the
ability to access the S3 files in
AWS.
 This access can be either
internal or external and this
process is called Staging
Create Internal Staging :
create or replace stage my_oracle_stage
copy_options= (on_error='skip_file')
file_format= (type = 'CSV' field_delimiter = ','
skip_header = 1);
Use below PUT command to stage files to
internal Snowflake stage
PUT file://path_to_your_file/your_filename
internal_stage_name
Upload a file items_data.csv in the
/tmp/oracle_data/data/ directory to an
internal stage named oracle_stage
put
ile:////tmp/oracle_data/data/items_data.cs
v @oracle_stage;
Ref :
https://docs.snowflake.net/manuals/sql-reference/sql/put.html
Step3: (External Staging options)
 Snowflake supports any
accessibleAmazon S3 or
MicrosoftAzure as an external
staging location.You can create
a stage to pointing to the
location data can be loaded
directly to the Snowflake table
through that stage. No need to
move the data to an internal
stage
 create an external stage
pointing to an S3 location, IAM
credentials with proper access
permissions are required
If data needs to be decrypted before loading
to Snowflake, proper keys are to be
provided.
create or replace stage oracle_ext_stage
url='s3://snowflake_oracle/data/load/files/'
credentials=(aws_key_id='1d318jnsonmb5#d
gd4rrb3c'
aws_secret_key='aii998nnrcd4kx5y6z');
encryption=(master_key =
'eSxX0jzskjl22bNaaaDuOaO8=');
Once data is extracted from Oracle it can be
uploaded to S3 using the direct upload
option or usingAWS SDK in your favourite
programming language. Python’s boto3 is
a popular one used under such
circumstances. Once data is in S3, an external
stage can be created to point that location
Step 4: Copy staged files to
Snowflake table
 Extracted data from Oracle,
uploaded it to an S3 location
and created an external
Snowflake stage pointing to
that location.The next step is
to copy data to the table.The
command used to do this
is COPY INTO. Note:To execute
the COPY INTO command,
compute resources in
Snowflake virtual warehouses
are required and your
Snowflake credits will be
utilized.
• To load from a named internal
copy into oracle_table
from @oracle_stage;
• Loading from the external stage. Only one file is
specified.
copy into my_ext_stage_table from
@oracle_ext_stage/tutorials/dataloading/items_ext
.csv;
• A copy directly from an external location without
creating a stage
copy into oracle_table from
s3://mybucket/oracle_snow/data/files
credentials=(aws_key_id='$AWS_ACCESS_KEY_
ID'
aws_secret_key='$AWS_SECRET_ACCESS_KE
Y') encryption=(master_key =
'eSxX009jhh76jkIuLPH5r4BD09wOaO8=')
file_format = (format_name = csv_format);
Files can be specified using patterns
copy into oracle_pattern_table
from @oracle_stage file_format =
(type = 'TSV')
pattern='.*/.*/.*[.]csv[.]gz';
Step 4: Update SnowflakeTable
The basic idea is to load incrementally
extracted data into an intermediate
or temporary table and modify
records in the final table with data in
the intermediate table.The three
methods mentioned below are
generally used for this.
1. Update the rows in the target table
with new data (with same keys).
Then insert new rows from the intermediate or
landing table which are not in the final table
UPDATE oracle_target_table t SET t.value =
s.value FROM landing_delta_table in
WHERE t.id = in.id;
INSERT INTO oracle_target_table (id, value)
SELECT id, value
FROM landing_delta_table WHERE NOT
id IN (SELECT id FROM
oracle_target_table);
2. Delete rows from the target table which are
also in the landing table. Then insert all
rows from the landing table to the final
table. Now, the final table will have the
latest data without duplicates
DELETE .oracle_target_table f WHERE f.id IN
(SELECT id from landing_table); INSERT
oracle_target_table (id, value) SELECT id,
value FROM landing_table;
Files can be specified using patterns
copy into oracle_pattern_table
from @oracle_stage file_format =
(type = 'TSV')
pattern='.*/.*/.*[.]csv[.]gz';
Step 4: Update SnowflakeTable
The basic idea is to load incrementally
extracted data into an intermediate
or temporary table and modify
records in the final table with data in
the intermediate table.The three
methods mentioned below are
generally used for this.
1. Update the rows in the target table
with new data (with same keys).
Then insert new rows from the intermediate or
landing table which are not in the final table
UPDATE oracle_target_table t SET t.value =
s.value FROM landing_delta_table in
WHERE t.id = in.id;
INSERT INTO oracle_target_table (id, value)
SELECT id, value
FROM landing_delta_table WHERE NOT
id IN (SELECT id FROM
oracle_target_table);
2. Delete rows from the target table which are
also in the landing table. Then insert all
rows from the landing table to the final
table. Now, the final table will have the
latest data without duplicates
DELETE .oracle_target_table f WHERE f.id IN
(SELECT id from landing_table); INSERT
oracle_target_table (id, value) SELECT id,
value FROM landing_table;
3. MERGE Statement – Standard SQL
merge statement which combines
Inserts and updates. It is used to
apply changes in the landing table
to the target table with one SQL
statement
MERGE into oracle_target_table
t1 using landing_delta_table t2 on
t1.id = t2.idWHEN matched then
update set value = t2.value WHEN
not matched then INSERT (id, value)
values (t2.id, t2.value);
This method works when you have a
comfortable project timeline and a
pool of experienced engineering
resources that can build and
maintain the pipeline. However, the
method mentioned above comes
with a lot of coding and
maintenance overhead
Ref :
https://hevodata.com/blog/oracle-to-snowflake-etl/
Q&A
https://www.analytics.today/blog/top-10-reasons-snowflake-rocks
https://www.g2.com/reports/grid-report-for-data-warehouse-fall-
2019?featured=snowflake&secure%5Bgated_consumer%5D=0043e810-90c1-4257-a24a-
f7a3b7e6b1c3&secure%5Btoken%5D=04647245837d1e63f5d46e942153e0beed97b18b25f466
db19d0c54901467747&utm_campaign=gate-768549
Q&A

Más contenido relacionado

La actualidad más candente

Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
Snowflake Computing
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 

La actualidad más candente (20)

Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
 
Snowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat SheetSnowflake SnowPro Certification Exam Cheat Sheet
Snowflake SnowPro Certification Exam Cheat Sheet
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guide
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with Snowflake
 
KSnow: Getting started with Snowflake
KSnow: Getting started with SnowflakeKSnow: Getting started with Snowflake
KSnow: Getting started with Snowflake
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
 
A 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with SnowflakeA 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with Snowflake
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
 

Similar a An overview of snowflake

ASP.Net Presentation Part2
ASP.Net Presentation Part2ASP.Net Presentation Part2
ASP.Net Presentation Part2
Neeraj Mathur
 
Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)
Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)
Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)
Faysal Shaarani (MBA)
 

Similar a An overview of snowflake (20)

ASP.Net Presentation Part2
ASP.Net Presentation Part2ASP.Net Presentation Part2
ASP.Net Presentation Part2
 
Sqllite
SqlliteSqllite
Sqllite
 
oracle dba
oracle dbaoracle dba
oracle dba
 
Ms sql server architecture
Ms sql server architectureMs sql server architecture
Ms sql server architecture
 
Big datademo
Big datademoBig datademo
Big datademo
 
Whitepaper To Study Filestream Option In Sql Server
Whitepaper To Study Filestream Option In Sql ServerWhitepaper To Study Filestream Option In Sql Server
Whitepaper To Study Filestream Option In Sql Server
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
ME_Snowflake_Introduction_for new students.pptx
ME_Snowflake_Introduction_for new students.pptxME_Snowflake_Introduction_for new students.pptx
ME_Snowflake_Introduction_for new students.pptx
 
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
 
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
 
Migration from 8.1 to 11.3
Migration from 8.1 to 11.3Migration from 8.1 to 11.3
Migration from 8.1 to 11.3
 
Oracle Database Backup
Oracle Database BackupOracle Database Backup
Oracle Database Backup
 
CocoaHeads PDX 2014 01 23 : CoreData and iCloud Improvements iOS7 / OSX Maver...
CocoaHeads PDX 2014 01 23 : CoreData and iCloud Improvements iOS7 / OSX Maver...CocoaHeads PDX 2014 01 23 : CoreData and iCloud Improvements iOS7 / OSX Maver...
CocoaHeads PDX 2014 01 23 : CoreData and iCloud Improvements iOS7 / OSX Maver...
 
Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)
Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)
Load & Unload Data TO and FROM Snowflake (By Faysal Shaarani)
 
Top 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseTop 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous Database
 
Sql lite android
Sql lite androidSql lite android
Sql lite android
 
Introduction to SQLite in Adobe AIR
Introduction to SQLite in Adobe AIRIntroduction to SQLite in Adobe AIR
Introduction to SQLite in Adobe AIR
 
Exam 1z0 062 Oracle Database 12c: Installation and Administration
Exam 1z0 062 Oracle Database 12c: Installation and AdministrationExam 1z0 062 Oracle Database 12c: Installation and Administration
Exam 1z0 062 Oracle Database 12c: Installation and Administration
 
ora_sothea
ora_sotheaora_sothea
ora_sothea
 
Esm migrate to_corre_6.0c
Esm migrate to_corre_6.0cEsm migrate to_corre_6.0c
Esm migrate to_corre_6.0c
 

Más de Sivakumar Ramar (8)

Nps speedo meter gauge chart in tabelau
Nps speedo meter   gauge chart in tabelauNps speedo meter   gauge chart in tabelau
Nps speedo meter gauge chart in tabelau
 
01 BlockChain
01 BlockChain01 BlockChain
01 BlockChain
 
AWS Services - Part 1
AWS Services - Part 1AWS Services - Part 1
AWS Services - Part 1
 
Amazon quicksight
Amazon quicksightAmazon quicksight
Amazon quicksight
 
Monitor tableau server for reference
Monitor tableau server for referenceMonitor tableau server for reference
Monitor tableau server for reference
 
Today's Synopsis about the 24 Databases
Today's Synopsis about the 24 DatabasesToday's Synopsis about the 24 Databases
Today's Synopsis about the 24 Databases
 
AWS Devops
AWS DevopsAWS Devops
AWS Devops
 
TABLEAU for Beginners
TABLEAU for BeginnersTABLEAU for Beginners
TABLEAU for Beginners
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

An overview of snowflake

  • 1.
  • 2.  Snowflake is a Cloud Data warehouse provided as SaaS with full support of ANSI SQL, also includes both Structural and Semi-Structure data.  Enables Users to CreateTables, Start Querying data with less administration.  Offers bothTraditional Share disk and Shared Nothing architecture to offer the best of both. Shared Nothing Architecture Shared disk Architecture
  • 3.  Snowflake facilitates Unlimited Storage Scalabilty without refactoring and Multiple Clusters can read or write share data, ResizeClusters Instantly – no downtime is involved.  FullTransaction consistencyACID Across entire System.  Centrally Manage Logical assets such as Servers, buckets etc.
  • 4. Snowflake Support by 3 different layers  Storage layer, Compute and Cloud Service
  • 5.  Snowflake process queries using MPP Concept such as each node has parts of the data stored locally while using a central data repository to store the data that is accessible by all compute nodes.  Snowflake Architecture consist of 3 layers 1. Data Storage 2. Query Processing 3. Cloud Services. Database Storage Layer:  Snowflake Organize data into multiple micro partition that are internally optimized and compressed. It uses columnar format to store. Data stored in Cloud Storage and works as Shared disk model which provides a simplicity in data management.  Compute Node connect with Storage layer to fetch data for querying as the Storage layer is independent.This allows as Snowflake is provisioned on Cloud, there by Storage is elastic resulting the user only pay perTB every month.
  • 6. Query Layer  Snowflake usesVirtualWarehouse for running the query.The Phenomenal of snowflake that separate the query processing layer from disk storage. Cloud Service Layer: AllActivities Such as Authentication, Security, Meta Management of loaded data and Query optimizer that coordinates across this layer. Benefits:
  • 7. Cloud Services :  Multi-tenant, transactional and Secure  Runs in AWS Cloud  Million of Queries per day over petabytes of data.  Replicated for Availability and Scalability  Focus on easy of use and service experience  Collection of services such as Access- Control,QueryOptimizer and transactional Manager
  • 8. 1. Extract data from oracle to CSV using SQL Plus 2. Data type conversation and other transformations. 3. Staging files to S3 4. Finally Copy Staged files to Snowflake tables. Step 1: Code: --Turn on the spool spool spool file.txt select * from dba_table; spool off Note : Spool file will not be available until it is turned off. #!/usr/bin/bash FILE="students.csv" sqlplus -s user_name/password@oracle_db <<EOF SET PAGESIZE 35000 SET COLSEP "|" SET LINESIZE 230
  • 9. SET FEEDBACKOFF SPOOL $FILE SELECT * FROM EMP; SPOOLOFF EXIT EOF#!/usr/bin/bash FILE="emp.csv" sqlplus -s scott/tiger@XE <<EOF SET PAGESIZE 50000 SET COLSEP "," SET LINESIZE 200 SET FEEDBACKOFF SPOOL $FILE SELECT * FROM STUDENTS; SPOOLOFF EXIT EOF
  • 10. Step 1.2:  For incremental load,we need to generate sql with proper condition to select only the records which we are modified after the last data pull. Query : select * from students where last_modified_time > last_pull_time and last_modified_time <= sys_time. Step 2 : Below are the recommendation for transferring data type conversation from oracle to snowflake.
  • 11. Step 3:  To load data to Snowflake, the data needs to be upload to s3 loaction (step 2 explains about extract of oracle to flat files)  We need a Snowflake instance which runs on AWS.This instance needs to have the ability to access the S3 files in AWS.  This access can be either internal or external and this process is called Staging Create Internal Staging : create or replace stage my_oracle_stage copy_options= (on_error='skip_file') file_format= (type = 'CSV' field_delimiter = ',' skip_header = 1); Use below PUT command to stage files to internal Snowflake stage PUT file://path_to_your_file/your_filename internal_stage_name Upload a file items_data.csv in the /tmp/oracle_data/data/ directory to an internal stage named oracle_stage put ile:////tmp/oracle_data/data/items_data.cs v @oracle_stage; Ref : https://docs.snowflake.net/manuals/sql-reference/sql/put.html
  • 12. Step3: (External Staging options)  Snowflake supports any accessibleAmazon S3 or MicrosoftAzure as an external staging location.You can create a stage to pointing to the location data can be loaded directly to the Snowflake table through that stage. No need to move the data to an internal stage  create an external stage pointing to an S3 location, IAM credentials with proper access permissions are required If data needs to be decrypted before loading to Snowflake, proper keys are to be provided. create or replace stage oracle_ext_stage url='s3://snowflake_oracle/data/load/files/' credentials=(aws_key_id='1d318jnsonmb5#d gd4rrb3c' aws_secret_key='aii998nnrcd4kx5y6z'); encryption=(master_key = 'eSxX0jzskjl22bNaaaDuOaO8='); Once data is extracted from Oracle it can be uploaded to S3 using the direct upload option or usingAWS SDK in your favourite programming language. Python’s boto3 is a popular one used under such circumstances. Once data is in S3, an external stage can be created to point that location
  • 13. Step 4: Copy staged files to Snowflake table  Extracted data from Oracle, uploaded it to an S3 location and created an external Snowflake stage pointing to that location.The next step is to copy data to the table.The command used to do this is COPY INTO. Note:To execute the COPY INTO command, compute resources in Snowflake virtual warehouses are required and your Snowflake credits will be utilized. • To load from a named internal copy into oracle_table from @oracle_stage; • Loading from the external stage. Only one file is specified. copy into my_ext_stage_table from @oracle_ext_stage/tutorials/dataloading/items_ext .csv; • A copy directly from an external location without creating a stage copy into oracle_table from s3://mybucket/oracle_snow/data/files credentials=(aws_key_id='$AWS_ACCESS_KEY_ ID' aws_secret_key='$AWS_SECRET_ACCESS_KE Y') encryption=(master_key = 'eSxX009jhh76jkIuLPH5r4BD09wOaO8=') file_format = (format_name = csv_format);
  • 14. Files can be specified using patterns copy into oracle_pattern_table from @oracle_stage file_format = (type = 'TSV') pattern='.*/.*/.*[.]csv[.]gz'; Step 4: Update SnowflakeTable The basic idea is to load incrementally extracted data into an intermediate or temporary table and modify records in the final table with data in the intermediate table.The three methods mentioned below are generally used for this. 1. Update the rows in the target table with new data (with same keys). Then insert new rows from the intermediate or landing table which are not in the final table UPDATE oracle_target_table t SET t.value = s.value FROM landing_delta_table in WHERE t.id = in.id; INSERT INTO oracle_target_table (id, value) SELECT id, value FROM landing_delta_table WHERE NOT id IN (SELECT id FROM oracle_target_table); 2. Delete rows from the target table which are also in the landing table. Then insert all rows from the landing table to the final table. Now, the final table will have the latest data without duplicates DELETE .oracle_target_table f WHERE f.id IN (SELECT id from landing_table); INSERT oracle_target_table (id, value) SELECT id, value FROM landing_table;
  • 15. Files can be specified using patterns copy into oracle_pattern_table from @oracle_stage file_format = (type = 'TSV') pattern='.*/.*/.*[.]csv[.]gz'; Step 4: Update SnowflakeTable The basic idea is to load incrementally extracted data into an intermediate or temporary table and modify records in the final table with data in the intermediate table.The three methods mentioned below are generally used for this. 1. Update the rows in the target table with new data (with same keys). Then insert new rows from the intermediate or landing table which are not in the final table UPDATE oracle_target_table t SET t.value = s.value FROM landing_delta_table in WHERE t.id = in.id; INSERT INTO oracle_target_table (id, value) SELECT id, value FROM landing_delta_table WHERE NOT id IN (SELECT id FROM oracle_target_table); 2. Delete rows from the target table which are also in the landing table. Then insert all rows from the landing table to the final table. Now, the final table will have the latest data without duplicates DELETE .oracle_target_table f WHERE f.id IN (SELECT id from landing_table); INSERT oracle_target_table (id, value) SELECT id, value FROM landing_table;
  • 16. 3. MERGE Statement – Standard SQL merge statement which combines Inserts and updates. It is used to apply changes in the landing table to the target table with one SQL statement MERGE into oracle_target_table t1 using landing_delta_table t2 on t1.id = t2.idWHEN matched then update set value = t2.value WHEN not matched then INSERT (id, value) values (t2.id, t2.value); This method works when you have a comfortable project timeline and a pool of experienced engineering resources that can build and maintain the pipeline. However, the method mentioned above comes with a lot of coding and maintenance overhead Ref : https://hevodata.com/blog/oracle-to-snowflake-etl/
  • 17.
  • 18.
  • 19.
  • 20.
  • 22.
  • 23. Q&A