SlideShare una empresa de Scribd logo
1 de 80
Descargar para leer sin conexión
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 1
Azure Data
dataredkite.com
premiseo.com
Who are we ?
26/02/2021 2
I'm a data and cloud Architect and Spark lover.
I worked many years as an Oracle consultant and
expert, and now I work with Cloud solutions devoted to
solve complex problems with high volumes of data.
I am a Data Analyst & Solution Architect indepedent -
☁️ MCSE, Cosmos DB & Delta lover.
I developed my skills through various clients' projects,
teaching at the University and personal proof of
concepts.
I’m also the Co-Founder of DataRedkite, a product which
can quickly give to its user a good management of data
in Microsoft Azure DataLake.
Laurent Leturgez Alexandre Bergere
Meetup Azure Lille
dataredkite.com
premiseo.com
Summary
26/02/2021 Meetup Azure Lille 3
Relational Databases NoSQL Databases Big Data Storage
Data Big Data Streaming
Storage :
Compute :
premiseo.com dataredkite.com
Storage
4
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 5
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed MySQL database service
for app developers
Azure Database for MySQL
Managed Postgres database
service for app developers
Azure Database for PostGres
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 6
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MySQL database service
for app developers
Azure Database for MySQL
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed Postgres database
service for app developers
Azure Database for PostGres
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 7
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database Azure SQL Database
dataredkite.com
premiseo.com
Azure SQL Database
27/04/2021 Meetup Azure Lille 8
• Azure SQL
• SQL Server Paas service
• Managed upgrades, patches, backups and monitoring
• Latest Stable version of SQL Server
• 99,99% availability
• Deployment model
• Single Database : database runs on non shared resources
• Elastic Pool : database runs with a collection of databases that share set of resources at a
predictable price
dataredkite.com
premiseo.com
Azure SQL Database
27/04/2021 Meetup Azure Lille 9
• Azure SQL
• Purchasing model
• DTU (Database Transaction Unit) : https://docs.microsoft.com/en-us/azure/azure-sql/database/service-tiers-
dtu
• Basic tier
• Standard Tier
• Premium Tier
• vCore model
• Serverless
• Service Tier
• General Purpose (vCore) / Standard (DTU) : Common workloads
• Business Critical (vCore) / Premium (DTU) : High transaction and availability / low latency IO
• HyperScale (vCore) :
• Up to 100Tb Database
• Rapid Scale up (compute resources)
• Rapid Scale out (read only nodes : read workload / hot-standby)
dataredkite.com
premiseo.com
Azure SQL Database
27/04/2021 Meetup Azure Lille 10
• Azure SQL Managed Instance
• Features
• Paas platform for lift and shift at scale
• Broadest SQL Server engine compatibility (network integration, features etc.)
• With perservation of all Paas capabilities (patching, updates, backups, HA etc.)
• vCore purchase model only
• BYOL available
• SQL Virtual Machine
• SQL Server deployment on VM (Linux and Windows)
• Can choice SQL Server version
• From 2008 R2
• Up to 2019
dataredkite.com
premiseo.com
Azure SQL Database
27/04/2021 Meetup Azure Lille 11
Azure SQL Database
Managed Instance
Instance scoped model with
high compatibility to SQL Server
Best for modernisation at scale
with low cost effort (lift & shift)
Single
Standalone managed database
for predictable and stable
workloads
Elastic Pool
Shared resources model :
multitenant
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 12
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MySQL database service
for app developers
Azure Database for MySQL
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed Postgres database
service for app developers
Azure Database for PostGre
dataredkite.com
premiseo.com
Azure Database for PostgreSQL
27/04/2021 Meetup Azure Lille 13
• Paas Service for PostgreSQL
• Runs on Windows
• Single Server
• v9.5 to 11
• Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/postgresql/concepts-pricing-
tiers)
• Up to 2 for Basic SKU
• Up to 64 for General Purpose SKU
• Up to 32 for Memory Optimized SKU
• Bunch of PG Extensions available
• Automated Backup (retention up to 35days)
• Backup frequency and backup types depend on database size
• Geo-redundant backup option (General Purpose & Memory Optimized)
dataredkite.com
premiseo.com
Azure Database for PostgreSQL
27/04/2021 Meetup Azure Lille 14
• Paas Service for PostgreSQL
• HyperScale (Citus)
• High performance and analytical workloads beyond 100Gb
• Hyperscale delivers
• Horizontal scaling across multiple machine (with Sharding)
• Query parallelization across these servers
• High performance for analytics
• Based on server groups
• Design approach required for table distribution and performance
• Distributed tables (based on distribution column)
• Reference tables (content concentrated into a single shard replicated on every worker node)
• Local tables (ordinary unsharded tables. Perfect for small tables not involded into joins)
• Automated backup through storage snapshots
dataredkite.com
premiseo.com
Azure Database for PostgreSQL
27/04/2021 Meetup Azure Lille 15
• Paas Service for PostgreSQL
• Flexible Server (Preview)
• Automated patching
• Automatic backups
• Performance adjustment in three switchable compute tiers : Burstable, GP, Memory Optimized
High Availability Zone Redundant HA (Optional)
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 16
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed Postgres database
service for app developers
Azure Database for PostGre
Managed MySQL database service
for app developers
Azure Database for MySQL
dataredkite.com
premiseo.com
Azure Database for MariaDB
27/04/2021 Meetup Azure Lille 17
• Paas Service for MariaDB
• Runs on Windows
• Single Server
• V10.2 and 10.3
• Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/mariadb/concepts-pricing-tiers)
• Up to 2 for Basic SKU
• Up to 64 for General Purpose SKU
• Up to 32 for Memory Optimized SKU
• Automated Backup (retention up to 35days)
• Backup frequency and backup types depend on database size
• Geo-redundant backup option (General Purpose & Memory Optimized)
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 18
Relational Databases
Managed relational SQL Database
as a service
Azure SQL Database
Managed MariaDB database
service for app developers
Azure Database for MariaDB
Managed Postgres database
service for app developers
Azure Database for PostGre
Managed MySQL database service
for app developers
Azure Database for MySQL
dataredkite.com
premiseo.com
Azure Database for MySQL
27/04/2021 Meetup Azure Lille 19
• Paas Service for MySQL
• Runs on Windows
• Single Server
• V5.6, 5.7, and 8.0
• Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/mysql/concepts-pricing-tiers)
• Up to 2 for Basic SKU
• Up to 64 for General Purpose SKU
• Up to 32 for Memory Optimized SKU
• Automated Backup (retention up to 35days)
• Backup frequency and backup types depend on database size
• Geo-redundant backup option (General Purpose & Memory Optimized)
dataredkite.com
premiseo.com
Azure Database for MySQL
27/04/2021 Meetup Azure Lille 20
• Paas Service for MySQL
Flexible Server (Preview)
• V5.7
• Automated patching
• Automatic backups
• Performance adjustment in three switchable compute tiers : Burstable, GP, Memory Optimized
• Network Isolation
• Private Access through Vnet integration
• Public Access
dataredkite.com
premiseo.com
Azure Database for MySQL
27/04/2021 Meetup Azure Lille 21
• Paas Service for MySQL
Flexible Server (Preview)
High Availability Zone Redundant HA (Optional)
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 22
NOSQL Databases
Globally distributed, multi-model
database for any scale
Azure Cosmos DB
dataredkite.com
premiseo.com
Azure Cosmos DB
26/02/2021 23
A globally distributed, massively scalable, multi-model database service
Azure Cosmos DB
o SQL API
o MongoDB API
o Cassandra API
o Gremlin API
o Table API
dataredkite.com
premiseo.com
Azure Cosmos DB
26/02/2021 24
Throughput
Cosmic Notes
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 25
Big Data
Storage
REST-based object storage for
unstructured data
Storage Account
Massively scalable, secure data
lake functionality built on Azure
Blob Storage
Azure Data Lake Storage
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 26
Big Data
Storage
REST-based object storage for
unstructured data
Storage Account
Massively scalable, secure data
lake functionality built on Azure
Blob Storage
Azure Data Lake Storage
dataredkite.com
premiseo.com
Storage Account
26/02/2021 27
o Azure Blobs : A scalable object store for text and binary data
o Azure Files : Managed file shares for cloud or on-premises deployments
o Azure Queues : A messaging store for reliable messaging between application components
o Azure Tables : A NoSQL store for no-schema storage of structured data
Azure Storage accounts are the base storage type within Azure. Azure Storage offers a very scalable object store for data
objects and file system services in the cloud. It can also provide a messaging store for reliable messaging, or it can act as a
NoSQL store.
Azure selected four of these data services and placed them together under the name Azure Storage. The four services are
Azure Blobs, Azure Files, Azure Queues, and Azure Tables. The following illustration shows the elements of Azure Storage
dataredkite.com
premiseo.com
Storage Account
26/02/2021 28
Type of Storage Account
Storage account type Services Redundancy options
General-purpose V2 Basic storage account type for blobs, files, queues, and tables. Recommended
for most scenarios using Azure Storage.
LRS, GRS, RA-GRS, ZRS, GZRS,
RA-GZRS
General-purpose V1 Legacy account type for blobs, files, queues, and tables. Use general-purpose
v2 accounts instead when possible.
LRS, GRS, RA-GRS
BlockBlobStorage Storage accounts with premium performance characteristics for block blobs
and append blobs. Recommended for scenarios with high transactions rates, or
scenarios that use smaller objects or require consistently low storage latency.
LRS, ZRS
FileStorage Files-only storage accounts with premium performance characteristics.
Recommended for enterprise or high performance scale applications.
LRS, ZRS
BlobStorage Legacy Blob-only storage accounts. Use general-purpose v2 accounts instead
when possible.
LRS, GRS, RA-GRS
dataredkite.com
premiseo.com
Replication Options
27/04/2021 29
dataredkite.com
premiseo.com
Replication Strategy
27/04/2021 30
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 31
Big Data
Storage
REST-based object storage for
unstructured data
Storage Account
Massively scalable, secure data
lake functionality built on Azure
Blob Storage
Azure Data Lake Storage
dataredkite.com
premiseo.com
Azure Datalake Store
26/02/2021 32
Azure Data Lake Storage is a Hadoop-compatible data repository that can store any size or type of data. This storage
service is available as Generation 1 (Gen1) or Generation 2 (Gen2).
Key features of Data Lake Storage:
o Unlimited scalability
o Hadoop compatibility
o Security support for both access control lists (ACLs) & RBAC (for Gen 2 only)
o POSIX compliance
o An optimized Azure Blob File System (ABFS) driver that's designed for big-data analytics
o Zone-redundant storage
o Geo-redundant storage
Azure Datalake Gen 1 Azure Datalake Gen 2
dataredkite.com
premiseo.com
Choose a storage solution on Azure
26/02/2021 33
Data classification Operations Latency & throughput Transactional support Recommended service
Product catalog data Semi-structured because of
the need to extend or modify
the schema for new products
o Customers require a high
number of read operations,
with the ability to query on
many fields within the
database.
o The business requires a
high number of write
operations to track the
constantly changing
inventory.
High throughput and low
latency
Required Azure Cosmos DB
Photos and videos Unstructured o Only need to be retrieved
by ID.
o Customers require a high
number of read operations
with low latency.
o Creates and updates will be
somewhat infrequent and
can have higher latency
than read operations.
Retrievals by ID need to
support low latency and high
throughput. Creates and
updates can have higher
latency than read operations.
Not required Azure Blob storage
Business data Structured Read-only, complex analytical
queries across multiple
databases
Some latency in the results is
expected based on the
complex nature of the queries
Required Azure SQL Database
Azure Database for MariaDB
Azure Database for PostGre
Azure Database for MySQL
premiseo.com dataredkite.com
Compute
34
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 35
Data
Compute
Managed data-integration solution
Data Factory
Process events with serverless
code
Azure Functions
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 36
Data
Compute
Managed data-integration solution
Data Factory
Process events with serverless
code
Azure Functions
dataredkite.com
premiseo.com
Azure Function
37
Azure Functions is the serverless compute service from Microsoft. Functions are event-driven: each function defines a
trigger — the exact definition of the event source, for instance, the name of a storage queue.
Uses cases:
If you want to... then...
Build a web API Implement an endpoint for your web applications using the HTTP trigger
Process file uploads Run code when a file is uploaded or changed in blob storage
Build a serverless workflow Chain a series of functions together using durable functions
Respond to database changes Run custom logic when a document is created or updated in Cosmos DB
Run scheduled tasks Execute code at set times
Create reliable message queue systems Process message queues using Queue Storage, Service Bus, or Event Hubs
dataredkite.com
premiseo.com
Azure Function
38
Consumption Plan Functions
Consumption Plan (B1, B2, B3, S1, S2, S3
Scale automatically and only pay for compute resources when your functions are running. On
the Consumption plan, instances of the Functions host will be dynamically added and
removed based on the number of incoming events.
Premium plan (P1v2, P2v2, P3v3)
While automatically scaling based on demand, use prewarmed workers to run applications
with no delay after being idle, run on more powerful instances and connect to VNETs.
Azure App Service plan
Run Functions within an App Service plan at regular App Service plan rates. Good fit for long-
running operations, as well as when more predictive scaling and costs are required.
Azure Functions hosting options : Azure Plan
dataredkite.com
premiseo.com
27/04/2021 39
Durable Functions is a library that brings workflow orchestration abstractions to Azure Functions. It introduces a number of idioms and tools
to define stateful, potentially long-running operations, and manages a lot of mechanics of reliable communication and state management
behind the scenes.
Log of events in the course of orchestrator
progression
3 steps of a workflow executed in sequence
https://medium.com/hackernoon/making-sense-of-azure-durable-functions-
645ecb3c1d58
Azure Function
Azure Durable Functions
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 40
Data
Compute
Managed data-integration solution
Data Factory
Process events with serverless
code
Azure Functions
dataredkite.com
premiseo.com
Azure Data Factory
27/04/2021 Meetup Azure Lille 41
• Serverless Data Integration service
• Data Pipeline : logical group of activities
• Data Flow : Data Transformation activity
• Data Copy : Data Transfer activity
• SSIS Integration
• Git integration
dataredkite.com
premiseo.com
Azure Data Factory
27/04/2021 Meetup Azure Lille 42
• Serverless Data Integration service
• Job scheduling
• Automatically through internal Scheduler
• Manually
• SDK : .NET, Python
• REST API
• PowerShell
dataredkite.com
premiseo.com
Azure Data Factory
27/04/2021 Meetup Azure Lille 43
• Serverless Data Integration service
• Integration runtime
• Compute infrastructure used by ADF to provide data integration
• Azure : Serverless
• Self Hosted : Onprem or Azure Virtual Machine (Windows)
• SSIS
Activity Features
Azure Data Flow
Data Copy
Dispatch Activity (HDI, Databricks, SQL …)
Cloud to Cloud data transfer/flows
Self-Hosted Data Flow
Data Copy
Dispatch Activity (HDI, Databricks, SQL …)
OnPrem or Virtual Machine deployment (Windows)
OnPrem <-> Cloud data transfer/flows
When connectors are not available
SSIS SSIS Package execution Private or public Network
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 44
Big Data
Compute
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
HDInsight supports the latest open
source projects from the Apache
Hadoop and Spark ecosystems.
Azure HDInsight
Managed Enterprise
Datawarehouse and BigData
Analytics service
Azure Synapse Analytics
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 45
Big Data
Compute
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
HDInsight supports the latest open
source projects from the Apache
Hadoop and Spark ecosystems.
Azure HDInsight
Managed Enterprise
Datawarehouse and BigData
Analytics service
Azure Synapse Analytics
dataredkite.com
premiseo.com
Azure Databricks
26/02/2021 46
dataredkite.com
premiseo.com
Azure Databricks
26/02/2021 47
Azure Databricks is a data analytics platform optimized for the Microsoft Azure cloud services platform. Azure Databricks
offers two environments for developing data intensive applications:
o Azure Databricks Workspace: provides an interactive workspace that enables collaboration between data engineers,
data scientists, and machine learning engineers.
o Azure Databricks SQL Analytics: provides an easy-to-use platform for analysts who want to run SQL queries on their
data lake, create multiple visualization types to explore query results from different perspectives, and build and share
dashboards.
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 48
Big Data
Compute
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
HDInsight supports the latest open
source projects from the Apache
Hadoop and Spark ecosystems.
Azure HDInsight
Managed Enterprise
Datawarehouse and BigData
Analytics service
Azure Synapse Analytics
dataredkite.com
premiseo.com
Azure HDInsights
27/04/2021 Meetup Azure Lille 49
• Managed Hadoop distribution for Azure
• Based on Cloudera Hortonworks hadoop distribution
• Comes in various flavours / shapes (VM shapes and number)
• Hadoop : General purpose (HDFS, Yarn, MapReduce, Hive, Pig, Sqoop, Oozie)
• Spark
• Kafka
• HBase
• Hive / LLAP (Interactive Query)
• Storm (Stream processing)
• ML Services with R
dataredkite.com
premiseo.com
Azure HDInsights
27/04/2021 Meetup Azure Lille 50
• At least one Storage account mandatory (for libs and binaries)
• External Metastores available for Ambari, Hive and Oozie
• HDInsights architecture
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 51
Big Data
Compute
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
HDInsight supports the latest open
source projects from the Apache
Hadoop and Spark ecosystems.
Azure HDInsigth
Managed Enterprise
Datawarehouse and BigData
Analytics service
Azure Synapse Analytics
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 52
Stream Analytics
Data Factory
Data Lake
Modern Analytics
MPP
Datawarehouse
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 53
MPP
Datawarehou
se
Choice of language (T-SQL, Spark
SQL, Python, Scala, .Net)
Analytics ready (Analysis Services,
Power BI)
Data Science and AI Ready (Azure
Machine Learning integration)
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 54
Synapse Analytics
• Sample Use Case : Pure Business Intelligence !
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 55
• Not for small database (Usually > 1Tb)
• Cost Model
• Synapse Provisioned
• T-SQL Pool with DWU (Datawarehouse Units)
• Storage (Geo redundant option)
• Synapse Serverless
• Spark Pools
• Synapse Pipeline
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 56
• Architecture
• DMS (Data Movement Service)
• Used for Data Colocation
• Key point: Data Partitioning
and Data Distribution
dataredkite.com
premiseo.com
Azure Synapse Analytics
27/04/2021 Meetup Azure Lille 57
• Hash distributed table • Replicated Table • Round Robin
distributed Table
• Example
• Dimension to Fact table join
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 58
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Use Spark Streaming with
Databricks
Spark Streaming with Databricks
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 59
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Use Spark Streaming with
Databricks
Spark Streaming with Databricks
dataredkite.com
premiseo.com
Azure Streaming Analytics
26/02/2021 60
dataredkite.com
premiseo.com
Azure Streaming Analytics
26/02/2021 61
o Azure Stream Analytics supports user-defined functions (UDF) or user-defined aggregates (UDA) in JavaScript for cloud jobs and C# for IoT
Edge jobs
UDFs, UDAs, and custom deserializers:
o Analyze real-time telemetry streams from IoT devices
o Web logs/clickstream analytics
o Geospatial analytics for fleet management and driverless vehicles
o Remote monitoring and predictive maintenance of high value assets
o Real-time analytics on Point of Sale data for inventory control and anomaly detection
Examples scenarios:
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 62
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Use Spark Streaming with
Databricks
Spark Streaming with Databricks
dataredkite.com
premiseo.com
Azure Iot Hub
63
Azure IoT Hub :
o The cloud gateway that connects IoT devices to gather data and drive business insights and automation.
o The big data streaming service of Azure. It is designed for high throughput data streaming scenarios where customers
may send billions of requests per day.
o Bi-directional communication capabilities
dataredkite.com
premiseo.com
Iot Hub or Event Hubs
64
IoT Hub was developed to address the unique requirements of connecting IoT devices to the Azure cloud while Event Hubs
was designed for big data streaming. Microsoft recommends using Azure IoT Hub to connect IoT devices to Azure.
IoT Capability IoT Hub standard tier IoT Hub basic tier Event Hubs
Device-to-cloud messaging
Protocols: HTTPS, AMQP, AMQP over webSockets
Protocols: MQTT, MQTT over webSockets
Per-device identity
File upload from devices
Device Provisioning Service
Cloud-to-device messaging
Device twin and device management
Device streams (preview)
IoT Edge
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 65
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Connect, monitor and manage
billions of IoT assets
Spark Streaming with Databricks
dataredkite.com
premiseo.com
Apache Kafka on HDInsight architecture
27/04/2021 Meetup Azure Lille 66
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 67
Big Data
Streaming
Real-time data stream processing
from millions of IoT devices
Azure Stream Analytics
Connect, monitor and manage
billions of IoT assets
Azure IoT Hub
Real-time data stream with Kafka
Azure HDInsigth & Kafka
Use Spark Streaming with
Databricks
Spark Streaming with Databricks
dataredkite.com
premiseo.com
Azure Databricks
26/02/2021 68
o Apache Spark Streaming is a scalable fault-tolerant streaming processing system that natively supports both batch and
streaming workloads.
o Spark Streaming is an extension of the core Spark API
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 69
Data Tools
dataredkite.com
premiseo.com
Azure Data Studio
26/02/2021 70
Azure Data Studio is a cross-platform database tool that you can run on Windows, macOS, and Linux. You'll use it to
connect to SQL Data Warehouse and Azure SQL Database.
Previously released under the preview name SQL Operations Studio, Azure Data Studio offers a modern editor experience
with IntelliSense, code snippets, source control integration, and an integrated terminal. It is engineered with the data
platform user in mind, with built in charting of query result sets and customizable dashboards.
dataredkite.com
premiseo.com
Storage Explorer
26/02/2021 71
Begin by downloading and installing Storage Explorer. You can use Storage Explorer to do several operations against data
in your Azure Storage account and data lake:
o Upload files or folders from your local computer into Azure Storage.
o Download cloud-based data to your local computer.
o Copy or move files and folders around in the storage account.
o Delete data from the storage account.
dataredkite.com
premiseo.com
Visual Studio Code
26/02/2021 72
Visual Studio Code is a lightweight source code editor which runs on your desktop and is available for Windows, macOS
and Linux. It comes with built-in support for JavaScript, TypeScript and Node.js and has a rich ecosystem of extensions for
other languages (such as C++, C#, Java, Python, PHP, Go) and runtimes (such as .NET and Unity).
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 73
Data Migration Tools
dataredkite.com
premiseo.com
Summary
26/02/2021 74
Scenario Some recommended solutions
Disaster Recovery Azure geo-redundant backups
Read Scale Use read-only replicas to load balance read-only query
workloads (preview)
ETL (OLTP to OLAP) Azure Data Factory or SQL Server Integration Services or
Databricks
Migration from on-premises SQL Server to Azure SQL
Database
Azure Database Migration Service
Kept up-to-date across several Azure SQL databases or SQL
Server database
Azure SQL Data Sync
Detecting compatibility issues that can impact database
functionality in your new version of SQL Server or Azure SQL
Database
Data Migration Assistant (DMA)
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 75
Resources
dataredkite.com
premiseo.com
Azure charts
26/02/2021 76
https://azurecharts.com/
premiseo.com dataredkite.com
26/02/2021 77
Just few sources in Microsoft Learn:
o Azure for the Data Engineer
o Store data in Azure
o Work with relational data in Azure
o Large Scale Data Processing with Azure Data Lake Storage Gen2
o Implement a Data Streaming Solution with Azure Streaming Analytics
o Implement a Data Warehouse with Azure SQL Data Warehouse
Sources
dataredkite.com
premiseo.com
Fill the form
78
https://forms.office.com/Pages/ResponsePage.as
px?id=M3s0akU8nUyLePs4Zpn6Tp_2uFsS8cJJsHCS
wweCY5JUNVlMMllQNU4yRUVVWjFEOU5GVVc2S
VU3Si4u
Your turn !
premiseo.com dataredkite.com
26/02/2021 Meetup Azure Lille 79
Fast, easy, and collaborative
Apache Spark-based analytics
platform
Azure Databricks
Next Session: Azure
Databricks
dataredkite.com
premiseo.com
Thank you
26/02/2021 80
Meetup Azure Lille
dataredkite.com
https://premiseo.com/

Más contenido relacionado

La actualidad más candente

Azure intelligent edge solutions overview
Azure intelligent edge solutions overviewAzure intelligent edge solutions overview
Azure intelligent edge solutions overviewCenk Ersoy
 
What are the Business Benefits of Microsoft Azure
What are the Business Benefits of Microsoft AzureWhat are the Business Benefits of Microsoft Azure
What are the Business Benefits of Microsoft AzureChris Roche
 
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Jay Patel
 
Azure Storage
Azure StorageAzure Storage
Azure StorageMustafa
 
ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!
ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!
ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!ITCamp
 
Data saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de KreukData saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de KreukErwin de Kreuk
 
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...Edureka!
 
Mastering azure devOps - Dot Net Tricks
Mastering azure devOps - Dot Net TricksMastering azure devOps - Dot Net Tricks
Mastering azure devOps - Dot Net TricksGaurav Singh
 
AWS for the Data Professional
AWS for the Data ProfessionalAWS for the Data Professional
AWS for the Data ProfessionalLynn Langit
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloudJames Serra
 
Tom Grey - Google Cloud Platform
Tom Grey - Google Cloud PlatformTom Grey - Google Cloud Platform
Tom Grey - Google Cloud PlatformFondazione CUOA
 
Citrix on Azure
Citrix on AzureCitrix on Azure
Citrix on AzureMustafa
 
Azure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data CenterAzure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data CenterAdnan Hashmi
 
Introduction to Microsoft Azure
Introduction to Microsoft AzureIntroduction to Microsoft Azure
Introduction to Microsoft AzureGuy Barrette
 
Architecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The CloudArchitecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The CloudAmazon Web Services
 
Building Hybrid Cloud Apps with Azure and Azure stack
Building Hybrid Cloud Apps with Azure and Azure stackBuilding Hybrid Cloud Apps with Azure and Azure stack
Building Hybrid Cloud Apps with Azure and Azure stackWinWire Technologies Inc
 

La actualidad más candente (20)

Ppt on cloud service
Ppt on cloud servicePpt on cloud service
Ppt on cloud service
 
Azure intelligent edge solutions overview
Azure intelligent edge solutions overviewAzure intelligent edge solutions overview
Azure intelligent edge solutions overview
 
What are the Business Benefits of Microsoft Azure
What are the Business Benefits of Microsoft AzureWhat are the Business Benefits of Microsoft Azure
What are the Business Benefits of Microsoft Azure
 
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
 
Google Cloud Platform
Google Cloud Platform Google Cloud Platform
Google Cloud Platform
 
Azure Storage
Azure StorageAzure Storage
Azure Storage
 
ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!
ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!
ITCamp 2018 - Thomas Maurer - Azure Stack - Everything you need to know!
 
Introduction to Microsoft Azure Cloud
Introduction to Microsoft Azure CloudIntroduction to Microsoft Azure Cloud
Introduction to Microsoft Azure Cloud
 
Data saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de KreukData saturday Oslo Azure Purview Erwin de Kreuk
Data saturday Oslo Azure Purview Erwin de Kreuk
 
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
 
04 Azure IAAS 101
04 Azure IAAS 10104 Azure IAAS 101
04 Azure IAAS 101
 
Mastering azure devOps - Dot Net Tricks
Mastering azure devOps - Dot Net TricksMastering azure devOps - Dot Net Tricks
Mastering azure devOps - Dot Net Tricks
 
AWS for the Data Professional
AWS for the Data ProfessionalAWS for the Data Professional
AWS for the Data Professional
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
Tom Grey - Google Cloud Platform
Tom Grey - Google Cloud PlatformTom Grey - Google Cloud Platform
Tom Grey - Google Cloud Platform
 
Citrix on Azure
Citrix on AzureCitrix on Azure
Citrix on Azure
 
Azure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data CenterAzure Stack - Azure in your own Data Center
Azure Stack - Azure in your own Data Center
 
Introduction to Microsoft Azure
Introduction to Microsoft AzureIntroduction to Microsoft Azure
Introduction to Microsoft Azure
 
Architecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The CloudArchitecting Enterprise Applications In The Cloud
Architecting Enterprise Applications In The Cloud
 
Building Hybrid Cloud Apps with Azure and Azure stack
Building Hybrid Cloud Apps with Azure and Azure stackBuilding Hybrid Cloud Apps with Azure and Azure stack
Building Hybrid Cloud Apps with Azure and Azure stack
 

Similar a 20210427 azure lille_meetup_azure_data_stack

Azure Data services
Azure Data servicesAzure Data services
Azure Data servicesRajesh Kolla
 
Perth Azure Usergroup Build 2018 updates
Perth Azure Usergroup Build 2018 updatesPerth Azure Usergroup Build 2018 updates
Perth Azure Usergroup Build 2018 updatesNirmal Thewarathanthri
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Rising Interest in Open Source Relational Databases
Rising Interest in Open Source Relational DatabasesRising Interest in Open Source Relational Databases
Rising Interest in Open Source Relational DatabasesChristopher Foot
 
OSS DB on Azure
OSS DB on AzureOSS DB on Azure
OSS DB on Azurerockplace
 
DBaaS with EDB Postgres on AWS
DBaaS with EDB Postgres on AWSDBaaS with EDB Postgres on AWS
DBaaS with EDB Postgres on AWSEDB
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platformgiventocode
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365Marco Parenzan
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL AzureShy Engelberg
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2Raul Chong
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data PlatformShu-Jeng Hsieh
 
Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesCCG
 
Clash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft AzureClash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft AzureMihail Mateev
 
MySQL Ecosystem in 2020
MySQL Ecosystem in 2020MySQL Ecosystem in 2020
MySQL Ecosystem in 2020Alkin Tezuysal
 
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...FIWARE
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Databaserockplace
 
A Tour of Azure SQL Databases (NOVA SQL UG 2020)
A Tour of Azure SQL Databases  (NOVA SQL UG 2020)A Tour of Azure SQL Databases  (NOVA SQL UG 2020)
A Tour of Azure SQL Databases (NOVA SQL UG 2020)Timothy McAliley
 

Similar a 20210427 azure lille_meetup_azure_data_stack (20)

Azure Data services
Azure Data servicesAzure Data services
Azure Data services
 
Perth Azure Usergroup Build 2018 updates
Perth Azure Usergroup Build 2018 updatesPerth Azure Usergroup Build 2018 updates
Perth Azure Usergroup Build 2018 updates
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Rising Interest in Open Source Relational Databases
Rising Interest in Open Source Relational DatabasesRising Interest in Open Source Relational Databases
Rising Interest in Open Source Relational Databases
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
OSS DB on Azure
OSS DB on AzureOSS DB on Azure
OSS DB on Azure
 
DBaaS with EDB Postgres on AWS
DBaaS with EDB Postgres on AWSDBaaS with EDB Postgres on AWS
DBaaS with EDB Postgres on AWS
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platform
 
2014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 3652014.10.22 Building Azure Solutions with Office 365
2014.10.22 Building Azure Solutions with Office 365
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
 
Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data Services
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Clash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft AzureClash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft Azure
 
MySQL Ecosystem in 2020
MySQL Ecosystem in 2020MySQL Ecosystem in 2020
MySQL Ecosystem in 2020
 
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
FIWARE Global Summit - A Multi-database Plugin for the Orion FIWARE Context B...
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
 
A Tour of Azure SQL Databases (NOVA SQL UG 2020)
A Tour of Azure SQL Databases  (NOVA SQL UG 2020)A Tour of Azure SQL Databases  (NOVA SQL UG 2020)
A Tour of Azure SQL Databases (NOVA SQL UG 2020)
 

Más de Alexandre BERGERE

Databases - beyond SQL : Cosmos DB (part 6)
Databases - beyond SQL : Cosmos DB (part 6)Databases - beyond SQL : Cosmos DB (part 6)
Databases - beyond SQL : Cosmos DB (part 6)Alexandre BERGERE
 
comparatifs des familles NoSQL & concepts de modélisation
comparatifs des familles NoSQL & concepts de modélisationcomparatifs des familles NoSQL & concepts de modélisation
comparatifs des familles NoSQL & concepts de modélisationAlexandre BERGERE
 
Iot streaming with Azure Stream Analytics from IotHub to the full data slack
Iot streaming with Azure Stream Analytics from IotHub to the full data slackIot streaming with Azure Stream Analytics from IotHub to the full data slack
Iot streaming with Azure Stream Analytics from IotHub to the full data slackAlexandre BERGERE
 

Más de Alexandre BERGERE (6)

Databases - beyond SQL : Cosmos DB (part 6)
Databases - beyond SQL : Cosmos DB (part 6)Databases - beyond SQL : Cosmos DB (part 6)
Databases - beyond SQL : Cosmos DB (part 6)
 
comparatifs des familles NoSQL & concepts de modélisation
comparatifs des familles NoSQL & concepts de modélisationcomparatifs des familles NoSQL & concepts de modélisation
comparatifs des familles NoSQL & concepts de modélisation
 
Azure data stack_2019_08
Azure data stack_2019_08Azure data stack_2019_08
Azure data stack_2019_08
 
Big dataclasses 2019_nosql
Big dataclasses 2019_nosqlBig dataclasses 2019_nosql
Big dataclasses 2019_nosql
 
Iot streaming with Azure Stream Analytics from IotHub to the full data slack
Iot streaming with Azure Stream Analytics from IotHub to the full data slackIot streaming with Azure Stream Analytics from IotHub to the full data slack
Iot streaming with Azure Stream Analytics from IotHub to the full data slack
 
MongoDB classes 2019
MongoDB classes 2019MongoDB classes 2019
MongoDB classes 2019
 

Último

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

20210427 azure lille_meetup_azure_data_stack

  • 2. dataredkite.com premiseo.com Who are we ? 26/02/2021 2 I'm a data and cloud Architect and Spark lover. I worked many years as an Oracle consultant and expert, and now I work with Cloud solutions devoted to solve complex problems with high volumes of data. I am a Data Analyst & Solution Architect indepedent - ☁️ MCSE, Cosmos DB & Delta lover. I developed my skills through various clients' projects, teaching at the University and personal proof of concepts. I’m also the Co-Founder of DataRedkite, a product which can quickly give to its user a good management of data in Microsoft Azure DataLake. Laurent Leturgez Alexandre Bergere Meetup Azure Lille
  • 3. dataredkite.com premiseo.com Summary 26/02/2021 Meetup Azure Lille 3 Relational Databases NoSQL Databases Big Data Storage Data Big Data Streaming Storage : Compute :
  • 5. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 5 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MariaDB database service for app developers Azure Database for MariaDB Managed MySQL database service for app developers Azure Database for MySQL Managed Postgres database service for app developers Azure Database for PostGres
  • 6. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 6 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MySQL database service for app developers Azure Database for MySQL Managed MariaDB database service for app developers Azure Database for MariaDB Managed Postgres database service for app developers Azure Database for PostGres
  • 7. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 7 Relational Databases Managed relational SQL Database as a service Azure SQL Database Azure SQL Database
  • 8. dataredkite.com premiseo.com Azure SQL Database 27/04/2021 Meetup Azure Lille 8 • Azure SQL • SQL Server Paas service • Managed upgrades, patches, backups and monitoring • Latest Stable version of SQL Server • 99,99% availability • Deployment model • Single Database : database runs on non shared resources • Elastic Pool : database runs with a collection of databases that share set of resources at a predictable price
  • 9. dataredkite.com premiseo.com Azure SQL Database 27/04/2021 Meetup Azure Lille 9 • Azure SQL • Purchasing model • DTU (Database Transaction Unit) : https://docs.microsoft.com/en-us/azure/azure-sql/database/service-tiers- dtu • Basic tier • Standard Tier • Premium Tier • vCore model • Serverless • Service Tier • General Purpose (vCore) / Standard (DTU) : Common workloads • Business Critical (vCore) / Premium (DTU) : High transaction and availability / low latency IO • HyperScale (vCore) : • Up to 100Tb Database • Rapid Scale up (compute resources) • Rapid Scale out (read only nodes : read workload / hot-standby)
  • 10. dataredkite.com premiseo.com Azure SQL Database 27/04/2021 Meetup Azure Lille 10 • Azure SQL Managed Instance • Features • Paas platform for lift and shift at scale • Broadest SQL Server engine compatibility (network integration, features etc.) • With perservation of all Paas capabilities (patching, updates, backups, HA etc.) • vCore purchase model only • BYOL available • SQL Virtual Machine • SQL Server deployment on VM (Linux and Windows) • Can choice SQL Server version • From 2008 R2 • Up to 2019
  • 11. dataredkite.com premiseo.com Azure SQL Database 27/04/2021 Meetup Azure Lille 11 Azure SQL Database Managed Instance Instance scoped model with high compatibility to SQL Server Best for modernisation at scale with low cost effort (lift & shift) Single Standalone managed database for predictable and stable workloads Elastic Pool Shared resources model : multitenant
  • 12. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 12 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MySQL database service for app developers Azure Database for MySQL Managed MariaDB database service for app developers Azure Database for MariaDB Managed Postgres database service for app developers Azure Database for PostGre
  • 13. dataredkite.com premiseo.com Azure Database for PostgreSQL 27/04/2021 Meetup Azure Lille 13 • Paas Service for PostgreSQL • Runs on Windows • Single Server • v9.5 to 11 • Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/postgresql/concepts-pricing- tiers) • Up to 2 for Basic SKU • Up to 64 for General Purpose SKU • Up to 32 for Memory Optimized SKU • Bunch of PG Extensions available • Automated Backup (retention up to 35days) • Backup frequency and backup types depend on database size • Geo-redundant backup option (General Purpose & Memory Optimized)
  • 14. dataredkite.com premiseo.com Azure Database for PostgreSQL 27/04/2021 Meetup Azure Lille 14 • Paas Service for PostgreSQL • HyperScale (Citus) • High performance and analytical workloads beyond 100Gb • Hyperscale delivers • Horizontal scaling across multiple machine (with Sharding) • Query parallelization across these servers • High performance for analytics • Based on server groups • Design approach required for table distribution and performance • Distributed tables (based on distribution column) • Reference tables (content concentrated into a single shard replicated on every worker node) • Local tables (ordinary unsharded tables. Perfect for small tables not involded into joins) • Automated backup through storage snapshots
  • 15. dataredkite.com premiseo.com Azure Database for PostgreSQL 27/04/2021 Meetup Azure Lille 15 • Paas Service for PostgreSQL • Flexible Server (Preview) • Automated patching • Automatic backups • Performance adjustment in three switchable compute tiers : Burstable, GP, Memory Optimized High Availability Zone Redundant HA (Optional)
  • 16. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 16 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MariaDB database service for app developers Azure Database for MariaDB Managed Postgres database service for app developers Azure Database for PostGre Managed MySQL database service for app developers Azure Database for MySQL
  • 17. dataredkite.com premiseo.com Azure Database for MariaDB 27/04/2021 Meetup Azure Lille 17 • Paas Service for MariaDB • Runs on Windows • Single Server • V10.2 and 10.3 • Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/mariadb/concepts-pricing-tiers) • Up to 2 for Basic SKU • Up to 64 for General Purpose SKU • Up to 32 for Memory Optimized SKU • Automated Backup (retention up to 35days) • Backup frequency and backup types depend on database size • Geo-redundant backup option (General Purpose & Memory Optimized)
  • 18. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 18 Relational Databases Managed relational SQL Database as a service Azure SQL Database Managed MariaDB database service for app developers Azure Database for MariaDB Managed Postgres database service for app developers Azure Database for PostGre Managed MySQL database service for app developers Azure Database for MySQL
  • 19. dataredkite.com premiseo.com Azure Database for MySQL 27/04/2021 Meetup Azure Lille 19 • Paas Service for MySQL • Runs on Windows • Single Server • V5.6, 5.7, and 8.0 • Up to 64 vCores depending on SKU (https://docs.microsoft.com/en-us/azure/mysql/concepts-pricing-tiers) • Up to 2 for Basic SKU • Up to 64 for General Purpose SKU • Up to 32 for Memory Optimized SKU • Automated Backup (retention up to 35days) • Backup frequency and backup types depend on database size • Geo-redundant backup option (General Purpose & Memory Optimized)
  • 20. dataredkite.com premiseo.com Azure Database for MySQL 27/04/2021 Meetup Azure Lille 20 • Paas Service for MySQL Flexible Server (Preview) • V5.7 • Automated patching • Automatic backups • Performance adjustment in three switchable compute tiers : Burstable, GP, Memory Optimized • Network Isolation • Private Access through Vnet integration • Public Access
  • 21. dataredkite.com premiseo.com Azure Database for MySQL 27/04/2021 Meetup Azure Lille 21 • Paas Service for MySQL Flexible Server (Preview) High Availability Zone Redundant HA (Optional)
  • 22. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 22 NOSQL Databases Globally distributed, multi-model database for any scale Azure Cosmos DB
  • 23. dataredkite.com premiseo.com Azure Cosmos DB 26/02/2021 23 A globally distributed, massively scalable, multi-model database service Azure Cosmos DB o SQL API o MongoDB API o Cassandra API o Gremlin API o Table API
  • 25. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 25 Big Data Storage REST-based object storage for unstructured data Storage Account Massively scalable, secure data lake functionality built on Azure Blob Storage Azure Data Lake Storage
  • 26. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 26 Big Data Storage REST-based object storage for unstructured data Storage Account Massively scalable, secure data lake functionality built on Azure Blob Storage Azure Data Lake Storage
  • 27. dataredkite.com premiseo.com Storage Account 26/02/2021 27 o Azure Blobs : A scalable object store for text and binary data o Azure Files : Managed file shares for cloud or on-premises deployments o Azure Queues : A messaging store for reliable messaging between application components o Azure Tables : A NoSQL store for no-schema storage of structured data Azure Storage accounts are the base storage type within Azure. Azure Storage offers a very scalable object store for data objects and file system services in the cloud. It can also provide a messaging store for reliable messaging, or it can act as a NoSQL store. Azure selected four of these data services and placed them together under the name Azure Storage. The four services are Azure Blobs, Azure Files, Azure Queues, and Azure Tables. The following illustration shows the elements of Azure Storage
  • 28. dataredkite.com premiseo.com Storage Account 26/02/2021 28 Type of Storage Account Storage account type Services Redundancy options General-purpose V2 Basic storage account type for blobs, files, queues, and tables. Recommended for most scenarios using Azure Storage. LRS, GRS, RA-GRS, ZRS, GZRS, RA-GZRS General-purpose V1 Legacy account type for blobs, files, queues, and tables. Use general-purpose v2 accounts instead when possible. LRS, GRS, RA-GRS BlockBlobStorage Storage accounts with premium performance characteristics for block blobs and append blobs. Recommended for scenarios with high transactions rates, or scenarios that use smaller objects or require consistently low storage latency. LRS, ZRS FileStorage Files-only storage accounts with premium performance characteristics. Recommended for enterprise or high performance scale applications. LRS, ZRS BlobStorage Legacy Blob-only storage accounts. Use general-purpose v2 accounts instead when possible. LRS, GRS, RA-GRS
  • 31. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 31 Big Data Storage REST-based object storage for unstructured data Storage Account Massively scalable, secure data lake functionality built on Azure Blob Storage Azure Data Lake Storage
  • 32. dataredkite.com premiseo.com Azure Datalake Store 26/02/2021 32 Azure Data Lake Storage is a Hadoop-compatible data repository that can store any size or type of data. This storage service is available as Generation 1 (Gen1) or Generation 2 (Gen2). Key features of Data Lake Storage: o Unlimited scalability o Hadoop compatibility o Security support for both access control lists (ACLs) & RBAC (for Gen 2 only) o POSIX compliance o An optimized Azure Blob File System (ABFS) driver that's designed for big-data analytics o Zone-redundant storage o Geo-redundant storage Azure Datalake Gen 1 Azure Datalake Gen 2
  • 33. dataredkite.com premiseo.com Choose a storage solution on Azure 26/02/2021 33 Data classification Operations Latency & throughput Transactional support Recommended service Product catalog data Semi-structured because of the need to extend or modify the schema for new products o Customers require a high number of read operations, with the ability to query on many fields within the database. o The business requires a high number of write operations to track the constantly changing inventory. High throughput and low latency Required Azure Cosmos DB Photos and videos Unstructured o Only need to be retrieved by ID. o Customers require a high number of read operations with low latency. o Creates and updates will be somewhat infrequent and can have higher latency than read operations. Retrievals by ID need to support low latency and high throughput. Creates and updates can have higher latency than read operations. Not required Azure Blob storage Business data Structured Read-only, complex analytical queries across multiple databases Some latency in the results is expected based on the complex nature of the queries Required Azure SQL Database Azure Database for MariaDB Azure Database for PostGre Azure Database for MySQL
  • 35. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 35 Data Compute Managed data-integration solution Data Factory Process events with serverless code Azure Functions
  • 36. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 36 Data Compute Managed data-integration solution Data Factory Process events with serverless code Azure Functions
  • 37. dataredkite.com premiseo.com Azure Function 37 Azure Functions is the serverless compute service from Microsoft. Functions are event-driven: each function defines a trigger — the exact definition of the event source, for instance, the name of a storage queue. Uses cases: If you want to... then... Build a web API Implement an endpoint for your web applications using the HTTP trigger Process file uploads Run code when a file is uploaded or changed in blob storage Build a serverless workflow Chain a series of functions together using durable functions Respond to database changes Run custom logic when a document is created or updated in Cosmos DB Run scheduled tasks Execute code at set times Create reliable message queue systems Process message queues using Queue Storage, Service Bus, or Event Hubs
  • 38. dataredkite.com premiseo.com Azure Function 38 Consumption Plan Functions Consumption Plan (B1, B2, B3, S1, S2, S3 Scale automatically and only pay for compute resources when your functions are running. On the Consumption plan, instances of the Functions host will be dynamically added and removed based on the number of incoming events. Premium plan (P1v2, P2v2, P3v3) While automatically scaling based on demand, use prewarmed workers to run applications with no delay after being idle, run on more powerful instances and connect to VNETs. Azure App Service plan Run Functions within an App Service plan at regular App Service plan rates. Good fit for long- running operations, as well as when more predictive scaling and costs are required. Azure Functions hosting options : Azure Plan
  • 39. dataredkite.com premiseo.com 27/04/2021 39 Durable Functions is a library that brings workflow orchestration abstractions to Azure Functions. It introduces a number of idioms and tools to define stateful, potentially long-running operations, and manages a lot of mechanics of reliable communication and state management behind the scenes. Log of events in the course of orchestrator progression 3 steps of a workflow executed in sequence https://medium.com/hackernoon/making-sense-of-azure-durable-functions- 645ecb3c1d58 Azure Function Azure Durable Functions
  • 40. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 40 Data Compute Managed data-integration solution Data Factory Process events with serverless code Azure Functions
  • 41. dataredkite.com premiseo.com Azure Data Factory 27/04/2021 Meetup Azure Lille 41 • Serverless Data Integration service • Data Pipeline : logical group of activities • Data Flow : Data Transformation activity • Data Copy : Data Transfer activity • SSIS Integration • Git integration
  • 42. dataredkite.com premiseo.com Azure Data Factory 27/04/2021 Meetup Azure Lille 42 • Serverless Data Integration service • Job scheduling • Automatically through internal Scheduler • Manually • SDK : .NET, Python • REST API • PowerShell
  • 43. dataredkite.com premiseo.com Azure Data Factory 27/04/2021 Meetup Azure Lille 43 • Serverless Data Integration service • Integration runtime • Compute infrastructure used by ADF to provide data integration • Azure : Serverless • Self Hosted : Onprem or Azure Virtual Machine (Windows) • SSIS Activity Features Azure Data Flow Data Copy Dispatch Activity (HDI, Databricks, SQL …) Cloud to Cloud data transfer/flows Self-Hosted Data Flow Data Copy Dispatch Activity (HDI, Databricks, SQL …) OnPrem or Virtual Machine deployment (Windows) OnPrem <-> Cloud data transfer/flows When connectors are not available SSIS SSIS Package execution Private or public Network
  • 44. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 44 Big Data Compute Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks HDInsight supports the latest open source projects from the Apache Hadoop and Spark ecosystems. Azure HDInsight Managed Enterprise Datawarehouse and BigData Analytics service Azure Synapse Analytics
  • 45. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 45 Big Data Compute Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks HDInsight supports the latest open source projects from the Apache Hadoop and Spark ecosystems. Azure HDInsight Managed Enterprise Datawarehouse and BigData Analytics service Azure Synapse Analytics
  • 47. dataredkite.com premiseo.com Azure Databricks 26/02/2021 47 Azure Databricks is a data analytics platform optimized for the Microsoft Azure cloud services platform. Azure Databricks offers two environments for developing data intensive applications: o Azure Databricks Workspace: provides an interactive workspace that enables collaboration between data engineers, data scientists, and machine learning engineers. o Azure Databricks SQL Analytics: provides an easy-to-use platform for analysts who want to run SQL queries on their data lake, create multiple visualization types to explore query results from different perspectives, and build and share dashboards.
  • 48. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 48 Big Data Compute Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks HDInsight supports the latest open source projects from the Apache Hadoop and Spark ecosystems. Azure HDInsight Managed Enterprise Datawarehouse and BigData Analytics service Azure Synapse Analytics
  • 49. dataredkite.com premiseo.com Azure HDInsights 27/04/2021 Meetup Azure Lille 49 • Managed Hadoop distribution for Azure • Based on Cloudera Hortonworks hadoop distribution • Comes in various flavours / shapes (VM shapes and number) • Hadoop : General purpose (HDFS, Yarn, MapReduce, Hive, Pig, Sqoop, Oozie) • Spark • Kafka • HBase • Hive / LLAP (Interactive Query) • Storm (Stream processing) • ML Services with R
  • 50. dataredkite.com premiseo.com Azure HDInsights 27/04/2021 Meetup Azure Lille 50 • At least one Storage account mandatory (for libs and binaries) • External Metastores available for Ambari, Hive and Oozie • HDInsights architecture
  • 51. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 51 Big Data Compute Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks HDInsight supports the latest open source projects from the Apache Hadoop and Spark ecosystems. Azure HDInsigth Managed Enterprise Datawarehouse and BigData Analytics service Azure Synapse Analytics
  • 52. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 52 Stream Analytics Data Factory Data Lake Modern Analytics MPP Datawarehouse
  • 53. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 53 MPP Datawarehou se Choice of language (T-SQL, Spark SQL, Python, Scala, .Net) Analytics ready (Analysis Services, Power BI) Data Science and AI Ready (Azure Machine Learning integration)
  • 54. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 54 Synapse Analytics • Sample Use Case : Pure Business Intelligence !
  • 55. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 55 • Not for small database (Usually > 1Tb) • Cost Model • Synapse Provisioned • T-SQL Pool with DWU (Datawarehouse Units) • Storage (Geo redundant option) • Synapse Serverless • Spark Pools • Synapse Pipeline
  • 56. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 56 • Architecture • DMS (Data Movement Service) • Used for Data Colocation • Key point: Data Partitioning and Data Distribution
  • 57. dataredkite.com premiseo.com Azure Synapse Analytics 27/04/2021 Meetup Azure Lille 57 • Hash distributed table • Replicated Table • Round Robin distributed Table • Example • Dimension to Fact table join
  • 58. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 58 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Use Spark Streaming with Databricks Spark Streaming with Databricks
  • 59. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 59 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Use Spark Streaming with Databricks Spark Streaming with Databricks
  • 61. dataredkite.com premiseo.com Azure Streaming Analytics 26/02/2021 61 o Azure Stream Analytics supports user-defined functions (UDF) or user-defined aggregates (UDA) in JavaScript for cloud jobs and C# for IoT Edge jobs UDFs, UDAs, and custom deserializers: o Analyze real-time telemetry streams from IoT devices o Web logs/clickstream analytics o Geospatial analytics for fleet management and driverless vehicles o Remote monitoring and predictive maintenance of high value assets o Real-time analytics on Point of Sale data for inventory control and anomaly detection Examples scenarios:
  • 62. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 62 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Use Spark Streaming with Databricks Spark Streaming with Databricks
  • 63. dataredkite.com premiseo.com Azure Iot Hub 63 Azure IoT Hub : o The cloud gateway that connects IoT devices to gather data and drive business insights and automation. o The big data streaming service of Azure. It is designed for high throughput data streaming scenarios where customers may send billions of requests per day. o Bi-directional communication capabilities
  • 64. dataredkite.com premiseo.com Iot Hub or Event Hubs 64 IoT Hub was developed to address the unique requirements of connecting IoT devices to the Azure cloud while Event Hubs was designed for big data streaming. Microsoft recommends using Azure IoT Hub to connect IoT devices to Azure. IoT Capability IoT Hub standard tier IoT Hub basic tier Event Hubs Device-to-cloud messaging Protocols: HTTPS, AMQP, AMQP over webSockets Protocols: MQTT, MQTT over webSockets Per-device identity File upload from devices Device Provisioning Service Cloud-to-device messaging Device twin and device management Device streams (preview) IoT Edge
  • 65. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 65 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Connect, monitor and manage billions of IoT assets Spark Streaming with Databricks
  • 66. dataredkite.com premiseo.com Apache Kafka on HDInsight architecture 27/04/2021 Meetup Azure Lille 66
  • 67. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 67 Big Data Streaming Real-time data stream processing from millions of IoT devices Azure Stream Analytics Connect, monitor and manage billions of IoT assets Azure IoT Hub Real-time data stream with Kafka Azure HDInsigth & Kafka Use Spark Streaming with Databricks Spark Streaming with Databricks
  • 68. dataredkite.com premiseo.com Azure Databricks 26/02/2021 68 o Apache Spark Streaming is a scalable fault-tolerant streaming processing system that natively supports both batch and streaming workloads. o Spark Streaming is an extension of the core Spark API
  • 70. dataredkite.com premiseo.com Azure Data Studio 26/02/2021 70 Azure Data Studio is a cross-platform database tool that you can run on Windows, macOS, and Linux. You'll use it to connect to SQL Data Warehouse and Azure SQL Database. Previously released under the preview name SQL Operations Studio, Azure Data Studio offers a modern editor experience with IntelliSense, code snippets, source control integration, and an integrated terminal. It is engineered with the data platform user in mind, with built in charting of query result sets and customizable dashboards.
  • 71. dataredkite.com premiseo.com Storage Explorer 26/02/2021 71 Begin by downloading and installing Storage Explorer. You can use Storage Explorer to do several operations against data in your Azure Storage account and data lake: o Upload files or folders from your local computer into Azure Storage. o Download cloud-based data to your local computer. o Copy or move files and folders around in the storage account. o Delete data from the storage account.
  • 72. dataredkite.com premiseo.com Visual Studio Code 26/02/2021 72 Visual Studio Code is a lightweight source code editor which runs on your desktop and is available for Windows, macOS and Linux. It comes with built-in support for JavaScript, TypeScript and Node.js and has a rich ecosystem of extensions for other languages (such as C++, C#, Java, Python, PHP, Go) and runtimes (such as .NET and Unity).
  • 73. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 73 Data Migration Tools
  • 74. dataredkite.com premiseo.com Summary 26/02/2021 74 Scenario Some recommended solutions Disaster Recovery Azure geo-redundant backups Read Scale Use read-only replicas to load balance read-only query workloads (preview) ETL (OLTP to OLAP) Azure Data Factory or SQL Server Integration Services or Databricks Migration from on-premises SQL Server to Azure SQL Database Azure Database Migration Service Kept up-to-date across several Azure SQL databases or SQL Server database Azure SQL Data Sync Detecting compatibility issues that can impact database functionality in your new version of SQL Server or Azure SQL Database Data Migration Assistant (DMA)
  • 77. premiseo.com dataredkite.com 26/02/2021 77 Just few sources in Microsoft Learn: o Azure for the Data Engineer o Store data in Azure o Work with relational data in Azure o Large Scale Data Processing with Azure Data Lake Storage Gen2 o Implement a Data Streaming Solution with Azure Streaming Analytics o Implement a Data Warehouse with Azure SQL Data Warehouse Sources
  • 79. premiseo.com dataredkite.com 26/02/2021 Meetup Azure Lille 79 Fast, easy, and collaborative Apache Spark-based analytics platform Azure Databricks Next Session: Azure Databricks
  • 80. dataredkite.com premiseo.com Thank you 26/02/2021 80 Meetup Azure Lille dataredkite.com https://premiseo.com/