SlideShare una empresa de Scribd logo
1 de 29
Windows Azure Storage
Overview, Internals and Best Practices
Sponsors
About me








Program Manager @ Edgar Online, RRD
Windows Azure MVP
Co-organizer of Odessa .NET User Group
Ukrainian IT Awards 2013 Winner – Software Engineering
http://cloudytimes.azurewebsites.net/
http://www.linkedin.com/in/antonvidishchev
https://www.facebook.com/anton.vidishchev
What is Windows Azure Storage?
Windows Azure Storage
 Cloud Storage - Anywhere and anytime access
 Blobs, Disks, Tables and Queues

 Highly Durable, Available and Massively Scalable
 Easily build “internet scale” applications
 10 trillion stored objects
 900K request/sec on average (2.3+ trillion per month)

 Pay for what you use
 Exposed via easy and open REST APIs
 Client libraries in .NET, Java, Node.js, Python, PHP,
Ruby
Abstractions – Blobs and Disks
Abstractions – Tables and Queues
Data centers
Windows Azure Data Storage Concepts

Container

Blobs

https://<account>.blob.core.windows.net/<container>

Account

Table

Entities

https://<account>.table.core.windows.net/<table>

Queue

Messages

https://<account>.queue.core.windows.net/<queue>
How is Azure Storage used by Microsoft?
Internals
Design Goals
Highly Available with Strong Consistency
 Provide access to data in face of failures/partitioning

Durability
 Replicate data several times within and across regions

Scalability
 Need to scale to zettabytes
 Provide a global namespace to access data around
the world
 Automatically scale out and load balance data to
meet peak traffic demands
Windows Azure Storage Stamps
Access blob storage via the URL: http://<account>.blob.core.windows.net/

Data access

Storage
Location
Service

LB

LB

Front-Ends

Front-Ends

Partition Layer

Partition Layer

Inter-stamp (Geo) replication

DFS Layer

DFS Layer

Intra-stamp replication

Intra-stamp replication

Storage Stamp

Storage Stamp
Architecture Layers inside Stamps

Partition Layer

Index
Availability with Consistency for Writing
All writes are appends to the end of a log, which is
an append to the last extent in the log
Write Consistency across all replicas for an
extent:
 Appends are ordered the same across all
3 replicas for an extent (file)
 Only return success if all 3 replica
appends are committed to storage
 When extent gets to a certain size or on
write failure/LB, seal the extent’s replica
set and never append anymore data to it

Write Availability: To handle failures during write
 Seal extent’s replica set
 Append immediately to a new extent
(replica set) on 3 other available nodes
 Add this new extent to the end of the
partition’s log (stream)

Partition Layer
Availability with Consistency for Reading
Read Consistency: Can
read from any replica, since
data in each replica for an
extent is bit-wise identical

Read Availability: Send out
parallel read requests if first
read is taking higher than
95% latency

Partition Layer
Dynamic Load Balancing – Partition Layer
Spreads index/transaction processing
across partition servers
 Master monitors traffic
load/resource utilization on
partition servers
 Dynamically load balance
partitions across servers to
achieve better
performance/availability



Does not move data around, only
reassigns what part of the index a
partition server is responsible for

Partition Layer

Index
Dynamic Load Balancing – DFS Layer
DFS Read load balancing across replicas
 Monitor latency/load on each
node/replica; dynamically select
what replica to read from and start
additional reads in parallel based on
95% latency

Partition Layer
Architecture Summary
 Durability: All data stored with at least 3 replicas
 Consistency: All committed data across all 3 replicas are identical
 Availability: Can read from any 3 replicas; If any issues writing seal
extent and continue appending to new extent
 Performance/Scale: Retry based on 95% latencies; Auto scale out and
load balance based on load/capacity



Additional details can be found in the SOSP paper:

 “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong
Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct.
2011
Best Practices
General .NET Best Practices For Azure
Storage
 Disable Nagle for small messages (< 1400 b)
 ServicePointManager.UseNagleAlgorithm = false;

 Disable Expect 100-Continue*
 ServicePointManager.Expect100Continue = false;

 Increase default connection limit
 ServicePointManager.DefaultConnectionLimit = 100; (Or
More)

 Take advantage of .Net 4.5 GC
 GC performance is greatly improved
 Background GC: http://msdn.microsoft.com/enus/magazine/hh882452.aspx
General Best Practices
 Locate Storage accounts close to compute/users
 Understand Account Scalability targets

 Use multiple storage accounts to get more
 Distribute your storage accounts across regions

 Consider heating up the storage for better
performance
 Cache critical data sets

 To get more request/sec than the account/partition targets
 As a Backup data set to fall back on

 Distribute load over many partitions and avoid
spikes
General Best Practices (cont.)
 Use HTTPS
 Optimize what you send & receive

 Blobs: Range reads, Metadata, Head Requests
 Tables: Upsert, Projection, Point Queries
 Queues: Update Message

 Control Parallelism at the application layer

 Unbounded Parallelism can lead to slow latencies and
throttling

 Enable Logging & Metrics on each storage
service
Blob Best Practices
 Try to match your read size with your write size
 Avoid reading small ranges on blobs with large blocks
 CloudBlockBlob.StreamMinimumReadSizeInBytes/
StreamWriteSizeInBytes

 How do I upload a folder the fastest?
 Upload multiple blobs simultaneously

 How do I upload a blob the fastest?
 Use parallel block upload

 Concurrency (C)- Multiple workers upload different
blobs
 Parallelism (P) – Multiple workers upload different
blocks for same blob
Concurrency Vs. Blob Parallelism

•
•
•

C=1, P=1 => Averaged ~ 13. 2 MB/s
C=1, P=30 => Averaged ~ 50.72 MB/s
C=30, P=1 => Averaged ~ 96.64 MB/s

• Single TCP connection is bound by
TCP rate control & RTT
• P=30 vs. C=30: Test completed
almost twice as fast!
• Single Blob is bound by the limits
of a single partition
• Accessing multiple blobs
concurrently scales

10000
8000
6000
4000
2000

Time (s)

XL VM Uploading 512, 256MB
Blobs (Total upload size =
128GB)

0
Blob Download
 XL VM Downloading
50, 256MB Blobs (Total
download size = 12.5GB)
C=1, P=1 => Averaged ~ 96 MB/s
C=30, P=1 => Averaged ~ 130 MB/s

120

Time (s)

•
•

140

100

80
60
40
20
0
C=1, P=1

C=30, P=1
Table Best Practices
 Critical Queries: Select PartitionKey, RowKey to avoid hotspots

 Table Scans are expensive – avoid them at all costs for latency sensitive
scenarios

 Batch: Same PartitionKey for entities that need to be updated
together
 Schema-less: Store multiple types in same table
 Single Index – {PartitionKey, RowKey}: If needed, concatenate
columns to form composite keys
 Entity Locality: {PartitionKey, RowKey} determines sort order

 Store related entites together to reduce IO and improve performance

 Table Service Client Layer in 2.1 and 2.2: Dramatic performance
improvements and better NoSQL interface
Queue Best Practices
 Make message processing idempotent: Messages
become visible if client worker fails to delete
message
 Benefit from Update Message: Extend visibility time
based on message or save intermittent state
 Message Count: Use this to scale workers
 Dequeue Count: Use it to identify poison messages
or validity of invisibility time used
 Blobs to store large messages: Increase throughput
by having larger batches
 Multiple Queues: To get more than a single queue
(partition) target
Thank you!
 Q&A

Más contenido relacionado

La actualidad más candente

Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Databricks clusters in autopilot mode
Databricks clusters in autopilot modeDatabricks clusters in autopilot mode
Databricks clusters in autopilot modePrakash Chockalingam
 
Training Slides: 151 - Tungsten Replicator - Moving your Data
Training Slides: 151 - Tungsten Replicator - Moving your DataTraining Slides: 151 - Tungsten Replicator - Moving your Data
Training Slides: 151 - Tungsten Replicator - Moving your DataContinuent
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tipsSubhas Kumar Ghosh
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing FrameworksSirKetchup
 
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...DataStax
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationUday Vakalapudi
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time ComputationSonal Raj
 
Graphite & Metrictank - Meetup Tel Aviv Yafo
Graphite & Metrictank - Meetup Tel Aviv YafoGraphite & Metrictank - Meetup Tel Aviv Yafo
Graphite & Metrictank - Meetup Tel Aviv YafoDieter Plaetinck
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadeaviadea
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
 

La actualidad más candente (20)

Apache Storm
Apache StormApache Storm
Apache Storm
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
MongoDB Backup & Disaster Recovery
MongoDB Backup & Disaster RecoveryMongoDB Backup & Disaster Recovery
MongoDB Backup & Disaster Recovery
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Databricks clusters in autopilot mode
Databricks clusters in autopilot modeDatabricks clusters in autopilot mode
Databricks clusters in autopilot mode
 
Training Slides: 151 - Tungsten Replicator - Moving your Data
Training Slides: 151 - Tungsten Replicator - Moving your DataTraining Slides: 151 - Tungsten Replicator - Moving your Data
Training Slides: 151 - Tungsten Replicator - Moving your Data
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing Frameworks
 
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Graphite & Metrictank - Meetup Tel Aviv Yafo
Graphite & Metrictank - Meetup Tel Aviv YafoGraphite & Metrictank - Meetup Tel Aviv Yafo
Graphite & Metrictank - Meetup Tel Aviv Yafo
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 

Destacado

Bdd by Dmitri Aizenberg
Bdd by Dmitri AizenbergBdd by Dmitri Aizenberg
Bdd by Dmitri AizenbergAlex Tumanoff
 
Object-2-Object mapping, как приправа к вашему проекту
Object-2-Object mapping, как приправа к вашему проектуObject-2-Object mapping, как приправа к вашему проекту
Object-2-Object mapping, как приправа к вашему проектуAlex Tumanoff
 
Microsoft Office 2013 новая модель разработки приложений
Microsoft Office 2013 новая модель разработки приложенийMicrosoft Office 2013 новая модель разработки приложений
Microsoft Office 2013 новая модель разработки приложенийAlex Tumanoff
 
Deep Dive C# by Sergey Teplyakov
Deep Dive  C# by Sergey TeplyakovDeep Dive  C# by Sergey Teplyakov
Deep Dive C# by Sergey TeplyakovAlex Tumanoff
 
Async clinic by by Sergey Teplyakov
Async clinic by by Sergey TeplyakovAsync clinic by by Sergey Teplyakov
Async clinic by by Sergey TeplyakovAlex Tumanoff
 

Destacado (7)

Bdd by Dmitri Aizenberg
Bdd by Dmitri AizenbergBdd by Dmitri Aizenberg
Bdd by Dmitri Aizenberg
 
Object-2-Object mapping, как приправа к вашему проекту
Object-2-Object mapping, как приправа к вашему проектуObject-2-Object mapping, как приправа к вашему проекту
Object-2-Object mapping, как приправа к вашему проекту
 
Design principles
Design principles Design principles
Design principles
 
Microsoft Office 2013 новая модель разработки приложений
Microsoft Office 2013 новая модель разработки приложенийMicrosoft Office 2013 новая модель разработки приложений
Microsoft Office 2013 новая модель разработки приложений
 
Deep Dive C# by Sergey Teplyakov
Deep Dive  C# by Sergey TeplyakovDeep Dive  C# by Sergey Teplyakov
Deep Dive C# by Sergey Teplyakov
 
Async clinic by by Sergey Teplyakov
Async clinic by by Sergey TeplyakovAsync clinic by by Sergey Teplyakov
Async clinic by by Sergey Teplyakov
 
Mono
MonoMono
Mono
 

Similar a Sql saturday azure storage by Anton Vidishchev

Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesWindows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesComunidade NetPonto
 
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce DatabaseBlack Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce DatabaseTim Vaillancourt
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.
 
Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mark Kromer
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsDirecti Group
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraAmazon Web Services
 
Tech-Spark: Exploring the Cosmos DB
Tech-Spark: Exploring the Cosmos DBTech-Spark: Exploring the Cosmos DB
Tech-Spark: Exploring the Cosmos DBRalph Attard
 
Building a Scalable Architecture for web apps
Building a Scalable Architecture for web appsBuilding a Scalable Architecture for web apps
Building a Scalable Architecture for web appsDirecti Group
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applicationsDing Li
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureDataStax Academy
 
Spinnaker VLDB 2011
Spinnaker VLDB 2011Spinnaker VLDB 2011
Spinnaker VLDB 2011sandeep_tata
 
ScalabilityAvailability
ScalabilityAvailabilityScalabilityAvailability
ScalabilityAvailabilitywebuploader
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisationgrooverdan
 
(DAT312) Using Amazon Aurora for Enterprise Workloads
(DAT312) Using Amazon Aurora for Enterprise Workloads(DAT312) Using Amazon Aurora for Enterprise Workloads
(DAT312) Using Amazon Aurora for Enterprise WorkloadsAmazon Web Services
 

Similar a Sql saturday azure storage by Anton Vidishchev (20)

Cnam azure 2014 storage
Cnam azure 2014   storageCnam azure 2014   storage
Cnam azure 2014 storage
 
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de AplicaçõesWindows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
Windows Azure - Uma Plataforma para o Desenvolvimento de Aplicações
 
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce DatabaseBlack Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Sql Server
Sql ServerSql Server
Sql Server
 
Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon Aurora
 
Tech-Spark: Exploring the Cosmos DB
Tech-Spark: Exploring the Cosmos DBTech-Spark: Exploring the Cosmos DB
Tech-Spark: Exploring the Cosmos DB
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
Building a Scalable Architecture for web apps
Building a Scalable Architecture for web appsBuilding a Scalable Architecture for web apps
Building a Scalable Architecture for web apps
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
 
Spinnaker VLDB 2011
Spinnaker VLDB 2011Spinnaker VLDB 2011
Spinnaker VLDB 2011
 
ScalabilityAvailability
ScalabilityAvailabilityScalabilityAvailability
ScalabilityAvailability
 
Csc concepts
Csc conceptsCsc concepts
Csc concepts
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisation
 
Aws best practices
Aws best practicesAws best practices
Aws best practices
 
(DAT312) Using Amazon Aurora for Enterprise Workloads
(DAT312) Using Amazon Aurora for Enterprise Workloads(DAT312) Using Amazon Aurora for Enterprise Workloads
(DAT312) Using Amazon Aurora for Enterprise Workloads
 

Más de Alex Tumanoff

Sql server 2019 New Features by Yevhen Nedaskivskyi
Sql server 2019 New Features by Yevhen NedaskivskyiSql server 2019 New Features by Yevhen Nedaskivskyi
Sql server 2019 New Features by Yevhen NedaskivskyiAlex Tumanoff
 
Odessa .net-user-group-sql-server-2019-hidden-gems by Denis Reznik
Odessa .net-user-group-sql-server-2019-hidden-gems by Denis ReznikOdessa .net-user-group-sql-server-2019-hidden-gems by Denis Reznik
Odessa .net-user-group-sql-server-2019-hidden-gems by Denis ReznikAlex Tumanoff
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 
Sdlc by Anatoliy Anthony Cox
Sdlc by  Anatoliy Anthony CoxSdlc by  Anatoliy Anthony Cox
Sdlc by Anatoliy Anthony CoxAlex Tumanoff
 
Kostenko ux november-2014_1
Kostenko ux november-2014_1Kostenko ux november-2014_1
Kostenko ux november-2014_1Alex Tumanoff
 
Java 8 in action.jinq.v.1.3
Java 8 in action.jinq.v.1.3Java 8 in action.jinq.v.1.3
Java 8 in action.jinq.v.1.3Alex Tumanoff
 
"Drools: декларативная бизнес-логика в Java-приложениях" by Дмитрий Контрерас...
"Drools: декларативная бизнес-логика в Java-приложениях" by Дмитрий Контрерас..."Drools: декларативная бизнес-логика в Java-приложениях" by Дмитрий Контрерас...
"Drools: декларативная бизнес-логика в Java-приложениях" by Дмитрий Контрерас...Alex Tumanoff
 
Navigation map factory by Alexey Klimenko
Navigation map factory by Alexey KlimenkoNavigation map factory by Alexey Klimenko
Navigation map factory by Alexey KlimenkoAlex Tumanoff
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsAlex Tumanoff
 
Игры для мобильных платформ by Алексей Рыбаков
Игры для мобильных платформ by Алексей РыбаковИгры для мобильных платформ by Алексей Рыбаков
Игры для мобильных платформ by Алексей РыбаковAlex Tumanoff
 
Android sync adapter
Android sync adapterAndroid sync adapter
Android sync adapterAlex Tumanoff
 
Неформальные размышления о сертификации в IT
Неформальные размышления о сертификации в ITНеформальные размышления о сертификации в IT
Неформальные размышления о сертификации в ITAlex Tumanoff
 
Разработка расширений Firefox
Разработка расширений FirefoxРазработка расширений Firefox
Разработка расширений FirefoxAlex Tumanoff
 
"AnnotatedSQL - провайдер с плюшками за 5 минут" - Геннадий Дубина, Senior So...
"AnnotatedSQL - провайдер с плюшками за 5 минут" - Геннадий Дубина, Senior So..."AnnotatedSQL - провайдер с плюшками за 5 минут" - Геннадий Дубина, Senior So...
"AnnotatedSQL - провайдер с плюшками за 5 минут" - Геннадий Дубина, Senior So...Alex Tumanoff
 
Lambda выражения и Java 8
Lambda выражения и Java 8Lambda выражения и Java 8
Lambda выражения и Java 8Alex Tumanoff
 
XP практики в проектах с тяжелой наследственностью
XP практики в проектах с тяжелой наследственностьюXP практики в проектах с тяжелой наследственностью
XP практики в проектах с тяжелой наследственностьюAlex Tumanoff
 
Первые шаги во фрилансе
Первые шаги во фрилансеПервые шаги во фрилансе
Первые шаги во фрилансеAlex Tumanoff
 
Spring Web Flow. A little flow of happiness.
Spring Web Flow. A little flow of happiness.Spring Web Flow. A little flow of happiness.
Spring Web Flow. A little flow of happiness.Alex Tumanoff
 

Más de Alex Tumanoff (20)

Sql server 2019 New Features by Yevhen Nedaskivskyi
Sql server 2019 New Features by Yevhen NedaskivskyiSql server 2019 New Features by Yevhen Nedaskivskyi
Sql server 2019 New Features by Yevhen Nedaskivskyi
 
Odessa .net-user-group-sql-server-2019-hidden-gems by Denis Reznik
Odessa .net-user-group-sql-server-2019-hidden-gems by Denis ReznikOdessa .net-user-group-sql-server-2019-hidden-gems by Denis Reznik
Odessa .net-user-group-sql-server-2019-hidden-gems by Denis Reznik
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
Sdlc by Anatoliy Anthony Cox
Sdlc by  Anatoliy Anthony CoxSdlc by  Anatoliy Anthony Cox
Sdlc by Anatoliy Anthony Cox
 
Kostenko ux november-2014_1
Kostenko ux november-2014_1Kostenko ux november-2014_1
Kostenko ux november-2014_1
 
Java 8 in action.jinq.v.1.3
Java 8 in action.jinq.v.1.3Java 8 in action.jinq.v.1.3
Java 8 in action.jinq.v.1.3
 
"Drools: декларативная бизнес-логика в Java-приложениях" by Дмитрий Контрерас...
"Drools: декларативная бизнес-логика в Java-приложениях" by Дмитрий Контрерас..."Drools: декларативная бизнес-логика в Java-приложениях" by Дмитрий Контрерас...
"Drools: декларативная бизнес-логика в Java-приложениях" by Дмитрий Контрерас...
 
Spring.new hope.1.3
Spring.new hope.1.3Spring.new hope.1.3
Spring.new hope.1.3
 
Navigation map factory by Alexey Klimenko
Navigation map factory by Alexey KlimenkoNavigation map factory by Alexey Klimenko
Navigation map factory by Alexey Klimenko
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
 
Игры для мобильных платформ by Алексей Рыбаков
Игры для мобильных платформ by Алексей РыбаковИгры для мобильных платформ by Алексей Рыбаков
Игры для мобильных платформ by Алексей Рыбаков
 
Android sync adapter
Android sync adapterAndroid sync adapter
Android sync adapter
 
Неформальные размышления о сертификации в IT
Неформальные размышления о сертификации в ITНеформальные размышления о сертификации в IT
Неформальные размышления о сертификации в IT
 
Разработка расширений Firefox
Разработка расширений FirefoxРазработка расширений Firefox
Разработка расширений Firefox
 
"AnnotatedSQL - провайдер с плюшками за 5 минут" - Геннадий Дубина, Senior So...
"AnnotatedSQL - провайдер с плюшками за 5 минут" - Геннадий Дубина, Senior So..."AnnotatedSQL - провайдер с плюшками за 5 минут" - Геннадий Дубина, Senior So...
"AnnotatedSQL - провайдер с плюшками за 5 минут" - Геннадий Дубина, Senior So...
 
Lambda выражения и Java 8
Lambda выражения и Java 8Lambda выражения и Java 8
Lambda выражения и Java 8
 
XP практики в проектах с тяжелой наследственностью
XP практики в проектах с тяжелой наследственностьюXP практики в проектах с тяжелой наследственностью
XP практики в проектах с тяжелой наследственностью
 
Anti patterns
Anti patternsAnti patterns
Anti patterns
 
Первые шаги во фрилансе
Первые шаги во фрилансеПервые шаги во фрилансе
Первые шаги во фрилансе
 
Spring Web Flow. A little flow of happiness.
Spring Web Flow. A little flow of happiness.Spring Web Flow. A little flow of happiness.
Spring Web Flow. A little flow of happiness.
 

Último

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Sql saturday azure storage by Anton Vidishchev

  • 1. Windows Azure Storage Overview, Internals and Best Practices
  • 3. About me        Program Manager @ Edgar Online, RRD Windows Azure MVP Co-organizer of Odessa .NET User Group Ukrainian IT Awards 2013 Winner – Software Engineering http://cloudytimes.azurewebsites.net/ http://www.linkedin.com/in/antonvidishchev https://www.facebook.com/anton.vidishchev
  • 4. What is Windows Azure Storage?
  • 5. Windows Azure Storage  Cloud Storage - Anywhere and anytime access  Blobs, Disks, Tables and Queues  Highly Durable, Available and Massively Scalable  Easily build “internet scale” applications  10 trillion stored objects  900K request/sec on average (2.3+ trillion per month)  Pay for what you use  Exposed via easy and open REST APIs  Client libraries in .NET, Java, Node.js, Python, PHP, Ruby
  • 9. Windows Azure Data Storage Concepts Container Blobs https://<account>.blob.core.windows.net/<container> Account Table Entities https://<account>.table.core.windows.net/<table> Queue Messages https://<account>.queue.core.windows.net/<queue>
  • 10. How is Azure Storage used by Microsoft?
  • 12. Design Goals Highly Available with Strong Consistency  Provide access to data in face of failures/partitioning Durability  Replicate data several times within and across regions Scalability  Need to scale to zettabytes  Provide a global namespace to access data around the world  Automatically scale out and load balance data to meet peak traffic demands
  • 13. Windows Azure Storage Stamps Access blob storage via the URL: http://<account>.blob.core.windows.net/ Data access Storage Location Service LB LB Front-Ends Front-Ends Partition Layer Partition Layer Inter-stamp (Geo) replication DFS Layer DFS Layer Intra-stamp replication Intra-stamp replication Storage Stamp Storage Stamp
  • 14. Architecture Layers inside Stamps Partition Layer Index
  • 15. Availability with Consistency for Writing All writes are appends to the end of a log, which is an append to the last extent in the log Write Consistency across all replicas for an extent:  Appends are ordered the same across all 3 replicas for an extent (file)  Only return success if all 3 replica appends are committed to storage  When extent gets to a certain size or on write failure/LB, seal the extent’s replica set and never append anymore data to it Write Availability: To handle failures during write  Seal extent’s replica set  Append immediately to a new extent (replica set) on 3 other available nodes  Add this new extent to the end of the partition’s log (stream) Partition Layer
  • 16. Availability with Consistency for Reading Read Consistency: Can read from any replica, since data in each replica for an extent is bit-wise identical Read Availability: Send out parallel read requests if first read is taking higher than 95% latency Partition Layer
  • 17. Dynamic Load Balancing – Partition Layer Spreads index/transaction processing across partition servers  Master monitors traffic load/resource utilization on partition servers  Dynamically load balance partitions across servers to achieve better performance/availability  Does not move data around, only reassigns what part of the index a partition server is responsible for Partition Layer Index
  • 18. Dynamic Load Balancing – DFS Layer DFS Read load balancing across replicas  Monitor latency/load on each node/replica; dynamically select what replica to read from and start additional reads in parallel based on 95% latency Partition Layer
  • 19. Architecture Summary  Durability: All data stored with at least 3 replicas  Consistency: All committed data across all 3 replicas are identical  Availability: Can read from any 3 replicas; If any issues writing seal extent and continue appending to new extent  Performance/Scale: Retry based on 95% latencies; Auto scale out and load balance based on load/capacity  Additional details can be found in the SOSP paper:  “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011
  • 21. General .NET Best Practices For Azure Storage  Disable Nagle for small messages (< 1400 b)  ServicePointManager.UseNagleAlgorithm = false;  Disable Expect 100-Continue*  ServicePointManager.Expect100Continue = false;  Increase default connection limit  ServicePointManager.DefaultConnectionLimit = 100; (Or More)  Take advantage of .Net 4.5 GC  GC performance is greatly improved  Background GC: http://msdn.microsoft.com/enus/magazine/hh882452.aspx
  • 22. General Best Practices  Locate Storage accounts close to compute/users  Understand Account Scalability targets  Use multiple storage accounts to get more  Distribute your storage accounts across regions  Consider heating up the storage for better performance  Cache critical data sets  To get more request/sec than the account/partition targets  As a Backup data set to fall back on  Distribute load over many partitions and avoid spikes
  • 23. General Best Practices (cont.)  Use HTTPS  Optimize what you send & receive  Blobs: Range reads, Metadata, Head Requests  Tables: Upsert, Projection, Point Queries  Queues: Update Message  Control Parallelism at the application layer  Unbounded Parallelism can lead to slow latencies and throttling  Enable Logging & Metrics on each storage service
  • 24. Blob Best Practices  Try to match your read size with your write size  Avoid reading small ranges on blobs with large blocks  CloudBlockBlob.StreamMinimumReadSizeInBytes/ StreamWriteSizeInBytes  How do I upload a folder the fastest?  Upload multiple blobs simultaneously  How do I upload a blob the fastest?  Use parallel block upload  Concurrency (C)- Multiple workers upload different blobs  Parallelism (P) – Multiple workers upload different blocks for same blob
  • 25. Concurrency Vs. Blob Parallelism • • • C=1, P=1 => Averaged ~ 13. 2 MB/s C=1, P=30 => Averaged ~ 50.72 MB/s C=30, P=1 => Averaged ~ 96.64 MB/s • Single TCP connection is bound by TCP rate control & RTT • P=30 vs. C=30: Test completed almost twice as fast! • Single Blob is bound by the limits of a single partition • Accessing multiple blobs concurrently scales 10000 8000 6000 4000 2000 Time (s) XL VM Uploading 512, 256MB Blobs (Total upload size = 128GB) 0
  • 26. Blob Download  XL VM Downloading 50, 256MB Blobs (Total download size = 12.5GB) C=1, P=1 => Averaged ~ 96 MB/s C=30, P=1 => Averaged ~ 130 MB/s 120 Time (s) • • 140 100 80 60 40 20 0 C=1, P=1 C=30, P=1
  • 27. Table Best Practices  Critical Queries: Select PartitionKey, RowKey to avoid hotspots  Table Scans are expensive – avoid them at all costs for latency sensitive scenarios  Batch: Same PartitionKey for entities that need to be updated together  Schema-less: Store multiple types in same table  Single Index – {PartitionKey, RowKey}: If needed, concatenate columns to form composite keys  Entity Locality: {PartitionKey, RowKey} determines sort order  Store related entites together to reduce IO and improve performance  Table Service Client Layer in 2.1 and 2.2: Dramatic performance improvements and better NoSQL interface
  • 28. Queue Best Practices  Make message processing idempotent: Messages become visible if client worker fails to delete message  Benefit from Update Message: Extend visibility time based on message or save intermittent state  Message Count: Use this to scale workers  Dequeue Count: Use it to identify poison messages or validity of invisibility time used  Blobs to store large messages: Increase throughput by having larger batches  Multiple Queues: To get more than a single queue (partition) target