The Public Cloud is a Lie

The Public Cloud is a Lie
Tapio Rautonen
@trautonen linkedin.com/in/trautonen

De facto visibility optimization
technology for video game
industry
World leading 3D optimization
and delivery platform
Professional AWS user since 2010
for many of the biggest public
cloud users in Finland
Bending Scala and other
technologies to provide the most
immersive 3D experience from
the cloud
Tapio Rautonen

THE PROMISE
Elastically provisioned unlimited pool of resources available
over the network on demand without human interaction

Amazon Elastic Compute Cloud (Amazon EC2)
Provides secure, resizable compute capacity in the cloud. It is
designed to make web-scale cloud computing easier for
developers.
Amazon EC2 enables you to increase or decrease capacity
within minutes, not hours or days. You can commission one,
hundreds, or even thousands of server instances
simultaneously.

Amazon Simple Storage Service (Amazon S3)
An object storage service that offers industry-leading
scalability, data availability, security, and performance.
Scale your storage resources up and down to meet
fluctuating demands, without upfront investments or resource
procurement cycles.

THE DESIRE
Creating a virtual copy of the real world in detail of
a grain of sand requires massive computing resources

Katajanokka, Helsinki by Umbra Same location by Google Earth

Umbra Composit Platform
Umbra Composit can take terabytes of 3D input data,
optimize and realign it for streaming purposes for any device
from virtual glasses to mobile phones and game engines.
Umbra’s patented optimization techniques and cloud
computing allows the platform to scale beyond the traditional
limits associated with ultra-high resolution 3D content.

THE FACT CHECK
It’s just servers in someone else’s data center

… without human interaction
Request an increase in the limits for resources provided by
Amazon EC2. Complete the required fields on the limit
increase form. We'll respond to you using the contact method
that you specified.

… unlimited resources
Launching a nontrivial amount of instances, unusual instance
types or bad timing can result insufficient capacity error.
For many computing resources there are hard limits that
cannot be worked around.

… it really is just servers
Pretty much everything in AWS is built on top of EC2. The
development experience is not like Heroku or some other well
designed PaaS service.
EKS should be managed Kubernetes Service, but it’s actually a
whole lot of proprietary configuration instead of just running
Kubernetes as a service.

THE REALITY
The cloud is as strong as the weakest tool in the chain

It could be the SDK ...
Rewriting of the AWS Java SDK to use Netty and non blocking
IO has resulted many concurrency related issues.
Asynchronous multipart S3 uploads randomly failing. Sent or
received SQS messages are lost on high
throughput/concurrency scenarios.

The documentation claims something but the services do the opposite ...
S3 should scale with highly sharded content to enormous
amounts of concurrent requests. But it’s still pretty easy to hit
slow down errors or internal server errors with tens of
thousands of concurrent requests.
DynamoDB scaling and resharding should happen without
the user noticing anything, but still on high throughput cases
the service usually fails with internal server errors.

Everything web-scale relies on optimal sharding ...
EFS can only hit the promised numbers when the content is
properly sharded on a really big volume.
S3 and DynamoDB suffer from the hot-key problem where the
high scalability can be reached only with optimal sharding
and access patterns.
PostgreSQL Aurora database is a lot slower than MySQL due
to sharding limitations.

THE PITFALLS
“Everything fails all the time”
- Werner Vogels

Who do you trust?
Defensive coding can result complex and hard to debug
scenarios without actually increasing the confidence.
Relying on some managed cloud service is a good starting
point, but know the limitations and pitfalls.
Cloud native software architectures should embrace the
“let it crash” philosophy.

Everything comes with a compromise
With SQL and ACID properties there will be scalability limits.
With sharding and scalability there will be no exactly once
guarantee.
With managed services you lose some of the configuration
possibilities.

The weird problems only happen when everything is at their limits
Losing a few messages only when pushing hundreds of
thousands of messages at the rate of thousands of messages
per second.
Service failing only when internal resharding or other user
invisible operation is ongoing.

THE END
Using public cloud doesn’t remove the requirement of
understanding the inner workings of each of the used tools

The Public Cloud is a Lie

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a The Public Cloud is a Lie

Similar a The Public Cloud is a Lie (20)

Más de Tapio Rautonen

Más de Tapio Rautonen (7)

Último

Último (20)

The Public Cloud is a Lie