Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Sector Summit 2016


Eche un vistazo a continuación

1 de 65 Anuncio

AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Sector Summit 2016

Descargar para leer sin conexión

This session discusses the set of data services that AWS offers for managing all types of data, including files, objects, databases, and data warehouses. We will discuss use cases for each AWS data service, including unique capabilities that the cloud enables and hybrid scenarios for integrating and migrating on-premises data to AWS. This session discusses Amazon S3, AWS Storage Gateway, Amazon EBS, Amazon RDS, Amazon Redshift, and native databases running on AWS. It also covers some of the key data and storage capabilities provided by AWS partners, and considerations for integrating with and migrating enterprise data to the cloud.

This session discusses the set of data services that AWS offers for managing all types of data, including files, objects, databases, and data warehouses. We will discuss use cases for each AWS data service, including unique capabilities that the cloud enables and hybrid scenarios for integrating and migrating on-premises data to AWS. This session discusses Amazon S3, AWS Storage Gateway, Amazon EBS, Amazon RDS, Amazon Redshift, and native databases running on AWS. It also covers some of the key data and storage capabilities provided by AWS partners, and considerations for integrating with and migrating enterprise data to the cloud.


Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (20)


Similares a AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Sector Summit 2016 (20)

Más de Amazon Web Services (20)


Más reciente (20)

AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Sector Summit 2016

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Joe Healy, WorldWide Public Sector AWS Consultant June 20, 2016 AWS as a Data Platform
  2. 2. Common constraints Too much, too fast, too many types of data - Velocity, volume & variety of data Too constrained - No capacity Too expensive - Costs Too insecure - “Unique” compliance requirements Too complicated - Skillset gaps Not always as difficult as originally perceived
  3. 3. ENTERPRISE APPS DEVELOPMENT & OPERATIONSMOBILE SERVICESAPP SERVICESANALYTICS Data Warehousing Hadoop/S park Streaming Data Collection Machine Learning Elastic Search Virtual Desktops Sharing & Collaboration Corporate Email Backup Queuing & Notifications Workflow Search Email Transcoding One-click App Deployment Identity Sync Single Integrated Console Push Notifications DevOps Resource Management Application Lifecycle Management Containers Triggers Resource Templates TECHNICAL & BUSINESS SUPPORT Account Management Support Professional Services Training & Certification Security & Pricing Reports Partner Ecosystem Solutions Architects MARKETPLACE Business Apps Business Intelligence Databases DevOps Tools NetworkingSecurity Storage Regions Availability Zones Points of Presence INFRASTRUCTURE CORE SERVICES Compute VMs, Auto- scaling, & Load Balancing Storage Object, Blocks, Archival, Import/Export Databases Relational, NoSQL, Caching, Migration Networking VPC, DX, DNS CDN Access Control Identity Management Key Management & Storage Monitoring & Logs Assessment and reporting Resource & Usage Auditing SECURITY & COMPLIANCE Configuration Compliance Web application firewall HYBRID ARCHITECTURE Data Backups Integrated App Deployments Direct Connect Identity Federation Integrated Resource Management Integrated Networking API Gateway IoT Rules Engine Device Shadows Device SDKs Registry Device Gateway Streaming Data Analysis Business Intelligence Mobile Analytics
  4. 4. Move Store Process Deliver Journey to Value & Insight
  5. 5. Journey to Value & Insight Move Store Process Deliver
  6. 6. Move AWS Import/Export Snowball Amazon S3 Transfer Acceleration AWS Direct Connect
  7. 7. AWS Import/Export Snowball Move
  8. 8. What is Snowball? Petabyte scale data transport E-ink shipping label Ruggedized case “8.5G Impact” All data encrypted end-to-end 50 TB & 80 TB 10Gb network Rain & dust resistant Tamper-resistant case & electronics
  9. 9. New ways to transfer data into the cloud: AWS Import/Export Snowball • Now holds 60% more – New 80 TB model, $250/job – 50 TB still available in US West and US East for $200/job • New regional availability – Currently in US West (Oregon) and US East (N. Virginia) – US West (N. California), GovCloud (US), Asia Pacific (Sydney), and EU (Ireland) regions expected by the end of 2016
  10. 10. How fast is Snowball? • Less than 1 day to transfer 50 TB via a 10Gb connection with Snowball, less than 1 week including shipping • Number of days to transfer 50 TB via the Internet at typical utilizations Internet Connection Speed Utilization 1Gbps 500Mbps 300Mbps 150Mbps 25% 19 38 63 126 50% 9 19 32 63 75% 6 13 21 42
  11. 11. How fast is Snowball? • Less than 1 day to transfer 250 TB via 5x10Gb connections with 5 Snowballs, less than 1 week including shipping • Number of days to transfer 250 TB via the Internet at typical utilizations Internet Connection Speed Utilization 1Gbps 500Mbps 300Mbps 150Mbps 25% 95 190 316 632 50% 47 95 158 316 75% 32 63 105 211
  12. 12. How is my data transported securely? • All data is encrypted with 256-bit encryption by the Snowball client • Keys are managed by AWS Key Management Service (AWS KMS) and are never sent to the Snowball appliance • Strong chain of custody • Tamper-resistant case • Tamper-resistant electronics (TPM) • Each Snowball appliance is erased according to NIST 800-88 media sanitization guidelines between every job
  13. 13. Pricing Dimension Price Usage charge per job $200.00 (50 TB) $250.00 (80 TB) Extra day charge (first 10 days* are free) $15.00 Data transfer In $0.00/GB Data transfer out $0.03/GB Shipping** Varies Amazon S3 charges Standard storage and request fees apply * Starts one day after the appliance is delivered to you. The first day the appliance is received at your site and the last day the appliance is shipped out are also free and not included in the 10-day free usage time. ** Shipping charges are based on your shipment destination and the shipping option (e.g., overnight, 2-day) you choose.
  14. 14. Amazon S3 Transfer Acceleration Move
  15. 15. Complicated setup and management Hard to optimize performance Expensive Optimizing Internet performance is hard Proprietary
  16. 16. Introducing Amazon S3 Transfer Acceleration S3 Bucket AWS Edge Location Uploader Optimized throughput! Typically 50% to 400% faster Change your endpoint, not your code 56 global Edge Locations No firewall exceptions No client software required
  17. 17. Amazon Route 53 Resolve HTTPS PUT/POST HTTP/S PUT/POST “” Service traffic flow Client to S3 Bucket example S3 Bucket EC2 Proxy AWS Region AWS Edge Location Customer client 1 2 3 4
  18. 18. Rio De Janeiro Warsaw New York Atlanta Madrid Virginia Melbourne Paris Los Angeles Seattle Tokyo Singapore Time[hrs] 500 GB upload from these Edge Locations to a bucket in Singapore Public Internet How fast is S3 transfer acceleration? S3 Transfer acceleration
  19. 19. Getting started 1. Enable S3 transfer acceleration on your S3 bucket. 2. Update your application or destination URL to <bucket-name> 3. Done!
  20. 20. How much will it help me?
  21. 21. Pricing* Dimension Price/GB Data transfer in from Internet** $0.04 (Edge Location in US, EU, JP) $0.08 (Edge Location in rest of the world) Data transfer out to Internet $0.04 Data transfer out to another AWS Region $0.04 Amazon S3 charges Standard data transfer charges apply * Plus standard Amazon S3 data transfer charges apply ** Accelerated performance or there is no bandwidth charge
  22. 22. AWS Direct Connect Move
  23. 23. Use cases • Connect on-premises resources to resources in VPC • Faster connectivity • Dedicated connection • Less operational overhead versus VPN • Multiple VPCs supported • Dedicated connectivity to your public AWS services • S3 for data ingestion • EC2 public interfaces • Owned by the customer • Other customers’ services • Cost • Data transfer out is $.02 or $.03 per GB in NA versus $.09 to the Internet
  24. 24. Direct Connect worldwide locations 12 Regions 28 DX Locations
  25. 25. GovCloud North America Direct Connect locations (Las Vegas) (Seattle) (NYC) (Ashburn) (San Jose) (LA) (Dallas) • Private (VPC) access to 1 designated region • Public (S3) access to all US regions DX Location choice provides: (Santa Clara) (Portland)
  26. 26. Move Store Process Deliver Journey to insight
  27. 27. S3 – Object storage options EBS – Throughput optimized Store
  28. 28. S3 – Object storage options Storage
  29. 29. Selecting the right object storage for your needs S3 S3-IA Glacier L i f e c y c l e Available S3: 99.99% S3-IA: 99.9% Performant Low Latency High throughput Secure SSE, client encryption, IAM integration Event notifications SQS, SNS, and Lambda Versioning Keep multiple copies automatically Cross-region replication Common namespace Define storage class per object Durable 99.999999999% Scalable Elastic capacity No preset limits “Hot” data Active and/or temporary data “Warm” data Infrequently accessed data “Cold” data Archive and compliance data
  30. 30. Selecting the right object storage for your needs S3 S3-IA Glacier L i f e c y c l e Available S3: 99.99% S3-IA: 99.9% Performant Low Latency High Throughput ≥ 30 Days≥ 128K ≥ 90 Days Durable 99.999999999% Scalable Elastic capacity No preset limits > 0K$0.007/GB per month $0.0125/GB per month “Hot” data Active and/or temporary data “Warm” data Infrequently accessed data “Cold” data Archive and compliance data ≥ 0 Days> 0K$0.03/GB per month 3 – 5 Hrs $0.01/GB retrieval $0.01/GB retrieval < 5%
  31. 31. Amazon Elastic Block Store (Amazon EBS) throughput optimized volumes Storage
  32. 32. ST1/SC1 performance Burst bucket based on MB/sec (vs. IOPS) • Scales with size of volume • Max throughput 500 MB/sec (ST1), 250 MB/sec (SC1) Use cases • Workloads with majority sequential I/O • EMR, Kafka, Hadoop, Splunk/log processing, media
  33. 33. Throttling • I/O requests of 1 MB or less count as 1 MB I/O credit • Sequential I/Os are merged into 1 MB I/O credits • Throttle designed to reward streaming and big data workloads with large data sets, large I/O block sizes, and sequential I/O patterns. • Small, random I/Os are inefficient and quickly drain the burst bucket
  34. 34. Amazon EBS volumes
  35. 35. Move Store Process Deliver Journey to insight
  36. 36. Amazon RDS – engine review AWS Database Migration Service Amazon Kinesis Process
  37. 37. Amazon Relational Database Service (Amazon RDS) Process
  38. 38. No infrastructure management Scale up/down Cost-effective Instant provisioning Application compatibility Amazon RDS
  39. 39. Amazon RDS engines Commercial Open source Amazon Aurora
  40. 40. Trade-offs with a managed service Fully managed host and OS • No access to the database host operating system • Limited ability to modify configuration that is managed on the host operating system • No functions that rely on configuration from the host OS Fully managed storage • Max storage limits • SQL Server—4 TB • MySQL, MariaDB, PostgreSQL, Oracle—6 TB • Aurora—64 TB • Growing your database is a process
  41. 41. High availability—multi-AZ Availability Zone A AWS Region Availability Zone B Replicated storage Same instance type as master
  42. 42. AWS Database Migration Service Process
  43. 43. Bring your on-premises databases into AWS Move data to the same or a different database engine Start your first migration in 10 minutes or less Keep your apps running during the migration Replicate from on premises, EC2 or RDS to EC2 or RDS One-time migration or ongoing replication AWS DMS
  44. 44. Amazon Kinesis Process
  45. 45. Amazon Kinesis: streaming data done the AWS way Makes it easy to capture, deliver, and process real-time data streams Pay as you go, no up-front costs Elastically scalable Right services for your specific use cases Real-time latencies Easy to provision, deploy, and manage
  46. 46. Amazon Kinesis Streams • For technical developers • Build your own custom applications that process or analyze streaming data • GA at re:Invent 2013 Amazon Kinesis Firehose • For all developers, data scientists • Easily load massive volumes of streaming data into Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service • GA at re:Invent 2015 Amazon Kinesis Analytics • For all developers, data scientists • Easily analyze data streams using standard SQL queries • Preview Amazon Kinesis: streaming data made easy Services make it easy to capture, deliver, and process streams on AWS
  47. 47. Sending & reading data from Kinesis Streams AWS SDK LOG4J Flume Fluentd Get* APIs Kinesis Client Library + Connector Library Apache Storm Amazon Elastic MapReduce Sending Consuming AWS Mobile SDK Kinesis Producer Library AWS Lambda Apache Spark
  48. 48. Amazon Kinesis Firehose Load massive volumes of streaming data into Amazon S3, Amazon Redshift, and Amazon ES • Zero administration: Capture and deliver streaming data into S3, Amazon Redshift, and Amazon ES without writing an application or managing infrastructure. • Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations. • Seamless elasticity: Seamlessly scales to match data throughput without intervention. Capture and submit streaming data to Firehose Analyze streaming data using your favorite BI tools Firehose loads streaming data continuously into S3, Amazon Redshift, and Amazon ES
  49. 49. Move Store Process Deliver Journey to insight
  50. 50. Amazon QuickSight Amazon Elasticsearch Service Amazon CloudFront Deliver
  51. 51. Amazon QuickSight Fast, easy-to-use, cloud-powered business intelligence for 1/10th the cost of traditional BI solutions Deliver
  52. 52. Big data challenges for our customers Lots of data Lots and lots of questions Few insights Who are my top customers and what are they buying? Which devices are showing time for maintenance? What is my product profitability by region? Why is my most profitable region not growing? How much inventory do I have? Has my fraud account expense increased? How is my marketing campaign performing? How is my employee satisfaction trending?
  53. 53. Costs too much Traditional business intelligence Takes too long Pay $ million before seeing first analysis 3 year TCO $150 to $250 per user per month Spend 6 to 12 months of consulting and SW implementation time
  54. 54. Business user QuickSight API Data prep Metadata SuggestionsConnectors SPICE Business user QuickSight UI Mobile devices Web browsers Partner BI products Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon EMR Amazon Redshift Amazon RDSFiles Apps Direct connect JDBC/ODBC On-premises data
  55. 55. Easy exploration of AWS data Securely discover and connect to AWS data Quickly explore AWS data sources • Relational databases • NoSQL databases • Amazon EMR, Amazon S3, files • Streaming data sources Easily import data from any table or file Automatic detection of data types Amazon EMR Amazon Kinesis Amazon Dynamo DB Amazon Redshift Amazon RDS Amazon S3 File Upload Third Party
  56. 56. Fast insights with SPICE • Super-fast, Parallel, In-memory, Calculation Engine • 2x to 4x compression columnar data • Compiled queries with machine code generation • Rich calculations • SQL–like syntax • Very fast response time to queries • Fully managed – no hardware or software to license
  57. 57. Amazon Elasticsearch Service Deliver
  58. 58. Key benefits Easy cluster creation and configuration management Support for ELK Security with AWS IAM Monitoring with Amazon CloudWatch Auditing with AWS CloudTrail Integration options with other AWS services (CloudWatch Logs, Amazon DynamoDB, Amazon S3, Amazon Kinesis)
  59. 59. Kibana UI
  60. 60. Amazon CloudFront Content delivery network Deliver
  61. 61. (16)Europe Amsterdam (2) Dublin Frankfurt (3) London (3) Madrid Marseille Milan Paris (2) Stockholm Warsaw (2)South America Rio de Janeiro Sao Paulo (21)North America Ashburn, VA (3) Atlanta, GA Chicago, IL Dallas, TX (2) Hayward, CA Jacksonville, FL Los Angeles, CA (2) Miami, FL Newark, NJ New York, NY (3) Palo Alto, CA Seattle, WA San Jose, CA South Bend, IN St. Louis, MO (14)Asia Chennai Hong Kong (2) Manila Melbourne Mumbai Osaka Singapore (2) Seoul(2) Sydney Taipei Tokyo (2) An extensive global network
  62. 62. Elastic Load Balancing Dynamic content Amazon EC2 Static content Amazon S3 Custom origin OR OR Custom origin CloudFront *.jpg *.php Delivering customer experience
  63. 63. AWS WAF
  64. 64. CloudFront with AWS WAF CloudFront Edge location WAF users hackers bad bots site scraping SQL Injection, XSS, other attacks legitimate traffic Malicious traffic is blocked by WAF rules at Edge Locations • Can be custom origin • Can be static and dynamic content • Show the other on premises + S3 EC2ELBS3 AND/OR Customer on-premises environment Origin server Origin storage
  65. 65. Thank you!

Notas del editor

  • Welcome.  Thank you for attending this session. 
    My name is Joe Healy and I am at Consultant within the WWPS Professional Services team at AWS based in Herndon. 
    In this session, I am going to provide an overview of some of the available AWS services which can help enable you to leverage AWS as a Data Platform.
  • Prior to this potentially lofty goal, there are some possible constraints that may be in place which are preventing the start of your journey.  

    These constraints can come in many forms:
    - Volume, variety, velocity of Data
    - No available capacity (bandwidth, people..)
    - Too Expensive – I have paid for my equipment, there isn’t anything else to pay for
    - Not secure enough for your requirements
    Too complex.  We operate fine the way we do things.  We can’t adapt to this Cloud model.  It isn’t applicable to us

    Hopefully during this session, I am able to address some of these concerns to show you how you can overcome the Perceived barrier(s) to adoption
  • Looking at this chart, you can see that there are many capabilities available from AWS which address many of your constraints and requirements.

    Areas such as:
    - Security and Compliance - The visibility into as well as the governance of the security controls that you have within AWS is staggering.  We have validated these capabilities by having our processes and procedures measured against many of the industry compliance standards.  You can read about these on our Security/Compliance website. FedRAMP, PCI, HIPAA are just a few examples
    - Infrastructure – Our infrastructure is available so that you can build and deploy your applications in the most cost effective, highly available and secure manner. We currently have 12 Public regions throughout the world (4 in the US), each comprised of at least 2 Availability Zones as well as 56 Edge Locations which enable you to satisfy your user experience requirements by bringing content closer to them among other things.
    Support – Through our Support team, Solutions Architects, Professional Services, Account Managers you have a tremendous set of human resources to help you with your specific journey.
    Partners – Our Partners are the force multiplier providing in depth assistance through Consulting or Managed Services or through the individual solutions provided by the ISV’s from SaaS offerings or our Marketplace.
    Hybrid – Picking up and dropping your entire infrastructure into AWS isn’t a realistic short term goal for most companies. In reality it may never be a long term goal either. So you will have to integrate between one or more locations. We provide some excellent capabilities to make your AWS infrastructure as much of a logical extension of your existing environments.
    Services – Whether you plan to do a simple lift and shift to EC2 from your existing environment, or if you are planning to move higher in the Stack to lessen the administrative/operational burden, AWS is consistently evolving the portfolio of services available to meet your requirements.

  • Given the broad depth of services that are available and the limited time that we have available, I will narrow down the examples of services in this session.

    There are different goals for the different types of data that you have. There is a tremendous level of Value and Insight in your data. There are also a series of steps that data needs to pass through to reach its specific goal. In this session, we will categorize those Steps into the following Phases:

    Move – You have to get your data from Point A to Point B. As the quantity of data increases and the transfer timeframe requirements shrink, this introduces a complex issue.
    Store – Once at AWS, you need to choose a data platform to meet your data requirements from a availability, performance and price perspective.
    Process – To extract and correlate the information within the data, you need to perform some type of processing.
    Deliver – The Insight discovered in your data is worthless if you aren’t able to make it accessible by your customer or community.

    This will not be an exhaustive session. I will try to focus on some of the newer capabilities/features that are available. Working with your AWS Account team or referencing our documentation will enable you to dive deep on all of the services we have.
  • So lets start with the Move Phase of our Journey.
  • Snowball
    The Import/Export service introduced this new capability last year. Prior to Snowball, if you wanted to use the Import/Export service you had to purchased your own external hard drives. The maximum device capacity supported was 16TB. Depending on the interface type available on your device (eSATA, USB3/2) the transfer speed to/from the device could vary wildly.

    To help streamline and standardize this process, the Snowball appliance was developed. It has been greatly received.

    So what is it?
  • What is AWS Import/Export Snowball?

    Snowball is a new AWS Import/Export offering that provides a petabyte-scale data transfer service that uses Amazon-provided storage devices for transport.
    With the launch of Snowball customers are now able to use highly secure, rugged Amazon-owned Network Attached Storage (NAS) devices, called Snowballs, to ship their data.
    Once received and set up, customers are able to copy up to 80TB data from their on-prem file system to the Snowball via the Snowball client software via a 10Gbps network interface.
    Prior to transfer to the Snowball all data is encrypted by 256-bit GSM encryption by the client.
    When customers finish transferring data to the device they simply ship it back to an AWS facility where the data is ingested at high speed into Amazon S3.
  • Compare and contrast Internet vs 1x Snowball.
  • Compare and contrast Internet vs 5x Snowball.
  • From a security perspective the Snowball device itself is always treated as untrusted as it passes through multiple parties – AWS, the customer, and the shipper. For this reason all data is encrypted before it is ever written to the device, and the keys for encryption are only stored on the host performing the encryption, never the Snowball itself.

    Additionally, AWS supports a strong chain of custody through the entire process, providing notifications of each step in the process so you always know where your Snowball, and your data, are at all times.

    The device itself has been custom designed to be tamper resistant, leveraging custom hardware to make the device difficult to physically compromise, as well as tamper evident seals which are verified upon receipt.

    The device also leverages an industry standard trusted platform module, providing independent verification of the devices firmware which will not allow the Snowball to boot if it detects that the device has been compromised.
  • Perhaps the Snowball device doesn’t meet your specific requirements.

    Maybe you can’t facilitate bringing an external device into your network.

    Or, the timeframes or cycle of time you need to conduct your transfers may not meet your deadlines. There are logistic steps that are out of your control. It is a streamed line process, but you are talking about a physical device that needs to be received, configured, data transferred, shipped back to AWS and the data copied off. Many of those steps you don’t have control over.

    This next service, may help you address some of those restrictions, if they exist.

    Amazon S3 Transfer Acceleration is a new capability that was added this year which simplifies and potentially increases the speed that data is transferred directly to/from an S3 bucket without any third party utilities or software.

    If you currently use any WAN optimization products to make your point to point transfers to AWS more efficient, you will be interest in learning more about this service.
  • Leveraging the Internet for file transfers can be a frustrating task as much of the path to your destination is out of your control.
    You may have extremely high Internet bandwidth, but as your data travels through the public internet, you are susceptible to any weak link along the way.

    Solutions that exist to try to mitigate this problem are extremely complex. They require custom proprietary software installed on EVERY client initiating the transfer and in lots of cases special software installed at where the data is getting ingested as well.

    Finally these typically require a large up front fee, a minimum payment amount and are prohibitively expensive.
  • This is why we’re happy to introduce S3 Transfer Acceleration, a way to move data faster over long geographic distances. “Long distances” means across or between continents, not across town. It ensures that your data moves as fast as your first mile, and removes the vagaries of intermediate networks.

    S3-XA has shown typical performance benefits of up to 400% (5x) in optimal conditions that we’ve seen from internal testing and our beta customer results.
    S3-XA is extremely simple to use. As it is a feature of S3, you simply need to enable your bucket with a checkbox, and change your endpoint.
    To mitigate the problem we described earlier about the long paths a file transfer takes. S3-XA leverages our 56 POP locations to insure your transfers travel a shorter distance on the public internet and then travel the remaining portion over an optimized route via the Amazon backbone.
    Since S3-XA is an extension of S3, it uses standard TCP and HTTP and thus does not require any firewall exceptions or custom software installation.
  • This is how the flow of a request transferred through S3 XA looks like:
    The client’s request hits Route 53 which resolves the acceleration endpoint to the best POP latency wise.

    From there, S3 Transfer Acceleration selects the fastest path to send data over persistent connections to EC2 proxy fleet over HTTPS in the same AWS Region as the S3 bucket. We maximize the send and receive windows here to maximize customer’s utilization of the available bandwidth.

    From here, the request is finally sent to S3.

    The service achieves acceleration thanks to:
    - Routing optimized to maximize routing on AMZN network
    - TCP optimizations along the path to maximize data transfer
    - Persistent connections to minimize connection setup and maximize connection reuse

  • See how much geography hurts?

    In general, the farther your bucket, the more benefit from moving over the AWS network.
  • Just 2 small steps. The setup is that simple.

    Behind the scenes, a CloudFront distribution and R53 Alias record is created for every bucket endpoint and the request is routed through an accelerated path
  • To determine if S3-XA is something that will benefit you and your customers, we developed an S3-XA Speed Checker to compare the likely transfer speed for a given endpoint.
    The tool compares the upload speed of S3 and S3-XA from the destination where the tool is running on to a other S3 regions.
    Depending on where your S3 bucket lives, you can determine if S3-XA will give you the performance benefits you desire before turning the feature on.
  • Data Transfer In from Internet depends on the location from where the request originated.
    No request fees.
    Simple per GB pricing.

    Legal approved language on fast or free: For uploads only, Each time you use Amazon S3 Transfer Acceleration to transfer an object, we will check whether Amazon S3 Transfer Acceleration likely will be faster than a regular Amazon S3 transfer. To do this, we will use the origin location of the object transferred and the location of the Edge Location processing the accelerated transfer relative to the destination AWS Region. If we determine, in our sole discretion,  that Amazon S3 Transfer Acceleration likely was not faster than a regular Amazon S3 transfer of the same object to the same destination AWS Region, we will not charge for that use of Amazon S3 Transfer Acceleration.
  • Available directly in 1Gb or 10Gb port speeds.

    Through a partner, you can go down as low as 50Mbps. Some examples being any of the major Telcos which you may be working with already.  
    A partner can remove much of the administrative burden for managing your connectivity.
  • Worldwide locations
  • North America Direct Connect Locations

    4 AWS Regions
    10 Direct Connect Locations to leverage

    Each Direct Connect Location has a 1 to 1 relationship with a specific region for Private Connectivity (Private VIF to a VPC VGW)

    Each Direct Connect Location has a 1 to all relationship to the AWS Regions when using a Public Interface and Public Services (S3)

    The list is growing to provide more options and to bring the “last mile” distance down between your infrastructure and the chosen Region.
  • Now that we understand how to move data to AWS, lets discuss some options how to optimally store the data
  • S3 and EBS are the two most common services to leverage for Storage. S3 for Object storage, and EBS for block based storage via a EC2 mounted file system.

    I wanted to showcase some abilities and new(er) features in both of these services which may be of interest.
  • With each object that you store in S3, there are 4 available storage classes.
    1. Standard - 11 9's durability...
    2. RRS - 4 9's of durability
    3. SIA - 11 9's of durability, less available and duration/access taxes
    4. Glacier - 11 9's of durability, 3-5 hour SLA for object access retrieval (very cold)

    We will primarily focus Standard, SIA and Glacier
  • In looking at these three storage classes (Standard, SIA and Glacier) you can see their purpose with respect to the expected Hotness of the data stored within their respective class.

    S3 Standard is for your "Hottest" data that you need to have the protection Standard provides but also the direct and immediate accessibility of it.

    S3-IA – Keeps your data warm, just in case you need direct and immediate access to it. But you are going to pay a request fee per object.

    Glacier – This is your archive data which isn’t meant to be directly accessible. There is a 3-5 hour SLA for each object to be retrieved. This is Cold, Archive data

    Lifecycle Policies can managed the change of Storage Class for your objects to meet your business rules for the type of data stored in a bucket.

  • As you move from Standard (Hot) object classes down to Glacier (Cold), your storage price at the object level decreases, however the accessibility decreases and you are price for object retrieval increases.

    Prices vary between regions. These prices are representing our US-East-1 region
  • Recently, a new volume type was added to the Elastic Block Store service.

    These are the ST1 and SC1 volume types which are classified as being Throughput Optimized versus measuring in terms of IOPS like the other volumes.
  • The volumes work based on a burst credit model, similar to the T2 EC2 instance family which have a CPU bursting model.

    Depending on the Volume type (SC1, ST1) you will have a baseline level of throughput and also based on the volume type and size, you will receive additional credit and burst ceilings for each TB of volume size.

    These volumes are optimized for sequential workloads.

    Historically for these use cases, you would launch EC2 instances which have Ephemeral/Instance storage available to the EC2 instance. The quantity and type of storage varies from instance type to instance type (availability as well).

    If you need a lot of local, ephemeral storage for your application you were forced to choose a very large EC2 instance, even though you may not have needed all of the CPU/Memory resources that came with it. Also, you need to build in the data protection for the data as well since these are temporary or Ephemeral disks. Meaning, as soon as you shutdown the server everything is erased. So you were potentially doing some replication to a standby system for the resiliency as you can’t perform EBS snapshots against these volumes either.

    Now that you can allocate and attach EBS volumes, you can leverage EC2 snapshot capability for your RPO/RTO objectives and you can right size the EC2 instance type to meet your CPU/Memory and performance requirements.

  • As I mentioned before, these are optimized for Sequential I/O. Those credits available are depleted based on the size of the I/O requests.

    So if you do a bunch of very small random IO requests, each request will deplete a full 1MB I/O credit.

    Sequential IO requests are merged and depleted at the same 1MB I/O credit, but you will deplete them much slower.
  • You can still provision up to a 16TB volume, but you have to start at 500GB.

    The price benefits can be substantial, depending on the I/O requirements.

  • Now we move into the Process Phase of the Journey.
  • There are many more services available that would meet the Process Phase of our Data.

    However, I wanted to spend a little time on a few.

    RDS is our Relational Database service where you can outsource the administration and operational tasks for your RDBMS to us and you can focus on the schema, data and the security of it.

    A newer capability we have is the Database Migration Service. I wanted to give some details about how this service can be used

    Kinesis is a great service for handling the ingestion and processing of streaming data. There are some new features that you may not be aware of.
  • Quick overview of what RDS is.

    This is a Deep Dive so there are some assumptions that some of the basics with RDS and the benefits are already understood.
    We are going to touch on many of these in more depth throughout the presentation.

    RDS is a managed database service. This service allows you more time to focus on your application: You focus on Schema Design, query construction, query optimization, and building your application.

    Infra Mgmt
    AWS does patching
    AWS Handles backup and replication
    AWS manages the Infrastructure and making sure that it is healthy
    You focus on your application
    HA and automated failover management
    High end features that you can do on your own but you get automatically.

    Instant Provisioning
    Simple and Fast to deploy
    When you need to launch a new database or change your existing one you can at any point in time with no need to wait for infrastructure to be ordered or configured.

    Scale up/Down
    Simple and fast to scale. You can change your configuration to meet your needs when you want to.

    No Cost to get started
    Pay only for what you consume

    Application Compatibility
    * Six different engines to choose from
    * There are many popular applications, or even your own custom code code, that you may be running on your own infrastructure and it and it can still work on RDS. If you are using one of the engines that are currently supported then there is a good chance you can get it working on RDS.

    When you think about all that it takes to get new database infrastructure and an actual database up and running there are a lot of things that an expert DBA and infrastructure person would have to do. With RDS you are getting this with just a few clicks and are up and running in a manner of minutes.
  • There are lots of different choices for your database engine on RDS. Each of these engines operate differently, offer different functionality, and have different licensing requirements.
    Everyone has their favorite engine and they use them for specific purposes.

    On the commercial side we have Oracle and Microsoft SQL Server
    On the open source side we have MySQL, PostgreSQL, and MariaDB
    And in its own category we have Amazon Aurora which is a My SQL compatible relational database built to take advantage of many of the properties that exist with modern cloud computing.

    MariaDB -
    MariaDB – Fork of the MySQL database. Led by the original developers of MySQL after concerns when it was acquired by Oracle and that the project might become a closed project. Works to maintain high compatibly with MySQL. Also has features to support non-blocking operations and progress reporting.
  • RDS is a managed service so in some cases you cannot do everything like you might do with a database running on EC2 or in your own data center. AWS is doing some of the administration so there are some tradeoffs.
    It is important for you to understand some of the limitations that exist within RDS as you look to use it.

    The RDS service fully manages the host, operating system, and database version that you are running on. This takes a lot of burden off your hands but you also get no access to the database host operating system, limited ability to modify configuration that is normally managed on the host operating system, and generally no access to functions that rely on configuration from the host operating system. If one of the reasons that you primarily access the host operating system is for metrics we have made some improvements in that space in order to help you along and we will talk about those later on.

    All of your storage on RDS is also managed. Once again this takes a lot of burden off of you from an administrative standpoint but it also means there are some limits. You can’t just order more or larger disks and have them swapped in. You cannot connect your database to a different backend SAN. There are storage limits of 4TB with SQLServer, 6TB with MySQL, MariaDB, PostgreSQL, and Oracle, and 64 TB with Aurora. If you choose to grow the size of your database you have to actually tell the RDS service that you want more storage so that it can provision more storage. If you have hit the max then that is all you can do and you will have to figure out if you need to shard across multiple RDS instances, purge some of your current data, or maybe look at archiving old data to another environment.

    There are gaps in what you can do with a fully managed vs RDS but the gap is narrowing as we roll out new functionality.
  • For a more robust database architecture you are going to want to look at having a Multi Availability Zone configuration.

    With a Multi Availability Zone configuration you are going to chose which Availability Zone you want your primary database instance to be in. The RDS service will then choose to have a standby instance and storage in another availability zone of the AWS Region that you are operating in. The instance will be of the same type as your master and the storage will the the same configuration and size as your primary.

    The RDS service will then take responsibility for ensuring that your primary is healthy and that your standby is in a state that you can recover to. Your data on the primary database is regularly replicated to the storage in the standby configuration. This standby is only there to handle failover from your primary, it is not something that you can login to or access when the primary is up and working.

    The failover conditions that this configuration handles are:
    Loss of availability in primary AZ
    Loss of network connectivity to primary
    Compute unit failure on primary
    Storage failure on primary

    SQL Server uses mirroring to support this functionality
  • Move data to the same or different database engine
    ~ Supports Oracle, Microsoft SQL Server, MySQL, PostgreSQL, MariaDB, Amazon Aurora, Amazon Redshift (soon)

    Keep your apps running during the migration
    ~ DMS minimizes impact to users by capturing and applying data changes

    Start your first migration in 10 minutes or less
    ~ The AWS Database Migration Service takes care of infrastructure provisioning and allows you to setup your first database migration task in less than 10 minutes

    Replicate within, to or from AWS EC2 or RDS
    ~ After migrating your database, use the AWS Database Migration Service to replicate data into your Redshift data warehouses, cross-region to other RDS instances, or back to on-premises

    The Schema Conversion Tool is available for your more complicated heterogeneous platform migrations – Oracle DB -> MySQL/Aurora/PostgreSQL for example
    - It will analyze objects such as database views, stored procedures and functions and convert that logic over to the target database.
    - Anything not converted is clearly marked for review

  • Amazon Kinesis – Service for Data streaming
  • Easy to use: Focus on quickly launching data streaming applications instead of managing infrastructure.
    Real-Time: Collect real-time data streams and promptly respond to key business events and operational triggers.
    Flexible: Choose the service, or combination of services, for your specific data streaming use cases.
  • Three different capabilities available (or will be soon) within Kinesis

    Kinesis Streams – For your near real time data streaming and associated processing application. Highly customizable and scalable.
    Kinesis Firehose – Simplified data ingestion endpoint for consolidating and loading of data into other AWS Services such as S3.
    Kinesis Analytics – Perform inline analysis on your data streams using standard SQL queries. In preview at the moment.
  • Lots of way to process streaming data, just like there are lots of ways to process batch data with Hadoop
    Open source from AWS

    Ingestion data from many data sources
    Load data into one or more data targets from a single Stream for specific analytic or retention requirements

  • A Kinesis Firehose stream can load data directly to S3, Redshift and Amazon Elasticsearch. One or all three can be leveraged by the same stream to meet your specific requirements.

    The service manages the scalability of the underlying resources to meet your streaming data requirements.
  • Now to the final Phase which is Deliver.
  • These services will help in the delivery of Insights and analytics gathered from the data.
  • Amazon Quicksight
  • Questions that organizations have for their data
  • Traditional Business Intelligence tools are very expensive and complicated to implement.
  • Quicksight makes it easy for all employees to build visualizations, perform ad-hoc analysis, and quickly get business insights from their data.

    Quicksight integrates automatically with other AWS Data services such as RDS, RedShift, S3 or even flat files.

    Super-fast, Parallel, In-memory Calculation Engine(SPICE) – Enabling users to run interactive queries against complex datasets and get rapid responses

  • Relational databases (Amazon RDS. Aurora, Amazon Redshift)
    NoSQL databases (Amazon DynamoDB)
    Amazon EMR, Amazon S3, files (CSV, Excel, TSV, XLF, CLF)
    Streaming data sources (Amazon DynamoDB, Amazon Kinesis)
  • If you are using an ELK stack presently (Elasticsearch, Logstash, Kibana) then you know the complexity involved with managing this platform.

    The new Amazon Elasticsearch Service removes the administrative/operational burden allowing you to focus on the indices and visualizations.
  • Amazon ES offers several features which we will go through in more detail shortly but some quick highlights.
    There are several options with the console, SDK, or CLI to eaasily set up the cluster with optimized configuration to match your application needs
    The service exposes the underlying Elasticsearch API so you can easily migrate existing workloads, It comes with built in Kibana and we have released a logstash output plugin that makes it easy for you to connect your logstash instances to your domains running in Amazon ES
    You have several options to secure your cluster using AWS IAM. We will walk through this in more detail
    The service also comes with several integrations with other AWS services like CloudWatch logs, DynamoDB etc to make the experience of connecting all these services a lot easier for you.
  • Here is a sample dashboard using Kibana 4 running on an Amazon ES domain. This shows VPC flow logs data being visualized in a Kibana dashboard.
  • This list has changed. We now have 56 Edge locations located around the world.

    This is on top of the 12 Regions.

    If you are able to bring content closer to your users, the experience will be better.
  • So no matter if you are using a custom origin or AWS, and no matter the content type, CloudFront will work with you to improve your users’ experience.

    User to CloudFront
    Routing based on lowest latency
    SSL termination close to viewers
    CloudFront to Origin
    TCP optimizations
    Keep-alive connections
    Network paths monitoring
    http verb optimization (get,put,etc)

  • Let’s talk about why we built the WAF based on customer feedback.
    Initially the WAF will be a CDN offering, but will be extended shortly after launch to include ELB