Se ha denunciado esta presentación.
by Nikita Patil, Associate Solutions Architect AWS
AWS Data & Analytics Week is an opportunity to learn about Amazon’s family of managed analytics services. These services provide easy, scalable, reliable, and cost-effective ways to manage your data in the cloud. We explain the fundamentals and take a technical deep dive into Amazon Redshift data warehouse; Data Lake services including Amazon EMR, Amazon Athena, & Amazon Redshift Spectrum; Log Analytics with Amazon Elasticsearch Service; and data preparation and placement services with AWS Glue and Amazon Kinesis. You'll will learn how to get started, how to support applications, and how to scale.
Amazon Elasticsearch Service
Hands-on Lab: Log Analytics
© 2017 Amazon Web Services, Inc. and its affiliates. All rights reserved. 2
This lab demonstrates the basic steps required to get started with Amazon Elasticsearch Service: creating
clusters, cluster node configurations, storage configurations, and Identity Access Manager (IAM) Policies
By the end of this lab you will be able to:
• Deploy an Amazon Elasticsearch Service domain
• Create an AWS CloudWatch Log Group
• Subscribe an Amazon CloudWatch Log Group to Amazon Elasticsearch Service
• Monitor Amazon Elasticsearch Cluster Metrics
• Some familiarity with IAM Roles and EC2 Instances is recommended. Previous kowledge of Kibana and
Elasticsearch is desirable.
• Have installed node.js and aws-es-proxy (available via npm).
• Have correctly installed and configured the AWS command line interface. See the documentation for
Amazon Elasticsearch Service
Amazon Elasticsearch Service introduction
Amazon Elasticsearch Service is a managed service that makes it easy to deploy, operate, and scale
Elasticsearch in the AWS cloud. Elasticsearch is a popular open-source search and analytics engine for use
cases, such as log analytics, real-time application monitoring, click stream analytics, and text search.
With Amazon Elasticsearch Service, you get direct access to the Elasticsearch open-source API so that existing
code and applications will work seamlessly. You can set up and configure your Amazon Elasticsearch cluster in
minutes from the AWS Management Console.
Amazon Elasticsearch Service provisions all the resources for your cluster and launches it. Amazon
Elasticsearch Service automatically detects and replaces failed Amazon Elasticsearch nodes, reducing the
overhead associated with self-managed infrastructures. You can deploy an Amazon Elasticsearch cluster in
minutes using the AWS Management Console. There are no upfront costs to set up Amazon Elasticsearch
clusters, and you pay only for the service resources that you use.
Amazon Elasticsearch Service offers the following benefits of a managed service:
• Simple cluster scaling via API
• Self-healing clusters
• High availability on-demand
© 2017 Amazon Web Services, Inc. and its affiliates. All rights reserved. 3
• Automatic cluster snapshots for data durability
• Cluster monitoring
Prerequisites components of Amazon Elasticsearch Service
Amazon Elasticsearch Service contains the following components:
Domain: An Amazon Elasticsearch domain comprises an Elasticsearch cluster – hardware and software – along
with additional hardware and software providing load-balancing, security, and monitoring. The domain is
exposed by service endpoints for Amazon Elasticsearch Service, with a name that must meet the following
• Uniquely identifies a domain within an AWS account
• Starts with a letter or number
• Contains at least three characters, but not more than 28 characters
• Contains only lowercase characters a-z, the numbers 0-9, and the hyphen (-)
Cluster: A cluster is a collection of one or more data nodes, optional dedicated master nodes, and storage
required to run Elasticsearch .
Node: A node is single instance within an Elasticsearch cluster that has the ability to recognize and process or
forward messages to other nodes.
Storage: Amazon Elasticsearch Service supports two distinct storage types, the Instance (default) storage or
Elastic Block Store (EBS) – general purpose (SSD), provisioned IOPS (SSD), and magnetic.
Amazon Elasticsearch Service is commonly used with the following services:
Use AWS CloudTrail to get a history of the Amazon Elasticsearch Service API calls and related events for your
account. CloudTrail is a web service that records API calls from your accounts and delivers the resulting log files
to your Amazon S3 bucket. You can also use CloudTrail to track changes that were made to your AWS
An Elasticsearch domain automatically sends metrics to Amazon CloudWatch so that you can gather and
analyze performance statistics. You can monitor these metrics by using the AWS CLI or the AWS SDKs. You
can also easily subscribe a CloudWatch Logs log group to an Amazon Elasticsearch Service domain to load the
data in that log group into Amazon ES.
Amazon Kinesis is a managed service that scales elastically for real-time processing of streaming data at a
massive scale. Amazon Elasticsearch Service provides Lambda sample code for integration with Kinesis.
© 2017 Amazon Web Services, Inc. and its affiliates. All rights reserved. 4
Amazon Simple Storage Service (Amazon S3) is storage for the Internet. You can use Amazon S3 to store and
retrieve any amount of data at any time, from anywhere on the web. Amazon Elasticsearch Service provides
Lambda sample code for integration with S3.
AWS Identity and Access Management (IAM)
AWS Identity and Access Management (IAM) is a web service that you can use to manage users and user
permissions in AWS. Use IAM to create user-based or IP-based access policies for your Amazon Elasticsearch
Amazon Elasticsearch Service Integration with Other Services
Amazon Elasticsearch Service integrates with the following services to provide data ingestion:
AWS Lambda is a zero-administration compute platform for back-end web developers that runs your code in the
AWS cloud and provides you with a fine-grained pricing structure. Amazon Elasticsearch Service provides
sample code to run on Lambda that integrates with Amazon Kinesis and Amazon S3.
Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance
with seamless scalability. Amazon Elasticsearch Service provides a Logstash plugin to support DynamoDB
streams and sign AWS service requests.
Create and Test an Amazon Elasticsearch Domain
Deploy an Amazon Elasticsearch Service Domain
1. In the AWS Management Console, click Elasticsearch Service.
2. Click Get Started or Create a new domain depending on whether you have an existing domain.
3. In Step 1: Elasticsearch domain name, type mytestdomain into the box, and click Next.
4. In Step 2: Configure cluster page, under the Node configuration section, select the following information
from the drop-down lists:
a. In the Instance count text box, leave 1 (the default).
This is the number of instances that you wish to deploy with your cluster.
b. In the Instance type section, select t2.small.elasticsearch (free tier eligible).
When you launch an instance, the instance type that you specify determines the hardware of the host
computer used for your instance. Each instance type offers different compute, memory, and storage
c. Do not select Enable dedicated master.
A dedicated master node is a cluster node that performs cluster management tasks, but does not hold
data or respond to data upload requests. This offloading of cluster management tasks increases the
stability of your Amazon Elasticsearch clusters. We recommend that you avoid allocating dedicated
master nodes for all small and short-lived Amazon Elasticsearch domains.
© 2017 Amazon Web Services, Inc. and its affiliates. All rights reserved. 5
d. Do not select Enable zone awareness.
If you enable zone awareness, you should use Amazon Elasticsearch API to set up replicas for your
cluster. Amazon Elasticsearch Service will distribute replicas across the nodes in Availability Zones.
This will increase the availability of your cluster.
5. In the Storage Configuration section, select the following information from the drop-down lists:
a. Storape EBS
Elasticsearch provides two different storage option types: Instance store and Elastic Block Store
b. EBS volume type General Purpose (SSD)
General Purpose (SSD) storage is suitable for a wide variety of database workloads that have
moderate I/O requirements. The baseline of 3 IOPS per GB and the ability to burst up to 3,000 IOPS
will provide you with predictable performance well-suited to many applications.
c. EBS volume size 10
Amazon EBS provides durable, block-level storage volumes that you can attach to a running instance.
You can use Amazon EBS as a primary storage device for data that requires frequent and granular
6. In the Snapshot configuration section, select the following from the drop-down lists:
a. Automated snapshots start hour 00:00 UTC (default)
7. In the Advanced options section, leave the values at the default settings.
8. Click Next.
9. In the Step 3: Set up access page, select the following from the drop-down lists:
a. Select the radio button for Public access.
Public access domains have public IPs for their endpoints, resolved through DNS.
b. Under Access policy, select Allow or deny access to one or more AWS accounts or IAM users
from the drop down menu.
c. In the resulting dialog
i. Set the “Account ID or ARN” to: Your AWS account ID
ii. Click OK.
10. Click Next.
11. Review the system configuration, and click Confirm.
Note: The service can take ten minutes to deploy. While waiting for the service to deploy, you can complete
the steps in the next section.
Create a CloudTrail Log Group for Amazon Elasticsearch Service
12. Click Services.
13. Under Management Tools, right-click CloudTrail, and click Open link in new tab.
© 2017 Amazon Web Services, Inc. and its affiliates. All rights reserved. 6
Note: This allows you to keep the Elasticsearch dashboard open so you can monitor the cluster creation
14. Hide the Services drop-down menu by clicking the icon below.
15. Switch to the new tab, and click Trails in the left navigation pane.
16. Click Create trail.
17. Name the trail mytestdomain-trail.
18. Leave Apply trail to all regions set to Yes.
19. Leave Management events set to All.
20. Skip the Data events section.
21. Under Storage location type mytestdomain-s3 as the S3 Bucket.
22. Skip the Advanced section.
23. Click Create.
24. You will be returned to the CloudTrail dashboard. Click mytestdomain-trail to open the configuration page.
25. Scroll down to the CloudWatch Logs section.
26. Under CloudWatch Logs, click Configure.
Note: If the Configure button is not visible, click the pencil icon to open the log creation wizard.
27. In the New or existing log group* text box, delete the existing text, and enter CloudTrail/mytestdomain-
28. Click Continue. This will bring you to the IAM console, to enable CloudTrail to write to CloudWatch Logs.
29. Click Allow.
Note: Wait until the console returns to the CloudTrail Configuration page. To generate more CloudTrail
log data, view some of the different AWS Services, such as Amazon VPC, Amazon EC2, or any of the other
services in the AWS Management Console. While CloudTrail generates these logs, they will be sent to your
Create an IAM role for Lambda to write to Amazon ES
This is a workaround for a current bug in the console.
30. Navigate to the IAM console.
31. Click the Policies tab in the left navigation window
© 2017 Amazon Web Services, Inc. and its affiliates. All rights reserved. 7
32. Click Create policy.
33. Click the JSON tab
34. Paste the following
35. Click Review policy.
36. Name the policy PostToAllAmazonESDomains.
37. Click Create policy.
38. Select the Roles tab in the left navigation pane.
39. Click Create role.
40. Leave AWS service selected under Select type of trusted entity.
41. Select Lambda as the service that will use this role.
42. Click Next: Permissions.
43. Type PostToAllAmazonESDomains in the search box.
44. Click the check mark next to the policy in the search results.
45. Click Next: Review.
46. Set the Role name to PostToAllAmazonESDomains
47. Click Create role.
Subscribe a CloudWatch Log Group to Amazon Elasticsearch Service
48. Switch back to the Elasticsearch dashboard browser tab and wait until the mytestdomain cluster Domain
status changes to Active.
Note: Here is a link to some additional reading while waiting for the cluster to go Active:
© 2017 Amazon Web Services, Inc. and its affiliates. All rights reserved. 8
49. Enter the CloudWatch dashboard by clicking Services.
50. Under Management Tools, right-click CloudWatch.
51. Switch to the CloudWatch Logs tab in your browser and click Logs in the left navigation pane.
52. In the Log Groups page, tick the Log Group box you just created, CloudTrail/mytestdomain-trail.
53. Click the Actions drop-down list, and select Stream to Amazon Elasticsearch Service.
54. In the Amazon ES Cluster* drop-down list, select the Amazon Elasticsearch Cluster that was created
55. In the Lambda IAM Execution Role drop-down list, select Create new IAM role, and click Allow.
[Workaround: choose the role you created above]
56. Click Next.
57. In the Log Format drop-down list, select AWS CloudTrail.
58. Ignore the Subscription Filter Pattern section. This is not required for this exercise.
59. In the Select Log Data to Test section, click Test Pattern. You should see 50 matches under Results.
60. Click Next.
61. Review and click Next again.
62. Click Start Streaming.
63. A Success page should now be displayed informing you of the service activation.
64. Click Close.
65. Close the current browser tab.
Use Kibana to visualize your CloudTrail logs
66. Return to the Elasticsearch Service browser tab you initially opened.
67. Return to the Elasticsearch Service dashboard.
68. In the left navigation pane, click mytestdomain.
69. Copy your domain’s Endpoint.
70. Run aws-es-proxy on your laptop, using the endpoint from the last step.
71. In your browser, navigate to http://localhost:9200/_plugin/kibana
72. In the top right, click Set up index patterns.
73. In the Index Pattern text box, type the index name or copy-paste from below (it has the format: cwl-
74. Click Next Step.
75. Drop down the Time Filter field name, and select @timestamp.
© 2017 Amazon Web Services, Inc. and its affiliates. All rights reserved. 9
76. Click Create Index Pattern.
77. Click the Discover tab to view the timestamp Logs events.
Note: By default, we can see logs for the last 15 minutes. You can change the interval by clicking on the
small clock symbol in the top right corner.
These are some of the log attributes that are displayed in the Kibana dashboard:
• PrincipalId - A unique identifier for the entity that made the call. For requests made with temporary
security credentials, this value includes the session name that is passed to the AssumeRole,
AssumeRoleWIthWebIdentity, or GetFederationToken API call.
• AccountId - The account that owns the entity that granted permissions for the request. If the request
was made using temporary security credentials, this is the account that owns the IAM user or role
that was used to obtain credentials.
• AccesKeyId - The access key ID that was used to sign the request. If the request was made using
temporary security credentials, this is the access key ID of the temporary credentials.
• SessionContext - If the request was made with temporary security credentials, the SessionContext
is an element that provides information about the session that was created for those credentials.
Sessions are created when any API is called that returns temporary credentials. Sessions are also
created when users work in the console and when users make a request using APIs that include
78. Click the disclosure triangle next to one of the search results.
79. Scroll down to reveal the fields and values that CloudTrail sends.
80. Click to the right of the awsRegion, eventName, eventSource, eventType, and userIdentity.arn
81. Scroll to the top of the list and you will see column headers for the fields you selected, along with the first
row of values. Click the disclosure triangle at the left edge of the first row of values to collapse it and see the
data in tabular form.
82. You can easily filter out some logs (rows) that you don’t want to see. Click the disclosure triangle to open
one of the table’s rows. Click the next to the eventName’s value for the field. This will filter out all rows
in the table that have the same value.
83. Kibana shows you your current filters at the top of the page, right below the search bar.
84. If you hover over the filter, you will see additional controls, including a trash basket you can use to remove
© 2017 Amazon Web Services, Inc. and its affiliates. All rights reserved. 10
Create a visualization
You create visualizations in Elasticsearch to analyze your data and reveal patterns. You can save your
visualizations and build them into dashboards that update in near real time.
85. Click the Visualize tab in the left navigation pane.
86. Click Create a visualization.
87. Select the Pie visualization.
88. Under From a New Search, Select Index, you’ll see your index name. Click it.
89. Click Split Slices.
90. Under Aggregation, select Terms.
91. Under Field, select eventSource.keyword (you may have to scroll down to the Strings section of the
92. Click the control to reveal a pie chart of different services
93. Click Add sub-buckets.
94. Click Split Slices.
95. From the Sub Aggregation menu, select Terms.
96. From the Field menu, select userIdentity.arn.keyword.
97. This shows you which user has called the different services. (Some calls may not include a user arn.)
98. At the top of the screen, click Save.
99. Name your visualization Services and identities pie.
100. Click Save.
Create a Time-Based Visualization
101. Click the Visualize tab again, then click the button to create a new visualization.
102. Choose a Line visualization.
103. Click your index name.
104. Under Buckets, click X-Axis.
105. Select Date Histogram from the Aggregation menu. This reveals a Count of all events on the Y axis.
106. You can Add sub buckets, e.g. with a Terms aggregation again to get a graph of the occurences of
different field values.
107. Instead, click the disclosure triangle next to Y-Axis.
108. Under Aggregation, select Unique Count. You could also create sums, mins, maxes, etc. These are
useful for monitoring quantities such as CPU, and data flow.
109. Select eventName.keyword from the Field drop down.
© 2017 Amazon Web Services, Inc. and its affiliates. All rights reserved. 11
110. This yields a graph of the count of different calls that you are making to various AWS services. Click
Save, and save your visualization as Unique calls.
Continue to experiment with different visualizations to see what you can discover about your account and the
resources that you are using.
Create a Near Real Time Dashboard
111. Click the Dashboard tab.
112. Click Create a dashboard.
113. Click the Add button.
114. Click the Services and identities pie.
115. Click Unique calls.
116. Click Add again at the top of the screen to collapse the visualizations list.
117. Click Auto refresh at the top of the screen. This drops down a list of choices for the frequency of refresh
118. Click 10 Seconds. Kibana refreshes all data in all tabs every 10 seconds. You can experiment with the
AWS console and see the results show up in Kibana.
Monitoring Amazon Elasticsearch Cluster Metrics and Statistics
119. Return to the Elasticsearch Service dashboard.
120. Click mytestdomain.
121. Click the Indices tab.
This shows you the documents that have been uploaded to Kibana. Click the drop-down arrows to display
further data: Count, Size in bytes, Query total, and Mappings.
122. Click the Monitoring tab.
This displays the various statistics of the cluster: CPU Utilization, Read Latency, Write Latency, and other
core system resources will be displayed.
123. Scroll down, and select the CPUUtilization metric.
124. From the Statistic drop-down list, select Maximum.
125. Click Update graph.
Congratulations! You now know how to:
• Deploy an Amazon Elasticsearch Service domain
• Create an AWS CloudTrail Log Group
• Subscribe an Amazon CloudWatch Log Group to Amazon Elasticsearch Service
© 2017 Amazon Web Services, Inc. and its affiliates. All rights reserved. 12
• Monitor Amazon Elasticsearch Cluster Metrics
• For more information about Elasticsearch, see https://aws.amazon.com/elasticsearch-service/
• For more information about Cloudtrail, see https://aws.amazon.com/cloudtrail/
• For more information about Cloudwatch, see https://aws.amazon.com/cloudwatch/