There are several ways of collecting big data, one the most promising is S3/CloudFront logging. It’s low cost and quick to implement. Let's dive in and see how to setup S3/CloudFront logging with your application.
1. COLLECTING BIG DATA WITH
S3/CLOUDFRONT LOGGING
Moty Michaely, VP R&D
Xplenty Data Integration-as-a-Service
2. In our recent article, “Scale Your Data Collection on the Cloud
Like a Champ”, we reviewed several ways of collecting big
data, the most promising of which was S3/CloudFront
logging. It’s low cost and quick to implement. Now we’d like
to dig deeper and show how to setup S3/CloudFront logging
with your application.
3. DEFINE APP DATA
Sit back and think - which data would you like to collect? Which app
events should be logged? These could be page visits, mouse clicks, logins,
errors, etc. Some of them may include parameters such as the page visit
URL. Write them all down. Be as thorough as possible so you don’t lose
any precious data.
4. CREATE AN AWS ACCOUNT
If you don’t already have an AWS (Amazon Web Services) account, you
can sign up here. Registration is free with the basic support package.
5. CREATE AN S3 BUCKET
Go to the S3 dashboard and create a bucket for saving the logs. Note that
the bucket must have a unique name across Amazon’s service and adhere
to DNS rules: 3-63 characters, only letters numbers and periods, shouldn't
look like an IP address, and no underscores. Don’t turn on logging - we will
do so via CloudFront.
(See the screenshot on the next slide for a visual explanation)
7. CREATE EVENT IMAGES
Set up directories in the image bucket, for example /mouse, to organize
events by categories, and create 1x1 pixel images (see previous post) for
all the events that you defined in the first step, e.g. click.png, login.png,
error.png. Don’t worry about event parameters at the moment, we will
deal with them shortly.
All files uploaded to S3 are set as private, so make sure to change the file
permissions to public. You may use tools such as CloudBerry
Explorer or S3 Browser to do so and much more.
8. CREATE EVENT IMAGES CONT.
Set HTTP headers for all the images so that they will be cached by
CloudFront, thus saving GET requests from CloudFront edge locations to
S3. Go to the relevant bucket, check the image files on the left, click
Actions at the top, choose Properties, and open the Metadata section.
Add the following metadata line and click save:
▪ Cache-Control: max-age=31536000
10. CREATE A CLOUDFRONT DISTRIBUTION
Creating a CloudFront distribution costs extra, but it’s mandatory - it logs
the query string, adds extra log info such as edge locations, and helps to
deliver files via Amazon’s CDN to shorten load times. Access
the CloudFront dashboard and create a web distribution for the image S3
bucket. Make sure that Use Origin Cache Headers is set under Object
Caching (it’s the default setting).
11. CREATE A CLOUDFRONT DISTRIBUTION
CONT.
Note that the distribution gets a random domain name. It could take a
while before it starts working because the DNS servers need to be
updated to support it. You can also set a more friendly domain using the
Alternate Domain Names (CNAMEs) option under Distribution Settings,
though it requires configuring your DNS settings so that your domain
points to CloudFront’s domain name. See Amazon’s documentation for
more info.
14. TURN LOGGING ON
Still in the CloudFront dashboard, check the distribution on the left, click
Distribution Settings at the top, click Edit under the General tab, enable
logging, and insert the bucket where you want to store the logs.
17. CODE A FUNCTION TO CALL EVENTS
Time to get your hands dirty and write a method that registers events, or
call one of your app’s developers to do it for you. The code could be on
the client side, server side, or both depending on the architecture. The
method should simply send an asynchronous HTTP GET request to the
relevant image URL, e.g. to http://logs.xplenty.com/mouse/click.png (links
in this format for demo purposes only, not operational).
If you need to send additional event parameters, use the query
string (don’t forget URL encoding), e.g.
http://logs.xplenty.com/mouse/click.png?id=login&url=http%3A%2F%2Fw
ww.example.com%2Flogin
18. EXAMPLE CODE TO CALL EVENTS
$.CloudFrontLog = function (attr) {
var url = 'http://logs.xplenty.com/' + attr.category + '/' + attr.action +
'.png',
data = {
id: attr.id,
url: attr.url
};
return $.get(url, data);
};
19. CALL THE EVENTS
Dig through your app’s code and add event calls using the method
that you’ve just written. This will collect the data that you defined in
step 1. Here’s a jQuery code sample for logging client-side button
clicks:
$('.btn').click(function(e) {
var id = $(this).attr('id');
$.CloudFrontLog({
action: 'click',
category: 'mouse',
id: id,
url: location.href
});
});
20. TEST
Use your staging environment to call events via the application and check
that the logs are generated accordingly. Patience young padawan, it may
take an hour or so until Amazon writes them.
21. GO LIVE!
Everything should be ready for you to collect big data like a champ -
update the production environment and let the logging begin. Don't know
what to do with the data? See how to analyze AWS logs in 15 minutes.