3. A quick intro about Beaconstac 1
Beaconstac is a proximity marketing and analytics platform for
beacons
Several beacon specific events are defined to aid proximity marketing
The events include Camp on event, beacon exit event, region enter,
region exit etc.
Beaconstac analytics platform makes it easy for
managers/marketers/developers to analyze event data
Components include Beaconstac iOS/Android sdk, beaconstac portal
4. Why Hadoop? 1
Collect event logs generated from Beaconstac SDK usage
Needed a system to answer queries like
o Heat map of beacons by the number of visits received in a specified time
interval.
o Heat map of beacons by the amount of time spent in a specified time
interval.
o Average time spent by users near different beacons
o Last seen per user
o Last seen per beacon
o Analyzing data with custom attributes filters
o Traversed path in an area by individual users
5. Leveraging Amazon's EMR for Beaconstac
Analytics
1
Amazon's Streaming API for writing mapper and reducer functions in Python
Input - Copy programs to Amazon S3
Output – Copy the processed/output data to S3
Initial tests were run using Amazon's EMR console. Here you can
define the following -
1) Cluster configuration – Name, Termination protection, Logging,
logs location on S3 etc.
2) Software configuration – Hadoop AMI version, applications to be
installed on startup etc.
3) Hardware configuration – Types of nodes – master, Core and
Task
4) Security keys, allowed users
5) Bootstrap actions – Configure Hadoop, Custom actions etc.
6) Steps – Streaming program, Hive program, Pig program
9. How Does AWS Data Pipeline Work? 1
Pipeline definition - specifies the business logic of your data management
AWS Data pipeline web service - interprets the pipeline definition and assigns
tasks to workers to move and transform data.
Task runner - polls the AWS Data Pipeline web service for tasks and then
performs those tasks.
10. Morpheus version of Data pipeline 1
Runs every hour
Requires a Kafka
consumer script
Copy the
output to
Elastic
Search
Run EMR
jobs
Copy logs
from Kafka
to S3
Runs once every
day
Processes each
job and produces
output
Each job
comprises of
mapper and
reducer scripts
Runs once every
day
Inserts output in
Elastic search