5. Twister
• Iterative Mapreduce
• Configure once use many times
• Map -> Reduce -> Combine
• Static data configured with partition file
reused through iterations
• Provides Fault tolerant solution
8. Data Structures
• Vector
• String delimited by coma
• StringValue
• HashMap<String, Integer>
9. Inputs
• Configuration file
– Number of items & transactions
– Minimum support count %
• Partition file
– Split data
– Number of items & transactions
13. Time vs. Transactions
Time vs Transactions
14
12
10
8
Time vs Transactions
6
4
2
0
10000 20000 30000
14. Time vs. Itemsets
Time vs Item sets
250
200
150
Time vs Item sets
Seconds
100
50
0
25 50 75
Itemsets
15. Time vs. Itemsets
Time vs Item sets
250
200
150
5 Mappers
Time vs Item sets
Seconds
100
50
20 Mappers
0
25 50 75
Itemsets
16. Implementation of Classifier Tool in Twister
Magesh khanna Vadivelu, Shivaraman Janakiraman
magevadi@indiana.edu, shivjana@indiana.edu
Motivation: Architecture: Results:
Time vs. Itemsets.
Mining frequent item-sets from large-
scale databases has emerged as an
important problem in the data mining
and knowledge discovery research
community. To overcome this
problem, we have proposed to
implement Apriori algorithm, a
classification algorithm, in Twister, a
Twister has several components. Client
distributed framework, that makes use Time vs. Transactions.
side is to drive MapReduce jobs.
of MapReduce. We specify a map
Daemons and workers which live on
function that processes a key-value pair
compute nodes manage MapReduce
to generate a set of intermediate key-
tasks. Connection between
value pairs, and a reduce function that
components are based on SSH and
merges all intermediate values
messaging software. To drive
associated with the same intermediate
MapReduce jobs, firstly client needs to
key. Our implementation of Apriori
configure the job. It configures
algorithm runs on a large cluster of
MapReduce methods to the More transactions increases the
machines and is highly scalable. On an
job, prepares KeyValue pairs and execution time but not as much as
application level, we can use this
configures static data to MapReduce Itemsets. This behavior is because
Apriori algorithm to identify the pattern
tasks through partition file if required. transactions are static data cached
in which customers buy products in a
Messages are transmitted through a in memory for each map-reduce
supermarket.
network of message brokers with cycle. Whereas Itemsets are
publish/subscribe mechanism. broadcasted for each map reduce.