This document summarizes Asakusa, an open source framework for developing and executing batch applications on Hadoop. It discusses how Asakusa uses domain specific languages to define batch workflows as directed acyclic graphs (DAGs) of operators, and compiles these into MapReduce jobs executed on Hadoop. It also describes components like ModelGenerator, Ashigel compiler, and ThunderGate for integrating with databases.
1. Hadoop
Asakusa
Rev. 1.0
2011/7/1
OSS
110701
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
2. Asakusa
Hadoop
Hadoop
→
13 100
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
2
3. Hadoop BI
4-5 20-30
IF
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
3
4. Hadoop
BI
MapReduce
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
4
5. Asakusa
Hadoop
BI/
MonkeyMagic
Oozie
Pig
Hive
MapReduce
Java
Core HDFS MapReduce
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
5
6. Hadoop Asakusa
Hadoop
Asakusa
DAG
DSL Ashigel
ModelGenarator
MapReduce
Ashigel
MonkeMagic
ThunderGate MySQL
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
6
7. Asakusa
DSL
I/F
DB Ashigel
ThunderGate
Importer/ Asakusa
Exporter/ MapReduce MonkeyMagic
Recoverer
JOB
JOB
Model
Generator
Hadoop Hadoop MonkeyMagic
HDFS
MapReduce
MySQL
Template
Generator
Test Driver
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
7
8. Asakusa
ModelGenerator
Writable DSL
ThunderGate
Ashigel Import/Export
API
DSL MapReduce
DAG DSL
DSL
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
8
9. Asakusa
Java Eclipse/Maven
MapReduce Model Generator
DSL Domain Specific Language Ashigel Compiler
Rumtime Library
ThunderGate
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
9
10. ModelGenerator
MapReduce
Java writable
ModelGenerator
MySQL DDL SQL) Table View
HadoopIO
Eclipse DSL
Table Hadoop
Hadoop Table
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
10
11. Ashigel
Hadoop DSL
BatchDSL
FlowDSL
DSL
OperatorDSL Map/Reduce
Map/Reduce
Map Reduce Map Reduce
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
11
12. DAG DSL
DSL
3 DSL
BatchDSL
FlowDSL
OperatorDSL
Replace
TX DSL DB
MR
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
12
13. BatchDSL
BatchDSL
DSL
job
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
13
14. FlowDSL
Flowpart FlowDSL
@Override
protected void describe() {
//
Join join = op.join(itemIn, orderIn);
//
SetStatus missing = op.setStatus(join.missed, " ");
orderOut.add(missing.out); Java
DAG
//
Sum sum = op.sum(join.joined); Operator
Obj
//
ToAmount result = op.toAmount(sum.out);
resultOut.add(result.out);
//
core.stop(result.original);
}
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
14
15. FlowDSL
Flowpart FlowDSL 2
@Override Operator
protected void describe() {
ReNewTxApMachingOperatorFactory f = new ReNewTxApMachingOperatorFactory();
UpdateTransactionsOperatorFactory f1 = new UpdateTransactionsOperatorFactory();
CoreOperatorFactory core = new CoreOperatorFactory();
// No
Ope edge
BranchSlipNoWithWithout bra11 = f.branchSlipNoWithWithout(inApMaching);
edge
// No
BranchRecGapAndUnmatch bra12 = f.branchRecGapAndUnmatch(bra11.out2);
//
edge
InitSlipInfo upd11 = f.initSlipInfo(bra12.out1);
//
InitBillInfo upd12 = f.initBillInfo(bra12.out1);
// No
Confluent<TxApMaching> cfl11 = core.confluent(bra12.out2, upd12.out);
edge
//
GroupSortBranchDeficitSurplusDiv grs11 = f1.groupSortBranchDeficitSurplusDiv(cfl11.out);
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
15
16. OperatorDSL
DSL
DAG
Operator Operator
MapReduce
/**
* @param info
* @param order
@return
/
@MasterJoin
public abstract JoinOrder join(ItemInfo info, OrderDetail order);
Asakusa
/**
@param each
@return
/
@Summarize
public abstract SumOrder sum(JoinOrder each);
/**
* @param total
* @return
/
@Convert
public OrderAmount toAmount(SumOrder total) {
amount.setAmount(total.getAmount());
amount.setOrderId(total.getOrderId());
return amount;
} 16
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
17. OperatorDSL
public abstract class ExampleOperator {
/**
@param masters
@param tx
@return null
/
@MasterSelection
public ItemMst selectItemMst(List<ItemMst> masters, HogeTrn
tx) {
for (ItemMst mst : masters) {
if (mst.getStart() <= tx.getDate() &&
tx.getDate() <= mst.getEnd()) {
return mst;
}
}
return null;
}
/**
@param master
@param tx
/
@MasterJoinUpdate(selection = "selectItemMst")
public void updateWithMaster(
@Key(group = "id") ItemMst master,
@Key(group = "itemId") HogeTrn tx) {
tx.setPrice(master.getPrice());
}
}
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
17
18. OperatorDSL
CoGroup
Confluent
Convert
Duplicate
GroupSort
Split
MasterBranch
MasterCheck
MasterJoin
MasterJoinUpdate
Summarize
Branch
Checkpoint
Empty
Identity
DAG
Logging
Stop
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
18
20. DSL
Excel
BatchDSL
FlowDSL
OperatorDSL
JUnit
JUnit
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
20
21.
DAG
STS
SPF
Shortest Path First –
GRASP
General Responsibility Assignment Software Principle
TX
Join
No_SQL
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
21
22.
TRN
TRN
TRN TRN
4 In
TRN
TRN 9 In
5 In
TRN TRN
5 Out
TRN
TRN TRN
TRN 4 Out
7 In
TRN TRN
TRN
TRN
TRN
TRN
TRN
TRN
TRN
TRN
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
22
23.
4 In
5 Out
Level-0
TRN
TRN
Level-1
TRN
TRN
TRN
Level-2
Level-3
4-5
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
23
24. Asakusa
MapReduce
JarFile
HadoopJob
Job TRX Hadoop
job
IO
Copyright 2011(C) OSS Laboratories Inc. All Rights Reserved
24