Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Azkaban	in	my	use	case
2017/03/09
@wyukawa
Workflow	Engines	Meetup	#1
#wfemeetup
https://connpass.com/event/50900/
Azkaban
• Implemented	at	LinkedIn	to	solve	the	problem	
of	Hadoop	job	dependencies
• Written	in	Java
– Not	modern	Java(raw...
Azkaban	feature
• Simple	Job	Management	Tool
– Define	job	dependency
– Retry
– Scheduling
– Web	UI
• See	dependency/execut...
Job	File
#	foo.job
type=command
command=echo	foo
retries=1
retry.backoff=300000
#	bar.job
type=command
dependencies=foo
co...
Job	History
Scheduling
Failure	Options
• Finish	Current	Running
– finishes	only	the	currently	running	job.	It	will	not	
start	any	new	jobs.
• Can...
Difference	when	ccc	failed
Why	“Finish	All	Possible”	is	not	
default?
Re-run	when	flow	failed
• User	can	execute	failed	jobs	only	if	user	
pushes	“prepare	execution”	button.	It’s	
convenient!
Concurrent	Execution	Options
• Skip	Execution
– Do	not	run	flow	if	it	is	already	running.
• Run	Concurrently
– Run	the	flo...
SLA		Notification
• If	duration	threshold	is	exceeded,	then	an	
alert	email	can	be	sent	or	the	flow	can	be	auto	
killed.
Flow	parameter
• can	set	parameter(for	example,	date)	when	
Azkaban	executes	flow
Q/A
https://connpass.com/event/50900/
○
△
❌
○
○
△
My	use	case
• Use	Azkaban	to	manage	hadoop job
– Write	batch	in	python
• Use	Azkaban	API
– I	created	client	https://github...
Python	batch	example
def validate_before(self):	
hive.exists("access_log",	"yyyymmdd='%s'"	%	(...))
def process(self):
ins...
Yaml example
foo:
type:	command
command:	echo	"foo”
retries:	1
retry.backoff:	300000
bar:
type:	command
command:	echo	"bar...
Job	Management	Overview
git push
push	button
upload	job
register	schedule
git pull
generate	job	file
Log	Analysis	Platform
Hadoop,	Hive	of	HDP2.5.3
Azkaban	3.15.0-
1-g77411d7
Presto	0.166
Cognos
Prestogres
Netezza
DBDB
ETL	...
My	usage	situation
• More	than	120	Azkaban	flows
• Many	daily	batches,	a	few	hourly,	weekly,	monthly	batches
• Most	flows	...
My	feeling
• Simple
• Easy	to	use
• Web	UI	is	convenient
• API	is	useful
• There	is	no	reason	to	replace	Azkaban
• I	hope	...
Podcast
• https://itunes.apple.com/jp/podcast/wyukaw
as-podcast/id1152456701
• http://wyukawa.tumblr.com/
Próxima SlideShare
Cargando en…5
×

Azkaban

6.781 visualizaciones

Publicado el

Azkaban

Publicado en: Software
  • Sé el primero en comentar

Azkaban

  1. 1. Azkaban in my use case 2017/03/09 @wyukawa Workflow Engines Meetup #1 #wfemeetup https://connpass.com/event/50900/
  2. 2. Azkaban • Implemented at LinkedIn to solve the problem of Hadoop job dependencies • Written in Java – Not modern Java(raw servlet, velocity,...)
  3. 3. Azkaban feature • Simple Job Management Tool – Define job dependency – Retry – Scheduling – Web UI • See dependency/execution time/log • Store log to db as blob – SPOF – Not register holiday – Not triggered by file creation event • Mail notification only – HTTP Job Callback • No binary – Need to build source • Not so active development • Mailing List doesn’t function very well
  4. 4. Job File # foo.job type=command command=echo foo retries=1 retry.backoff=300000 # bar.job type=command dependencies=foo command=echo bar
  5. 5. Job History
  6. 6. Scheduling
  7. 7. Failure Options • Finish Current Running – finishes only the currently running job. It will not start any new jobs. • Cancel All – immediately kills all jobs and fails the flow. • Finish All Possible – will keep executing jobs as long as its dependencies are met.
  8. 8. Difference when ccc failed
  9. 9. Why “Finish All Possible” is not default?
  10. 10. Re-run when flow failed • User can execute failed jobs only if user pushes “prepare execution” button. It’s convenient!
  11. 11. Concurrent Execution Options • Skip Execution – Do not run flow if it is already running. • Run Concurrently – Run the flow anyway. Previous execution is unaffected. • Pipeline
  12. 12. SLA Notification • If duration threshold is exceeded, then an alert email can be sent or the flow can be auto killed.
  13. 13. Flow parameter • can set parameter(for example, date) when Azkaban executes flow
  14. 14. Q/A https://connpass.com/event/50900/ ○ △ ❌ ○ ○ △
  15. 15. My use case • Use Azkaban to manage hadoop job – Write batch in python • Use Azkaban API – I created client https://github.com/wyukawa/eboshi – Commit scheduling information to GHE • Painful to write job file – I created generation tool https://github.com/wyukawa/ayd – generate 1 flow from 1 yaml file
  16. 16. Python batch example def validate_before(self): hive.exists("access_log", "yyyymmdd='%s'" % (...)) def process(self): insert_query = """ INSERT OVERWRITE TABLE aggregate PARTITION(yyyymmdd='%s') SELECT ... FROM access_log WHERE ... GROUP BY ... """ % (...) hiveCli.query(insert_query) def validate_after(self): hive.exists("aggregate", "yyyymmdd='%s'" % (...))
  17. 17. Yaml example foo: type: command command: echo "foo” retries: 1 retry.backoff: 300000 bar: type: command command: echo "bar” dependencies: foo retries: 1 retry.backoff: 300000
  18. 18. Job Management Overview git push push button upload job register schedule git pull generate job file
  19. 19. Log Analysis Platform Hadoop, Hive of HDP2.5.3 Azkaban 3.15.0- 1-g77411d7 Presto 0.166 Cognos Prestogres Netezza DBDB ETL with Python 2.7.13 InfiniDB Pentaho Saiku
  20. 20. My usage situation • More than 120 Azkaban flows • Many daily batches, a few hourly, weekly, monthly batches • Most flows are related to hive • There is the Azkaban in batch server • I prepare the template Azkaban flows to reaggregate past data – Set job name and date to parameter – Set Run Concurrently • I don’t use SLA but I may use in the future – https://github.com/azkaban/azkaban/pull/911 • I don’t use HTTP Job Callback – use hipchat in python ETL
  21. 21. My feeling • Simple • Easy to use • Web UI is convenient • API is useful • There is no reason to replace Azkaban • I hope development become active
  22. 22. Podcast • https://itunes.apple.com/jp/podcast/wyukaw as-podcast/id1152456701 • http://wyukawa.tumblr.com/

×