3. About Karmasphere
● Productivity suite for Developers and Analysts.
● Point-and-drool GUI for Hadoop, Hive, Cascading, Pig.
● MapReduce development and debugging on-cluster.
● Integrated with Eclipse and NetBeans IDEs.
● Interface between a human (you!) and a Hadoop cluster.
● Does the boring, tedious or repetitive bits.
● Finds the errors fast before you do.
● Works anywhere with anything.
HALP!
Karmasphere
Hockey sticks!
4. The Idea
● Collect Underpants
● ....?
● Profit
But what goes in the middle?
5. The Problem
● Collect Data
● Convert to MapReduce
● Execute
● Debug
● Tune
● … Profit
Get someone else to do it!
6. How long will it take?
● Performance
Of what? Surely not the computer.
9. Analytics Performance
But what about
this bit?
Or this bit?
That the human understands the problem does not
mean that the computer understands the problem.
10. Analytics Performance
But what about
this bit?
Or this bit?
The computer knowing the answer is not the same as
the human understanding the answer.
11. Common MapReduce Challenges
● How do I write a Hadoop job?
● Did my job work?
● If it didn't throw an exception, it worked. Right?
● Did I get the correct answer?
● Are you sure?
● Do you have enough information to prove that?
● … to your accountants or customers?
● What happened? or What do I need to know?
● Please note, this feature is now officially called the
“Job Profiler”, not the “What?! Window.”
14. Common Analytical Tasks
So common, in fact, that ...
group
sort
aggregate
intersection
unique
limit
scan
join
function
hash
materialize
condition
set operations
store
cat
index
17. Pig
An imperative scripting language
data =
LOAD '$input'
AS (query:CHARARRAY,
count:INT);
queries_group =
GROUP data
BY query
PARALLEL $reducers;
queries_sum =
FOREACH queries_group
GENERATE
group AS query,
SUM(data.count) AS count;
queries_ordered = ORDER queries_sum
BY count DESC
PARALLEL $reducers;
Simple and accessible to all.
22. How to get involved
● Getting started as a Hadoop Java Developer?
● Download Karmasphere Studio FREE!
● Deploying Hadoop jobs in production?
● Use Karmasphere Studio Professional Edition.
● Want to use high level languages like SQL?
● Talk to us about Karmasphere Analyst.
● Join the beta programme!
23. Questions, Errata, Heckling
● Some questions suggested by others:
● Where can I download Karmasphere Studio Community Edition?
– Visit http://www.karmasphere.com/ for free downloads and great justice.
● What about building production-ready jobs for enterprise deployment?
– Ask us about introductory offers on Karmasphere Studio Professional Edition.
● How can I use graphical SQL on Hadoop?
– Talk to us about the Karmasphere Analyst Sekrit(!) Beta.
● Some questions I thought up:
● How do I (something awfully complicated)?
– Please talk to us, we enjoy the challenges.
● Is there any tea on this spaceship?
● And some from the audience, please!
● I get paid by the answer. I need questions.