Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1jaRzhu.
Rusty Sears introduces REEF along with examples of computational frameworks, including interactive sessions, iterative graph processing, bulk synchronous computations, Hive queries, and MapReduce. Filmed at qconsf.com.
Rusty Sears is a member of Microsoft's Cloud Information Services Lab, where he works on infrastructure for large-scale hosted services. In addition to his work on REEF, he has an interest in log-structured indexing and persistent storage for serving workloads. Prior to Microsoft, he worked on backend storage and services for mobile and large-scale applications at Yahoo! Research.
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
REEF: Retainable Evaluator Execution Framework
1.
2. Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/reef
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
3. Presented at QCon San Francisco
www.qconsf.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
5. True multi-tenancy…
Unified realtime-batch workflows
In-situ processing
Utilization: one cluster for scientists
and production
…but, only for sophisticated apps
Fault tolerance
Pre-emption
Elasticity
9. σ
π
Tedious: users write code to dump + load
data at each step
Slow: Data unnecessarily written to disk,
read back (and re-parsed) at each step
Hard to build: Each duplicates the same
mechanisms under the hood
⋈
10. Support YARN versions of new (and
existing) scalable data pipelines.
Allow them to be transparently
composed.
Move redundant tooling and plumbing
into shared libraries.
11. Yarn (
) handles resource
management (security, quotas, priorities)
Per-job Drivers (
) request resources,
coordinate computations, and handle
faults, preemption, etc…
REEF Evaluators (
) hold hardware
resources, allowing multiple Activities
π σ, etc…) to use
(
, ,
,
, ,
the same cached state.
σ
σ σ
15. Configuring distributed systems is hard
So is reasoning about event flows
Tang performs static and dynamic checks
to help ease the pain
16. Error:
Configuring distributed systems is hard
So is reasoning about event flows
Tang performs static and dynamic checks
to help ease the pain
container-4872364523847-02.stderr:
NullPointerException at:
java…eval():1234
ShellActivity.helper():546
Error:
ShellActivity.onNext():789
Unknowninstanceof Evaluator
Required parameter “Command”
YarnEvaluator.onNext():12
Missing required parameter “cmd”
Got ShellActivity