Explorys leverages HBase and the Hadoop stack to power the next generation of Enterprise Performance Management for Healthcare. The Explorys team will present an overview in 3 parts: Explorys functional and technical overview, approaches in MapReduce performance tuning, and system operations for HBase and Hadoop.
4. The volume of data…
plus the variety of systems and
sources of data…
is piling up at a velocity…
that traditional data
approaches were not designed
to support.
Healthcare’s
Data Overload
page 4
5. Explorys Provides...
A platform to leverage data across
systems, venues, and partners to
drive care quality, cost efficiency,
BIG and risk mitigation.
Rapidly deployable Software-as-a-
DATA Service apps for leadership and
providers.
Extensible Data-as-a-Service
functions to support healthcare IT
and business intelligence.
page 5
6. Explorys’ Customers and Patient Span
By ZIP Code
80 hospitals, hundreds of ambulatory
practices and thousands of providers
caring for 14 million patients.
page 6
8. 44 billion curated clinical, operation,
and financial data points,
4 4,0 0 0,1 3 1,1 1 7
and counting.
page 8
9. What Explorys Does
Platform and Apps
The Applications
j Explore: High speed
search and population
Measure: Provider &
group level performance
exploration. metrics and benchmarks.
DataGrid
Registry: Automated Engage: Rule-based
care and disease patient & provider workflow
management registries. and outreach.
page 9
11. HBase and MR at Explorys
Casey Stella
Senior Software Engineer
page 11
12. Map Reduce Strategies
HBase at Explorys
HBase is our transactional data store
Keys group data from a given patient together
MR jobs process data from HBase
Transform data and report data
Sample data
Emit data into a form which can be accessed efficiently from applications
Naïve MR jobs cause much, much stress
page 12
13. Local Aggregation
Map Task 1 Locally aggregate processing of a patient
Patient 1 : Encounter in an individual mapper
Patient 1 : Observation
Fewer keys and chunkier values
Patient 1 : Observation
Sorting is cheaper
Patient 1 : Diagnosis
Careful
Map Task 2
Patient data can span tasks
Patient 1 : Drug
Patient 2 : Encounter Potential scalability issues
Patient 2 : Observation
Patient 2 : Observation
Data Intensive Text Processing with Map
Reduce by Jimmy Lin and Chris Dyer
covers this technique very well
page 13
14. Map Reduce and Junior Engineers
Map Reduce is Distributed Computing for the masses
Masses still do stupid things
Masses still have to write MR jobs to do their job
Safety at Explorys
Most of our engineers start without prior experience in Hadoop or HBase
Giving them a book only goes so far
Need a combination of process and technology
Still an uphill battle
page 14
15. Map Reduce and Junior Engineers
Process
Jobs are tested in development grid with real data
Most map reduce jobs are pushed into teams where MR and HBase education
are very important
Technology
Constructed an API wrapping Hadoop mapreduce package
Alternate job builder interface with added type-safety
Adds the ability to swap-out at launch-time different contexts
page 15
16. Building a Solid Foundation
Daniel Washburn
Systems Engineer
page 16
18. Performance Management
Collect as much as you can
Ganglia, OpenTSDB
Nagios, Zenoss
Understand what you’re monitoring
If you don’t know what a metric means, look it up!
Work with customers to understand what’s important to them
Act on it
State-based alerting is where many people stop
Data-driven, predictive approach is the goal
Create dashboards
page 18
19. Configuration Management
Consistency is essential
Do this while you’re still small!
Choose a methodology
Parallel execution/distribution
Configuration management engine
Implement it
Parallel-ssh, mcollective
Puppet
page 19
20. Release Management
Upgrade early and often
Become comfortable with the process
The logistics of upgrading can be tough, but it’s worth it
Get involved with the community
HBase is constantly evolving
The mailing lists and IRC channel are very active
Your contribution might help someone else
page 20
21. Teamwork
It takes a village…
… to raise an HBase
Effective communication is essential
We’re all part of the effort
Administrators
Engineers
Developers
End users
page 21
22. Thank You!
Questions?
Doug Meil
Chief Software Architect
Doug.Meil@explorys.com
Casey Stella
Senior Software Engineer
Casey.Stella@explorys.com
Daniel Washburn
Systems Engineer
Daniel.Washburn@explorys.com
www.explorys.com
page 22
Notas del editor
Performance ManagementMonitoring and ReportingConfiguration ManagementAutomationRelease ManagementUpgrades and TuningTeamworkYou’re in this togetherCustomer ServiceUnderstand who you work for
Step 1: monitor, Monitor, MONITOR!Hadoop and Hbase ship with native Ganglia reporting. Reasonably easy to set up. Ganglia can be finicky.Nagios, Zenoss, etc. Everyone uses some sort of NMS. Choose your poison.OpenTSDB is great for those who want everything in one place, forever.Step 2: Understand what you’re monitoringIf you don’t know what a metric means, look it up! Always be learning.It may take you 20 minutes to figure out what something means, but you’ll know if for next timeWork with customers to understand what’s important to them, too.This doesn’t always mean paying customers, although they are important. This also means other teams in your company.Step 3: Act on the dataState-based alerting is easyAny NMS can give you up/down alertsData-driven alerts are harderWe have a script that reports when individual task trackers are more than 2 std deviations outside of mean for the gridBehavioral monitoring is goal“Listen for the silence”, report when expected tasks run for too long, or don’t run at all. We’re still working on this.
Do this when you’re small!No, really.Don’t wait. Do it now.Consistency is essentialYou must trust your platform. You have to know that everything is working.Your customers must trust your platform. They’ll try to work around you if you can’t provide stability.Use version control. Manually editing configs will only take you so far. It breaks down quickly. It’s not about blame, it’s about consistency.Choose a methodology and implement itParallel execution/distributionWe’ve managed to strong-arm our way using SVN and parallel-ssh. Our arms are tired.Configuration managementConfiguration management tools mean you change it once and it goes everywhere.Means the difference between a date night and a date with your computer.
Upgrade early and oftenTest, test, and re-test!The logistics of upgrading can be tough, but it’s worth it.Get involved with the communityHBase is constantly evolvingYour feature request might help someone else, toohbase-user and hbase-dev are very active mailing listsThe HBase developers don’t bite (hard)Case studies and documentation are always welcome
It takes a village…… to raise an HBaseInter-team communication is essentialWe’re all part of the effortAdministratorsEngineersDevelopersManagersEnd users