When your clients need only small database for personal music library and some kind of HTTP interface to it, everything looks nice and you can use lot of bright frameworks and trusted approaches for your application.
But what changes if you step ahead of existing solutions to bring things like population health management?
Let's talk about our Big Data experience and meaninful framework usage:
What makes the difference when you go Big Data and Hadoop.
Frameworks and big data: hamsters vs hipsters.
Reality matters. Frameworks cost. How much?
What framework is good for you?
Making your own frameworks.
4. 4frameworksdays.com
CAN CHIMPS
DO BIG DATA?
Real shocking title book
available for pre-order. This is
exactly what happens now in
Big Data industry.
Roses are red.
Violets are blue.
We do Hadoop
What about YOU?
8. 8frameworksdays.com
FRAMEWORK
Is an essential supporting
structure of a building, vehicle, or
object.
In computer programming, a
software framework is an
abstraction in which software
providing generic functionality can
be selectively changed by
additional user-written code, thus
providing application-specific
software.
9. 9frameworksdays.com
FRAMEWORKS
DICTATE APPROACH
Frameworks are to
lower amount of job by
reusing. The more you
can reuse the better. But complex framework are
too massive to be flexible.
They limit your solutions.
Doing Big Data you
usually build unique
solution.
13. 13frameworksdays.com
OPEN SOURCE framework
for big data. Both distributed
storage and processing.
Provides RELIABILITY
and fault tolerance by
SOFTWARE design.
Example — File system
as replication factor 3 as
default one.Horisontal scalability from
single computer up to
thousands of nodes.
INFRASTRUCTURE
3 SIMPLE HADOOP PRINCIPLES
16. 16frameworksdays.com
How everyone (who usually
sells something) depicts
Hadoop complexity
GREAT BIG
INFRASTRUCTURE
AROUND
SMALL
CUTE
CORE
YOUR
APPLICATION
SAFE and
FRIENDLY
17. 17frameworksdays.com
How it looks from the real
user point of view
Feeling of something wrong
CORE
HADOOP
COMPLETELY
UNKNOWN
INFRASTRUCTURE
SOMETHINGYOU
UNDERSTAND
YOUR
APPLICATION
FEAR OF
19. 19frameworksdays.com
WHAT BRICKS SHOULD WE TAKE
TO BUILD BIG DATA SOLUTION?
● We should build
unique solutions
using the same
approaches.
● So bricks are to
be flexible.
20. 20frameworksdays.com
WHAT BRICKS SHOULD WE TAKE
TO BUILD BIG DATA SOLUTION?
● We should build
robust solution with
high reliability.
● Bricks are to be
simple and
replacable.
21. 21frameworksdays.com
WHAT BRICKS SHOULD WE TAKE
TO BUILD BIG DATA SOLUTION?
● We should be able
to change our
solution over the
time.
● Bricks are to be
small.
22. 22frameworksdays.com
WHAT BRICKS SHOULD WE TAKE
TO BUILD BIG DATA SOLUTION?
● As flexible as it is
possible.
● Focused on specific
aspect without
large infrastructure
required.
● Simple and
interchangable.
23. 23frameworksdays.com
HADOOP 2.x CORE AS A FRAMEWORK
BASIC BLOCKS
● ZooKeeeper as coordinational service.
● HDFS as file system layer.
● YARN as resource management.
● MapReduce as basic distributed processing option.
26. 26frameworksdays.com
Hadoop: don't do it yourself
REUSE AS IS
● BASIC infrastructure is pretty reusable to build
with it. At least unless you know it well.
● Do you have manpower to re-implement it?
You'd beeeter contribute in this case.
30. 30frameworksdays.com
WHAT DO WE USUALLY
EXPECT FROM NEW
FRAMEWORK?
BETTER
CHEAPER
FASTER
frameworks provide
higher layer of
abstraction so
coding go faster
some part of
work is
already done
top framework
contributors are
usually top
engineers
31. 31frameworksdays.com
OOOPS...
BETTER
CHEAPER
FASTER
frameworks provide
higher layer of
abstraction so
coding go faster
some part of
work is
already done
top framework
contributors are
usually top
engineersAdditional cost of
new framework
maintenance
Additional time
of learning new
approach
Lot of
defects due
to lack of
experience
with new
framework
32. 32frameworksdays.com
BETTER
CHEAPER
FASTER
frameworks provide
higher layer of
abstraction so
coding go faster
some part of
work is
already done
top framework
contributors are
usually top
engineersAdditional cost of
new framework
maintenance
Additional time
of learning new
approach
Lot of
defects due
to lack of
experience
with new
framework
NONEXISTENT
ONLY TWO?
33. 33frameworksdays.com
JUST FEW EXAMPLES
● Spring batch — main thread who
started spring context forgot to check
task accomplishment status.
● Apache Spark — persistence to disk
was limited to 2GB due to ByteBuffer
int limitation.
● Apaceh Hbase has by now no effective
guard against client RPC timeout.
● What about binary data like hashes? No
effective out-of-the-box support by now.
ONLY
REAL
EXPERIENCE
NEW FRAMEWORKS ARE
ALWAYS HEADACHE
38. 38frameworksdays.com
SO BIG DATA TECHNOLOGY
BOOKS ARE ALWAYS OUTDATED
Great books but when they are printed they are
already old. Read original E-books with updates.
41. 41frameworksdays.com
FRAMEWORKS IN BIG DATA
HAMSTERS vs HIPSTERS
Significant overhead even
comparing to MapReduce
access
Most simple way to access
your Hbase data for
analytics.
Apache Hbase is top OLTP solution for Hadoop.
Hive can provide SQL connector to it.
Hbase direct RPC for OLTP, MapReduce or Spark when you need
performance and Hive when you need faster implementation.
Crazy idea: Hive running over Hbase table snapshots.
43. 43frameworksdays.com
ETL: FRAMEWORKS COST
● We do object transformations when we do ETL
from SQL to NoSQL objects.
● Practically any ORM framework eats at least 10%
of CPU resource.
● Is it small or big amount? Depends who pays...
SQL
server
JOIN
Table1
Table2
Table3
Table4 BIG DATA shard
BIG DATA shard
BIG DATA shardETL stream
ETL stream
ETL stream
ETL stream
44. 44frameworksdays.com
10% overhead...
● Single desktop
application -
computers usually
have unused CPU
power. 10% overhead
is not so notable for
user so user accepts it.
● User pays for
electricity and
hardware.
45. 45frameworksdays.com
● Lot of mobile
clients. Can tolerate
10% performance
degradation.
Application still
works.
● All users pay for
your 10%
performance
overhead.
10% overhead...
46. 46frameworksdays.com
● Single server solution.
OK, usually you have
10% spare.
● So you pay for overhead
but you don't notice it
before it is needed. You
have the same 1 server.
10% overhead...
47. 47frameworksdays.com
● 10% overhead of
1000 servers with
properly distributed
job means up to 100
servers additionaly
needed.
● This is your direct
maintenance costs.
10% overhead...
IN CLUSTERS YOU DIRECTLY PAY
FOR OVERHEAD WITH ADDITIONAL
CLUSTER NODES.
49. 49frameworksdays.com
MAKING YOUR OWN
FRAMEWORK
● Most common reason for your
own framework is … growing
complexity and support cost.
● New framework development
and migration can be cheeper
than support of existing
solutions.
● You don't want to depend on
existing framework development.
50. 50frameworksdays.com
MAKING FRAMEWORK
LAZY STYLE
● First do multiple
solutions than
integrate them into
single approach.
● GOOD You only
integrate what is
already used so less
unused work.
● BAD Your act reactive.
51. 51frameworksdays.com
MAKING FRAMEWORK
PROACTIVE STYLE
● You improve framework
before actual need.
● GOOD You are guided
by approach, not need,
so usually you have
more clear design.
● BAD Your have more
probability to do not
needed things.
52. 52frameworksdays.com
OUTSIDE YOUR TEAM
● Great, you have additional
workforce. But from now you
have external support tickets.
● Usually you can control your
users so major changes are
yet possible but harder.
● Pay more attention to
documentation and trainings
for other teams. It pays back.
53. 53frameworksdays.com
OUTSIDE YOUR COMPANY
● You receive additional
workforce. People start
contributing into your
framwork. Don't be so
optimistic.
● Community support is good
but you need to support
community applications.
● You are no longer flexible. You
don't control users of your
framework.
54. 54frameworksdays.com
LESSONS LEARNED
CORE
● Avoid inventing unique approach
for every Big Data solution. It is
critical to have good relatively
stable ground.
● Your Big Data CORE architecture
is to be layered infrastructure
constructed from small, simple,
unified, replaceable components
(UNIX way).
● Be ready for packaging issues
but try to reuse as maximum as
possible on CORE layer.
55. 55frameworksdays.com
LESSONS LEARNED
● Selecting frameworks to extend your big
data core prefer solutions with stable
approach, flexible functionality and
healthy community. Revise your
approaches as world changes fast.
● Prefer to contribute to good existing
solution rather than start your own.
● The more frequent you change
something, the more higher layer tool
you need for this. But in big data you
directly pay for any performance
overhead.
● If you have started your own framework,
the more popular it is, the fewer freedom
to modify you have so the only flexibility
is bad reason to start.
BEYOND
THE
CORE