Introduction to what is big data, what can it do and not do, the importance of datascience and how to architect big data solutions (lambda architecture)
3. “Big data is data that exceeds the processing
capacity of conventional database systems.
The data is too big, moves too fast, or doesn’t fit
the strictures of your database architectures.”
3
–Edd Dumbill, O’Reilly
What is Big Data?
http://radar.oreilly.com/2012/01/what-is-big-data.html
4. The 3 V’s of Big Data
4
• Volume
• Velocity
• Variety
• (Veracity)
10. New tools and technologies to store and
process all data on a cluster of commodity
hardware so that the system acts as one, is
resilient and scales linearly.
10
What is Big Data? — revisited
11. So what?
11
the data lake is a large data pool
in which the schema and data requirements are not defined
until the data is queried, processed, analysed
or delivered as information to the end-user
12. “We don’t do Hadoop because we have Big
Data; we do Big Data because we have
Hadoop.”
12
So what?
–Unknown developer, Facebook
13. “In the years ahead, the same power that big
data awards enterprise companies will be the
norm for small business.”
13
So what?
–Matt Ehrlichman
http://blogs.wsj.com/accelerators/2014/10/31/matt-ehrlichman-big-data-for-small-firms/
14. 14
What does Big Data enable?
• Combine data from within and without your
organisation
• Build new products and services
• Analyse all data (e.g. 5TB historic event data at rest in Oracle db)
15. Big Data is no panacea
15
• First decide what problem you want to solve; pick a
real business problem to add immediate value
• Start small, the technology is made for linear
scalability (a 3-node cluster is a cluster!)
• Then become lean: learn through experimentation
16. Big Data challenges
• Beware of hype, Big Data - washing and fad
• Tech infancy
• IT | Biz
• Data is hard
• Lack of skills!
shameless self plug: BigBoards!
16
17. Big Data opportunity
• Big Data is here to stay
• Vendor market is HUGE and will grow massively as
Big Data will blend in within the datacenter
• However, the practitioner’s market can deliver
EXPONENTIALLY more value
17
18. “Data doesn't create meaning. We do.”
https://www.ted.com/talks/susan_etlinger_what_do_we_do_with_all_this_big_data
18
Data Science FTW
–Susan Etlinger
20. Ideal Data System
• It just runs your queries on the complete dataset,
yielding results instantaneously
• Interaction with CAP Theorem when partitioning
• There is just immutable data and functions on
that data
• Distinguishing features
• constantly growing append-only dataset
• only add immutable facts of data
• recompute queries from master data
20