6. Challenge
40% 5%
Growth of IT
spending per year
Growth of data
generated per year
Source: McKinsey
7. Maybe Big Data is...
• When any of volume, velocity, variety, value
(cost?) becomes a problem
• When new use cases emerge, new things
become possible, because of new data
sources
8. For example
US cell Items shared Smart meter
updates Social media readings 2015
600B/day 4B/day 29B/day
9. Agenda
• What’s Big Data?
• Why now?
• Why care?
• Key technologies
• How can I get started?
14. Agenda
• What’s Big Data?
• Why now?
• Why care?
• Key technologies
• How can I get started?
15. Why care?
“Companies that can harness big data will
trample data incompetents”
The Economist, May 26th 2011
16. Why care - take 2
• The competition will do it (and you’ll get
fired)
• Competitive advantage to be gained by
doing it well (you get promoted)
• It’s not hard to get started (no need for huge
investment)
17. What are we looking
for?
• Data / Information
• Insights
• Actionable intelligence
18. Agenda
• What’s Big Data?
• Why now?
• Why care?
• Key technologies
• How can I get started?
20. Big = Slow?
Throughput
Throughput: records/ms
falls as datasets
get larger
0 25 50 75 100
Records (in millions)
Source: Gerard Maas, http://www.gerardmaas.net/2011/06/bigdata-on-rdbms
22. Hadoop
• Great for unstructured data or arbitrary
queries
• MapReduce framework for distributed
compute
• Tools now making it accessible
• Still essentially a batch processing system
24. Use cases
• Tracking trending topics on social media
• Network and infrastructure monitoring
• Web and ad analytics dashboard and
platforms
• Real-time A-B testing
• User profiling
31. Data sources
Web, SCM, Retail Location Services Infra Monitoring
Smart Metering Oil/Gas Sensors Ad Marketplaces
Fraud Detection Social Media
31
32. Capabilities
• Open source, supported, or “packaged”
solution?
• How do “commodity” servers fit your
infrastructure?
• Don’t rule out Cloud deployments to get
quick answers
32
33. Acunu
Discover the Potential of Real Time Big Data with Acunu Activate Acunu Reflex
Makes Big Data results easy, economic and fast
Every CIO, Architect and Analyst knows of existing data with huge untapped potential within their organisation.
Zero to Big Data Hero
Evolving Big Data technologies provide new paths to revenue with both customers and prospects. Build a Big Data database cluster on commodity hardware in hours, not days.
$
Acunu partners with you to deliver competitive advantage by
capturing data and exploring its benefits. You’ll validate the
value of Big Data by building real applications and dashboards
to drive new value for your business.
“ Key business andmanagement and processing landscape.
traditional data
technology trends are disrupting the
Save Money versus Open Source Alternatives
At the outset, we work with you to identify and develop use
Data analysis is increasingly being viewed as a Save up to 60% on hardware and operation costs.
competitive advantage. An increasingly sensor-enabled
cases and areas where Big Data tools could be utilised to add
and instrumented business environment is generating huge
significant business value. We work with you to recommend
z z z Database lag getting you down?
solutions architectures for your specific use cases. volumes of data… Traditional IT infrastructure is simply
”
zz
not able to meet the demands of this new situation.
We then deploy Acunu Reflex in your own data center or in the
cloud and can include Apache Hadoop for investigative work and -Gartner Milliseconds turning into minutes?
Acunu Analytics for real-time decision support.
Once the software is installed, we work with you to integrate, capture and store sources of data from inside
your organisation. We provide hands-on assistance to help you showcase the business value of your data
through live proof-of-concept applications. You’ll get results quickly, with successive iterations delivering What is Acunu Reflex?
ongoing value.
As a result, you gain an understanding of Big Data’s transformative capability through working Easy
Acunu provides an integrated suite of technologies to support rapid development and deployment of
demonstrations and have a clear route to deliver that competitive advantage to your business. your Big Data applications. Getting started is easy with a single, fast installation, handling all the details
usually associated with OS tuning, storage optimization, database integration and management.
This alleviates the complexity of NoSQL development, deployment and support. The platform is flexible
and scalable, providing simple, one click deployment. Scale linearly with ease and deploy across
numerous machines within a data center or across a globally distributed public or private cloud.
Workshops Structure & Planning Ecosystem of Expertise
Acunu Specialist delivers A dedicated Project Lead will Acunu’s Big Data expertise is
workshops and provides keep the project on track complemented through our
on-demand consulting to
enable your development team
through kickoff, reviews and
regular calls. Progressively,
partners. Together we will build
your own Big Data ecosystem. Economic
to build Big Data applications. we’ll help you plan next steps.
Acunu’s subscription base pricing model insures continuous value, skipping charges for non-production
deployment, so you can defer technology expenses until your application goes into production. Acunu
provides the NoSQL domain expertise you need, reducing your technology deployment costs without
compromising your data security. The platform is architected to store significantly more data per node
A Comprehensive Big Data Discovery Package than competing technologies with a focus on reducing both your initial hardware and operational costs
over time. Acunu’s support for commodity hardware and large capacity disks further reduces your costs.
Deployment Data Source Integration Support & Training Fast
We deploy Acunu’s database We work with you to integrate We deliver hands on training
and storage software, complete sources of log, clickstream, on the Acunu Reflex Acunu provides a suite of products focused on bringing you the performance your Big Data applications
with management tools, to sensor, monitoring or similar infrastructure to your demand. Whether it’s a globally distributed database, millions to billions of records, tremendous amounts
your own hardware or to data into Acunu Reflex. operations staff, and provide of machine generated data or managing millions of active users, Acunu provides you with real time
Amazon’s public cloud. support throughout the project.
results. Acunu has the professional services and support to get your applications up and running in the
shortest possible time. Acunu leverages best in class open source solutions, adding additional
management and performance technology to accelerate your Big Data results.
33
34. www.acunu.com @acunu
Apache, Apache Cassandra, Cassandra, Hadoop, and the eye and
elephant logos are trademarks of the Apache Software Foundation.
Notas del editor
\n
\n
\n
Analyst firms appear to be fighting with each other to come up with the most Vs to define Big Data. \n“Big” seems to imply size of the dataset but there’s more to it than this.\n
There’s nothing new about huge datasets: plenty of people playing with big data sets for years:\nSeismic survey datasets in the PB range, v. high end supercomputing hardware\nWeather: ECMWF has supercomputers with > PB disk storage\nHEP: 15PB/year for CERN HB project\nThe rest of us can’t afford supercomputers\n
But we all have a challenge with datasets that grow faster than IT budgets (these numbers from McKinsey and are probably optimistic w.r.t. IT budgets)\n
So maybe Big Data is really when we have one of these two things...and we’ve not already solved the problem. Perhaps there’s a silent [New] in “Big Data” :)\n
Here are some new datasets that typify both the challange and the opportunity of big data. \n\nSo we’ve explored what Big Data might be. Let’s move on to look at why the Big Data hype is happening now\n
Exponential drop in price. (1GB cost around $200K in 1980). Today, I can buy SATA disks at 4p/GB,\n
Basic economics. Reduce the price and the demand goes up.\n
With huge reductions in cost and waves of commoditisation, the scene is set for repeated disruptive innovation, not just in storage technology itself but in the products and services that rely on it.\n
\n
What’s big data about? We’re looking to get insight from data. Data trumps intuition and commonsense every time (funny anecdotal examples). More data means better decisions, based on fact not folklore.\n
\n
Odd coloured cars more reliable.\nVegetarians less likely to miss flights.\nComputing hardware doesn’t fail at high temperatures as thought - but changing temperatures kill it.\nA person who’s just viewed a particular web page is more likely to buy product X\n
\n
RDBMS. Ted Codd, 1970, IBM. System/R, DB2, Oracle...\nBy late 1980s, it was the standard. Usurpers (e.g. Object-oriented Databases) failed to gain significant market share; hierarchical (IMS - developed for Apollo) and network databases (CODASYL) pretty much disappeared.\nHowever, while RDBMS have become the default choice, they aren’t necessarily the best for some Big Data UCs. Some problems: \n
Problem 1: Performance. Dealing with time-series data is a common BD use case. We’re not looking to do complex transactions but we need to store the data so we can access it for analytics &c. RDBs do not handle this well. \n\nI’ve been investigating the performance of our “big vendor” RDBMS to hold months of sensor data. So far, the results are not really encouraging. I observe a exponential performance drop on the single-index (PK) table holding the data as more records are added. Here’s a plot of the performance of 5K records as records are continuously added to the table. Record addition is done with 5 parallel client threads, each inserting 1K records in batch mode. The client is an optimized Java app, using raw JDBC for the batch inserts. I haven’t found a faster way than that to add records to the relational DB.\n
Problem 2: Increasingly we’d like to scale out rather than scale up. Why? (i) incremental capacity (and cost) management; (ii) availability; (iii) distribution; (iv) potential cloud deployment (onto relatively small machines)\n\nRelational DBs tend to push towards a single big machine, or a tightly coupled cluster with expensive h/w like Infiniband or SAN storage.\n
From Google via Yahoo. Not really a database, but provides a distributed filesystem intended to store large files - with no schema.\nNot trivial to set up, but tools getting better - no longer need to write Java code to do queries. See Hive - HQL - for SQL-like access.\nBut it’s still batch, and plenty of time, you want real-time.\n
We’re looking to act on the insights that the data bring. If we don’t act, we’re just observing. But action is often time critical; the world is changing. (e.g. we’re monitoring an oil well, trading financial instruments, trying to understand the behaviour of lots of people) and insights from yesterday’s data are historical documents: interesting, perhaps, but not great as a guide to action.\n
Some concrete examples of things that people we’re working with are interested in capturing. \n
Lots of databases.\n
Lots of different kinds of databases with different goals in mind. \nOne way to view them is to see what sit in CAP terms (Brewer’s CAP theorem).\nMany different data models.\n
\n
Picking the right solution (or combination) can deliver significant cost savings, increased capability and allow granular growth over time.\n
\n
Paradoxically for “big data” consider starting with something small\nLet's look at the other items...\n
\n
\n
How can we help?\nAcunu Activate: A focused package of work to help you discover big data opportunities and understand how to exploit them;\nAcunu Reflex: A fully supported distributed database to support real-time big data use cases\nAcunu Analytics: Currently in preview, launching soon, provides real-time results for queries that would normally be costly to compute\n