Aspirational Block Program Block Syaldey District - Almora
Agile Analytics
1. AGILE MEAGILE ME
1
Science is Agile by Design
There is a thing called Gravity
Gravity warps space and time
Gravitational Waves are Real
Refinements of
understanding
takes place in
increments
3. AGILE MEAGILE ME
ATIF ABDUL RAHMAN
Agile Analytics
www.About.Me/AtifAbdulRahman
I was like her according to Pearson-R;
We were both outliers
19th March, 2016, Dubai, UAE
4. AGILE MEAGILE ME
• Line 1
• Line 2
Title 1
4
Let’s address the elephant in the room
5. AGILE MEAGILE ME
• Line 1
• Line 2
Title 1
5
BI is Bureaucratic
Let’s address the elephant in the room
Data Warehouse
Architectures are
Fragmented by
Design
Vendor & Tools
Lock-In created
artificial
constraints
Manpower
Outsourcing
Industry boomed
and thrived upon
this bureaucracy
6. AGILE MEAGILE ME
• Line 1
• Line 2
Title 1
6
BI is Bureaucratic
Let’s address the elephant in the room
Data Warehouse
Architectures are
Fragmented by
Design
Vendor & Tools
Lock-In created
artificial
constraints
Manpower
Outsourcing
Industry boomed
and thrived upon
this bureaucracy
Relay Race: Everybody is Waiting
7. AGILE MEAGILE ME
7
Our ability to process data was always a step behind our capability to
generate data, essentially data was always big. However, our
technologies had eventually reached their shelf life of increments..
9. AGILE MEAGILE ME
9
The Big in Data is not for the data being big, it’s the big disruption
10. AGILE MEAGILE ME
10
The Big in Data is not for the data being big, it’s the big disruption
Our ability to process data was always a step behind our capability to
generate data, essentially data was always big. However, our
technologies had eventually reached their shelf life of increments..
11. AGILE MEAGILE ME
11
Utility
Hardware
can do more
now
Open
Source is
leading the
technology
stack
Significant
reduction in
dependency
with IT
Democratization of Data Infrastructure
Big Data Technologies are
removing barriers and constraints,
its an enabler rather than a
disruption in itself.
Arguably
started by
Hadoop but
not the only
player
Resources
freed up for
Data
Governance
15. AGILE MEAGILE ME
15
Data Doesn’t reveal
its secrets very easily!
Big data is like teenage sex:
everyone talks about it, nobody
really knows how to do it, everyone
thinks everyone else is doing it, so
everyone claims they are doing it...
16. AGILE ME
Obs
Transactional
Declarative
(biggest in size
Difficult and misleading)
BigDataness
*In very specific environments, more data with simpler algorithms work
better: Basic premise of VC Theorem: Machine Learning
“What information consumes is rather
obvious: it consumes the attention of its
recipients. Hence, a wealth of information
creates a poverty of attention.” Herbert Simon
Signal to Noise Ratio decreases
with more data making it harder
to find true signals in general
17. AGILE ME
Why is it taking it so long to predict the future?
17
Common complaints heard by data scientists?
18. AGILE ME
2014 2015
Gartner Hype Cycle
From Big Data To Analytics
Good News: Big Data hype has already peaked,
now everyone wants value from it (Analytics).
19. AGILE MEAGILE ME
• CRISP-DM
• Analytics is
inherently Agile
Learning & Empirical Process Control
19
This is the most adopted knowledge
discovery approach, pretty much
incremental in nature and focuses on
feedback, improvements and learning
empirically. This makes it well aligned
with the Agile Manifesto.
Insights are
Discovered, not
Designed
Russell Jurney
20. AGILE ME
THE NON-DATA DRIVEN PATTERN OF DATA DRIVEN INITIATIVES
20
With little investment, a lot of value can be gained (similar to an MVP)
80/20
Rule
21. AGILE MEAGILE ME
21
• Individuals & Interactions: Analytics is a Team Sport
• Working Models: Models are Refined instead of Designed
• Customer Collaboration: User Stories Emerge
• Responding to Change: Models always have Expiry Dates
TDWI Survey 2013:
80% of practitioners
reported improved success
rates using Agile.
Agile Manifesto for Analytics
Andy Palmer
CEO, Tamr
22. AGILE MEAGILE ME
22
Rise of the Data Scientist: An Agile Creature
These unicorns are rare, but teams of data scientists are common
23. AGILE MEAGILE ME
Agile Apps vs Agile Analytics
23
Features UX
User
Stories
Value
Differences between App vs Analytics User Stories
Applications: (Features Mostly) Analytics (Insights Mostly)
We need the top N recommendations with
their ratings
We need to find similar books?
We need to find books that the reader
might purchase?
Differences must be addressed:
24. AGILE MEAGILE ME
24
Clear Not Clear
AvailableNotAvailable
RightData
Requirements
Refinement
DataEnrichment
•Have as narrow a scope as possible;
•Contain explicitly quantitative clauses;
•Are ranked by relative value; and
•Are potentially answerable given the available data.
Adapting for Analytics
User Stories
emerge after
the fact
Data
usefulness is
discovered
after the fact
25. AGILE MEAGILE ME
25
Getting back to Science:
Most analytics problems are not linear problems like those in most
application development. Analytics demand Agility on Steroids!
26. AGILE MEAGILE ME
26
Getting back to Science:
Most analytics problems are not linear problems like those in most
application development. Analytics demand Agility on Steroids!
Remember Galileo’s Sad Story?
27. AGILE MEAGILE ME
Data Virtualization
27*TDWI
An Enabler to put Agile on Steroids and delivery awesome Analytics Projects
29. AGILE MEAGILE ME
29
Data Scientists are better at
Statistics than most
Programmers and are better at
Programming than most
Statisticians.
Choose your
(Agile) Approach
Provision The
Agile Data
Architecture
Party
Hard
Dear Agile Practitioners,
Always Remember:
However, to bring sense to the whole thing, what remains relevant is across the need and capability to identify signals from the noise.
Why big data is such a hype is because likewith humans, data obesity is a real prolem, we are spitting more noise that ever, and at an exponential rate, make it harder and harder to identify the isgnals.
Infact, our estbalished technology has peaked trying to even process this noise, let alone find signals, (clay shirky information filters are not working, there is more of cute cat videos and other junk on the internet)
Hence the birth of a plethora of new technologies nad techniques that allows us to get back on findings signals amidst the noise.
So the fundamental question remains that while wekeep on adding more hay to the haystack, are we still able to find the needle ?
That it was 1960 when the term big data first time appeared in scientific publications, when the volumes, variety and velocity was always bigger than our ability to compute them.
It was a hype back then too. And it pretty much fit the hype cycle too.
Not just one but arguably atleast three times. Well, Neo had 5 iterations!
The hype cycle of AI gave us the plateau of productivity as expert systems
The hype cycle of DSS gave us the plateau of BI
The hype cycle of big data may eventualy lead to a plattue of data science.
Eventually, weall plateauss are useful, but we need to manage epxectaitons of what can be done really really well and what should be postponed to a later disruptive innovation.
Use the user’s social network data to make recommendations. Data might eventually reveal that social network activity has no impact on the accuracy of recommendation. Books someone bought in the past might not be valuable recommendation as the person might be looking for books in other genres.
Perhaps all of these data are very powerful indicators of what to recommend but churning them and identifying the 10 books to recommend might be very computation/performance intensive and take 20 seconds to arrive at the list. It is difficult if not impossible to know these characteristics at the outset. In this measure of uncertainty, analytics projects and application software projects differ. Added to this would be requirements coming from a variety of stakeholders (marketing, publisher relationship manager, warehouse relationship manager, et al) who might want to have a say in what the recommendation engine spews out.