SlideShare una empresa de Scribd logo
1 de 36
Data Science Innovations:
Roadmap to Hadoop Ecosystem & Spark
Suresh.sood@uts.edu.au
linkedin.com/in/sureshsood
@soody
http://www.slideshare.net/ssood/datainnovation
February 4, 2015
Topic Areas for Discussion
1. Statistics/Data mining or Data Science?
2. What is big data and the challenge today ?
3. Data types
4. Data Science workflows & discovery
5. Hadoop
6. Data Science innovation
7. New Sources of Information (Big data)  Data Driven Innovations
8. Internet of Things
9. Data Science Innovations
10. Apache Spark
Statistics, Data Mining or Data Science ?
• Statistics
– precise deterministic causal analysis over precisely collected data
• Data Mining
– deterministic causal analysis over re-purposed data carefully sampled
• Data Science
– trending/correlation analysis over existing data using bulk of
population i.e. big data
Adapted from:
NIST Big Data taxonomy draft report (see http://bigdatawg.nist.gov /show_InputDoc.php)
Unknown relationships
Unstructured data
95% of data not collected
Social-Psychological- local-Mobile-GPS-M2M
Beyond Transactions including interactions and observations
4
What is Big Data ?
Big Data Challenge Today : Moving from
Transactions Alone to Relationships and Empathy
Current State
= Transactions $$$
We do this stuff well e.g.
Collect payments …
Future State
= Human Empathy (relationships)
We don’t do this really e.g. User
generated content, ratings, reviews, 1:1
dialogue, Distress Signals, Geolocation
5
Data Types
• Astronomical
• Documents
• Earthquake
• Email
• Environmental sensors
• Fingerprints
• Health (personal) Images
• Graph data (social network)
• Location
• Marine
• Particle accelerator
• Satellite
• Scanned survey data
• Sound
• Text
• Transactions
• Video
Data Science Workflows & Discovery
Hadoop & Spark Explained
HadoopConfigurations(SingleandMulti-Rack)
Adapted from: http://stackiq.com/
Cluster manager e.g. Apache Ambari, Apache Mesos, or Rocks
3 TB drives ,18 data
nodes configuration
represents 648 TB of raw
storage HDFS standard
replication factor of 3
216 TB of usable storage
Name/secondary/data nodes – 6 core 96 GB
Management node – 4 core 16 GB
Data Science Innovation
Data science innovation is something an
organization has not done before or even
something nobody anywhere has done before. A
data science innovation focuses on discovering
and using new or untraditional data sources to
solve new problems.
Adapted from:
Franks, B. (2012) Taming the Big Data Tidal Wave, p. 255, John Wiley & Son
http://tacocopter.com/
New Sources of Information (Big data) : Social Media + Internet of Things
 Data Science Innovations
Internet of Things (IOTs)
“trillion sensors”
Source: www.tsensorssummit.org
Data Science Innovations
ID Analytics Innovative Info source Innovation Software/Platform
1. Node-Link (NLA) Multiple Reduce suspect list from 18 m
to 230/32
New version
Spark GraphX
2. ANZ Truckometer NZ transport authority real time
traffic data
GDP forecast 6 months in
advance
N/A
3. Driving (Usage Based) Black box (telematics)
Unstructured data
Pay as you drive policy
Pay how you drive
Hadoop Map Reduce
4a. Deception (veracity) Found stories online blogs Flag fake stories text, images
and short video
MongoDB – Python
dictionary
4b. Psychological State Twitter and Instagram Junk words MongoDB – Python
dictionary
4c. Thematic Apperception Technique Mobile phone screen
customisation
Automated informant testing Sparkling Water
(H2O/Spark)
Deep Learning
5. Brand Brand stories “found” online Brand user profile R/Hadoop
6. Supermarket shopper behavior CCTV /beacon transmitters “My store” product placement
based on time of day
predictive shopping behaviour
MongoDB
Hadoop 2 Cluster
Spark GraphX
Spark MLib
7. Sandbag exercise Sandbag sensors Virtual trainer Spark GraphX
Spark MLib
8. Oil reserves shipment monitoring Skybox (Google) satellite images Improved oil forecast “Busboy” – C /Hadoop
9. J score for mobile energy usage Sparse incomplete data from
community of mobile users
Energy bug mgmt. Spark/Amazon Web
Suresh Sood 2015
1. Node Link Analytics
• 1990’s Ivan Milat killed 7 backpackers making him Australia's most notorious Serial Killer
• Everyone in Australia was a suspect
• Large volumes of data from multiple sources
 RTA Vehicle records
 Gym Memberships
 Gun Licensing records
 Internal Police records
• Police applied node link analysis techniques (NetMap) to the data
• Harness power of the human mind
• Analyst can spot indirect links, patterns , structure, relationships and anomalies
• A bottom-up approach with process of discovery to uncover structure
• Reduced the suspect list from 18 million to 230
• Further analysis with the use of additional satellite information reduced this to 32
Data Information Knowledge
The ANZ Heavy Traffic Index comprises flows
of vehicles weighing more than 3.5 tonnes
(primarily trucks) on 11 selected roads around
NZ. It is contemporaneous with GDP growth.
The ANZ Light Traffic Index is made up of light
or total traffic flows (primarily cars and
vans) on 10 selected roads around the
country. It gives a six month lead on GDP
growth
http://www.anz.co.nz/about-us/economic-markets-research/truckometer/
2.
3. Black Box Insurance
• Big data transforms actuarial insurance from using probability methods to estimate premiums into dynamic risk
management using real data generating individually tailored premiums
• Estimate 20 km work or home journey, data point acquired every min and journey captures 12 points per km. Assume
1000 km per month driving or generating 12,000 points per month resulting in 144,000 points per car/annum. Hence,
1,000 cars leads to 144 million points per annum.
• Telematics technology (black box) monitor helps assess the driving behavior and prices policy based on true driver
centric premiums by capturing:
– Number of journeys
– Distances travelled
– Types of roads
– Speed
– Time of travel
– Acceleration and braking
– Any accidents
– Location ?
• Benefits low mileage, smooth and safe drivers
• Privacy vs. Saving monies on insurance (Canada ; http://bit.ly/Black_box)
Psychological analytics helps put human context into Business
• Behavior data  Links human emotions to business -> Analyse footprints left behind.
• What really does customer satisfaction mean ? Is the person actually happy?
• How do we take the emotional dimension into account for customer experience?
• How do we recognize someone is dissatisfied?
• How do we recognize a “distressed” person?
• Do we use text and voice? Will sleeping patterns and eating habits help?
• would you act differently if someone is happy?
• How do you coach employees to see how someone sounds in emotional terms?
• Understanding when distress exists and when a customer needs enhanced service
• Behavior data reveals attitude and intent. This is more predictive of future
opportunities and risk versus historical data
18
4a.
1.Gayle
3. Paris
2. Paige
+
+
4.”The occasion
was my cousin
Paige’s 16th”
5. “I am a Canadian
and get by in
French.”
6. "All I can say is WOW! We rented a 2
bedroom, 1 ½ bath apartment (two
showers), "Merlot" from ParisPerfect
http://www.parisperfect.com/ and boy was
it ever perfect! "
7. “We had a full view of the Eiffel from
our charming little terrace. ....We were
within walking distance to two metro
stops (Pont d'Alma or Ecole Militaire) "
8. "We were walkable to many good
bistros, cafes and bakeries and only a
few blocks from the wonderful market
street Rue Cler."
9. "I bought a Paris Pratique pocket-sized book at a
Metro station. This handy guide has detailed maps
of each arrondisement, as well as the metro lines,
the bus lines, the RER and the SCNF (trains). I'll
never be without this again."
10."Six months before our trip, I gave
Paige a couple of good guide books on
Paris and suggested she let me know
what her interests were since after all,
this was to be her trip."
11.Sites
•The Marais
•Notre Dame
•L'Arc de Triomphe - 248 steps up and 248 steps
down...
•Champs Elysee
•Jacquemart Museum
•Louvre Lite
•Musee D'Orsay
•Les Invalides, Napoleon's Tomb and the
Napoleon Museum
•Sacre Coeur
•Monmartre
•Rodin Museum
•Pompidou Museum
•Train to Vernon, bike to Giverny with Fat Tire
Bike Tours
•http://www.fattirebiketoursparis.com/
•Eiffel Tower
Elaboration of Trip to Paris Blog Story (Means-End & Heider)
Woodside, Sood & Miller 2008 When Consumers and Brands Talk Psychology & Marketing
12. Unforgettable Memories
"This trip had so many memories, but here are a few choice
highlights........On our very first night, knowing that the Eiffel
Tower light show started at 10:00 p.m.... she [Paige] dropped
her camera…down 6 flights…we were stunned…Spanish
Family below standing below [with pieces of the camera]”
15." Michael Osman is an American artists
living in Paris."
"He supplements his income by being a
tour guide." I" found out about him on
Fodors"
"So I engaged Michael for two days."
16. "On our trip to Giverny, we met a young
woman from Brisbane, Australia who was
traveling on her own and we invited her to join
us. Three of us enjoyed delicious and
innovative soufflés, while Paige had the rack of
lamb. We shared two dessert soufflés, one
chocolate and the other cherry/almond. Yum"
17. "I wanted Paige to get a feel
for shopping experiences that
she would not have at home (aka
the ubiquitous mall). "
18."We went on Fat
Tire's day trip to
Monet's gardens and
house in Giverny, about
an hour outside Paris."
13."The father stretched out his cupped
hands which held all of the pieces they were
able to recover, including the memory stick
and he very solemnly said, "El muerto...".
14. "They had decide to come to Paris
to find the Harley Davidson store so
they could buy Harley Paris t-shirts."
+
+
+
+
19....."I know Paige will
treasure the memory of
this girl's trip for many
years to come."
19
20
The Newman Model of Deception (Pennebaker et al)
Key word categories for deception mapping:
1. Self words e.g. “I” and “me” – decrease when someone distances
themselves from content
1. Exclusive words e.g. “but” and “or” decrease with fabricated
content owing to complexity of maintaining deception
1. Negative emotion words e.g. “hate” increase in word usage owing
to shame or guilty feeling
1. Motion verbs e.g. “go” or “move” increase as exclusive words go
down to keep the story on track
Instagram Deception (Suspects outside of -20 & +20)
Vine Deception (Suspects outside of -5 and +5)
4b. Psychological State
• LIWC (analyzewords.com)
– Reveal personality from word usage
– Uses LIWC classification of words
• TweetPsych (tweetpsych.com/)
– Linguisitic analysis using:
– RID
– LIWC
Note: TweetPsych is not without critics:
http://psychcentral.com/blog/archives/2009/06/18/putting-cool-ahead-of-science-tweetpsych/
4c. Thematic Apperception Technique
Social CRM integrates “breadcrumb” data
25
5. Brand User Analytics
Aquarius,Aries,Cancer,Capricorn,Gemini,Leo,Libra,
Pisces, Sagittarius,Scorpio,Taurus,Virgo
Ambivalent, Employee, Opposer, Reporter, Supporter
11. Committed Partnerships, 12. Compartmentalised
Friendship,13. Childhood friendship,14. Courtship,15. Fling, 16.
Secret-Affair, 17. Enslavement , 2. Marriages of Convenience,3.
Best Friendships,4. Kinships, 5. Rebounds/ Avoidance-Driven,6.
Courtships,7.Dependencies 8. Enmities, 9. Love-Hate (Sweeney and
Chew)
Africa,Argentina,Australia,Australia/Hong Kong,
Austria, California, Canada, China, Egypt, England,
Finland, France Germany, Guernsey, Holland, India,
Indonesia, Ireland , Israel, Italy , Japan, Kuwait,
Malaysia, Nepal,Paraguay , Philippines, Phillipines,
Portugual, Saudi Arabia, Singapore South Africa,
Spain, Sweden, Taiwan, Thailand,UK ,USA
A&F,Beijing ,Gucci,LVMH,New York,Old Navy,
,Paris, Sydney, Tiffany, Tokyo, Tommy, Versace
An-Verb,An-Vis,Hol-Verb,Hol-Vis
Depriv/Enhance,Enhance/Depriv
Variables and Data Types in Big Data Set
27
Model Comparison By Variables/Predictors
6. Supermarket Shopper Behavior
Beacon
Active Card
7.Smart Sandbag System
smart-dove.com
The first 3 columns are x, y, z axis of gyroscope, then x, y,
z axis of accelerator. These are raw data of 40 repetitions
of shoulder press exercise. Standard Deviation and
moving average algorithm to build the chart and Hidden
Markov Model to extract features and build model of
exercise. All models are put into cloud for trainee
exercise scoring.
8. Oil reserves shipment monitoring
Ras Tanura Najmah compound, Saudi Arabia
Source: http://www.skyboximaging.com/blog/monitoring-oil-reserves-from-space
9. Carat: Collaborative Energy Diagnosis
Information Architecture
Source: http://carat.cs.berkeley.edu
Spark
Streaming
GraphX
SparkSQL
MLLib
Square
Kilometer Array
(SKA)
• Data collected in a single day take nearly two million years to playback on an MP3 player
• Central computer has processing power of about one hundred million PCs.
• SKA will use enough optical fiber linking up all the radio telescopes to wrap twice around the Earth.
• Dishes of SKA when fully operational will produce 10 times the global internet traffic as of 2013.
• Aperture arrays in the SKA could produce more than 100 times the global internet traffic as of 2013.
• The SKA will generate enough raw data to fill 15 million 64 GB MP3 players every day.
• The SKA supercomputer will perform 1018 operations per second - equivalent to the number of stars in three
million Milky Way galaxies - in order to process all the data that the SKA will produce.
• So sensitive that it will be able to detect an airport radar on a planet 50 light years away.
• Thousands of antennas with collecting area of about one square kilometer (that's 1,000,000 square meters).
• Previous mapping of Centaurus A galaxy took a team 12,000 hours of observations or several years. SKA ETA 5
minutes !
• In first six hours of operation, SKA will generate more information than all previous radio telescopes
• in the world combined.
To the scientists involved, however, the SKA is no testbed, it’s a transformative instrument
which, according to Luijten, will lead to “fundamental discoveries of how life and planets and
matter all came into existence. As a scientist, this is a once in a lifetime opportunity.”
Sources: http://bit.ly/amazin-facts & http://bit.ly/astro-ska
Centaurus A
Caution!
“Children never put off till
tomorrow what will keep
them from going to bed
tonight”
ADVERTISING AGE

Más contenido relacionado

Destacado

bodyboard
bodyboardbodyboard
bodyboard
nikolas
 
Workshop B - Tools for SNA
Workshop B - Tools for SNA Workshop B - Tools for SNA
Workshop B - Tools for SNA
suresh sood
 
Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016
suresh sood
 
Australian Business Culture
Australian Business Culture Australian Business Culture
Australian Business Culture
suresh sood
 

Destacado (11)

[LT] V.Benetis. Kibernetinis saugumas: ką būtina žinoti IT paslaugų pirkėjams
[LT] V.Benetis. Kibernetinis saugumas: ką būtina žinoti IT paslaugų pirkėjams[LT] V.Benetis. Kibernetinis saugumas: ką būtina žinoti IT paslaugų pirkėjams
[LT] V.Benetis. Kibernetinis saugumas: ką būtina žinoti IT paslaugų pirkėjams
 
[LT] 2015 11 19 V.Benetis. Asmens duomenų apsauga kibernetinėje erdvėje kas p...
[LT] 2015 11 19 V.Benetis. Asmens duomenų apsauga kibernetinėje erdvėje kas p...[LT] 2015 11 19 V.Benetis. Asmens duomenų apsauga kibernetinėje erdvėje kas p...
[LT] 2015 11 19 V.Benetis. Asmens duomenų apsauga kibernetinėje erdvėje kas p...
 
bodyboard
bodyboardbodyboard
bodyboard
 
Spark
SparkSpark
Spark
 
Workshop B - Tools for SNA
Workshop B - Tools for SNA Workshop B - Tools for SNA
Workshop B - Tools for SNA
 
Systemof insight
Systemof insightSystemof insight
Systemof insight
 
Future of jobs, big data & innovation
Future of jobs, big data & innovation Future of jobs, big data & innovation
Future of jobs, big data & innovation
 
Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016
 
Cybersecurity Skills Audit
Cybersecurity Skills AuditCybersecurity Skills Audit
Cybersecurity Skills Audit
 
Australian Business Culture
Australian Business Culture Australian Business Culture
Australian Business Culture
 
Session 1
Session 1Session 1
Session 1
 

Similar a Datainnovation

open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptx
DennicaRivera
 
Utrecht Open Data, 19 juni 2012
Utrecht Open Data, 19 juni 2012Utrecht Open Data, 19 juni 2012
Utrecht Open Data, 19 juni 2012
Ivonne Jansen
 

Similar a Datainnovation (20)

Bigdataforesight
BigdataforesightBigdataforesight
Bigdataforesight
 
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics NetherlandsBig Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
open-data-presentation.pptx
open-data-presentation.pptxopen-data-presentation.pptx
open-data-presentation.pptx
 
Open Data Journalism
Open Data JournalismOpen Data Journalism
Open Data Journalism
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Utrecht Open Data, 19 juni 2012
Utrecht Open Data, 19 juni 2012Utrecht Open Data, 19 juni 2012
Utrecht Open Data, 19 juni 2012
 
From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle From DARPA to Shakespeare: All the Data we Can Handle
From DARPA to Shakespeare: All the Data we Can Handle
 
Technologies and Innovation Worth Watching in 2016
Technologies and Innovation Worth Watching in 2016Technologies and Innovation Worth Watching in 2016
Technologies and Innovation Worth Watching in 2016
 
Open Data Durban Presentation - July, 2015
Open Data Durban Presentation - July, 2015Open Data Durban Presentation - July, 2015
Open Data Durban Presentation - July, 2015
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
 
Listen to the Pulse of the City
Listen to the Pulse of the CityListen to the Pulse of the City
Listen to the Pulse of the City
 
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...Paulo Canas Rodrigues - The role of Statistics  in the  Internet of Things - ...
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - ...
 
Smart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart dataSmart Data Module 1 introduction to big and smart data
Smart Data Module 1 introduction to big and smart data
 
Data science and visualization lab presentation
Data science and visualization lab presentationData science and visualization lab presentation
Data science and visualization lab presentation
 
Big Data and official statistics with examples of their use
Big Data and official statistics with examples of their useBig Data and official statistics with examples of their use
Big Data and official statistics with examples of their use
 
Zeng marcia ifla-subjectaccesssmartdatadh
Zeng marcia ifla-subjectaccesssmartdatadhZeng marcia ifla-subjectaccesssmartdatadh
Zeng marcia ifla-subjectaccesssmartdatadh
 
Ontological Routing & POI linking
Ontological Routing & POI linkingOntological Routing & POI linking
Ontological Routing & POI linking
 
[AIIM17] Knowledge Management and the Internet of Things - Katrina Pugh
[AIIM17]  Knowledge Management and the Internet of Things - Katrina Pugh[AIIM17]  Knowledge Management and the Internet of Things - Katrina Pugh
[AIIM17] Knowledge Management and the Internet of Things - Katrina Pugh
 
Using gamification to generate citizen input for public transport planning
Using gamification to generate citizen input for public transport planningUsing gamification to generate citizen input for public transport planning
Using gamification to generate citizen input for public transport planning
 

Más de suresh sood

Transforming instagram data into location intelligence
Transforming instagram data into location intelligenceTransforming instagram data into location intelligence
Transforming instagram data into location intelligence
suresh sood
 

Más de suresh sood (20)

Getting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
Getting to the Edge of the Future - Tools & Trends of Foresight to NowcastingGetting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
Getting to the Edge of the Future - Tools & Trends of Foresight to Nowcasting
 
Bigdata AI
Bigdata AI Bigdata AI
Bigdata AI
 
Bigdata ai
Bigdata aiBigdata ai
Bigdata ai
 
Data Science Innovations
Data Science InnovationsData Science Innovations
Data Science Innovations
 
Foresight conversation
Foresight conversationForesight conversation
Foresight conversation
 
Data science Innovations January 2018
Data science Innovations January 2018Data science Innovations January 2018
Data science Innovations January 2018
 
future2020
future2020future2020
future2020
 
Data science innovations
Data science innovations Data science innovations
Data science innovations
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
Swarm jobs
Swarm jobsSwarm jobs
Swarm jobs
 
Beyond dashboards
Beyond dashboardsBeyond dashboards
Beyond dashboards
 
Foresight Analytics
Foresight AnalyticsForesight Analytics
Foresight Analytics
 
TPA
TPATPA
TPA
 
Datapreneurs
DatapreneursDatapreneurs
Datapreneurs
 
Jobs Complexity
Jobs ComplexityJobs Complexity
Jobs Complexity
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
 
DBIA
DBIADBIA
DBIA
 
Transforming instagram data into location intelligence
Transforming instagram data into location intelligenceTransforming instagram data into location intelligence
Transforming instagram data into location intelligence
 
Crowdsourcing Social Media
Crowdsourcing Social Media Crowdsourcing Social Media
Crowdsourcing Social Media
 
Crowdsourcing co creation and ideation
Crowdsourcing co creation and ideationCrowdsourcing co creation and ideation
Crowdsourcing co creation and ideation
 

Último

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Último (20)

PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 

Datainnovation

  • 1. Data Science Innovations: Roadmap to Hadoop Ecosystem & Spark Suresh.sood@uts.edu.au linkedin.com/in/sureshsood @soody http://www.slideshare.net/ssood/datainnovation February 4, 2015
  • 2. Topic Areas for Discussion 1. Statistics/Data mining or Data Science? 2. What is big data and the challenge today ? 3. Data types 4. Data Science workflows & discovery 5. Hadoop 6. Data Science innovation 7. New Sources of Information (Big data)  Data Driven Innovations 8. Internet of Things 9. Data Science Innovations 10. Apache Spark
  • 3. Statistics, Data Mining or Data Science ? • Statistics – precise deterministic causal analysis over precisely collected data • Data Mining – deterministic causal analysis over re-purposed data carefully sampled • Data Science – trending/correlation analysis over existing data using bulk of population i.e. big data Adapted from: NIST Big Data taxonomy draft report (see http://bigdatawg.nist.gov /show_InputDoc.php)
  • 4. Unknown relationships Unstructured data 95% of data not collected Social-Psychological- local-Mobile-GPS-M2M Beyond Transactions including interactions and observations 4 What is Big Data ?
  • 5. Big Data Challenge Today : Moving from Transactions Alone to Relationships and Empathy Current State = Transactions $$$ We do this stuff well e.g. Collect payments … Future State = Human Empathy (relationships) We don’t do this really e.g. User generated content, ratings, reviews, 1:1 dialogue, Distress Signals, Geolocation 5
  • 6. Data Types • Astronomical • Documents • Earthquake • Email • Environmental sensors • Fingerprints • Health (personal) Images • Graph data (social network) • Location • Marine • Particle accelerator • Satellite • Scanned survey data • Sound • Text • Transactions • Video
  • 8. Hadoop & Spark Explained
  • 9. HadoopConfigurations(SingleandMulti-Rack) Adapted from: http://stackiq.com/ Cluster manager e.g. Apache Ambari, Apache Mesos, or Rocks 3 TB drives ,18 data nodes configuration represents 648 TB of raw storage HDFS standard replication factor of 3 216 TB of usable storage Name/secondary/data nodes – 6 core 96 GB Management node – 4 core 16 GB
  • 10. Data Science Innovation Data science innovation is something an organization has not done before or even something nobody anywhere has done before. A data science innovation focuses on discovering and using new or untraditional data sources to solve new problems. Adapted from: Franks, B. (2012) Taming the Big Data Tidal Wave, p. 255, John Wiley & Son
  • 11. http://tacocopter.com/ New Sources of Information (Big data) : Social Media + Internet of Things  Data Science Innovations
  • 12. Internet of Things (IOTs) “trillion sensors” Source: www.tsensorssummit.org
  • 13. Data Science Innovations ID Analytics Innovative Info source Innovation Software/Platform 1. Node-Link (NLA) Multiple Reduce suspect list from 18 m to 230/32 New version Spark GraphX 2. ANZ Truckometer NZ transport authority real time traffic data GDP forecast 6 months in advance N/A 3. Driving (Usage Based) Black box (telematics) Unstructured data Pay as you drive policy Pay how you drive Hadoop Map Reduce 4a. Deception (veracity) Found stories online blogs Flag fake stories text, images and short video MongoDB – Python dictionary 4b. Psychological State Twitter and Instagram Junk words MongoDB – Python dictionary 4c. Thematic Apperception Technique Mobile phone screen customisation Automated informant testing Sparkling Water (H2O/Spark) Deep Learning 5. Brand Brand stories “found” online Brand user profile R/Hadoop 6. Supermarket shopper behavior CCTV /beacon transmitters “My store” product placement based on time of day predictive shopping behaviour MongoDB Hadoop 2 Cluster Spark GraphX Spark MLib 7. Sandbag exercise Sandbag sensors Virtual trainer Spark GraphX Spark MLib 8. Oil reserves shipment monitoring Skybox (Google) satellite images Improved oil forecast “Busboy” – C /Hadoop 9. J score for mobile energy usage Sparse incomplete data from community of mobile users Energy bug mgmt. Spark/Amazon Web Suresh Sood 2015
  • 14. 1. Node Link Analytics • 1990’s Ivan Milat killed 7 backpackers making him Australia's most notorious Serial Killer • Everyone in Australia was a suspect • Large volumes of data from multiple sources  RTA Vehicle records  Gym Memberships  Gun Licensing records  Internal Police records • Police applied node link analysis techniques (NetMap) to the data • Harness power of the human mind • Analyst can spot indirect links, patterns , structure, relationships and anomalies • A bottom-up approach with process of discovery to uncover structure • Reduced the suspect list from 18 million to 230 • Further analysis with the use of additional satellite information reduced this to 32 Data Information Knowledge
  • 15. The ANZ Heavy Traffic Index comprises flows of vehicles weighing more than 3.5 tonnes (primarily trucks) on 11 selected roads around NZ. It is contemporaneous with GDP growth. The ANZ Light Traffic Index is made up of light or total traffic flows (primarily cars and vans) on 10 selected roads around the country. It gives a six month lead on GDP growth http://www.anz.co.nz/about-us/economic-markets-research/truckometer/ 2.
  • 16. 3. Black Box Insurance • Big data transforms actuarial insurance from using probability methods to estimate premiums into dynamic risk management using real data generating individually tailored premiums • Estimate 20 km work or home journey, data point acquired every min and journey captures 12 points per km. Assume 1000 km per month driving or generating 12,000 points per month resulting in 144,000 points per car/annum. Hence, 1,000 cars leads to 144 million points per annum. • Telematics technology (black box) monitor helps assess the driving behavior and prices policy based on true driver centric premiums by capturing: – Number of journeys – Distances travelled – Types of roads – Speed – Time of travel – Acceleration and braking – Any accidents – Location ? • Benefits low mileage, smooth and safe drivers • Privacy vs. Saving monies on insurance (Canada ; http://bit.ly/Black_box)
  • 17. Psychological analytics helps put human context into Business • Behavior data  Links human emotions to business -> Analyse footprints left behind. • What really does customer satisfaction mean ? Is the person actually happy? • How do we take the emotional dimension into account for customer experience? • How do we recognize someone is dissatisfied? • How do we recognize a “distressed” person? • Do we use text and voice? Will sleeping patterns and eating habits help? • would you act differently if someone is happy? • How do you coach employees to see how someone sounds in emotional terms? • Understanding when distress exists and when a customer needs enhanced service • Behavior data reveals attitude and intent. This is more predictive of future opportunities and risk versus historical data
  • 19. 1.Gayle 3. Paris 2. Paige + + 4.”The occasion was my cousin Paige’s 16th” 5. “I am a Canadian and get by in French.” 6. "All I can say is WOW! We rented a 2 bedroom, 1 ½ bath apartment (two showers), "Merlot" from ParisPerfect http://www.parisperfect.com/ and boy was it ever perfect! " 7. “We had a full view of the Eiffel from our charming little terrace. ....We were within walking distance to two metro stops (Pont d'Alma or Ecole Militaire) " 8. "We were walkable to many good bistros, cafes and bakeries and only a few blocks from the wonderful market street Rue Cler." 9. "I bought a Paris Pratique pocket-sized book at a Metro station. This handy guide has detailed maps of each arrondisement, as well as the metro lines, the bus lines, the RER and the SCNF (trains). I'll never be without this again." 10."Six months before our trip, I gave Paige a couple of good guide books on Paris and suggested she let me know what her interests were since after all, this was to be her trip." 11.Sites •The Marais •Notre Dame •L'Arc de Triomphe - 248 steps up and 248 steps down... •Champs Elysee •Jacquemart Museum •Louvre Lite •Musee D'Orsay •Les Invalides, Napoleon's Tomb and the Napoleon Museum •Sacre Coeur •Monmartre •Rodin Museum •Pompidou Museum •Train to Vernon, bike to Giverny with Fat Tire Bike Tours •http://www.fattirebiketoursparis.com/ •Eiffel Tower Elaboration of Trip to Paris Blog Story (Means-End & Heider) Woodside, Sood & Miller 2008 When Consumers and Brands Talk Psychology & Marketing 12. Unforgettable Memories "This trip had so many memories, but here are a few choice highlights........On our very first night, knowing that the Eiffel Tower light show started at 10:00 p.m.... she [Paige] dropped her camera…down 6 flights…we were stunned…Spanish Family below standing below [with pieces of the camera]” 15." Michael Osman is an American artists living in Paris." "He supplements his income by being a tour guide." I" found out about him on Fodors" "So I engaged Michael for two days." 16. "On our trip to Giverny, we met a young woman from Brisbane, Australia who was traveling on her own and we invited her to join us. Three of us enjoyed delicious and innovative soufflés, while Paige had the rack of lamb. We shared two dessert soufflés, one chocolate and the other cherry/almond. Yum" 17. "I wanted Paige to get a feel for shopping experiences that she would not have at home (aka the ubiquitous mall). " 18."We went on Fat Tire's day trip to Monet's gardens and house in Giverny, about an hour outside Paris." 13."The father stretched out his cupped hands which held all of the pieces they were able to recover, including the memory stick and he very solemnly said, "El muerto...". 14. "They had decide to come to Paris to find the Harley Davidson store so they could buy Harley Paris t-shirts." + + + + 19....."I know Paige will treasure the memory of this girl's trip for many years to come." 19
  • 20. 20
  • 21. The Newman Model of Deception (Pennebaker et al) Key word categories for deception mapping: 1. Self words e.g. “I” and “me” – decrease when someone distances themselves from content 1. Exclusive words e.g. “but” and “or” decrease with fabricated content owing to complexity of maintaining deception 1. Negative emotion words e.g. “hate” increase in word usage owing to shame or guilty feeling 1. Motion verbs e.g. “go” or “move” increase as exclusive words go down to keep the story on track
  • 22. Instagram Deception (Suspects outside of -20 & +20) Vine Deception (Suspects outside of -5 and +5)
  • 23. 4b. Psychological State • LIWC (analyzewords.com) – Reveal personality from word usage – Uses LIWC classification of words • TweetPsych (tweetpsych.com/) – Linguisitic analysis using: – RID – LIWC Note: TweetPsych is not without critics: http://psychcentral.com/blog/archives/2009/06/18/putting-cool-ahead-of-science-tweetpsych/
  • 25. Social CRM integrates “breadcrumb” data 25 5. Brand User Analytics
  • 26. Aquarius,Aries,Cancer,Capricorn,Gemini,Leo,Libra, Pisces, Sagittarius,Scorpio,Taurus,Virgo Ambivalent, Employee, Opposer, Reporter, Supporter 11. Committed Partnerships, 12. Compartmentalised Friendship,13. Childhood friendship,14. Courtship,15. Fling, 16. Secret-Affair, 17. Enslavement , 2. Marriages of Convenience,3. Best Friendships,4. Kinships, 5. Rebounds/ Avoidance-Driven,6. Courtships,7.Dependencies 8. Enmities, 9. Love-Hate (Sweeney and Chew) Africa,Argentina,Australia,Australia/Hong Kong, Austria, California, Canada, China, Egypt, England, Finland, France Germany, Guernsey, Holland, India, Indonesia, Ireland , Israel, Italy , Japan, Kuwait, Malaysia, Nepal,Paraguay , Philippines, Phillipines, Portugual, Saudi Arabia, Singapore South Africa, Spain, Sweden, Taiwan, Thailand,UK ,USA A&F,Beijing ,Gucci,LVMH,New York,Old Navy, ,Paris, Sydney, Tiffany, Tokyo, Tommy, Versace An-Verb,An-Vis,Hol-Verb,Hol-Vis Depriv/Enhance,Enhance/Depriv Variables and Data Types in Big Data Set
  • 27. 27
  • 28. Model Comparison By Variables/Predictors
  • 29. 6. Supermarket Shopper Behavior Beacon Active Card
  • 30. 7.Smart Sandbag System smart-dove.com The first 3 columns are x, y, z axis of gyroscope, then x, y, z axis of accelerator. These are raw data of 40 repetitions of shoulder press exercise. Standard Deviation and moving average algorithm to build the chart and Hidden Markov Model to extract features and build model of exercise. All models are put into cloud for trainee exercise scoring.
  • 31. 8. Oil reserves shipment monitoring Ras Tanura Najmah compound, Saudi Arabia Source: http://www.skyboximaging.com/blog/monitoring-oil-reserves-from-space
  • 32. 9. Carat: Collaborative Energy Diagnosis
  • 35. Square Kilometer Array (SKA) • Data collected in a single day take nearly two million years to playback on an MP3 player • Central computer has processing power of about one hundred million PCs. • SKA will use enough optical fiber linking up all the radio telescopes to wrap twice around the Earth. • Dishes of SKA when fully operational will produce 10 times the global internet traffic as of 2013. • Aperture arrays in the SKA could produce more than 100 times the global internet traffic as of 2013. • The SKA will generate enough raw data to fill 15 million 64 GB MP3 players every day. • The SKA supercomputer will perform 1018 operations per second - equivalent to the number of stars in three million Milky Way galaxies - in order to process all the data that the SKA will produce. • So sensitive that it will be able to detect an airport radar on a planet 50 light years away. • Thousands of antennas with collecting area of about one square kilometer (that's 1,000,000 square meters). • Previous mapping of Centaurus A galaxy took a team 12,000 hours of observations or several years. SKA ETA 5 minutes ! • In first six hours of operation, SKA will generate more information than all previous radio telescopes • in the world combined. To the scientists involved, however, the SKA is no testbed, it’s a transformative instrument which, according to Luijten, will lead to “fundamental discoveries of how life and planets and matter all came into existence. As a scientist, this is a once in a lifetime opportunity.” Sources: http://bit.ly/amazin-facts & http://bit.ly/astro-ska Centaurus A
  • 36. Caution! “Children never put off till tomorrow what will keep them from going to bed tonight” ADVERTISING AGE