SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Dark Data: Where the Future Lies
Vince Kellen, Ph.D.
Senior Vice Provost
Analytics and Technologies
University of Kentucky
Vince.Kellen@uky.edu
March 5, 2014
This is a living document subject to substantial revision.
The economic case
 The global economy is now [permanently] fueled by information
 Innovation is becoming the merging of human creativity and
increasingly automated information extraction
 Data is growing exponentially, human creativity ‘cycles’ are not
 We are going to need [novel, surprising, freaky] ways of increasing
the speed of information extraction from vast and growing data
reserves
 Finally, we are going to have to develop [novel, surprising, freaky]
economic ‘infrastructure’ to foster emergent designs for turning
extracted information into wealth creation faster
2
[Population, wealth, technology, knowledge]
Hunting and foraging
Agricultural revolution
Rise of the ‘world system’
Industrial revolution
Post-information revolution
Sources: Wikipedia; various; UN Report World Population to 2300 (2004)
Diffusion accelerates technology adoption
Communications technology accelerates diffusion
World’s technological installed capacity to store information
Hilbert M, Lopez P. 2011. The World’s Technological Capacity to Store, Communicate, and Compute Information. Science. Vol. 332 no. 6025 pp. 60-65.
The total world's information,
which is 1.8 zettabytes, could
be stored in about four grams
of DNA.
Harvard stores 70 billion books using DNA.
Research team stores 5.5 petabits, or 1 million gigabits, per cubic
millimeter in DNA storage medium
http://www.computerworld.com/s/article/9230401/Harvard_stores_70_billion_books_using_DNA
Photo: Kelvin Ma for the Wall
Street Journal
Dr. Church keeps a vial of
DNA encoded with copies of
his latest book.
Cause…
http://andrewmcafee.org/2011/01/jevons-computation-efficicency-hardware-investmen/
http://andrewmcafee.org/2011/01/jevons-computation-efficicency-hardware-investmen/
Effect!
Desperately seeking productivity
Moore’s law, growth in data, IT investments
9
As data grows exponentially, so does dark data
Dark data
10
Rate of innovation, pace of urban life
 In order to sustain exponential economic growth rates, the rate of
innovation must increase. Otherwise we will not have exponential
growth
 Information flows through human culture (cities) is akin to blood
flowing through a circulatory system.
• Both cities and animals conserve physical energy (molecules). As both
get bigger, they conserve energy
 However, the two systems have two fundamentally different
behavior when it comes to ‘output’
• As cities get bigger, their ‘pace of life’ and economic output increases.
The rate of information flow quickens
• As animals get bigger, their ‘pace of life’ and metabolic output
decreases. The rate of metabolic flow decreases
Bettencourt, et al. (2007). Growth, innovation, scaling and the pace of life in cities. www.pnas.orgcgidoi10.1073pnas.0610172104
11
Information rules
 Information quickening drives economic growth, encouraging consumption
(and conservation) of molecules
• While fears of a Malthusian collapse have haunted economists forever,
innovation and technology has enabled growth so far
• Analytics can lead to productivity increases
 Information’s dominance in the economy appears to be causing slowing or
reversing population growth rates
• Rising populations drive rates of innovation and economic growth. No
population growth might be worrisome
• Is rising information unexpectedly going to cool down the economy?
 While innovation allows both growth and efficient use of resources, to
sustain growth we are going to need more innovation, not less!
• Increasing stores of information and means of action will be needed
• DARK DATA WILL NEED TO BE MINED
12
Bits versus Atoms
Physical material exhibits limits to scale. Data does not. Computing cost-
effectiveness growth enables exponential data growth
13
Two overlapping, interacting systems
The two systems now interact. Less molecules create more data. Information fuels
economic growth, reduces population rates, improves utilization of molecules. The
rate at which dark data is applied will affect all these rates
Molecules
Dark data
Information
14
Pause. Where are we?
Data and information are very important at this point in human history. How
do we take advantage of these megatrends?
15
Production and consumption of information
 In order to unleash dark data, we have to worry about two
problems: better production of information from dark data reserves
and better means of applying mined information to economic
activity (consumption)
 Production
• We will need new purely human, purely technical and human-technical
ways of extracting information from growing reserves of data
 Consumption
• We are desperately going to need [old, new] human beings with a very
different orientation to data and decision making
16
Production ideas
 Crowd-sourced and community sourced analytics.
• Skills will be scarce. Have to do a better job of matching analytics to global
skill sets
 Dark data exchanges
• Can we sell our dark data to others for their exploitation? Can we buy others
dark data?
 Dark data reserves exploration
• More use of automated means of discovering data reserves and cataloging
their location. Idea generation on possible value from mining
 Data refineries
• We need to improve the rate at which data can be refined. Automated
metadata extraction, automated data quality detection, semi-automated model
construction, elimination of ‘one-off’ models and better reuse of partial or
complete models
17
Production ideas
 Make widespread use of rapid data discovery tools. The ability to go from
the first question to the final answer quickly matters greatly
 Combine purely automated technical methods of extraction and
refinement with human, collaborative processes to further refine the data
 Develop and use refined, automated data movement tools
 Increase data’s ‘surface area’
through careful model design aimed
at facilitating regular analysts’ use
 Increase data transparency, make
available to many more analysts
 Utilize new ways obscuring data to
improve privacy and security without
sacrificing pattern discovery
18
Information consumption dysfunction
 The No. 1 impediment for improved use of dark data is human
psychology. The dominant regime for managing information and power
must end
 This regime has the following attributes:
• Define goals and try to achieve them
• Maximize winning, minimize losing
• Unilateral control and accountability
 This regime causes the following dysfunctions
• Information is power, thus data is hoarded, metadata formation is guarded,
‘framing of problems’ becomes a competitive battlefield
• Gamers that rely on data obfuscation to make untestable claims
• Reliance on personal anecdote and sample sizes of 1
• Threat-induced reactions to difficult data, causing data suppression
• Cover-ups, manipulation of others, assaults on autonomy and agency
See Chris Argyris and Double Loop Learning. http://en.wikipedia.org/wiki/Chris_Argyris 19
The problems with the dominant regime
 It’s in our nature, all humans are
highly skilled at this behavior. Part
of being a child and parent
 It is toxic to creative, high IQ talent
 It inhibits team performance
 It creates internal political theater
 It limits terribly the application of
insights from dark data
 It causes awfully bad, if not tragic
public spectacles
20
Needed: A new culture of information
 A new cultural model needs to develop, based on the following
attributes
• Transparency. Provide equal access to all sides of a debate
• Rapid validation. Find and use tools that let all sides of a debate
analyze, validate or refute insights into data
• Instead of maximizing winning and minimizing losing, encourage
small, fast failure. Instead of ‘punishing’ individuals, put the focus on
team rewards and multi-lateral control
• Instead of empowering leaders so that accountability can be overly
simple, establish more intricate performance measurement systems
that stabilize the enterprise, provide better feedback to many
 The future of exploitation of dark data will be owned by teams that
can collaborate well, challenge members productively and stay
together long enough to turn the data into economic wealth
21
How can you spot the person who can’t succeed?
 Shine light on their data and data management processes. Ask
them to document and share details about their model. See if they
will allow others to independently verify their results. Engage in a
conversation about their model assumptions
 Gamers playing under the old rules will typically do the following
• Defer, delay and avoid the meeting or producing the evidence
• Refer to concepts like ‘we’re the experts’ or ‘we can’t explain it to non-
experts’
• Change the subject
• Cite powers outside of their control that limit their ability to respond
• Go undercover and hide for a while
 You can’t succeed with a house full of gamers
22
Building expert teams takes skill and time
 Expert teams share a clear and common purpose and a strong mission
 Expert teams share mental models
• Their members anticipate each other. That can communicate without the need for overt communication
 They are adaptive
• They are self correcting. Their members compensate for each other. They reallocate functions. They engage in
a cycle of prebrief-performance-debrief, giving feedback to each other. They establish and revise team goals.
They differentiate between high and low priorities. They have mechanisms for anticipating and reviewing issues
and problems of members. They periodically review and diagnose team effectiveness and team vitality
 They have clear (but not overly clear or rigid) roles and responsibilities
• Members understand their roles and how they fit together
 They have strong team leadership
• Led by someone with good leadership, not just technical skills. They have team members who believe the
leader cares about them. They provide situation updates. They foster teamwork, coordination and cooperation.
They self-correct first
 They develop a strong sense of "collective"
• Trust, teamness and confidence are important. They manage conflict well. Members confront each other
effectively. They trust each others intentions
 They optimize performance outcomes
• They make fewer errors. They communicate often enough, ensuring members have the information to be able
to contribute. They make better decisions
 The cooperate and coordinate
• They identify team task work requirements. They ensure, through staffing and development, that the team
possesses the right mix of competencies. They consciously integrate new members. They distribute and assign
work thoughtfully. They examine and adjust the physical workspace to optimize communication and
coordination
Other consumption ideas
 Examining decision-making within the enterprise. Find bottlenecks
to faster decisions. Draw a new line separating central from local
agency. Let projects proceed with light/fast approval with follow-up
and audit later
 More rapid or time-boxed decision making. Use agile approaches.
Minimally viable products. Incremental releases
 Reward spontaneous collaboration. Design committees, teams,
units based on collaboration IQ rather than representativeness
 Automate more decisions, starting with the mundane or risk-free
 Define new roles with complementary analysis and application
skills. Hire more generalists with excellent critical thinking
24
CEO imperative
 Designing an organization that can take advantage of dark data is
very difficult. It is a CEO problem
 The challenge has many layers
• Understanding where to strategically apply dark data findings, how to
compete on analytics
• Ascertaining organizational and infrastructure readiness
• Establishing executive and employee incentive models that help
• Managing and monitoring progress at the technical, individual, team
and enterprise level
• Enforcing evidence-based decision making and changing the culture
• Designing the models to be used throughout the enterprise
 CIOs can play a strong role, but the CEO, IMHO, has to own this
25
CEO Advisory Engagement
1. Strategic possibilities
• Examine the firm’s business model, value-creating activities
• Identify areas where analytics and data may help, through ideation sessions
2. Dark data inventory
• Document the data assets across the enterprise
• Categorize and rank by quality and availability
3. Value network assessment
• Evaluate the value for upstream and downstream players
• Identify potential sources, uses for dark data
4. Economic estimates
• Identify use cases, evaluate potential benefit and risks
• Prioritize opportunities
5. Organizational development and change management
• Identify culture issues, skill gaps, org structure changes, incentives, additional
resources needed, communications approach, timelines and sequencing
26
Summary
 Information is redefining humanity in ways we still don’t understand. The
future is not certain. It will be written by winners
 Economic growth depends on rates of innovation. Innovation depends on
new insights which come chiefly from data
 Data is growing exponentially. Human ability to process it is not. Thus,
dark data is growing exponentially too
 Firms differ widely in their [in]ability to mine data for information
(production) and apply information in decisions (consumption)
 A [largely, partially] semi-automated analytic discovery and refining
capability is imminent
 Winners will find new ways of organizing themselves and their
ecosystems to gain advantage, speeding up timeframes
27
Questions?
“Get your facts first, and then you can distort them as you please.”
-Mark Twain 28

Más contenido relacionado

La actualidad más candente

The future of data analytics
The future of data analyticsThe future of data analytics
The future of data analyticsEdward Chenard
 
Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...
Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...
Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...Karen Graham
 
Multimediapresentatio nforest d
Multimediapresentatio nforest dMultimediapresentatio nforest d
Multimediapresentatio nforest dWaldenForest
 
Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Hamilton
 
How to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inHow to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inMary Chitty
 
ELH School Tech 2013 - Computational Thinking
ELH School Tech 2013 - Computational ThinkingELH School Tech 2013 - Computational Thinking
ELH School Tech 2013 - Computational ThinkingPaul Herring
 
Systemic Learning Analytics Symposium, October 10th 2013
Systemic Learning Analytics Symposium, October 10th 2013Systemic Learning Analytics Symposium, October 10th 2013
Systemic Learning Analytics Symposium, October 10th 2013Adam Cooper
 
The Ethics of Structured Information
The Ethics of Structured InformationThe Ethics of Structured Information
The Ethics of Structured InformationNicholas Poole
 
The Modern Columbian Exchange: Biovision 2012 Presentation
The Modern Columbian Exchange: Biovision 2012 PresentationThe Modern Columbian Exchange: Biovision 2012 Presentation
The Modern Columbian Exchange: Biovision 2012 PresentationMerck
 
Use of Technology Tools to Improve Leadership
Use of Technology Tools to Improve LeadershipUse of Technology Tools to Improve Leadership
Use of Technology Tools to Improve LeadershipBill Sheridan, CAE
 
Why the ‘Old Brain’ Struggles with Big Data - Deloitte CIO - WSJ
Why the ‘Old Brain’ Struggles with Big Data - Deloitte CIO - WSJWhy the ‘Old Brain’ Struggles with Big Data - Deloitte CIO - WSJ
Why the ‘Old Brain’ Struggles with Big Data - Deloitte CIO - WSJSherry Jones
 
PKM and Corporate Memory - a dichotomy?
PKM and Corporate Memory - a dichotomy?PKM and Corporate Memory - a dichotomy?
PKM and Corporate Memory - a dichotomy?Collabor8now Ltd
 
BYOD: Beating IT's Kobayashi Maru
BYOD: Beating IT's Kobayashi MaruBYOD: Beating IT's Kobayashi Maru
BYOD: Beating IT's Kobayashi MaruMichele Chubirka
 
Team building insights from artificial intelligence
Team building insights from artificial intelligenceTeam building insights from artificial intelligence
Team building insights from artificial intelligenceRobert Roan
 
Watson - A new era of computing.
Watson - A new era of computing.Watson - A new era of computing.
Watson - A new era of computing.Cesar Maciel
 
A Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationA Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationUniversity of South Africa (Unisa)
 

La actualidad más candente (20)

The future of data analytics
The future of data analyticsThe future of data analytics
The future of data analytics
 
Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...
Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...
Data Maturity for Nonprofits: Three Perspectives, Nine Lessons, and Three Ass...
 
Multimediapresentatio nforest d
Multimediapresentatio nforest dMultimediapresentatio nforest d
Multimediapresentatio nforest d
 
Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science
 
Information Overload Phenomena
Information Overload PhenomenaInformation Overload Phenomena
Information Overload Phenomena
 
How to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inHow to create a taxonomy for management buy-in
How to create a taxonomy for management buy-in
 
ELH School Tech 2013 - Computational Thinking
ELH School Tech 2013 - Computational ThinkingELH School Tech 2013 - Computational Thinking
ELH School Tech 2013 - Computational Thinking
 
Systemic Learning Analytics Symposium, October 10th 2013
Systemic Learning Analytics Symposium, October 10th 2013Systemic Learning Analytics Symposium, October 10th 2013
Systemic Learning Analytics Symposium, October 10th 2013
 
The Ethics of Structured Information
The Ethics of Structured InformationThe Ethics of Structured Information
The Ethics of Structured Information
 
The Modern Columbian Exchange: Biovision 2012 Presentation
The Modern Columbian Exchange: Biovision 2012 PresentationThe Modern Columbian Exchange: Biovision 2012 Presentation
The Modern Columbian Exchange: Biovision 2012 Presentation
 
Impact of data overloading on productivity
Impact of data overloading on productivityImpact of data overloading on productivity
Impact of data overloading on productivity
 
Use of Technology Tools to Improve Leadership
Use of Technology Tools to Improve LeadershipUse of Technology Tools to Improve Leadership
Use of Technology Tools to Improve Leadership
 
Why the ‘Old Brain’ Struggles with Big Data - Deloitte CIO - WSJ
Why the ‘Old Brain’ Struggles with Big Data - Deloitte CIO - WSJWhy the ‘Old Brain’ Struggles with Big Data - Deloitte CIO - WSJ
Why the ‘Old Brain’ Struggles with Big Data - Deloitte CIO - WSJ
 
Women On The Leading Edge
Women On The Leading Edge Women On The Leading Edge
Women On The Leading Edge
 
PKM and Corporate Memory - a dichotomy?
PKM and Corporate Memory - a dichotomy?PKM and Corporate Memory - a dichotomy?
PKM and Corporate Memory - a dichotomy?
 
BYOD: Beating IT's Kobayashi Maru
BYOD: Beating IT's Kobayashi MaruBYOD: Beating IT's Kobayashi Maru
BYOD: Beating IT's Kobayashi Maru
 
Team building insights from artificial intelligence
Team building insights from artificial intelligenceTeam building insights from artificial intelligence
Team building insights from artificial intelligence
 
Watson - A new era of computing.
Watson - A new era of computing.Watson - A new era of computing.
Watson - A new era of computing.
 
A Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) EducationA Blind Date With (Big) Data: Student Data in (Higher) Education
A Blind Date With (Big) Data: Student Data in (Higher) Education
 
Module 1
Module 1Module 1
Module 1
 

Similar a Dark Data: Where the Future Lies

Korea talk on emerging technology and ideas for Korea's new creative economy...
Korea talk on  emerging technology and ideas for Korea's new creative economy...Korea talk on  emerging technology and ideas for Korea's new creative economy...
Korea talk on emerging technology and ideas for Korea's new creative economy...Jerome Glenn
 
Innovation series 112318
Innovation series 112318Innovation series 112318
Innovation series 112318Tim Maurer
 
AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumDale Sanders
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongMarTech Conference
 
01 deloitte predictive analytics analytics summit-09-30-14_092514
01   deloitte predictive analytics analytics summit-09-30-14_09251401   deloitte predictive analytics analytics summit-09-30-14_092514
01 deloitte predictive analytics analytics summit-09-30-14_092514bethferrara
 
Energing Technology and the Creative Economy
Energing Technology and the Creative EconomyEnerging Technology and the Creative Economy
Energing Technology and the Creative EconomyJerome Glenn
 
BA and Beyond 19 Sponsor spotlight - Namahn - Beating complexity with complexity
BA and Beyond 19 Sponsor spotlight - Namahn - Beating complexity with complexityBA and Beyond 19 Sponsor spotlight - Namahn - Beating complexity with complexity
BA and Beyond 19 Sponsor spotlight - Namahn - Beating complexity with complexityBA and Beyond
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfvishal choudhary
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxwahiba ben abdessalem
 
The Essential Data Ingredient
The Essential Data IngredientThe Essential Data Ingredient
The Essential Data IngredientRich Cooper
 
Climate Change 2015: Continuing Education Programming Implications
Climate Change 2015: Continuing Education Programming ImplicationsClimate Change 2015: Continuing Education Programming Implications
Climate Change 2015: Continuing Education Programming Implicationsdorothydurkin
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxssuser1a4f0f
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaChris Waller
 
Examining the Big Data Frontier
Examining the Big Data FrontierExamining the Big Data Frontier
Examining the Big Data FrontierGovLoop
 
The age of data - Putting responsible data into practice
The age of data - Putting responsible data into practiceThe age of data - Putting responsible data into practice
The age of data - Putting responsible data into practicePhuong Vo An
 
Taming the Data Tsunami
Taming the Data TsunamiTaming the Data Tsunami
Taming the Data TsunamiPaul Boal
 
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Joe Keating
 
Systems Thinking for agile service design
Systems Thinking for agile service designSystems Thinking for agile service design
Systems Thinking for agile service designjohanna kollmann
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionFabio Stella
 

Similar a Dark Data: Where the Future Lies (20)

Korea talk on emerging technology and ideas for Korea's new creative economy...
Korea talk on  emerging technology and ideas for Korea's new creative economy...Korea talk on  emerging technology and ideas for Korea's new creative economy...
Korea talk on emerging technology and ideas for Korea's new creative economy...
 
Innovation series 112318
Innovation series 112318Innovation series 112318
Innovation series 112318
 
AMDIS CHIME Fall Symposium
AMDIS CHIME Fall SymposiumAMDIS CHIME Fall Symposium
AMDIS CHIME Fall Symposium
 
The Human Side of Data By Colin Strong
The Human Side of Data By Colin StrongThe Human Side of Data By Colin Strong
The Human Side of Data By Colin Strong
 
01 deloitte predictive analytics analytics summit-09-30-14_092514
01   deloitte predictive analytics analytics summit-09-30-14_09251401   deloitte predictive analytics analytics summit-09-30-14_092514
01 deloitte predictive analytics analytics summit-09-30-14_092514
 
Energing Technology and the Creative Economy
Energing Technology and the Creative EconomyEnerging Technology and the Creative Economy
Energing Technology and the Creative Economy
 
BA and Beyond 19 Sponsor spotlight - Namahn - Beating complexity with complexity
BA and Beyond 19 Sponsor spotlight - Namahn - Beating complexity with complexityBA and Beyond 19 Sponsor spotlight - Namahn - Beating complexity with complexity
BA and Beyond 19 Sponsor spotlight - Namahn - Beating complexity with complexity
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
The Essential Data Ingredient
The Essential Data IngredientThe Essential Data Ingredient
The Essential Data Ingredient
 
Climate Change 2015: Continuing Education Programming Implications
Climate Change 2015: Continuing Education Programming ImplicationsClimate Change 2015: Continuing Education Programming Implications
Climate Change 2015: Continuing Education Programming Implications
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in Pharma
 
Examining the Big Data Frontier
Examining the Big Data FrontierExamining the Big Data Frontier
Examining the Big Data Frontier
 
The age of data - Putting responsible data into practice
The age of data - Putting responsible data into practiceThe age of data - Putting responsible data into practice
The age of data - Putting responsible data into practice
 
Taming the Data Tsunami
Taming the Data TsunamiTaming the Data Tsunami
Taming the Data Tsunami
 
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
 
Systems Thinking for agile service design
Systems Thinking for agile service designSystems Thinking for agile service design
Systems Thinking for agile service design
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
 

Más de Vince Kellen, Ph.D. (10)

Big Data and Student Retention
Big Data and Student RetentionBig Data and Student Retention
Big Data and Student Retention
 
OMG! MOOCs!
OMG! MOOCs!OMG! MOOCs!
OMG! MOOCs!
 
MOOCs and Higher Education
MOOCs and Higher EducationMOOCs and Higher Education
MOOCs and Higher Education
 
Big Data And The University
Big Data And The UniversityBig Data And The University
Big Data And The University
 
Project Volatility
Project VolatilityProject Volatility
Project Volatility
 
Passion Inventories
Passion InventoriesPassion Inventories
Passion Inventories
 
Why IT Needs Artistic Sensibilities
Why IT Needs Artistic SensibilitiesWhy IT Needs Artistic Sensibilities
Why IT Needs Artistic Sensibilities
 
Building Bridges
Building BridgesBuilding Bridges
Building Bridges
 
Rightplacing
RightplacingRightplacing
Rightplacing
 
Transformational Leadership
Transformational LeadershipTransformational Leadership
Transformational Leadership
 

Último

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Dark Data: Where the Future Lies

  • 1. Dark Data: Where the Future Lies Vince Kellen, Ph.D. Senior Vice Provost Analytics and Technologies University of Kentucky Vince.Kellen@uky.edu March 5, 2014 This is a living document subject to substantial revision.
  • 2. The economic case  The global economy is now [permanently] fueled by information  Innovation is becoming the merging of human creativity and increasingly automated information extraction  Data is growing exponentially, human creativity ‘cycles’ are not  We are going to need [novel, surprising, freaky] ways of increasing the speed of information extraction from vast and growing data reserves  Finally, we are going to have to develop [novel, surprising, freaky] economic ‘infrastructure’ to foster emergent designs for turning extracted information into wealth creation faster 2
  • 3. [Population, wealth, technology, knowledge] Hunting and foraging Agricultural revolution Rise of the ‘world system’ Industrial revolution Post-information revolution Sources: Wikipedia; various; UN Report World Population to 2300 (2004) Diffusion accelerates technology adoption Communications technology accelerates diffusion
  • 4. World’s technological installed capacity to store information Hilbert M, Lopez P. 2011. The World’s Technological Capacity to Store, Communicate, and Compute Information. Science. Vol. 332 no. 6025 pp. 60-65.
  • 5. The total world's information, which is 1.8 zettabytes, could be stored in about four grams of DNA. Harvard stores 70 billion books using DNA. Research team stores 5.5 petabits, or 1 million gigabits, per cubic millimeter in DNA storage medium http://www.computerworld.com/s/article/9230401/Harvard_stores_70_billion_books_using_DNA Photo: Kelvin Ma for the Wall Street Journal Dr. Church keeps a vial of DNA encoded with copies of his latest book.
  • 9. Moore’s law, growth in data, IT investments 9
  • 10. As data grows exponentially, so does dark data Dark data 10
  • 11. Rate of innovation, pace of urban life  In order to sustain exponential economic growth rates, the rate of innovation must increase. Otherwise we will not have exponential growth  Information flows through human culture (cities) is akin to blood flowing through a circulatory system. • Both cities and animals conserve physical energy (molecules). As both get bigger, they conserve energy  However, the two systems have two fundamentally different behavior when it comes to ‘output’ • As cities get bigger, their ‘pace of life’ and economic output increases. The rate of information flow quickens • As animals get bigger, their ‘pace of life’ and metabolic output decreases. The rate of metabolic flow decreases Bettencourt, et al. (2007). Growth, innovation, scaling and the pace of life in cities. www.pnas.orgcgidoi10.1073pnas.0610172104 11
  • 12. Information rules  Information quickening drives economic growth, encouraging consumption (and conservation) of molecules • While fears of a Malthusian collapse have haunted economists forever, innovation and technology has enabled growth so far • Analytics can lead to productivity increases  Information’s dominance in the economy appears to be causing slowing or reversing population growth rates • Rising populations drive rates of innovation and economic growth. No population growth might be worrisome • Is rising information unexpectedly going to cool down the economy?  While innovation allows both growth and efficient use of resources, to sustain growth we are going to need more innovation, not less! • Increasing stores of information and means of action will be needed • DARK DATA WILL NEED TO BE MINED 12
  • 13. Bits versus Atoms Physical material exhibits limits to scale. Data does not. Computing cost- effectiveness growth enables exponential data growth 13
  • 14. Two overlapping, interacting systems The two systems now interact. Less molecules create more data. Information fuels economic growth, reduces population rates, improves utilization of molecules. The rate at which dark data is applied will affect all these rates Molecules Dark data Information 14
  • 15. Pause. Where are we? Data and information are very important at this point in human history. How do we take advantage of these megatrends? 15
  • 16. Production and consumption of information  In order to unleash dark data, we have to worry about two problems: better production of information from dark data reserves and better means of applying mined information to economic activity (consumption)  Production • We will need new purely human, purely technical and human-technical ways of extracting information from growing reserves of data  Consumption • We are desperately going to need [old, new] human beings with a very different orientation to data and decision making 16
  • 17. Production ideas  Crowd-sourced and community sourced analytics. • Skills will be scarce. Have to do a better job of matching analytics to global skill sets  Dark data exchanges • Can we sell our dark data to others for their exploitation? Can we buy others dark data?  Dark data reserves exploration • More use of automated means of discovering data reserves and cataloging their location. Idea generation on possible value from mining  Data refineries • We need to improve the rate at which data can be refined. Automated metadata extraction, automated data quality detection, semi-automated model construction, elimination of ‘one-off’ models and better reuse of partial or complete models 17
  • 18. Production ideas  Make widespread use of rapid data discovery tools. The ability to go from the first question to the final answer quickly matters greatly  Combine purely automated technical methods of extraction and refinement with human, collaborative processes to further refine the data  Develop and use refined, automated data movement tools  Increase data’s ‘surface area’ through careful model design aimed at facilitating regular analysts’ use  Increase data transparency, make available to many more analysts  Utilize new ways obscuring data to improve privacy and security without sacrificing pattern discovery 18
  • 19. Information consumption dysfunction  The No. 1 impediment for improved use of dark data is human psychology. The dominant regime for managing information and power must end  This regime has the following attributes: • Define goals and try to achieve them • Maximize winning, minimize losing • Unilateral control and accountability  This regime causes the following dysfunctions • Information is power, thus data is hoarded, metadata formation is guarded, ‘framing of problems’ becomes a competitive battlefield • Gamers that rely on data obfuscation to make untestable claims • Reliance on personal anecdote and sample sizes of 1 • Threat-induced reactions to difficult data, causing data suppression • Cover-ups, manipulation of others, assaults on autonomy and agency See Chris Argyris and Double Loop Learning. http://en.wikipedia.org/wiki/Chris_Argyris 19
  • 20. The problems with the dominant regime  It’s in our nature, all humans are highly skilled at this behavior. Part of being a child and parent  It is toxic to creative, high IQ talent  It inhibits team performance  It creates internal political theater  It limits terribly the application of insights from dark data  It causes awfully bad, if not tragic public spectacles 20
  • 21. Needed: A new culture of information  A new cultural model needs to develop, based on the following attributes • Transparency. Provide equal access to all sides of a debate • Rapid validation. Find and use tools that let all sides of a debate analyze, validate or refute insights into data • Instead of maximizing winning and minimizing losing, encourage small, fast failure. Instead of ‘punishing’ individuals, put the focus on team rewards and multi-lateral control • Instead of empowering leaders so that accountability can be overly simple, establish more intricate performance measurement systems that stabilize the enterprise, provide better feedback to many  The future of exploitation of dark data will be owned by teams that can collaborate well, challenge members productively and stay together long enough to turn the data into economic wealth 21
  • 22. How can you spot the person who can’t succeed?  Shine light on their data and data management processes. Ask them to document and share details about their model. See if they will allow others to independently verify their results. Engage in a conversation about their model assumptions  Gamers playing under the old rules will typically do the following • Defer, delay and avoid the meeting or producing the evidence • Refer to concepts like ‘we’re the experts’ or ‘we can’t explain it to non- experts’ • Change the subject • Cite powers outside of their control that limit their ability to respond • Go undercover and hide for a while  You can’t succeed with a house full of gamers 22
  • 23. Building expert teams takes skill and time  Expert teams share a clear and common purpose and a strong mission  Expert teams share mental models • Their members anticipate each other. That can communicate without the need for overt communication  They are adaptive • They are self correcting. Their members compensate for each other. They reallocate functions. They engage in a cycle of prebrief-performance-debrief, giving feedback to each other. They establish and revise team goals. They differentiate between high and low priorities. They have mechanisms for anticipating and reviewing issues and problems of members. They periodically review and diagnose team effectiveness and team vitality  They have clear (but not overly clear or rigid) roles and responsibilities • Members understand their roles and how they fit together  They have strong team leadership • Led by someone with good leadership, not just technical skills. They have team members who believe the leader cares about them. They provide situation updates. They foster teamwork, coordination and cooperation. They self-correct first  They develop a strong sense of "collective" • Trust, teamness and confidence are important. They manage conflict well. Members confront each other effectively. They trust each others intentions  They optimize performance outcomes • They make fewer errors. They communicate often enough, ensuring members have the information to be able to contribute. They make better decisions  The cooperate and coordinate • They identify team task work requirements. They ensure, through staffing and development, that the team possesses the right mix of competencies. They consciously integrate new members. They distribute and assign work thoughtfully. They examine and adjust the physical workspace to optimize communication and coordination
  • 24. Other consumption ideas  Examining decision-making within the enterprise. Find bottlenecks to faster decisions. Draw a new line separating central from local agency. Let projects proceed with light/fast approval with follow-up and audit later  More rapid or time-boxed decision making. Use agile approaches. Minimally viable products. Incremental releases  Reward spontaneous collaboration. Design committees, teams, units based on collaboration IQ rather than representativeness  Automate more decisions, starting with the mundane or risk-free  Define new roles with complementary analysis and application skills. Hire more generalists with excellent critical thinking 24
  • 25. CEO imperative  Designing an organization that can take advantage of dark data is very difficult. It is a CEO problem  The challenge has many layers • Understanding where to strategically apply dark data findings, how to compete on analytics • Ascertaining organizational and infrastructure readiness • Establishing executive and employee incentive models that help • Managing and monitoring progress at the technical, individual, team and enterprise level • Enforcing evidence-based decision making and changing the culture • Designing the models to be used throughout the enterprise  CIOs can play a strong role, but the CEO, IMHO, has to own this 25
  • 26. CEO Advisory Engagement 1. Strategic possibilities • Examine the firm’s business model, value-creating activities • Identify areas where analytics and data may help, through ideation sessions 2. Dark data inventory • Document the data assets across the enterprise • Categorize and rank by quality and availability 3. Value network assessment • Evaluate the value for upstream and downstream players • Identify potential sources, uses for dark data 4. Economic estimates • Identify use cases, evaluate potential benefit and risks • Prioritize opportunities 5. Organizational development and change management • Identify culture issues, skill gaps, org structure changes, incentives, additional resources needed, communications approach, timelines and sequencing 26
  • 27. Summary  Information is redefining humanity in ways we still don’t understand. The future is not certain. It will be written by winners  Economic growth depends on rates of innovation. Innovation depends on new insights which come chiefly from data  Data is growing exponentially. Human ability to process it is not. Thus, dark data is growing exponentially too  Firms differ widely in their [in]ability to mine data for information (production) and apply information in decisions (consumption)  A [largely, partially] semi-automated analytic discovery and refining capability is imminent  Winners will find new ways of organizing themselves and their ecosystems to gain advantage, speeding up timeframes 27
  • 28. Questions? “Get your facts first, and then you can distort them as you please.” -Mark Twain 28