SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
Big Data, Democratized Analytics and Deep Context will Change How We
                        Think and Do Development
      Aniket Bhushan, Senior Researcher, The North-South Institute, abhushan@nsi-ins.ca
                                       Dec 21, 2011

International development as a field of research and practice has been more of a laggard than a
leader in using big data and powerful analytics. Much of the data is often dated or of poor
quality. Huge areas including those of the greatest interest remain entirely unmapped, data poor
or otherwise poorly understood.

This situation is changing faster than anyone predicted and the set of tools driving this evolution
represents one of the most important trends in international development. The proliferation of
mobile technologies, computing power and democratization of analytics within an open-source,
open-data chapeau will fundamentally change the way we think about and do development. I
provide a synopsis of the most impactful developments in three key areas: the base (data) layer,
the analysis layer, and the feedback (data) layer. Together these advances are changing how we
think about data in development, how we develop a deeper contextual understanding at a very
micro level without losing the ability to aggregate and generalize, and how we bring it all
together in a meaningful way to make better decisions.




                                                1
International
                                           institutions:
                   Public sector           World Bank,
                      data at             IMF, UN Data
                   various levels



                                    Priavte sector
                                    (proprietary)




               Base layer: Big and Open




                            Visualization




              Virtualization


                                        Analytics
                                         layer




                                                     Anonymous -
                                                         big
Purposive -   Feedback
   push
                layer
              "crowds"




                                                                   2
The Base Layer: Open Data and Big Data

How do we know what we know in the field of international development? What is the
information, or evidence base, who generates it and how? There are at least three main
collectors, sorters and repositories of development information: international institutions (UN,
World Bank, IMF etc), national and sub-national official public sector institutions and the private
sector. Change is afoot in each.

Take for instance the open data push by
international institutions such as the World Bank
and African Development Bank, or the UN’s Global
Pulse initiative. Opening up the World Bank’s
databank has made a huge amount of information
available to a wider range of stakeholders than ever
before, similarly Global Pulse is creating a platform
to harness new data streams. Groups such as
Development Gateway and Aid Data have sized the
opportunity to push openness further. A good example of what is made possible by this opening
is a tool like Development Loop which not only plots all World Bank and African Development
Bank projects at precise geographic locations across Africa but also overlays the same with
feedback sourced from the intended beneficiaries of the project or initiative.

                                              This full circle or loop is a powerful reminder of
                                              the importance of data transparency and universal
                                              standards. Opening up aid data in a standardized
                                              format will make geocoding a potent tool for real
                                              transparency and accountability.

                                             To appreciate what a game-changer open data can
                                             be examine the current state of affairs. Research
                                             conducted by UK based Publish What You Fund
                                             recently in Uganda aimed at simply tallying up
                                             financial resources available for development,
found that the government was unaware of the amount donors planned to spend in that year
(2006-07). The planned expenditure was more than double what the government was aware of;
indeed financial resources flowing into the country were far higher than had been estimated. Or
take another example, what the World Bank’s Chief Economist for Africa calls a “statistical
tragedy”. The majority of Africa’s population lives in countries that still use an outdated (1960s)
method of national income accounting used to generate fundamental data points such as Gross
Domestic Product (GDP). Ghana for instance only shifted to the 1993 UN system of national


                                                3
accounts last year. When they did so they found their GDP was 62 percent higher than
previously thought, catapulting the country to ‘middle income’ status.

If even the most basic information often taken for granted is riddled with problems then what do
we really know? Data reliability is one issue but time-lag is another. Most of the data used in
international development is stale by the time it is called upon in decision making. The
information base we rely on in international development needs to be bolstered by building
bridges with new sources and data-streams.

Opening up proprietary private sector data and exposing it to the concerns in international
development in the coming years will be a game changer. To date international institutions have
made the most progress towards data openness, select public sector authorities (for instance
under the purview of the Open Government Partnership) are also making progress, but it is the
private sector – the main repository of “big data” – that is the holy grail. If you total all the data
collected by the US Library of Congress (one of the largest public sector repositories) it would
be about 235 terabytes as of April 2011. Wal-Mart processes and stores about 2500 terabytes per
hour! The big data revolution is changing business models fundamentally, and like capital and
labour data itself has become a commercial driver and firms like Google, Facebook, Twitters are
the first “data factories”, pioneers much like their antecedents in the industrial revolution.

                                             This is truly big data and its growth has exploded off
                                             the charts thanks in large part to the explosion of
                                             mobile sensors (the most important of which is the
                                             mobile phone but think also credit cards, laptops,
                                             GPS and everything from radio-frequency to QR
                                             codes), and the rapid democratization of high-power
                                             analytics. Whether it is geocoded mobile phone data
                                             modeled to track slum development or predict
                                             microfinance loan defaults or provide weather-
                                             indexed insurance to small farmers big data is
                                             already a game-changer in development and we have
                                             only begun to scratch the surface.

The Analytics Layer: Virtualization, Visualization are driving Democratization

Analytics is simply the collection of tools and techniques used to make sense of data. High-
power analytics proved a game-changer in the commercial sector and can now do the same in the
social sector. At its core analytics is about unearthing and understanding relationships and
patterns. Analytics helped retailers discover unlikely trends, most famously that customers who
came in to buy diapers also tended to buy beer! It can do the same for complex social systems.


                                                  4
Developments in analytics have kept pace with the speed with which big data has grown.
Bringing this capacity to bear on development challenges such as food security and urbanization
is just getting started.

                                                      Before we look at some examples let us
                                                      put in perspective what we mean by the
                                                      explosion of mobile sensors. The mobile
                                                      phone is already the most celebrated
                                                      example; anyone who has visited Kenya
                                                      (or really any African country) has seen
                                                      vividly the power of mobile money. This
                                                      is old news. Anyone who has followed
                                                      elections in Kenya probably also heard of
                                                      Ushahidi, a locally developed open-source
                                                      platform to map reports of post-election
                                                      violence which went live in 2008. This is
                                                      also old news.

What is new is what is now made possible by bringing to bear requisite analytics on the huge
proliferation of mobile sensors. Mobile phones have grown from under 750million with more
than two thirds in developed countries to over 5billion with about four times as many in
developing countries as in the developed world. Of the 5billion about 1billion live on less than
$5 a day. The developing world is the leading driver of mobile big data. This includes voice,
text, financial, locational and positional information, which is now possible to overlay with the
base data layer described earlier (income, health, education and other indicators generated by
official sources) to produce new insights into real behaviour and complex incentive structures.

Sticking with Kenya, take the example of the
Engineering Social Systems lab. Coupling
terabytes of mobile phone data with Kenyan
census information ESS is modeling the growth
of slums to inform urban planners about where to
locate services such as water pumps and public
toilets. In Uganda the same group is developing
causal structures of food security, in Rwanda
they collected a sample of every phone-call over
a four year period, coupled with a random
survey, to analyze how different people react to the same economic shock. What is really
interesting is the way experimental initiatives are being brought out of the lab into real world



                                               5
application. The shift that is taking place is fundamental; away from models governing theory to
models informed and built on real networks (see Reality Mining).

For long development analysis at best has been limited to correlations and inferences based on
correlations, for the first time big data coupled with high-power analytics is opening up the
possibility if not of entirely causal dynamics then at least more robust inferences. Our traditional
methods of inquiry have conditioned us to think in terms of generalizing on the basis of random
sampling, for the first time the proliferation of mobile sensors is making possible highly targeted
yet nonintrusive and anonymous inquiry.

And this is not another story about mobile phones alone. The point is the rapid emergence of
altogether new data-streams, in step with the development of analytical capacity to draw useful
inference out of them. Take Twitter for instance which generates information about the size of
the entire US Library of Congress in two weeks and together with Facebook has already shown
its efficacy during the Arab uprisings. At the heart of this evolution are open-source software
systems and tools that allow the simultaneous collection, categorization, and analysis of various
data types from Twitter hashtags to videos to positional data and machine IDs. Swift River
developed by Ushahidi is an example of a free open-source platform that enables rapid
simultaneous filtering and verification of real-time data from channels like Twitter, SMS, Email
and others. It also visualizes the information in dashboards that the average user can understand.
This is particularly powerful for monitoring immediate post-crisis developments when the
information flow suddenly increases but is also only useful if immediately analyzed.

                                                      Democratization of analytics driven by a
                                                      commitment to open-source is furthered by
                                                      virtualization    of    platforms      and
                                                      visualization of information (to make it
                                                      engaging for the average user). An aspect
                                                      of virtualization is community or crowd
                                                      driven problem solving. Take for instance
                                                      Data Without Borders, a pro-bono data
                                                      scientist exchange. DWB organizes ‘data
                                                      dives’ to help NGOs, civil society
                                                      organizations and other who might not
                                                      have the time, capacity or inclination but
                                                      may be sitting on information useful for
purposes beyond their imagination, to make sense of their own information. At a recent data dive
DWB helped a human rights group that allows users to anonymously upload information about
violations get a better look at who was using the system and plot trends, without compromising



                                                 6
anonymity. Using open-source tools (such as R) they also visualized the information to make it
easy to understand.

Here are some in a fast growing toolkit worth following:

       Social network analysis: with the growing penetration of web 2.0 technologies, social
       media is becoming the dominant communication channel for rapid exchange. Network
       analysis includes a set of techniques used to characterize relationships among discrete
       nodes in a graph or a network. In social network analysis, connections between
       individuals in a community or organization are analyzed, e.g., how information travels, or
       who has the most influence over whom. Examples of applications include identifying key
       opinion leaders, and identifying bottlenecks in enterprise information flows. Two
       important techniques within network theory and analysis are exploratory data analysis
       (EDA) and link analysis. EDA is an approach for analyzing large datasets in summarized
       formats according to their main characteristics. EDA came about in part as a reaction to
       overdue emphasis in statistical fields on hypothesis testing or “confirmatory analysis”.
       EDA emphasizes using the data to suggest hypotheses to test. It emphasizes testing
       assumptions on which inference is based. Similarly link analysis focuses on analyzing
       relationships among nodes through visualization methods, including network diagrams
       and association matrices. Gephi is an interactive open source (java based) platform for
       complex systems and network analyses.

       Automated web-scraping: almost everything is on the web today, which means almost
       everything has a hypertext (HTTP) related identity. Web scraping is an automated
       technique for collecting information from the web. Web scraping transforms typically
       unstructured data in HTML format into structured data that can be centralized in a
       spreadsheet. Simply using MS Excel it is possible for instance to scrape tables and other
       data in a variety of unstructured formats (including real-time, e.g. weather information on
       world clock, or stock price info), and save them as a spreadsheet for further use. Web
       scraping, though somewhat mired in legal and privacy issues, has the potential to
       massively increase the volume of accessible information. For example UN Global Pulse
       is running a project with Price Stats and the Billion Prices Project at MIT, investigating
       daily bread prices across Latin American countries by scraping the web for online prices.
       The end product is a demonstration, an e-bread index which tracks bread price inflation
       real-time (daily), and can be compared contrasted or can complement the traditional
       consumer price index (which is only published on a monthly basis). UN Global Pulse is
       also pioneering Hunch Works, the first social network for hypothesis formation, evidence
       gathering and decision making. Hunch Works allows researchers to connect with other
       experts with complementary resources so that together they could quickly determine if
       data signals are indications of deepening crisis and warrant further investigation.


                                                7
World Bank – Adept software platform: ADePT is a free (STATA based) platform
       developed by the World Bank that automates and standardizes analysis. ADePT allows
       complex statistical analyses, and direct pre-configured access to a range of micro level
       data from the Bank’s and other sources. It is particularly useful for economic analysis and
       particularly useful in developing countries where researchers may not have ready access
       to otherwise expensive statistical packages.

       Hadoop – distributed parallel analytics: is big data is the latest buzz word in IT then
       Hadoop is a large part of the story behind it. Hadoop is a platform for distributing
       problems, tasks, analyses across a number of servers, speeding up analysis and shrinking
       the distance between data, analysis and result. Hadoop works behind things you know
       well but you have never seen it or are likely to know it. For example one of the most
       well-known implementations is Facebook, which brings in core data stored by you into
       Hadoop clusters where it is reflected against your friends, their interest to suggest
       recommendations back to you. High power distributed parallel analytics solutions like
       Hadoop help square the big data circle, make it small, in real-time.

The Feedback Layer: Deep Context, Complex Microsystems, Real-Time Loops

The efficacy of the feedback layer is also new. This layer has two key aspects: the purposive or
push driven response and the big anonymous response (discussed above). Targeted
crowdsourcing has already come a long way. The Ushahidi experience in Kenya for instance also
worked for monitoring elections in Afghanistan and tracking emergencies including the cholera
outbreak during the Haiti earthquake. Mobile phone SMS platforms have been adapted to make
participatory budgeting more inclusive in hard to reach areas such as conflict-affected South-
Kivu in the Democratic Republic of Congo and results have been encouraging.

To understand how powerful the feedback layer can be consider the experience of the Mobile
Accord, which at the initiative of the World Bank’s World Development Report 2011, ran Geo
Poll an SMS based targeted polling in the DRC. The poll asked 10 sensitive questions including
about topics such as rape and violence against women in conflict zone. The survey produced
1.2million text responses and the outputs were turned into a video “DRC Speaks” which
captured people’s responses to questions about their experiences in their own words. This ended
up being one of the largest surveys ever conducted in the country.




                                                8
There is a pattern here. In the base layer more and more (new and old) data is opening every day.
In the analytics layer experimental ideas are leaving the lab for real world application;
virtualization and visualization are helping foster new communities geared towards collaboration
and collective problem solving. Similarly in the feedback layer the tools are also democratizing.
Ushahidi has created a very easy to use version of their implementation called CrowdMap.
Anyone who knows how to set up an email account will be able to set up their own incident
mapping of whatever trend, alert or issue they are interested in getting feedback from the crowd
on. The service is already being used to track everything from citizen report cards assessing
corruption in India to the Syrian uprising to national emergencies on the tiny island of Samoa.

The feedback layer is also innovating in reverse – backwards from new to old media and formats
– something essential in international development where so much remains unmapped or in non-
digital formats. Take mapping. In the developing world there are still vast areas beyond GPS
coverage. But people in those communities have intimate knowledge of their surroundings. If
only there was a way to tap into that knowledge and build a bridge between that data source and
an open-source digital resource like Open Street Map. Walking Papers does precisely that.
Contributors can simply draw, by hand, on simple paper, say a map of their neighbourhood and
upload to Walking Papers, where a community specializes in taking that information and using it
to deepen Open Street Maps.

Looking Ahead

International development as a field of practice and research has tended to be a laggard in using
big data, powerful analytics and innovative sourcing techniques, and with good reasons.
However this is changing faster than anyone predicted and faster than most organizations that
‘do development’ are prepared for. The open data movement has widened access to a broad
range of basic contextual information. A similar push is needed to open private sector data in the
service of social good. Big data is beginning to have a big impact on how we think about
development challenges and this is because we have the ability to make it understandable like
never before. Powerful analytical tools and collaborative platforms are dramatically changing

                                                9
what is possible for even the most intractable challenges like understanding socioeconomic risks
and responses, dealing with urban planning, and better preparing for emergencies. For the first
time we have a feedback layer which has made possible deep and near real-time awareness of
what is working or not working where and why. Together big data, democratized analytics and
the ability to tap deep contexts will change the way we think and do development in the coming
years.




                                              10
Bibliography
Aid Data, Development Gateway, African Development Bank - Development Loop. Development Loop. n.d.
          http://www.aiddata.org/content/index/Maps/development-loop-app (accessed Dec 21, 2011).

Courtney, Alexa, and David Kilcullen. "Big data, small wars, local insights: Designing for development with conflict-affected
         communities." What Matters. McKinsey & Company, December 2, 2011.

Crowd Map. n.d. https://crowdmap.com/mhi.

Data Without Borders. n.d. http://datawithoutborders.cc/.

Davis, Steve, and Jonathan Bays. "Harnessing big data to address the world’s problems." What Matters. McKinsey & Company,
          November 2, 2011.

DRC Speaks (Geo Poll). n.d. http://www.youtube.com/watch?v=1VtaWPAtHyA.

Eagle, Nathan. Reality Mining. Dataset: http://reality.media.mit.edu/, Massachusetts: Massachusetts Institute of Technology,
          2009.

Engineering Social Systems. "Big Data for Social Good." Engineering Social Systems. n.d. http://ess.santafe.edu/bigdata.html
         (accessed December 21, 2011).

Jay Chen, Trishank Karthik, Lakshminaryanan Subramanian. Contextual Information Portals. Online: http://ai-d.org/, Association
          for the Advancement of Artificial Intelligence, n.d.

McKinsey Global Institute. Big data: The next frontier for innovation, competition, and productivity. McKinsey & Company, 2011.

Metha, Abhishek. Big Data: Powering the Next Industrial Revolution. White paper:
         http://www.tableausoftware.com/learn/whitepapers/big-data-revolution, Seattle: Tableau Software, n.d.

Priebatsch, Seth. The game layer on top of the world. Video online at:
          http://www.ted.com/talks/lang/en/seth_priebatsch_the_game_layer_on_top_of_the_world.html, Boston: Ted Talks,
          2010.

Publish What You Fund. "Aid Budgets in Uganda." Publish what you fund. n.d.
         http://www.publishwhatyoufund.org/resources/uganda/ (accessed December 21, 2011).

Shantayanan Devarajan. "Africa’s statistical tragedy." Africa Can End Poverty (blog of World Bank, Africa Chief Economist).
         October 6, 2011. http://blogs.worldbank.org/africacan/africa-s-statistical-tragedy (accessed October 6, 2011).

Swift River. n.d. http://ushahidi.com/products/swiftriver-platform.

UN Global Pulse. n.d. http://www.unglobalpulse.org/.

Ushahidi. n.d. http://ushahidi.com/.

Walking Papers. n.d. http://walking-papers.org/.




                                                               11

Más contenido relacionado

La actualidad más candente

Big Data, Social Networks & Human Behavior (Jukka-Pekka Onnela)
Big Data, Social Networks & Human Behavior (Jukka-Pekka Onnela)Big Data, Social Networks & Human Behavior (Jukka-Pekka Onnela)
Big Data, Social Networks & Human Behavior (Jukka-Pekka Onnela)UN Global Pulse
 
An era of game changing insight from Big Data
An era of game changing insight from Big DataAn era of game changing insight from Big Data
An era of game changing insight from Big DataIBM Government
 
Gov Transformation Through Public Data
Gov Transformation Through Public DataGov Transformation Through Public Data
Gov Transformation Through Public DataW. David Stephenson
 
Gimme my data: government transformation
Gimme my data: government transformationGimme my data: government transformation
Gimme my data: government transformationW. David Stephenson
 
End of year review/preview
End of year review/previewEnd of year review/preview
End of year review/previewNigelG
 
Data Innovation: Generating Climate Solutions Event
Data Innovation: Generating Climate Solutions EventData Innovation: Generating Climate Solutions Event
Data Innovation: Generating Climate Solutions EventUN Global Pulse
 
Sxsw presentation june
Sxsw presentation juneSxsw presentation june
Sxsw presentation juneNigelG
 
DFS use among digital Kenyans
DFS use among digital KenyansDFS use among digital Kenyans
DFS use among digital KenyansCaribou Data
 
Top Trends from SXSW Interactive 2014. The Big Roundup.
Top Trends from SXSW Interactive 2014. The Big Roundup.Top Trends from SXSW Interactive 2014. The Big Roundup.
Top Trends from SXSW Interactive 2014. The Big Roundup.Ashika Chauhan
 
How to Catch Frogs - The Impact of Disruptive Technology to African Travellers
How to Catch Frogs - The Impact of Disruptive Technology to African TravellersHow to Catch Frogs - The Impact of Disruptive Technology to African Travellers
How to Catch Frogs - The Impact of Disruptive Technology to African TravellersStephenie Rodriguez
 
FUTURE Perspective #11 trends newsletter
FUTURE Perspective #11 trends newsletterFUTURE Perspective #11 trends newsletter
FUTURE Perspective #11 trends newsletterElaine Cameron
 
Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution gngeorge
 
Remodista Leadership Session Insights Gained Q3 2012
Remodista Leadership Session   Insights Gained   Q3 2012Remodista Leadership Session   Insights Gained   Q3 2012
Remodista Leadership Session Insights Gained Q3 2012Wendi McGowan-Ellis
 
The implications of Big Data for BTS and COS
The implications of Big Data for BTS and COSThe implications of Big Data for BTS and COS
The implications of Big Data for BTS and COSGeorge Kershoff
 

La actualidad más candente (19)

Big Data, Social Networks & Human Behavior (Jukka-Pekka Onnela)
Big Data, Social Networks & Human Behavior (Jukka-Pekka Onnela)Big Data, Social Networks & Human Behavior (Jukka-Pekka Onnela)
Big Data, Social Networks & Human Behavior (Jukka-Pekka Onnela)
 
An era of game changing insight from Big Data
An era of game changing insight from Big DataAn era of game changing insight from Big Data
An era of game changing insight from Big Data
 
10 Overriding Themes from SXSW (March 2014)
10 Overriding Themes from SXSW (March 2014)10 Overriding Themes from SXSW (March 2014)
10 Overriding Themes from SXSW (March 2014)
 
Gov Transformation Through Public Data
Gov Transformation Through Public DataGov Transformation Through Public Data
Gov Transformation Through Public Data
 
Gimme my data: government transformation
Gimme my data: government transformationGimme my data: government transformation
Gimme my data: government transformation
 
End of year review/preview
End of year review/previewEnd of year review/preview
End of year review/preview
 
Data Innovation: Generating Climate Solutions Event
Data Innovation: Generating Climate Solutions EventData Innovation: Generating Climate Solutions Event
Data Innovation: Generating Climate Solutions Event
 
Sxsw presentation june
Sxsw presentation juneSxsw presentation june
Sxsw presentation june
 
SoDA 2010 Digital Marketing Outlook
SoDA 2010 Digital Marketing OutlookSoDA 2010 Digital Marketing Outlook
SoDA 2010 Digital Marketing Outlook
 
DFS use among digital Kenyans
DFS use among digital KenyansDFS use among digital Kenyans
DFS use among digital Kenyans
 
Top Trends from SXSW Interactive 2014. The Big Roundup.
Top Trends from SXSW Interactive 2014. The Big Roundup.Top Trends from SXSW Interactive 2014. The Big Roundup.
Top Trends from SXSW Interactive 2014. The Big Roundup.
 
How to Catch Frogs - The Impact of Disruptive Technology to African Travellers
How to Catch Frogs - The Impact of Disruptive Technology to African TravellersHow to Catch Frogs - The Impact of Disruptive Technology to African Travellers
How to Catch Frogs - The Impact of Disruptive Technology to African Travellers
 
FUTURE Perspective #11 trends newsletter
FUTURE Perspective #11 trends newsletterFUTURE Perspective #11 trends newsletter
FUTURE Perspective #11 trends newsletter
 
Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution
 
Remodista Leadership Session Insights Gained Q3 2012
Remodista Leadership Session   Insights Gained   Q3 2012Remodista Leadership Session   Insights Gained   Q3 2012
Remodista Leadership Session Insights Gained Q3 2012
 
The implications of Big Data for BTS and COS
The implications of Big Data for BTS and COSThe implications of Big Data for BTS and COS
The implications of Big Data for BTS and COS
 
Big datafordevelopment un-globalpulsejune2012
Big datafordevelopment un-globalpulsejune2012Big datafordevelopment un-globalpulsejune2012
Big datafordevelopment un-globalpulsejune2012
 
iX_POV_FinalDraft
iX_POV_FinalDraftiX_POV_FinalDraft
iX_POV_FinalDraft
 
iX_POV_FinalDraft
iX_POV_FinalDraftiX_POV_FinalDraft
iX_POV_FinalDraft
 

Similar a Big data, democratized analytics and deep context,

Big Data for Development: Opportunities and Challenges, Summary Slidedeck
Big Data for Development: Opportunities and Challenges, Summary SlidedeckBig Data for Development: Opportunities and Challenges, Summary Slidedeck
Big Data for Development: Opportunities and Challenges, Summary SlidedeckUN Global Pulse
 
Big Data For Development A Primer
Big Data For Development A PrimerBig Data For Development A Primer
Big Data For Development A PrimerUN Global Pulse
 
The future of real time information
The future of real time informationThe future of real time information
The future of real time informationthaiscarbonell1512
 
Intuit 2020 Report: The New Data Democracy
Intuit 2020 Report: The New Data DemocracyIntuit 2020 Report: The New Data Democracy
Intuit 2020 Report: The New Data DemocracyIntuit Inc.
 
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Katie Whipkey
 
Opportunities and Challenges in Crisis Informatics
Opportunities and Challenges in Crisis InformaticsOpportunities and Challenges in Crisis Informatics
Opportunities and Challenges in Crisis InformaticsLea Shanley
 
Big data and development
Big data and developmentBig data and development
Big data and developmentSimone Sala
 
Big data for development
Big data for development Big data for development
Big data for development Junaid Qadir
 
23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunisha23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunishaShivlal Mewada
 
PSFK Future Of Real-Time Information
PSFK Future Of Real-Time InformationPSFK Future Of Real-Time Information
PSFK Future Of Real-Time InformationPSFK
 
Global Pulse Magazine - Fall 2011
Global Pulse Magazine - Fall 2011Global Pulse Magazine - Fall 2011
Global Pulse Magazine - Fall 2011UN Global Pulse
 
Global pulse technology summary
Global pulse technology summaryGlobal pulse technology summary
Global pulse technology summarySara-Jayne Terp
 
WEF - Personal Data New Asset Report2011
WEF - Personal Data New Asset Report2011WEF - Personal Data New Asset Report2011
WEF - Personal Data New Asset Report2011Vincent Ducrey
 
Challenges facing Information and Records Management Professionals
Challenges facing Information and Records Management  ProfessionalsChallenges facing Information and Records Management  Professionals
Challenges facing Information and Records Management ProfessionalsCollabor8now Ltd
 
Big data for the next generation of event companies
Big data for the next generation of event companiesBig data for the next generation of event companies
Big data for the next generation of event companiesRaj Anand
 

Similar a Big data, democratized analytics and deep context, (20)

Big Data for Development: Opportunities and Challenges, Summary Slidedeck
Big Data for Development: Opportunities and Challenges, Summary SlidedeckBig Data for Development: Opportunities and Challenges, Summary Slidedeck
Big Data for Development: Opportunities and Challenges, Summary Slidedeck
 
Big Data For Development A Primer
Big Data For Development A PrimerBig Data For Development A Primer
Big Data For Development A Primer
 
The future of real time information
The future of real time informationThe future of real time information
The future of real time information
 
Intuit 2020 Report: The New Data Democracy
Intuit 2020 Report: The New Data DemocracyIntuit 2020 Report: The New Data Democracy
Intuit 2020 Report: The New Data Democracy
 
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
 
Opportunities and Challenges in Crisis Informatics
Opportunities and Challenges in Crisis InformaticsOpportunities and Challenges in Crisis Informatics
Opportunities and Challenges in Crisis Informatics
 
Big Data-Job 2
Big Data-Job 2Big Data-Job 2
Big Data-Job 2
 
data, big data, open data
data, big data, open datadata, big data, open data
data, big data, open data
 
Global pulse technology
Global pulse technologyGlobal pulse technology
Global pulse technology
 
Big data Paper
Big data PaperBig data Paper
Big data Paper
 
Big data and development
Big data and developmentBig data and development
Big data and development
 
Future trends jan12 final
Future trends jan12 finalFuture trends jan12 final
Future trends jan12 final
 
Big data for development
Big data for development Big data for development
Big data for development
 
23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunisha23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunisha
 
PSFK Future Of Real-Time Information
PSFK Future Of Real-Time InformationPSFK Future Of Real-Time Information
PSFK Future Of Real-Time Information
 
Global Pulse Magazine - Fall 2011
Global Pulse Magazine - Fall 2011Global Pulse Magazine - Fall 2011
Global Pulse Magazine - Fall 2011
 
Global pulse technology summary
Global pulse technology summaryGlobal pulse technology summary
Global pulse technology summary
 
WEF - Personal Data New Asset Report2011
WEF - Personal Data New Asset Report2011WEF - Personal Data New Asset Report2011
WEF - Personal Data New Asset Report2011
 
Challenges facing Information and Records Management Professionals
Challenges facing Information and Records Management  ProfessionalsChallenges facing Information and Records Management  Professionals
Challenges facing Information and Records Management Professionals
 
Big data for the next generation of event companies
Big data for the next generation of event companiesBig data for the next generation of event companies
Big data for the next generation of event companies
 

Big data, democratized analytics and deep context,

  • 1. Big Data, Democratized Analytics and Deep Context will Change How We Think and Do Development Aniket Bhushan, Senior Researcher, The North-South Institute, abhushan@nsi-ins.ca Dec 21, 2011 International development as a field of research and practice has been more of a laggard than a leader in using big data and powerful analytics. Much of the data is often dated or of poor quality. Huge areas including those of the greatest interest remain entirely unmapped, data poor or otherwise poorly understood. This situation is changing faster than anyone predicted and the set of tools driving this evolution represents one of the most important trends in international development. The proliferation of mobile technologies, computing power and democratization of analytics within an open-source, open-data chapeau will fundamentally change the way we think about and do development. I provide a synopsis of the most impactful developments in three key areas: the base (data) layer, the analysis layer, and the feedback (data) layer. Together these advances are changing how we think about data in development, how we develop a deeper contextual understanding at a very micro level without losing the ability to aggregate and generalize, and how we bring it all together in a meaningful way to make better decisions. 1
  • 2. International institutions: Public sector World Bank, data at IMF, UN Data various levels Priavte sector (proprietary) Base layer: Big and Open Visualization Virtualization Analytics layer Anonymous - big Purposive - Feedback push layer "crowds" 2
  • 3. The Base Layer: Open Data and Big Data How do we know what we know in the field of international development? What is the information, or evidence base, who generates it and how? There are at least three main collectors, sorters and repositories of development information: international institutions (UN, World Bank, IMF etc), national and sub-national official public sector institutions and the private sector. Change is afoot in each. Take for instance the open data push by international institutions such as the World Bank and African Development Bank, or the UN’s Global Pulse initiative. Opening up the World Bank’s databank has made a huge amount of information available to a wider range of stakeholders than ever before, similarly Global Pulse is creating a platform to harness new data streams. Groups such as Development Gateway and Aid Data have sized the opportunity to push openness further. A good example of what is made possible by this opening is a tool like Development Loop which not only plots all World Bank and African Development Bank projects at precise geographic locations across Africa but also overlays the same with feedback sourced from the intended beneficiaries of the project or initiative. This full circle or loop is a powerful reminder of the importance of data transparency and universal standards. Opening up aid data in a standardized format will make geocoding a potent tool for real transparency and accountability. To appreciate what a game-changer open data can be examine the current state of affairs. Research conducted by UK based Publish What You Fund recently in Uganda aimed at simply tallying up financial resources available for development, found that the government was unaware of the amount donors planned to spend in that year (2006-07). The planned expenditure was more than double what the government was aware of; indeed financial resources flowing into the country were far higher than had been estimated. Or take another example, what the World Bank’s Chief Economist for Africa calls a “statistical tragedy”. The majority of Africa’s population lives in countries that still use an outdated (1960s) method of national income accounting used to generate fundamental data points such as Gross Domestic Product (GDP). Ghana for instance only shifted to the 1993 UN system of national 3
  • 4. accounts last year. When they did so they found their GDP was 62 percent higher than previously thought, catapulting the country to ‘middle income’ status. If even the most basic information often taken for granted is riddled with problems then what do we really know? Data reliability is one issue but time-lag is another. Most of the data used in international development is stale by the time it is called upon in decision making. The information base we rely on in international development needs to be bolstered by building bridges with new sources and data-streams. Opening up proprietary private sector data and exposing it to the concerns in international development in the coming years will be a game changer. To date international institutions have made the most progress towards data openness, select public sector authorities (for instance under the purview of the Open Government Partnership) are also making progress, but it is the private sector – the main repository of “big data” – that is the holy grail. If you total all the data collected by the US Library of Congress (one of the largest public sector repositories) it would be about 235 terabytes as of April 2011. Wal-Mart processes and stores about 2500 terabytes per hour! The big data revolution is changing business models fundamentally, and like capital and labour data itself has become a commercial driver and firms like Google, Facebook, Twitters are the first “data factories”, pioneers much like their antecedents in the industrial revolution. This is truly big data and its growth has exploded off the charts thanks in large part to the explosion of mobile sensors (the most important of which is the mobile phone but think also credit cards, laptops, GPS and everything from radio-frequency to QR codes), and the rapid democratization of high-power analytics. Whether it is geocoded mobile phone data modeled to track slum development or predict microfinance loan defaults or provide weather- indexed insurance to small farmers big data is already a game-changer in development and we have only begun to scratch the surface. The Analytics Layer: Virtualization, Visualization are driving Democratization Analytics is simply the collection of tools and techniques used to make sense of data. High- power analytics proved a game-changer in the commercial sector and can now do the same in the social sector. At its core analytics is about unearthing and understanding relationships and patterns. Analytics helped retailers discover unlikely trends, most famously that customers who came in to buy diapers also tended to buy beer! It can do the same for complex social systems. 4
  • 5. Developments in analytics have kept pace with the speed with which big data has grown. Bringing this capacity to bear on development challenges such as food security and urbanization is just getting started. Before we look at some examples let us put in perspective what we mean by the explosion of mobile sensors. The mobile phone is already the most celebrated example; anyone who has visited Kenya (or really any African country) has seen vividly the power of mobile money. This is old news. Anyone who has followed elections in Kenya probably also heard of Ushahidi, a locally developed open-source platform to map reports of post-election violence which went live in 2008. This is also old news. What is new is what is now made possible by bringing to bear requisite analytics on the huge proliferation of mobile sensors. Mobile phones have grown from under 750million with more than two thirds in developed countries to over 5billion with about four times as many in developing countries as in the developed world. Of the 5billion about 1billion live on less than $5 a day. The developing world is the leading driver of mobile big data. This includes voice, text, financial, locational and positional information, which is now possible to overlay with the base data layer described earlier (income, health, education and other indicators generated by official sources) to produce new insights into real behaviour and complex incentive structures. Sticking with Kenya, take the example of the Engineering Social Systems lab. Coupling terabytes of mobile phone data with Kenyan census information ESS is modeling the growth of slums to inform urban planners about where to locate services such as water pumps and public toilets. In Uganda the same group is developing causal structures of food security, in Rwanda they collected a sample of every phone-call over a four year period, coupled with a random survey, to analyze how different people react to the same economic shock. What is really interesting is the way experimental initiatives are being brought out of the lab into real world 5
  • 6. application. The shift that is taking place is fundamental; away from models governing theory to models informed and built on real networks (see Reality Mining). For long development analysis at best has been limited to correlations and inferences based on correlations, for the first time big data coupled with high-power analytics is opening up the possibility if not of entirely causal dynamics then at least more robust inferences. Our traditional methods of inquiry have conditioned us to think in terms of generalizing on the basis of random sampling, for the first time the proliferation of mobile sensors is making possible highly targeted yet nonintrusive and anonymous inquiry. And this is not another story about mobile phones alone. The point is the rapid emergence of altogether new data-streams, in step with the development of analytical capacity to draw useful inference out of them. Take Twitter for instance which generates information about the size of the entire US Library of Congress in two weeks and together with Facebook has already shown its efficacy during the Arab uprisings. At the heart of this evolution are open-source software systems and tools that allow the simultaneous collection, categorization, and analysis of various data types from Twitter hashtags to videos to positional data and machine IDs. Swift River developed by Ushahidi is an example of a free open-source platform that enables rapid simultaneous filtering and verification of real-time data from channels like Twitter, SMS, Email and others. It also visualizes the information in dashboards that the average user can understand. This is particularly powerful for monitoring immediate post-crisis developments when the information flow suddenly increases but is also only useful if immediately analyzed. Democratization of analytics driven by a commitment to open-source is furthered by virtualization of platforms and visualization of information (to make it engaging for the average user). An aspect of virtualization is community or crowd driven problem solving. Take for instance Data Without Borders, a pro-bono data scientist exchange. DWB organizes ‘data dives’ to help NGOs, civil society organizations and other who might not have the time, capacity or inclination but may be sitting on information useful for purposes beyond their imagination, to make sense of their own information. At a recent data dive DWB helped a human rights group that allows users to anonymously upload information about violations get a better look at who was using the system and plot trends, without compromising 6
  • 7. anonymity. Using open-source tools (such as R) they also visualized the information to make it easy to understand. Here are some in a fast growing toolkit worth following: Social network analysis: with the growing penetration of web 2.0 technologies, social media is becoming the dominant communication channel for rapid exchange. Network analysis includes a set of techniques used to characterize relationships among discrete nodes in a graph or a network. In social network analysis, connections between individuals in a community or organization are analyzed, e.g., how information travels, or who has the most influence over whom. Examples of applications include identifying key opinion leaders, and identifying bottlenecks in enterprise information flows. Two important techniques within network theory and analysis are exploratory data analysis (EDA) and link analysis. EDA is an approach for analyzing large datasets in summarized formats according to their main characteristics. EDA came about in part as a reaction to overdue emphasis in statistical fields on hypothesis testing or “confirmatory analysis”. EDA emphasizes using the data to suggest hypotheses to test. It emphasizes testing assumptions on which inference is based. Similarly link analysis focuses on analyzing relationships among nodes through visualization methods, including network diagrams and association matrices. Gephi is an interactive open source (java based) platform for complex systems and network analyses. Automated web-scraping: almost everything is on the web today, which means almost everything has a hypertext (HTTP) related identity. Web scraping is an automated technique for collecting information from the web. Web scraping transforms typically unstructured data in HTML format into structured data that can be centralized in a spreadsheet. Simply using MS Excel it is possible for instance to scrape tables and other data in a variety of unstructured formats (including real-time, e.g. weather information on world clock, or stock price info), and save them as a spreadsheet for further use. Web scraping, though somewhat mired in legal and privacy issues, has the potential to massively increase the volume of accessible information. For example UN Global Pulse is running a project with Price Stats and the Billion Prices Project at MIT, investigating daily bread prices across Latin American countries by scraping the web for online prices. The end product is a demonstration, an e-bread index which tracks bread price inflation real-time (daily), and can be compared contrasted or can complement the traditional consumer price index (which is only published on a monthly basis). UN Global Pulse is also pioneering Hunch Works, the first social network for hypothesis formation, evidence gathering and decision making. Hunch Works allows researchers to connect with other experts with complementary resources so that together they could quickly determine if data signals are indications of deepening crisis and warrant further investigation. 7
  • 8. World Bank – Adept software platform: ADePT is a free (STATA based) platform developed by the World Bank that automates and standardizes analysis. ADePT allows complex statistical analyses, and direct pre-configured access to a range of micro level data from the Bank’s and other sources. It is particularly useful for economic analysis and particularly useful in developing countries where researchers may not have ready access to otherwise expensive statistical packages. Hadoop – distributed parallel analytics: is big data is the latest buzz word in IT then Hadoop is a large part of the story behind it. Hadoop is a platform for distributing problems, tasks, analyses across a number of servers, speeding up analysis and shrinking the distance between data, analysis and result. Hadoop works behind things you know well but you have never seen it or are likely to know it. For example one of the most well-known implementations is Facebook, which brings in core data stored by you into Hadoop clusters where it is reflected against your friends, their interest to suggest recommendations back to you. High power distributed parallel analytics solutions like Hadoop help square the big data circle, make it small, in real-time. The Feedback Layer: Deep Context, Complex Microsystems, Real-Time Loops The efficacy of the feedback layer is also new. This layer has two key aspects: the purposive or push driven response and the big anonymous response (discussed above). Targeted crowdsourcing has already come a long way. The Ushahidi experience in Kenya for instance also worked for monitoring elections in Afghanistan and tracking emergencies including the cholera outbreak during the Haiti earthquake. Mobile phone SMS platforms have been adapted to make participatory budgeting more inclusive in hard to reach areas such as conflict-affected South- Kivu in the Democratic Republic of Congo and results have been encouraging. To understand how powerful the feedback layer can be consider the experience of the Mobile Accord, which at the initiative of the World Bank’s World Development Report 2011, ran Geo Poll an SMS based targeted polling in the DRC. The poll asked 10 sensitive questions including about topics such as rape and violence against women in conflict zone. The survey produced 1.2million text responses and the outputs were turned into a video “DRC Speaks” which captured people’s responses to questions about their experiences in their own words. This ended up being one of the largest surveys ever conducted in the country. 8
  • 9. There is a pattern here. In the base layer more and more (new and old) data is opening every day. In the analytics layer experimental ideas are leaving the lab for real world application; virtualization and visualization are helping foster new communities geared towards collaboration and collective problem solving. Similarly in the feedback layer the tools are also democratizing. Ushahidi has created a very easy to use version of their implementation called CrowdMap. Anyone who knows how to set up an email account will be able to set up their own incident mapping of whatever trend, alert or issue they are interested in getting feedback from the crowd on. The service is already being used to track everything from citizen report cards assessing corruption in India to the Syrian uprising to national emergencies on the tiny island of Samoa. The feedback layer is also innovating in reverse – backwards from new to old media and formats – something essential in international development where so much remains unmapped or in non- digital formats. Take mapping. In the developing world there are still vast areas beyond GPS coverage. But people in those communities have intimate knowledge of their surroundings. If only there was a way to tap into that knowledge and build a bridge between that data source and an open-source digital resource like Open Street Map. Walking Papers does precisely that. Contributors can simply draw, by hand, on simple paper, say a map of their neighbourhood and upload to Walking Papers, where a community specializes in taking that information and using it to deepen Open Street Maps. Looking Ahead International development as a field of practice and research has tended to be a laggard in using big data, powerful analytics and innovative sourcing techniques, and with good reasons. However this is changing faster than anyone predicted and faster than most organizations that ‘do development’ are prepared for. The open data movement has widened access to a broad range of basic contextual information. A similar push is needed to open private sector data in the service of social good. Big data is beginning to have a big impact on how we think about development challenges and this is because we have the ability to make it understandable like never before. Powerful analytical tools and collaborative platforms are dramatically changing 9
  • 10. what is possible for even the most intractable challenges like understanding socioeconomic risks and responses, dealing with urban planning, and better preparing for emergencies. For the first time we have a feedback layer which has made possible deep and near real-time awareness of what is working or not working where and why. Together big data, democratized analytics and the ability to tap deep contexts will change the way we think and do development in the coming years. 10
  • 11. Bibliography Aid Data, Development Gateway, African Development Bank - Development Loop. Development Loop. n.d. http://www.aiddata.org/content/index/Maps/development-loop-app (accessed Dec 21, 2011). Courtney, Alexa, and David Kilcullen. "Big data, small wars, local insights: Designing for development with conflict-affected communities." What Matters. McKinsey & Company, December 2, 2011. Crowd Map. n.d. https://crowdmap.com/mhi. Data Without Borders. n.d. http://datawithoutborders.cc/. Davis, Steve, and Jonathan Bays. "Harnessing big data to address the world’s problems." What Matters. McKinsey & Company, November 2, 2011. DRC Speaks (Geo Poll). n.d. http://www.youtube.com/watch?v=1VtaWPAtHyA. Eagle, Nathan. Reality Mining. Dataset: http://reality.media.mit.edu/, Massachusetts: Massachusetts Institute of Technology, 2009. Engineering Social Systems. "Big Data for Social Good." Engineering Social Systems. n.d. http://ess.santafe.edu/bigdata.html (accessed December 21, 2011). Jay Chen, Trishank Karthik, Lakshminaryanan Subramanian. Contextual Information Portals. Online: http://ai-d.org/, Association for the Advancement of Artificial Intelligence, n.d. McKinsey Global Institute. Big data: The next frontier for innovation, competition, and productivity. McKinsey & Company, 2011. Metha, Abhishek. Big Data: Powering the Next Industrial Revolution. White paper: http://www.tableausoftware.com/learn/whitepapers/big-data-revolution, Seattle: Tableau Software, n.d. Priebatsch, Seth. The game layer on top of the world. Video online at: http://www.ted.com/talks/lang/en/seth_priebatsch_the_game_layer_on_top_of_the_world.html, Boston: Ted Talks, 2010. Publish What You Fund. "Aid Budgets in Uganda." Publish what you fund. n.d. http://www.publishwhatyoufund.org/resources/uganda/ (accessed December 21, 2011). Shantayanan Devarajan. "Africa’s statistical tragedy." Africa Can End Poverty (blog of World Bank, Africa Chief Economist). October 6, 2011. http://blogs.worldbank.org/africacan/africa-s-statistical-tragedy (accessed October 6, 2011). Swift River. n.d. http://ushahidi.com/products/swiftriver-platform. UN Global Pulse. n.d. http://www.unglobalpulse.org/. Ushahidi. n.d. http://ushahidi.com/. Walking Papers. n.d. http://walking-papers.org/. 11