SlideShare una empresa de Scribd logo
1 de 161
Descargar para leer sin conexión
This issue is provided by
the Johns Hopkins University Press Journals Division
and powered by Project MUSE®
Terms and Conditions of Use
Thank you for purchasing this Electronic J-Issue from the Journals Division of the Johns
Hopkins University Press. We ask that you respect the rights of the copyright holder by
adhering to the following usage guidelines:
This issue is for your personal, noncommercial use only. Individual articles from this J-
Issue may be printed and stored on you personal computer.
You may not redistribute, resell, or license any part of the issue.
You may not post any part of the issue on any web site without the written permission of
the copyright holder.
You may not alter or transform the content in any manner that would violate the rights of
the copyright holder.
Sharing of personal account information, logins, and passwords is not permitted.
1ForewordSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)
1© 2014 by The Johns Hopkins University Press
Foreword
Following the exposure of the U.S. National Security Administration’s
(NSA) controversial surveillance program, there has been heated debate
surrounding the collection and storage of personal data. Our latest issue of
The SAIS Review of International Affairs, “Policy by Numbers: How Big Data
is Transforming Security, Governance, and Development,” seeks to move
beyond the sensationalism that has accompanied the NSA revelations. We
hope to provide readers a more nuanced perspective on the role of data
in international affairs, with a diverse collection of interviews, essays, and
opinion editorials from scholars, technologists, and policymakers.
We explore the rise of big data, in which governments and profit-
seeking organizations make policies and predictions based upon correla-
tions among massive quantities of data. We examine the trend toward open
data, in which governments provide valuable datasets directly to the public.
We assess the impact of data—positive and negative, international and do-
mestic—on public policy, national security, international development, and
individual well-being.
While the rise of big and open data is associated with promising ap-
plications, there are still vast uncertainties regarding how best to exploit
this technology. We hope that readers from the academic and public policy
communities will feel empowered to enhance their understanding of techni-
cal tools and data analysis, in an age where technological innovation often
outpaces government policy.
We begin with a conversation with Robert Kirkpatrick, Director of
the United Nations Global Pulse Initiative. The UN Global Pulse Initiative
collects and analyzes real-time data to better protect populations from
socioeconomic shocks. Kirkpatrick explores the challenges associated with
big data analytics, the surprising correlations among seemingly unrelated
datasets, and the initiative’s effort to predict food price crises with data
from social media.
Human rights data often impacts policy decisions. The next three
articles explore the opportunities and risks associated with collecting and
analyzing this sensitive information. Megan Price and Patrick Ball use
case studies of violent conflicts in Syria and Iraq to evaluate data-gathering
methodologies in conflict scenarios. They warn that datasets from conflict
scenarios are often subject to bias, and should not be used in isolation
to draw conclusions. Monti Narayan Datta argues that the collection of
quantitative data on modern day slavery has generated discussion in media
and among policymakers on how to mitigate and eradicate slavery. Our
interview with Arch Puddington, Vice President for Research at Freedom
House, discusses worldwide trends in freedom, and the impact of Freedom
House’s annual reports and indices.
2 SAIS Review Winter–Spring 2014
The rise of big and open data has a powerful impact on government
policymaking. Pongkwan Sawasdipakdi frames data as an information
weapon in the context of Thai domestic politics. She examines the govern-
ment’s rice-pledging scheme, and argues that contrasting datasets from the
government and opposition parties are used to gain political power and
credibility. Ian Kalin describes the theory and practice of open data policy
in the United States, and explains how government leaders can replicate
successful open data initiatives. Joel Gurin argues that open government
and open data can improve economic growth, transparency, and citizen
engagement. He also notes the obstacles for implementing open data initia-
tives in developing countries.
How can policymakers craft policies and frameworks that best take
advantage of big data? Given the fast pace of innovation and the slow pace
of policy, Kord Davis discusses how to bridge the gap between policymak-
ers and innovators. He identifies spaces where the public and private sector
can collaborate to produce effective and balanced policy. Aniket Bhushan
argues that the rise of big data and open data has created an opportunity
for disruptive innovation in international affairs. He offers examples related
to real-time macroeconomic analysis, humanitarian response, and poverty
measurement.
Data impacts national security and individual privacy, as well. Chris
Poulin outlines the processes of data collection and analysis, using case
studies from the Arab Spring, medical risk analysis, and his work at the
Durkheim Project, a data analysis initiative that seeks to predict and prevent
veteran suicides. David Rubin, Kim Lynch, Jason Escaravage, and Hillary
Lerner explain how to balance the opposing forces of opportunity and risk,
collective security and individual privacy, and innovation and protection
when using data for national security programs.
Finally, we look to China for lessons on data infrastructure. Eric Hagt
traces the history of China’s satellite navigation system, Beidou, and com-
pares its potential as a tool for development versus domestic and national
security. Margaret Ross statistically analyzes the risk of potential disrup-
tions to the global undersea cable communications network.
We conclude with analyses of influential literature and scholarly re-
search. Ilaria Mazzocco reviews Big Data: A Revolution That Will Transform
How We Live, Work, and Think by Viktor Mayer-Schönberger and Kenneth
Cukier. Bartholomew Thanhauser reviews Evgeny Morozov’s To Save Ev-
erything, Click Here: The Folly of Technological Solutionism.
We would like to thank our advisory board for their guidance in shap-
ing our exploration of data, our excellent editorial staff for their dedication
and persistence, and our authors for their thoughtful work on complex
global challenges. Their combined contributions made the publication of
“Policy by Numbers” possible.
Meghan Kleinsteiber Lauren Caldwell
Editor-in-Chief Senior Editor
3A Conversation with Robert KirkpatrickSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)
3© 2014 by The Johns Hopkins University Press
A Conversation with Robert
Kirkpatrick, Director of United
Nations Global Pulse
You are director of United Nations Global Pulse, an initiative to
leverage real-time data and analytics to monitor impacts of inter-
national and local shocks. How did the idea of Global Pulse come
about? What is your mission statement?
The initial idea of Global Pulse came about in the aftermath of the global
financial crisis. There was a recognition that we live in a hyper-connected
world where information moves at the speed of light, and crises and vul-
nerabilities can emerge quickly, but we’re still using two- to three-year-
old statistics to make most policy decisions. It was clear that there were
swathes of people being pushed below the poverty line almost overnight,
and we needed to modernize
our systems and capacities for
absorbing real-time informa-
tion for decision-making.
As a result, United Nations
Secretary-General Ban Ki-moon
established Global Pulse in 2009
to act as an innovation lab and
catalyst for the United Nations.
We bring together global de-
velopment experts, as well as
experts from academia and the
private sector, to explore how
analysis of big data can reveal faster insights about human well-being and
emerging vulnerabilities, in order to better protect populations from hunger,
poverty, and disease.
So Global Pulse’s mission is to accelerate the use of data science for
sustainable development and humanitarian action, to address systemic bar-
riers to adoption, and to cultivate a robust innovation ecosystem.
Robert Kirkpatrick is the director of UN Global Pulse, an initiative of the Executive
Office of the United Nations Secretary-General. The Global Pulse initiative explores
how Big Data and real-time analytics technologies can power a more agile approach to
sustainable development.
We . . . explore how digital data
sources and real-time analytics
technologies can help reveal
insights about human well-being
and emerging vulnerabilities,
in order to better protect
populations from shocks.
4 SAIS Review Winter–Spring 2014
As the New York Times wrote in its August 2013 profile of Global Pulse,
the United Nations is often perceived as a “sprawling bureaucracy.”
What makes the Global Pulse team unique? What qualities—personal
and professional—do you seek in a team member?
Global Pulse is unique because we have an “intrapreneurial” approach. It
requires risk-taking and innovation to discover and generate new tools,
techniques, and methodologies to help the UN system and wider community
leverage new sources of real-time information and insights in the service
of humanitarian response and development work. This also requires a real
blend of expertise from within and outside of the UN. Due to the experi-
mental nature of our work, we are set up as a network of labs.
We have multidisciplinary teams working at our Pulse Labs in New
York, Jakarta, and Kampala that include data scientists and analysts, social
scientists, legal experts, and communications and partnerships specialists.
Pulse Lab teams design, scope, and co-create projects with UN agencies
and national institutions that provide sectoral expertise, and with private
sector or academic partners who provide access to data or analytical and
engineering tools.
When building a team, I look for “T-shaped people”—that is, people with
a broad range of skills and a flexible attitude, as well as deep knowledge of
one discipline, whether it is data science, design, partnership management,
or legal and privacy matters.
From what range of sources do you derive the data used for your
analyses? Which datasets do you consider to be the most unique or
surprising? What are the challenges associated with data collection
and analysis?
Global Pulse is interested in trends that reveal something about human
well-being, which can be revealed from data produced by people as they
go about their daily lives (sometimes known as “data exhaust”). Broadly
speaking, we have been exploring two types of data in the Pulse Labs. The
first is data that reflects “what people say,” which includes publicly available
content from the open web, such as tweets, blog posts, news stories, and so
forth. The second is data that reflects “what people do,” which can include
information routinely generated for business intelligence and to optimize
sales in a private sector company. An example of “what people do” data is
anonymized mobile phone traffic information, which can reveal everything
from footfall in a shopping district during rush hour to how a population
migrates after a natural catastrophe.
A dataset that may be surprising is postal data (the traffic and volume
of packages being shipped), which can be used as a proxy for GDP and eco-
nomic activity in a country or region. We are beginning a series of research
projects with the Universal Postal Union (UPU), the United Nations special-
ized agency for the postal sector, to explore this relationship further.
There are several challenges associated with moving this kind of
analysis out of an innovation lab and into practice, including the need to
5A Conversation with Robert Kirkpatrick
build skills and capacity around data science, the formation of sustain-
able partnerships with potential data providers in the private sector, and
identifying where new data and
insights can fit into the planning
and decision-making processes.
And most importantly, we must
take data protection and privacy
norms, policies, and techniques
to a new level to mitigate the po-
tential for misuse. Our mission
to find responsible ways of using
big data for global development
purposes does not include analyzing private or confidential information.
We follow, and advocate for, robust privacy protection principles.
Does Global Pulse focus on certain sectors? If so, why?
Although we can and do work with any part of the UN system that has a de-
velopment problem that data science might contribute to solving, there are
certain areas that are particularly well-suited to big data analysis. This year,
we will focus in particular on public health, including attitudes to health
as expressed on social media, news media, and patterns in anonymized
search data. For example, in partnership with the World Health Organiza-
tion, we are exploring whether early warning of non-communicable disease
risk factors in a country or community could be understood via analysis of
key words in social media data. We continue to look at parental attitudes
to immunizing children as expressed on social media, in order to address
misinformation that stops parents from protecting their children against
preventable diseases.
Another research priority is food security. In Indonesia, our Pulse Lab
Jakarta research team is exploring whether big data can provide insights
about the impacts of food price changes, in order to support the social pro-
tection policies of the government of Indonesia. Other areas of focus this
year include supporting humanitarian action through new data analytics
techniques, finding new ways to measure economic well-being, and using
digital data mining to help shape the priority development agenda that will
replace the Millennium Development Goals after they expire in 2015. Across
all sectors, though, Global Pulse conducts a range of activities to strengthen
the big data for development (BD4D) ecosystem by guiding the development
of regulatory frameworks and technical standards to address data-sharing
and privacy protection challenges. We support an emerging community of
practice to accelerate public sector adoption through advocacy, policy guid-
ance, and technical assistance.
How do you identify and maintain relationships with your private sec-
tor partners?
Private sector partners are incredibly important in helping us leverage big
data as a resource for sustainable development. Using big data responsibly
We must take data protection
and privacy norms, policies,
and techniques to a new level
to mitigate the potential for
misuse.
6 SAIS Review Winter–Spring 2014
and effectively requires several different elements, so there are different
areas of expertise, knowledge, and resources we look for when building
partnerships. The Global Pulse network of partners and collaborators in-
cludes forward-thinking private sector companies that are willing to engage
in “data philanthropy,” by granting access to data and technology tools to
the public sector. Our network also includes industry leaders, universities,
research institutes, and non-profit networks of researchers and innovators
who are ready to bring their skills and expertise to bear for advancing the
use of data science across the global development and humanitarian fields.
To establish and maintain these relationships, we have a partnership
manager and a privacy and legal expert, both of whom help guide potential
partners through the process. They work with counterparts in the com-
panies to ensure that safeguards, legal agreements, and data protection
principles are in place. Once collaboration is underway, our research team
will work closely with data analysts in the partner organization to initiate a
project or exploration. Often, the data never leaves the business that owns it;
rather, our data scientists guide the process and then the trends or results
are shared. This modality works well when the data is sensitive.
The experience with our partners, overall, is one of mutual learning.
The Pulse Lab network offers a safe “sandbox” for de-risking this type of
experimentation as we all learn together how the public and private sector
can responsibly harness big data for development.
Could you share a few Global Pulse success stories? Similarly, which
development challenges (such as particular regions or issues) are par-
ticularly difficult to tackle?
There are success stories on the data philanthropy front in which telecom-
munications companies have made anonymized datasets available as part
of a competition or challenge. For example, last year we collaborated with
Orange Telecom to host a “Data for Development Challenge” in which the
company opened up a dataset of anonymized mobile phone data to more
than eighty research teams from around the world to analyze. This research
garnered insights that the international development community can be
inspired by or learn from.
In terms of projects we are carrying out, as I mentioned previously, our
Pulse Lab in Indonesia is conducting research on mining tweets to under-
stand food price crises. Their research has provided new insights into the
very real problem of sudden increases in the price of staple foodstuffs, like
rice prices, pushing families below the poverty line and causing regional
economic instability. Real-time information about these impacts could help
policymakers and governments provide support to families who are suffer-
ing as a result of food price hikes.
Going forward, we plan to conduct further research on social media
analysis for food security and for crowdsourcing food prices, since these are
areas of focus for the government of Indonesia, and will be applicable in
many other parts of the world. Certainly, this is not a solution that can be
7A Conversation with Robert Kirkpatrick
applied universally. Social media analysis is of limited use in countries where
internet penetration is low, and even in regions of a country where the digi-
tal divide is vast.
There are also ana-
lytical challenges
yet to be resolved,
including the ne-
cessity of build-
ing technologies
that can support
diverse local lan-
guages.
We are in the
early days of dis-
covering how big data can be applied to development and humanitarian
contexts, and there are diverse challenges ranging from data access and the
capacity to use real-time data in decision-making to data privacy. These
challenges will be addressed over time.
As Viktor Mayer-Schönberger and Kenneth Cukier wrote, big data is
revolutionizing the way we solve problems. You have noted several ways
that data collection and analysis is helping Global Pulse address global
development challenges. But does the shift toward big data analytics
have drawbacks for the field of international development, as well?
There are risks rather than drawbacks. I hear a lot about the supposed fear
that all development decisions would be made by algorithms. This is not
a realistic fear, but rather a false dichotomy between quantitative use of
data in decision-making and policymakers using qualitative experiences to
decide a course of action. Big data will always be one part of a solution, not
the only solution. Of course, we need research, official statistics, and the
deep knowledge of field workers, communities, and practitioners, but big
data and data science represents a useful addition to the development and
humanitarian worker’s toolbox.
Another common misperception is that real-time data would replace
official statistics, but this assumption is unrealistic, as well. Official statis-
tics will continue to provide high-quality snapshots of progress that can
be benchmarked. But increasingly, between those annual, bi-annual, or
monthly updates, real-time data sources will provide valuable interim feed-
back and indicators. This feedback can enable course-correction when it is
evident that a program isn’t working. And real-time data can reveal shifts
in food-pricing, population changes, or disease outbreaks within a day or
an hour, rather than a month or a year.
We are in the early days of discovering how
big data can be applied to development
and humanitarian contexts, and there
are diverse challenges ranging from data
access and the capacity to use real-time
data in decision-making to data privacy.
8 SAIS Review Winter–Spring 2014
Many graduate schools of international affairs, including the Johns
Hopkins School of Advanced International Studies (SAIS), offer cours-
es or concentrations in international development. Do you find that
these programs offer adequately rigorous quantitative or data analysis
requirements? What skills should students develop if they intend to
enter the field of international development?
There is a need for greater skills capacity for data analysis—this is something
that is needed across the board and not only in our field. The international
development practitioner of the future will be someone who is data literate,
and capable of using data analysis to inform his or her understanding and
decision-making. So yes, we’d like to see graduate schools covering data
for development
and the process-
es involved. Just
as the schools of
journalism are
now teaching data
journalism, all
students must un-
derstand how to
identify credible
informative sources, how to perform—or at least understand quantitative
statistical and data analysis, and how to appropriately use data to inform
their judgment. The good news is that I see a lot of appetite for these skills
from current students, so this change is beginning to happen.
The international development practitioner
of the future will be someone who is data
literate, and capable of using data analysis
to inform his or her understanding and
decision-making.
9SAIS Review vol. XXXVI no. 1 (Winter–Spring 2014)
9© 2014 by The Johns Hopkins University Press
Big Data, Selection Bias, and the
Statistical Patterns of Mortality in
Conflict
Megan Price and Patrick Ball
The notion of “big data” implies very specific technical assumptions. The tools that have made
big data immensely powerful in the private sector depend on having all (or nearly all) of the
possible data. In our experience, these technical assumptions are rarely met with data about
the policy and social world. This paper explores how information is generated about killings
in conflict, and how the process of information generation shapes the statistical patterns in
the observed data. Using case studies from Syria and Iraq, we highlight the ways in which
bias in the observed data could mislead policy. The paper closes with recommendations about
the use of data and analysis in the development of policy.
Introduction
Emerging technology has greatly increased the amount and availability
of data in a wide variety of fields. In particular, the notion of “big data”
has gained popularity in a number of business and industry applications,
enabling companies to track products, measure marketing results, and in
some cases, successfully predict customer behavior.1
These successes have,
understandably, led to excitement about the potential to apply these meth-
ods in an increasing number of disciplines.
Megan Price is the director of research at the Human Rights Data Analysis Group. She
has conducted data analyses for projects in a number of locales including Syria and
Guatemala. She recently served as the lead statistician and head author of two reports
commissioned by the Office of the United Nations High Commissioner of Human
Rights.
Patrick Ball is the executive director of the Human Rights Data Analysis Group.
Beginning in El Salvador in 1991, Patrick has designed technology and conducted
quantitative analyses for truth commissions, non-governmental organizations, domestic
and international criminal tribunals, and United Nations missions. Most recently, he
provided expert testimony in the trial of former de facto President of Guatemala, Gen.
José Efraín Ríos Montt.
The materials contained herein represent the opinions of the authors and editors and should not be
construed to be the view of HRDAG, any of HRDAG’s constituent projects, the HRDAG Board of
Advisers, the donors to HRDAG, or this project.
10 SAIS Review Winter–Spring 2014
Although we share this excitement about the potential power of data
analysis, our decades of experience analyzing data about conflict-related
violence motivates us to proceed with caution. The data available to hu-
man rights researchers is fundamentally different from the data available
to business and industry. The difference is whether the data are complete.
In most business processes, an organization has access to all the data: every
item sold in the past twelve months, every customer who clicked through
their website, etc. In the exceptional cases where complete data are unavail-
able, industry analysts are often able to generate a representative sample of
the data of interest.2
In human rights, and more specifically in studies of conflict violence,
we rarely have access to complete data. What we have instead are snapshots
of violence: a few videos of public killings posted to YouTube, a particular
set of events retro-
spectively recorded by
a truth commission,
stories covered in the
local or international
press, protesters’ SMS
messages aggregated
onto a map, or victims’ testimonies recorded by non-governmental human
rights organizations (NGOs) are typical sources. Statistically speaking, these
snapshots are “convenience samples,” and they cover an unknown propor-
tion of the total number of cases of violence.3
It is mathematically difficult,
often impossible, to know how much is undocumented and, consequently,
missing from the sample.
Incompleteness is not a criticism of data—collecting complete or rep-
resentative data under conflict conditions is generally impossible. The chal-
lenge is that researchers and advocates naturally want to address questions
that require either the total number or a representative subset of cases of
violence. How many people have been killed? What proportion was from
a vulnerable population? Were more victims killed last week or this week?
Which perpetrator(s) are committing the majority of the violence? Basing
answers and policy decisions on analyses of partial datasets with unknown,
indeed unknowable, biases can prove to be misleading. These concerns
should not deter researchers from asking questions of data; rather, it should
caution them against basing conclusions on inadequate analyses of raw data.
We conclude by suggesting methods from several quantitative disciplines
to estimate the bias in direct observations.
The Problem of Bias
When people record data about events in the world, the records are almost
always partial; reasons why the observation of violence often misses some or
most of the violence are presented in the examples to follow. Most samples
are partial, and in samples not collected randomly, the patterns of omission
may have structure that influence the patterns observed in the data. For ex-
ample, killings in urban areas may be nearly always reported, while killings
In human rights, and more specifically
in studies of conflict violence, we rarely
have access to complete data.
11Big Data, Selection Bias, and Mortality in Conflict
in rural areas are rarely documented. Thus, the probability of an event be-
ing reported depends on where the event happened. Consequently, analysis
done directly from this data will suggest that violence is primarily urban.
This conclusion is incorrect because the data simply do not include many
(or at least proportionally fewer) cases from the rural areas. In this case, the
analysis is finding a pattern in the documentation that may appear to be a
pattern in true violence—but if analysts are unaware of the documentation
group’s relatively weaker coverage of the rural areas, they can be misled
by the quantitative result. In our experience, even when analysts are aware
of variable coverage in different areas, it is enormously difficult to draw a
meaningful conclusion from a statistical pattern that is affected by bias.
Statisticians call this problem “selection bias” because some events
(in this example, urban ones) are more likely to be “selected” for the sample
than other events (in this example, rural ones). Selection bias can affect
human rights data collection in many ways.4
We use the word “bias” in the
statistical sense, meaning a statistical difference between what is observed
and what is “truth” or reality. “Bias” in this sense is not used to connote
judgment. Rather, the point is to focus attention on empirical, calculable
differences between what is observed and what actually happened.
In this article, we focus on a particular kind of selection bias called
“event size bias.” Event size bias is the variation in the probability that a
given event is reported, related to the size of the event: big events are likely
to be known, small events are less likely to be known. In studies of conflict
violence, this kind of bias arises when events that involve only one victim
are less likely to be documented than events that involve larger groups of
victims. For example, a market bombing may involve the deaths of many
people. The very public nature of the attack means that the event is likely to
attract extensive attention from multiple media organizations. By contrast,
an assassination of a single person, at night, by perpetrators who hide the
victim’s body, may go unreported. The victim’s family may be too afraid to
report the event, and the body may not be discovered until much later, if at
all. These differences in the likelihood of observing information about an
event can skew the available data and result in misleading interpretations
about patterns of violence.5
Case Studies
We present here two examples from relatively well-documented conflicts.
Some analysts have argued that information about conflict-related killings
in Iraq and Syria is complete, or at least sufficient for detailed statistical
analysis. In contrast, our analysis finds that in both cases, the available data
are likely to be systematically biased in ways that are likely to confound
interpretation.
Syria
Many civilian groups are currently carrying out documentation efforts in the
midst of the ongoing conflict in Syria. In early 2012, the United Nations Of-
fice for the High Commissioner for Human Rights (OHCHR) commissioned
12 SAIS Review Winter–Spring 2014
the Human Rights Data Analysis Group (HRDAG) to examine datasets from
several of these groups, and in two reports, Price et al. provide in-depth de-
scriptions of these sources.6
In this section, we focus our attention on four
sources—in essence, lists of people killed—which cover the entire length of
the ongoing conflict and which have continued to provide us with updated
records of victims. These sources are the Syrian Center for Statistics and Re-
search7
(CSR-SY), the Syrian Network for Human Rights8
(SNHR), the Syria
Shuhada website9
(SS) and the Violations Documentation Centre10
(VDC).
Figure 1 shows the number of victims documented by each of the four
sources over time within the Syrian governorate of Tartus. The large peak
visible in all four lines in May 2013 corresponds to an alleged massacre in
Banias.11
It appears that all four sources documented some portion of this
event. Many victims were recorded in the alleged massacre, this event was
very well reported, and all four of our sources reflect this event in their lists.
However, three out of the four sources document very little violence occur-
ring before or after May 2013 in Tartus. The fourth source, VDC, shows the
peak of violence in May as the culmination of a year of consistent month-
to-month increases in the number of reported killings.
When interpreting figures such as Figure 1, we should not aim to iden-
tify a single “correct” source. All of these sources are documenting different
snapshots of the violence, and all of them are contributing substantial num-
bers of unique records of victims undocumented by the other sources.12
The
presence of event size bias is detectable in this particular example because
all four of the sources obviously captured a similar event (or set of events)
in May 2013, while at the same time one of those sources captured a very
different subset of events during the preceding months. If we did not have
access to the VDC data, our analysis of conflict violence in Tartus would
incorrectly conclude that the alleged massacre in May 2013 was an isolated
event surrounded by relatively low levels of violence.
The conclusion from Figure 1 should not be that VDC is doing a
“better” job of documenting victims. VDC is clearly capturing some events
that are not captured by the other sources, but there is no way to tell how
many events are not being captured by VDC. From this figure alone we
cannot conclude what other biases may be present in the observed data.
For example, the relatively small peak in February 2012 could be as small
as it seems, or it could be as large as the later peak in May 2013. Without a
method of statistical estimation that uses a probability model to account
for the undocumented events, it is impossible to know.13
To underline this crucial point: despite the availability of a large
amount of data describing violence in Tartus, there is no mathematically
sound method to draw conclusions about the patterns of violence directly
from the data (though it is possible to use the data and statistical models to
estimate how many events are missing). The differences in the four sources
available to us make it possible to detect the event size bias occurring in May
2013, but what other biases might also be present in this observed data and
hidden from view? What new events might a fifth, sixth, or seventh source
document? Are there enough undocumented events such that if they were
13BIG DATA, SELECTION BIAS, AND MORTALITY IN CONFLICT
included, our interpretation of the patterns would change? These are the
crucial questions that must be examined when interpreting perceived pat-
terns in observed data.
Iraq
We detect a subtler form of event size bias in data from the Iraq Body Count
(IBC), which indexes media and other sources that report on violent deaths
in Iraq since the Allied invasion in March 2003.14
Our analysis is motivated
by a recent study by Carpenter et al., which found evidence of substantial
event size bias.15
Their approach was to compare the U.S. military’s “sig-
nificant acts” (SIGACTS) database to the IBC records. As they report, this
comparison showed that “[e]vents that killed more people were far more
likely to appear in both datasets, with 94.1% of events in which ≥20 people
were killed being likely matches, as compared with 17.4% of … killings [that
occurred one at a time].”16
This implies that IBC, SIGACTS, or both, capture
a higher fraction of large events than small events. Carpenter et al. go on
Figure 1. Number of Victims Documented by Four Sources, Over Time, in Tartus
14 SAIS Review Winter–Spring 2014
to note that “[t]he possibility that large events, or certain kinds of events
(e.g., car bombs) are overrepresented might allow attribution that one side
in a conflict was more recklessly killing civilians, when in fact, that is just
an artifact of the data collection process.”17
Motivated by this analysis, we considered other ways to examine IBC
records for evidence of potential event size bias. Since IBC aggregates re-
cords from multiple sources, updated IBC data already incorporates many
records from SIGACTS.18
In contrast to the work of Carpenter et al., who
treated IBC and SIGACTS as two separate data sources and conducted their
own independent record linkage between the two sources, we examined only
records in the IBC database, including those labeled as from SIGACTS.
It should be noted that we conducted this analysis on a subset of the
data after filtering out very large events with more than fifty victims. We
made this choice because, on inspection, many of the records with larger
numbers of reported victims are data released in batches by institutions
such as morgues, or incidents aggregated over a period of time, rather than
specific, individual events.
We began by identifying the top one hundred data sources; one or
more of the top one hundred sources cover 99.4 percent of the incidents
in IBC.19
Given these sources, we counted the number of sources (up to
one hundred) for each event. Event size was defined as the mean (rounded
to the nearest integer) of the reported maximum and minimum event size
values. Then the data were divided into three categories: events with one
victim, events with two to five victims, and events with six to fifty victims.
The analysis was performed on these groups.
Figure 2 summarizes our findings. The shading of each bar in Figure
2 indicates the proportion of events of that size reported by one, two, or
three or more sources. For each category of event sizes, most events have
two sources. For events of size one, the second most frequent number of
sources is one, accounting for nearly a third of all events of this size; almost
no single-victim events have three or more sources. The number of events
with three or more sources increases quickly in medium-sized events and
in large events. Relatively few of the largest events are reported by a single
source. Thus there seems to be a relationship between event size and the
number of sources: larger events are captured by more sources. This rein-
forces the finding by Carpenter et al. that larger events are more likely to
be captured by both IBC and SIGACTS. We have generalized this finding to
the top one hundred sources; larger events are more likely to be captured
by multiple sources.
The number of sources covering an event is an indicator of how “inter-
esting” an event is to a community of documentation groups—in this case,
media organizations. The pattern shown in Figure 2 implies that media
sources are more interested in larger events than smaller events. Greater
interest in the larger events implies that larger events are more likely to be
reported (observed) by multiple sources relative to smaller events. Since a
larger proportion of small events are covered by only a single source, it is
likely that more small events are missed, and therefore excluded from IBC.20
15Big Data, Selection Bias, and Mortality in Conflict
As noted by Carpenter et al., the correlation between event attributes
and the likely reporting of those events can result in highly misleading in-
terpretation of apparent patterns in the data. As a relatively neutral example,
analysts might erroneously conclude that most victims in Iraq were killed
in large events, whereas this may actually be an artifact of the data collec-
tion. A potentially more damag-
ing, incorrect conclusion might
be reached if large events are
centered in certain geographic
regions or attributed to certain
perpetrators; in these cases,
reading the raw data directly
would mistake the event size
bias for a true pattern, thereby
misleading the analyst. Inap-
propriate interpretations could result in incorrect decisions regarding
security measures, intervention strategies, and ultimately, accountability.
The correlation between event
attributes and the likely reporting
of those events can result in
highly misleading interpretation
of apparent patterns in the data.
Figure 2. Proportion of Events Covered by One, Two, or Three or More Sources
16 SAIS Review Winter–Spring 2014
Discussion
Event size bias is one of many kinds of selection and reporting biases that
are common to human rights data collection. It is important to recall that
we refer here to biases in the statistical sense: a measurable difference be-
tween the observed sample and the underlying population of interest. The
biases that worry us here affect statistics and quantitative analyses; we are
not implying that the political goals of the data collection groups have
influenced their work.
In the context of conflict violence, meaningful statistical analysis
involves comparisons to answer questions such as: Did more violence oc-
cur this month or last month? Were there more victims of ethnicity A or
B? Did the majority of the
violence occur in the north or
the south of the country? The
concern about bias focuses
on how the data collection
process may more effectively
document one month rela-
tive to another, creating the
appearance of a difference
between the months. Unfortu-
nately, the apparent difference is the result of changes in the documentation
process, not real changes in the patterns of violence.
To make sense of such comparisons, the observed data must in some
way be adjusted to represent the true rates. There are a number of methods
for making this adjustment if the observed data were collected at random,
but this is rarely the case. There are relatively few models that can adjust
data that were collected because it was simply available.
In order to compare nonrandom data across categories like months
or regions, the analyst must assume that the rate at which events from
each category are observed is the same. For example, 60 percent of the total
killings were collected in March, and 60 percent of the total killings were
collected in April. This rate is called the coverage rate, and it is unknown,
unless somehow the true number of events were known or estimated. If the
coverage rates for different categories are not the same, the observed data
tell only the story of the documentation; they do not indicate an accurate
pattern. For example, if victims of ethnicity A are killed in large-scale vio-
lent events with many witnesses, while victims of ethnicity B are killed in
targeted, isolated violent events, we may receive more reports of victims of
ethnicity A and erroneously conclude that the violence is targeted at eth-
nicity A. Until we adjust for the event size bias resulting in more reports of
victims of ethnicity A, we cannot draw conclusions about the true relation-
ship between the number of victims from ethnicity A versus B.
There are many other kinds of selection bias. As an example, when rely-
ing on media sources, journalists make decisions about what is considered
newsworthy. Sometimes their decisions may create event size bias, as large
. . . the apparent difference
is the result of changes in the
documentation process, not
real changes in the patterns of
violence.
17Big Data, Selection Bias, and Mortality in Conflict
events are frequently considered newsworthy. But the death of individual,
prominent members of a society are frequently also considered newswor-
thy. Conversely, media “fatigue” may result in under-documentation later
in a conflict, or when other newsworthy stories may limit the amount of
time and space available to cover victims of a specific conflict.21
Many other
characteristics of both the documentation groups and the conflict can result
in these kinds of biases such as logistical or budgetary limitations, trust or
affinity variations within the community, and the security and stability of
the situation on the ground.22
As each of these factors changes, coverage
rates are likely to change as well.
The fundamental reason why biases are so problematic for quantita-
tive analyses is that bias often correlates with other dimensions that are
interesting to analysts, such as trends over time, patterns over space, differ-
ences compared by the victims’ sex, or some other factor. As in the example
of ethnicities A and B above, the event size bias is correlated with the kind
of event. Failing to adjust for the reporting bias leads to the wrong conclu-
sion. As another example, consider the Iraq case described above: If event
size is correlated with the events’ perpetrators, then bias on event size means
bias on perpetrator, and a naïve reading of the data could lead to security
officials trying to solve the wrong security problems. Or, in the Syria case,
if decisions about resource allocation to Tartus were made on the basis of
the observed information, without taking into account the patterns of kill-
ings that were not observed, researchers may have inaccurately concluded
that violence documented in May 2013 represented an isolated event. One
could imagine that such a conclusion could lead to any number of incorrect
decisions: sending aid groups into Tartus under the erroneous assumption
of relative security, or failing to send aid and assistance before or after May
2013, assuming that such resources were more in need elsewhere.
It is important to note that these challenges frequently lack a scientific
solution.23
We do not need to simply capture more data. What we need is
to appropriately recognize and adjust for the biases present in the available
data. Indeed, as indicated in the Iraq example, where multiple media sources
appear to share similar biases, the addition of more data perpetuates and
in some cases amplifies the event size bias.
Detection of, and adjustment for, bias requires statistical estimation.
A wide variety of statistical methods can be used to adjust for bias and es-
timate what is missing from observed data. In our work we favor multiple
systems estimation, which has been developed under the name capture-
recapture in ecology, and used to study a variety of human populations
in research in demography and public health. Analysts more familiar with
traditional survey methods often prefer adjustments based on post-stratifi-
cation or “raking,” each of which involves scaling unrepresentative data to a
known representative sample or population.24
Each method has limitations
and requires assumptions, which may or may not be reasonable, but formal
statistical models provide a way to make those assumptions explicit, and in
some cases, to test whether they are appropriate. Comparisons from raw data
implicitly but necessarily assume that such snapshots are statistically repre-
sentative. This assumption may sometimes be true, but only by coincidence.
18 SAIS Review Winter–Spring 2014
Conclusions
Carpenter et al. warn that “press members and scientists alike should be cau-
tious about assuming the completeness and representativeness of tallies for
which no formal evaluation of sensitivity has been conducted. Citing partial
tallies as if they were scientific samples confuses the public, and opens the
press and scholars to being manipulated in the interests of warring parties.”
In a back-of-the-envelope description elsewhere, we have shown that small
variations in coverage rates can lead to an exactly wrong conclusion from
raw data. 25
Groups such as the Iraq Body Count, the Syrian Center for Statistics
and Research, the Syrian Network for Human Rights, the Syria Shuhada
website, and the Violations Documentation Centre collect invaluable data,
and they do so systematically, and with principled discipline. These groups
should continue to collate and share it as a fundamental record of the past.
The data can also be used in qualitative research about specific cases, and in
some circumstances, in statistical models that can adjust for biases.
It is tempting, particularly in emotionally charged research such as
studies of conflict-related violence, to search available data for answers. It is
intuitive to create infographics, to draw maps, and to calculate statistics and
draft graphs to look for patterns in the data. Unfortunately, all people—even
statisticians—tend to draw conclusions even when we know that the data
are inadequate to support comparisons. Weakly founded statistics tend to
mislead the reader.
Statistics, graphs, and maps are seductive because they seem to prom-
ise a solid basis for conclusions. The current obsession with using data to
formulate evidence-based policy increases the pressure to use statistics, even
as new doubts emerge about whether “big data” predictions about social
conditions are accurate.26
When calculations are made in a way that enables
a mathematical foundation for statistical inference, these statistics deliver
on the promise of an objective
measurement in relation to a
specific question. But analysis
with inadequate data is very hard
even for subject matter experts
to interpret. In the worst case,
it offers a falsely precise view,
a view that may be completely
wrong. In the best case, it invites speculation about what’s missing and
what biases are uncontrolled, creating more questions than answers, and
ultimately, a distraction. When policymakers turn to statistical analysis to
address key questions, they must assure that the analysis gives the right
answers.
Statistics, graphs, and maps are
seductive because they seem
to promise a solid basis for
conclusions.
19Big Data, Selection Bias, and Mortality in Conflict
Notes
1 
One extreme example includes Target successfully predicting a customer’s pregnancy, as
reported in the New York Times and Forbes. In particular, Target noticed that pregnant women
buy specific kinds of products at regular points in their pregnancy, and the company used
this information to build marketing campaigns.
2 
However it is certainly worth noting that even in these contexts sometimes big data are
not big enough and may still be subject to the kinds of biases we worry about in this paper.
See Kate Crawford’s keynote at STRATA and Tim Harford’s recent post on Financial Times
for examples.
3 
Specifically, “convenience samples” refer to data that is non-randomly collected, though
collecting such data is rarely convenient.
4 
Another common kind of bias that affects human rights data is reporting bias. Whereas se-
lection bias focuses on how the data collection process identifies events to sample, reporting
bias describes how some points become hidden, while others become visible, as a result of
the actions and decisions of the witnesses and interviewees. For an overview of the impact
of selection bias on human rights data collection, see Jule Krüger, Patrick Ball, Megan Price,
and Amelia Hoover Green (2013). “It Doesn’t Add Up: Methodological and Policy Implica-
tions of Conflicting Casualty Data.” In Counting Civilian Casualties: An Introduction to Recording
and Estimating Nonmilitary Deaths in Conflict, ed. by Taylor B. Seybolt, Jay D. Aronson, and
Baruch Fischhoff. Oxford UP.
5 
Christian Davenport and Patrick Ball. “Views to a Kill: Exploring the Implications of Source
Selection in the Case of Guatemalan State Terror, 1977–1996.” Journal of Conflict Resolution
46(3): 427–450. 2002.
6 
Megan Price, Jeff Klingner, Anas Qtiesh, and Patrick Ball (2013). “Full Updated Statistical
Analysis of Documentation of Killings in the Syrian Arab Republic.” Human Rights Data
Analysis Group, commissioned by the United Nations Office of the High Commissioner
for Human Rights (OHCHR). Megan Price, Jeff Klingner, and Patrick Ball (2013). “Prelimi-
nary Statistical Analysis of Documentation of Killings in the Syrian Arab Republic.” The
Benetech Human Rights Program, commissioned by the United Nations Office of the High
Commissioner for Human Rights (OHCHR).
7 
http://www.csr-sy.com
8 
http://www.syrianhr.org
9 
http://syrianshuhada.com
10 
http://www.vdc-sy.info
11 
See reports in the LA Times, BBC, and the Independent, among others.
12 
Price. et al. 2013.
13 
See https://hrdag.org/mse-the-basics/ for the first in a series of blog posts describing
Multiple Systems Estimation (MSE) or Kristian Lum , Megan Emily Price and David Banks
(2013). Applications of Multiple Systems Estimation in Human Rights Research. The Ameri-
can Statistician, 67:4, 191–200. DOI: 10.1080/00031305.2013.821093
14 
http://www.iraqbodycount.org
15 
Carpenter D, Fuller T, Roberts L. “WikiLeaks and Iraq Body Count: the sum of parts may
not add up to the whole—a comparison of two tallies of Iraqi civilian deaths.” Prehosp Disaster
Med. 2013;28(3):1–7. doi:10.1017/S1049023X13000113
16 
Ibid.
17 
Ibid.
18 
We downloaded the ibc-incidents file on 14 Feb 2014, and processed it using the pandas
package in python.
19 
The top 100 sources include, for example, AFP, AL-SHAR, AP, CNN, DPA, KUNA, LAT,
MCCLA, NINA, NYT, REU, VOI, WP, XIN, and US DOD VIA WIKILEAKS.
20 
These assumptions can be formalized and tested within the framework of ‘species richness,’
which is a branch of ecology that estimates the number of different types of species within
a geographic area and/or time period of interest using models for data organized in a very
similar way to the IBC’s event records. See Wang, Ji-Ping. “Estimating species richness by a
Poisson-compound gamma model.” Biometrika 97.3 (2010): 727–740.
20 SAIS Review Winter–Spring 2014
21 
A research question to address this might be: Do media-reported killings in a globally-
interesting conflict like Iraq or Syria decline during periods when other stories attract
interest? Do reported killings decline during the Olympics?
22 
Krüger et al. (2013)
23 
Bias issues can sometimes be resolved with appropriate statistical models, that is, with
better scientific reasoning about the specific kind of data involved. However, we underline
that bias is not solvable with better technology. Indeed, some of the most severely biased
datasets we have studied are those collected by semi- or fully-automated, highly technologi-
cal methods. Technology tends to increase analytic confusion because it tends to amplify
selection bias.
24 
For a description of multiple systems estimation, see Lum et al. 2013. For methods on
missing data in survey research which might be applicable to the adjustment of raw, non-
random data if population-level information is available, see Brick, J. Michael, and Graham
Kalton. “Handling missing data in survey research.” Statistical methods in medical research 5.3
(1996): 215–238. For an overview of species richness models which might be used to estimate
total populations from data organized like the IBC, see op. cit Wang. For an analysis of
sampling issues in “elusive” populations, see Johnston, Lisa G., and Keith Sabin. “Sampling
hard-to-reach populations with respondent driven sampling.” Methodological Innovations
Online 5.2 (2010): 38–48.
25 
https://hrdag.org/why-raw-data-doesnt-support-analysis-of-violence/
26 
Lazer, David and Kennedy, Ryan and King, Gary and Vespignani, Alessandro, Google Flu
Trends Still Appears Sick: An Evaluation of the 2013–2014 Flu Season (March 13, 2014).
Available at SSRN: http://ssrn.com/abstract=2408560
21SAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)
21© 2014 by The Johns Hopkins University Press
Using Big Data and Quantitative
Methods to Estimate and Fight
Modern Day Slavery
Monti Narayan Datta
Given the hidden, criminal nature of contemporary slavery, empirically estimating the
proportion of the population enslaved at the national and global level is a challenge. At the
same time, little is understood about what happens to the lives of the survivors of slavery
once they are free. I discuss some data collection methods from two nongovernmental
organizations (NGOs) I have worked with that shed light on these issues. The first NGO,
the Walk Free Foundation, estimates that there are about 30 million enslaved in the world
today. The second NGO, Free the Slaves, employs a longitudinal analysis to chronicle the
lives of survivors. The acquisition and dissemination of such information is crucial because
policymakers and donors sometimes require hard data before committing time, political
will, and resources to the cause.
Unpacking the Problem of Contemporary Slavery
As Kevin Bales of the Wilberforce Institute for the Study of Slavery and
Emancipation explains, “Slavery is the possession and control of a
person in such a way as to significantly deprive that person of his or her
individual liberty, with the intent of exploiting that person through their
use, management, profit, transfer or disposal. Usually this exercise will be
achieved through means
such as violence or threats
of violence, deception and/
or coercion.”1
Thus, at its core,
slavery is a dynamic be-
tween two individuals, the
enslaved and the slave-
holder, in which the slave-
holder has a monopoly of
control and violence upon
Monti Narayan Datta is an assistant professor of political science at the University of
Richmond. His current book project, forthcoming with Cambridge University Press,
focuses on the consequences of anti-Americanism. He is working on several projects
on human trafficking and modern day slavery with Free the Slaves and Chab Dai and
the Walk Free Foundation. Along with Kevin Bales and Fiona David, he is a co-author
of the Global Slavery Index: http://www.globalslaveryindex.org.
. . . at its core, slavery is a dynamic
between two individuals, the
enslaved and the slaveholder,
in which the slaveholder has a
monopoly of control and violence
upon the enslaved.
22 SAIS Review Winter–Spring 2014
the enslaved. The slaveholder can coerce the enslaved to perform a number
of abominable acts. This can include: sexual servitude on the streets of New
York City;2
adult labor in the coltan mines of the Congo;3
child slavery in the
shrimp farms of Bangladesh; or forced domestic servitude in the suburbs of
Los Angeles.4
Compounding the matter is that enslaved persons can spend
years—sometimes decades—under such conditions.5
This can sometimes lead
to slavery lasting across several generations. Short of homicide, slavery is
one of the most inhumane crimes one person can commit against another.
In recent years, a number of governments and international govern-
mental organizations have addressed modern day slavery at home and
abroad. In the United States, Congress passed the Victims of Trafficking
and Violence Protection Act (TVPA) in 2000. The TVPA established the
President’s Interagency Task Force to Monitor and Combat Trafficking—a
cabinet-level group whose mission is to coordinate efforts to combat traf-
ficking in persons—led by the U.S. State Department. Since then, the State
Department has produced its annual Trafficking in Persons (TIP) Report,
which has become “the U.S. Government’s principal diplomatic tool to en-
gage foreign governments on human trafficking.”6
Although not without
controversy, the TIP Report has educated many on the sources and impact
of modern day slavery.7
On the global stage, between 2000 and 2001 the United Nations Gen-
eral Assembly adopted three protocols to its Convention against Transna-
tional Organized Crime: (1) the Protocol to Prevent, Suppress and Punish
Trafficking in Persons, especially Women and Children; (2) the Protocol
against the Smuggling of Migrants by Land, Sea, and Air; and (3) the Pro-
tocol against the Illicit Manufacturing and Trafficking in Firearms. With
117 signatory countries, these protocols, known as the Palermo Protocols,
advanced the global discussion not only on what constitutes contemporary
slavery, but also on what the international community can do to mitigate
its spread.
Although some may argue that international agreements like the
Palermo Protocols and documents like the TIP Report matter only margin-
ally,8
others counter they catalyze change.9
Building upon a crest of public
awareness on human trafficking, U.S. President Barack Obama proclaimed
in 2012 at the Clinton Global Initiative, “We are turning the tables on the
traffickers. Just as they are now using technology and the Internet to exploit
their victims, we are going to harness technology to stop them.”10
Although he did not mention it explicitly, President Obama was re-
ferring to the idea of using big data to mitigate contemporary slavery. As
Pulitzer-prize winning journalist Steve Lohr explains, big data is “shorthand
for advancing trends in technology that open the door to a new approach
to understanding the world and making decisions.”11
This typically involves
using software to find trends and patterns in large amounts of aggregated
data from the Internet, sometimes from publicly available data, and other
times from clandestinely obtained data.
Along the lines of utilizing publicly available data, the tech-giant
Google announced in April 2013 a big data partnership with the Polaris
23Big Data and Quantitative Methods to Estimate Modern Day Slavery
Project, an antislavery NGO in Washington, D.C. The partnership, called the
Global Human Trafficking Hotline Network, aims to use data mining soft-
ware to identify human trafficking trends from the hotline that can even-
tually inform “eradication, prevention, and victim protection strategies.”12
Although using big data to fight trafficking is new, the idea has been
demonstrated by scholars like Mark Latonero of the Annenberg Center on
Communication Leadership & Policy at the University of Southern Califor-
nia. Latonero’s team partnered with local law enforcement agencies in Los
Angeles, explored trends in human trafficking on websites like Backpage.
com, and applied this information to target specific traffickers. This was
done by mining data from advertisements for the sexual services of domestic
minors on the adult section of Backpage in the Greater Los Angeles area, and
identifying the phone numbers from those ads that appeared in the greatest
frequencies. With this information, Latonero’s team was able to provide law
enforcement with data linking certain phone numbers to criminal networks.
The U.S. government is also using big data to mine private information
networks, not on the World Wide Web, but on what is called the Deep Web—
that part of the Internet that is not searchable on databases like Google.
The Defense Advanced Research Projects Agency (DARPA), a branch of the
U.S. military, recently launched a program called Memex to hunt criminal
networks on the Deep Web. The first domain DARPA intends to undercover
with this new technology is human trafficking.13
These developments in big data dovetail with a broader discussion
within academia about how social science researchers can apply quantitative
methods to estimate trends in contemporary slavery. Although rigorous,
many studies of modern day
slavery only exist in the pol-
icy and academic communi-
ties, and very few published
works actually employ
quantitative methods. In
a comprehensive review of
the research-based literature
on contemporary slavery,
Elżbieta M. Goździak and
Micah N. Bump of George-
town University found that,
of 218 research-based journal articles, only seven (about 3 percent) were
based on quantitative methods. Without hard data, it can be challenging
for scholars to make generalizable inferences to inform policy.
In this paper, using big data as a backdrop, I discuss some novel quan-
titative methods employed by two NGOs I have worked with that shed light
on contemporary slavery. The first NGO, the Walk Free Foundation, esti-
mates that there are about 30 million enslaved in the world today. The sec-
ond NGO, Free the Slaves, working with its local Indian partner, MSEMVS,
assesses the lives of survivors and how they are reintegrating into society.
The acquisition and dissemination of such information is crucial because
These developments in big data
dovetail with a broader discussion
within academia about how social
science researchers can apply
quantitative methods to estimate
trends in contemporary slavery.
24 SAIS Review Winter–Spring 2014
policymakers and donors sometimes require hard data before committing
time, political will, and resources to the cause.
The Walk Free Foundation
Australian philanthropists Andrew and Nicola Forrest established the Walk
Free Foundation (Walk Free)14
three years ago to eradicate contemporary
slavery. After meeting with Microsoft co-founder Bill Gates, Andrew For-
rest was inspired to explore the underpinnings of contemporary slavery
using quantitative methods. As Forrest recounts, “Global modern slavery
is hard to measure, and Bill’s a measure kind of guy,” adding, “in manage-
ment speak, if you can’t measure it, it doesn’t exist.”15
For Forrest, it was
important to inform people in the business and policy worlds of the extent
to which slavery exists, country-by-country, to prompt action. Although
some quantitative assessments of contemporary slavery existed, very little
was publicly available. Forrest sought to collect more precise data to dis-
seminate freely and thus launched a Global Slavery Index (GSI), on which
I have been working since 2012.16
The 2013 GSI ranks 162 of the world’s nations in terms of their level
of contemporary slavery. Methodologically, these rankings are based on
several factors; the most novel is an estimation of the proportion of the
population enslaved in each country. For this measure, the GSI team (led
by Kevin Bales and Fiona David) has drawn upon secondary source data
analysis that Bales pioneered for his book, Disposable People, and later dis-
seminated in Scientific American.17
These secondary sources consisted of a
review of the public record, including materials from published reports from
governments, the investigations of NGOs and international organizations,
and journalistic reports. The GSI team has also drawn upon data from rep-
resentative random sample surveys to extrapolate the prevalence of slavery
for selected comparable countries. Figure 1 illustrates the 2013 GSI data for
the proportion of the population estimated to be enslaved.
In Figure 1 the countries with darker shades indicate a corresponding
higher proportion of enslavement. Some of the countries with the highest
proportions are Haiti (about 2.1 percent of the population enslaved), Mau-
ritania (about 4.0 percent of the population enslaved), Pakistan (about 1.2
percent of the population enslaved), and
India (about 1.1 percent of the popula-
tion enslaved).
Table 1 lists the 2013 GSI data in
terms of the total estimated number of
the enslaved, country-by-country. This is
a novel contribution compared to other
estimates of contemporary slavery. Such information can be useful to busi-
ness people, policymakers, and students who want a more informed under-
standing of where slavery occurs and with what frequency.
Overall, the 2013 GSI estimates about 29.8 million are enslaved among
the 162 countries under study. The country with the least number of esti-
. . . 29.8 million are
enslaved among the 162
countries under study.
25Big Data and Quantitative Methods to Estimate Modern Day Slavery
mated enslaved in 2013 was Iceland (twenty-two enslaved), and the country
with the greatest number was India (13.9 million enslaved). The standard
deviation (or spread) was extremely large: about 1.2 million enslaved.
One important question is if the GSI has made a difference in the
real world. One way to shed light on this is to explore some of the statis-
tics achieved since the GSI’s launch in October 2013. To date, the GSI has
received over half a million website visits. There have been over thirteen
thousand downloads of the full report, available in English, Arabic, French,
and Spanish. Moreover, there have been over fifteen hundred media reports
about the GSI in over thirty-five countries, including The Economist,18
Die
Standaard,19
La Vanguardia,20
CNN,21
National Public Radio,22
and Time.23
Some of the media reports about the GSI illustrate how it can gener-
ate discussion on an underreported issue. In India, for instance, where the
GSI estimates the greatest number of the enslaved to be, media response
has been strong. The Times of India reported, “Sixty-six years after indepen-
dence, India has the dubious distinction of being home to half the number
of modern-day slaves in the world.”24
Perhaps due to such sentiments, the
Hindustan Times discussed the causes of slavery in India and observed, “Some
of the reasons for high numbers caught in slavery in India are the difficulty
in accessing protections and government entitlements, such as the food
rations card, corruption or non-performance of safety nets (such as the
National Employment Guarantee, primary health care and pensions) and
practices of land grabbing and asset domination by high-caste groups.”25
There is also some evidence that the GSI has begun to influence gov-
ernment policy. In January of this year, building upon the momentum of
the GSI, Andrew Forrest signed a memorandum of understanding (MOU)
Figure 1. Global Slavery Index (GSI)—Proportion of the Population Estimated Enslaved in 2013
26 SAIS Review Winter–Spring 2014
Table 1. Global Slavery Index—Estimated Enslaved in 2013
Country Estimated Enslaved Country Estimated Enslaved
Afghanistan	 86,089 	 Lebanon	 4,028
Albania	 11,372 	 Lesotho	 14,560
Algeria	 70,860 	 Liberia	 29,504
Angola	 16,767 	 Libya	 17,683
Argentina	 35,368 	 Lithuania	 2,909
Armenia	 10,678 	 Luxembourg	 69
Australia	 3,167 	 Macedonia	 6,226
Austria	 1,100 	 Madagascar	 19,184
Azerbaijan	 33,439 	 Malawi	 110,391
Bahrain	 2,679 	 Malaysia	 25,260
Bangladesh	 343,192 	 Mali	 102,240
Barbados	 46 	 Mauritania	 151,353
Belarus	 11,497 	 Mauritius	 535
Belgium	 1,448 	 Mexico	 103,010
Benin	 80,371 	 Moldova	 33,325
Bolivia 	 29,886 	 Mongolia	 4,729
Bosnia and
  Herzegovina	 13,789 	 Montenegro	 2,234
Botswana	 14,298 	 Morocco	 50,593
Brazil	 209,622 	 Mozambique	 173,493
Brunei 	 417 	 Myanmar	 384,037
Bulgaria	 27,739 	 Namibia	 15,729
Burkina Faso	 114,745 	 Nepal	 258,806
Burundi	 71,146 	 Netherlands	 2,180
Cambodia	 106,507 	 New Zealand	 495
Cameroon	 153,258 	 Nicaragua	 5,798
Canada	 5,863 	 Niger	 121,249
Cape Verde	 3,688 	 Nigeria	 701,032
Central African
  Republic	 32,174 	 Norway	 652
Chad	 86,329 	 Oman	 5,739
Chile	 37,846 	 Pakistan	 2,127,132
China	 2,949,243 	 Panama	 548
Colombia	 129,923 	 Papua New Guinea	 6,131
Costa Rica	 679 	 Paraguay	 19,602
Côte d’Ivoire	 156,827 	 Peru	 82,272
Croatia	 15,346 	 Philippines	 149,973
Cuba	 2,116 	 Poland	 138,619
Czech Republic	 37,817 	 Portugal	 1,368
  Democratic Republic
of the Congo	 462,327 	 Qatar	 4,168
Denmark	 727 	 Republic of the Congo	30,889
Djibouti	 2,929 	 Romania	 24,141
Dominican Republic	 23,183 	 Russia	 516,217
Ecuador	 44,072 	 Rwanda	 80,284
Egypt	 69,372 	 Saudi Arabia	 57,504
El Salvador	 10,490 	 Senegal	 102,481
27Big Data and Quantitative Methods to Estimate Modern Day Slavery
Equatorial Guinea	 5,453 	 Serbia	 25,981
Eritrea	 44,452 	 Sierra Leone	 44,644
Estonia	 1,496 	 Singapore	 1,105
Ethiopia	 651,110 	 Slovakia	 19,458
Finland	 704 	 Slovenia	 7,402
France	 8,541 	 Somalia	 73,156
Gabon	 13,707 	 South Africa	 44,545
Gambia	 14,046 	 South Korea	 10,451
Georgia	 16,227 	 Spain	 6,008
Germany	 10,646 	 Sri Lanka	 19,267
Ghana	 181,038 	 Sudan	 264,518
Greece	 1,466 	 Suriname	 1,522
Guatemala	 13,194 	 Swaziland	 1,302
Guinea	 82,198 	 Sweden	 1,237
Guinea-Bissau	 12,186 	 Switzerland	 1,040
Guyana	 2,264 	 Syria	 19,234
Haiti	 209,165 	 Tajikistan	 23,802
Honduras	 7,503 	 Tanzania	 329,503
Hong Kong,
  SAR China	 1,543 	 Thailand	 472,811
Hungary	 35,763 	 Timor-Leste	 1,020
Iceland	 22 	 Togo	 48,794
India	 13,956,010 	 Trinidad and Tobago	 486
Indonesia	 210,970 	 Tunisia	 9,271
Iran 	 65,312 	 Turkey	 120,201
Iraq	 28,252 	 Turkmenistan	 14,711
Ireland	 321 	 Uganda	 254,541
Israel	 8,096 	 Ukraine	 112,895
Italy	 7,919 	 United Arab Emirates	 18,713
Jamaica	 2,386 	 United Kingdom	 4,426
Japan	 80,032 	 United States	 59,644
Jordan	 12,843 	 Uruguay	 9,978
Kazakhstan	 46,668 	 Uzbekistan	 166,667
Kenya	 37,349 	 Venezuela 	 79,629
Kuwait	 6,608 	 Vietnam	 248,705
Kyrgyzstan	 16,027 	 Yemen	 41,303
Laos	 50,440 	 Zambia	 96,175
Latvia	 2,040 	 Zimbabwe	 93,749
Source: The Global Slavery Index
with the Pakistani State of Punjab. In the business world, that a government
would sign a deal with a businessman to help eradicate slavery in its own
borders is atypical. Yet Forrest was able to leverage his influence in Pakistan
to encourage a conversation that aims to provide the state of Punjab with
inexpensive coal in exchange for assurances that the government will work
toward the liberation of its own people. 26
Although it is too early to see
how Pakistan will hold up to its promise, this agreement may herald future
MOUs between NGOs like Walk Free and governments that want to mitigate
slavery, and one day even eradicate it.
28 SAIS Review Winter–Spring 2014
The GSI may also be influencing heads of state. Former U.S. President
Jimmy Carter references the GSI several times in his new bestselling book,
A Call to Action: Women, Religion, Violence, and Power. And the GSI has been
publicly endorsed by, among others, Hillary Clinton, Gordon Brown, Julia
Gillard, and Tony Blair.27
Free the Slaves
The GSI strives to use big data to count the number of slaves in the world.
Other NGOs have begun to employ longitudinal techniques to chronicle
the lives of survivors of slavery once they are free. One such NGO is Free
the Slaves (FTS), which Kevin Bales, Peggy Callahan, and Jolene Smith co-
founded in 2000 as the sister-organization of Anti-Slavery International (the
oldest international human rights organization in the world).28
Early in its evolution, FTS reasoned that the liberation of any slave
would be beneficial not only for that individual, but also for the commu-
nity, and thus produce a “freedom dividend,” multiplied by each additional
person freed. As FTS explains, “Local communities thrive when formerly
enslaved people start their own businesses; communities begin to flourish as
people come together to organize and watch out for one another; children
go to school—and the benefits extend for generations.” 29
For the past decade, FTS has partnered with different grassroots orga-
nizations in Haiti, India, Nepal, Ghana, the Democratic Republic of Congo,
and Brazil to empower local communities of the enslaved to seek liberation.
In India, FTS has worked with a local grassroots organization called Mina
Sansadham Evam Mahila Vikas Sansthan (MSEMVS).30
Through the ef-
forts of MSEMVS, over 150 villages have eradicated slavery and trafficking
in recent years and many more are beginning to experience liberation in
the North Indian States of Uttar Pradesh and Bihar, two of India’s poorest
states, as Figure 2 highlights.
In addition to empowering people in rural Uttar Pradesh to seek libera-
tion, MSEMVS has been among the first NGOs to begin several longitudinal
studies on the effects, in addition to quantitative studies of the predictive
factors of enslavement. The studies are intended to provide insight into:
(1) whether slavery and trafficking have been eradicated; and (2) whether
the socio-economic conditions of people living in these communities have
improved. I consulted with Free the Slaves at this time, and, along with
Ginny Baumann, Jody Sarich, Austin Choi-Fitzpatrick, and Jessica Leslie,
helped put together a follow-up report for the village of Kukrouthi in Ut-
tar Pradesh.
The follow-up report was conducted among the residents of three
hamlets in Kukrouthi village.31
There were two sources of information: The
first was a set of 120 household level surveys, and the second was a set of
focus group discussions. A total of 929 people were accounted for by the
surveys. The time periods under comparison were 2009 (when the libera-
tion process began) and 2011 (when the process of self-liberation by local
residents was completed).
29BIG DATA AND QUANTITATIVE METHODS TO ESTIMATE MODERN DAY SLAVERY
Some of the key findings between the 2009 and 2011 studies are as
follows, providing credence to FTS’s supposition of there being a “freedom
dividend” after liberation.
Growth in Childhood Education
One important indicator of a freedom dividend in Kukrouthi village is the
number of children in school. The underlying premise is that in free com-
munities children receive better education, which fuels a society’s human
capital. In Kukrouthi, the team from MSEVMS found evidence of significant
growth in childhood education rates. Whereas in 2009 only 69 percent of
the school-aged children were reported to be in school, by 2011, 91 percent
were enrolled, as Figure 3 illustrates.
Figure 2. Uttar Pradesh, North India
30 SAIS Review WINTER–SPRING 2014
Better Nutrition
Another key indicator illuminating the freedom dividend is access to ad-
equate nutrition. As with childhood education, the team from MSEVMS
reported a dramatic increase in the number of families that were able to eat
three meals a day, from 31 percent in 2009 to 71 percent in 2011. This was
more than a 200 percent increase, as Table 2 details.
Table 2. Number of Daily Meals By
Household in Kukrouthi, 2009 and 2011
Number of Meals Year Percentage
Two Meals Per Day 2011 22%
2009 31%
Three Meals Per Day 2011 71%
2009 31%
No Response 2011 8%
2009 3%
Figure 3. Percent Children in School in Kukrouthi, 2009 and 2011
31Big Data and Quantitative Methods to Estimate Modern Day Slavery
Improved Access to Health Care
Yet another strong indicator of a freedom dividend is access to health care,
even if of rudimentary quality. In 2011, MSEVMS reported that almost the
entire population of Kukrouthi village had access to healthcare. This was
another dramatic increase compared to 2009, when MSEVMS found that
just 52 percent of families received health care treatment. Table 3 provides
a breakdown of this comparison.
Table 3. Comparison of Access to Health Care in
Kukrouthi, 2009 and 2011
Access to Health Care Year Percent
Yes		 2011	 96%
			 2009	 57%
No		 2011	 3%
			 2009	 43%
Don’t Know		 2011	 1%
			 2009	 .
No Response		 2011	 1%
			 2009	 1%
Improvement in Childhood Vaccinations
Lastly, in 2009, just one-third of children had the proper number of recom-
mended vaccinations (i.e., three vaccinations). By 2011, this had increased
to 90 percent, as Table 4 shows.
Table 4. Comparison of Child Vaccinations in
Kukrouthi, 2009 and 2011
Immunizations Year Percentage
None	 2011	.
	 2009	49%
One	 2011	3%
	 2009	7%
Two	 2011	7%
	 2009	12%
Three	 2011	90%
	 2009	33%
32 SAIS Review Winter–Spring 2014
A World Without Slavery
Applying quantitative methods to the study of contemporary slavery could
contribute significantly to shedding more light on the phenomenon. In
collaboration with my colleagues at the Walk Free Foundation, I have
used quantitative methods to estimate the total number of enslaved in the
world today. This, in turn,
has generated discussion
among the media and pol-
icy community on how to
mitigate modern day slav-
ery, with an eye toward its
eradication. With Free the
Slaves and MSEVMS, we
have begun to chronicle systematically how communities can benefit from
freedom. This information provides preliminary evidence to policy makers
that liberating slaves provides a wide range of socioeconomic benefits.
The modern day anti-slavery movement is young. Moving forward, we
need more scholars and policy makers who want to explore what quantita-
tive methods and big data can do for the movement. We are at a point in
the world where everyone agrees that contemporary slavery is a wrong that
must be addressed. The time is ripe for further discussion on how to make
this a reality. I hope we can get there, at least in part, through employing
quantitative methods and exploring big data.
Notes
1 
Kevin Bales, The Global Slavery Index, 2013. http://www.globalslaveryindex.org/report/#view-
online
2 
For example: http://www.gems-girls.org/get-involved/very-young-girls
3 
Congo. https://www.freetheslaves.net/congo
4 
CNN Freedom Project, http://thecnnfreedomproject.blogs.cnn.com.
5 
For example: Survivors of Slavery Speak Out, http://survivorsofslavery.org
6 
Trafficking In Persons Report, http://www.state.gov/j/tip/rls/tiprpt
7 
For example: http://www.coha.org/the-trafficking-in-persons-report-who-is-the-united-
states-to-judge
8 
For example: John J. Mearsheimer, “The False Promise of International Institutions,” In-
ternational Security, Vol. 19, No. 3 (1995) pp. 5–49.
9 
For example: Anne Marie-Slaughter, A New World Order, (Princeton University Press, 2005).
10 
Barack Obama, “Remarks by the President to the Clinton Global Initiative,” September 25,
2012. http://www.whitehouse.gov/the-press-office/2012/09/25/remarks-president-clinton-
global-initiative
11 
Steve Lohr, “The Age of Big Data,” The New York Times, February 11, 2012. http://
www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.
html?pagewanted=all&_r=0
12 
“Polaris Project Launches Global Human Trafficking Hotline Network.” http://www.
polarisproject.org/media-center/news-and-press/press-releases/767-polaris-project-launches-
global-human-trafficking-hotline-network
13 
“Darpa Reinventing Search Engines to Fight Crime,” Wired, February 11, 2014. http://
www.wired.co.uk/news/archive/2014-02/11/darpa-memex-human-trafficking
14 
The Walk Free Foundation, http://www.walkfreefoundation.org
Applying quantitative methods to the
study of contemporary slavery could
contribute significantly to shedding
more light on the phenomenon.
33Big Data and Quantitative Methods to Estimate Modern Day Slavery
15 
Elisabeth Behrmann, “Gates Helps Australia’s Richest Man in Bid to End Slavery,” Bloom-
berg, April 14, 2013. http://www.bloomberg.com/news/2013-04-10/gates-helps-australia-s-
richest-man-in-bid-to-end-slavery.html
16 
The Global Slavery Index, http://www.globalslaveryindex.org
17 
Kevin Bales, “The Social Psychology of Modern Slavery,” Scientific American, April 2002.
18 
“Dry Bones,” The Economist, October 19, 2013. http://www.economist.com/news/
international/21588105-hateful-practice-deep-roots-still-flourishing-dry-bones
19 
“Wereldwijd bijna 29 miljoen slaven [29 million people in slavery worldwide],” De Stan-
daard, October 17, 2013. http://nos.nl/artikel/563375-wereldwijd-bijna-30-miljoen-slaven.
html
20 
“Casi 30 millones de personas son esclavos modernos [Almost 30 million people
are modern slaves],” La Vanguardia, October 18, 2013. http://www.lavanguardia.
com/20131018/54391301708/casi-30-millones-de-personas-son-esclavos-modernos-barce-
lona.html
21 
Tim Hume, “India, China, Pakistan, Nigeria on Slavery’s List of Shame, Says Report,” CNN,
October 17, 2013. http://www.cnn.com/2013/10/17/world/global-slavery-index
22 
Audie Cornish, “Report Estimates 30 Million People in Slavery Worldwide,” National Public
Radio, October 17, 2013, http://www.npr.org/templates/story/story.php?storyId=236407720
23 
Nilanjana Bhowmick, “Report: Almost 14 Million Indians Live Like Slaves,” Time, October
17, 2013. http://world.time.com/2013/10/17/report-almost-14-million-indians-live-like-
slaves/
24 
“India Has Half the World’s Modern Slaves: Study,” The Times of India, October 18, 2013.
http://timesofindia.indiatimes.com/india/India-has-half-the-worlds-modern-slaves-Study/
articleshow/24313244.cms
25 
Abhijit Patnaik, “Modern Slavery Widespread in India,” Hindustan Times, October 17, 2013.
http://www.hindustantimes.com/India-news/NewDelhi/Modern-slavery-widespread-in-
India/Article1-1136431.aspx
26 
Dennis Shanahan, “Andrew Forrest Strikes Cheap Coal Deal to End Pakistan Slav-
ery,” The Australian, January 23, 2014, http://www.theaustralian.com.au/business/
mining-energy/andrew-forrest-strikes-cheap-coal-deal-to-end-pakistan-slavery/story-
e6frg9df-1226808181875#
27 
The Global Slavery Index. http://www.globalslaveryindex.org/endorsements
28 
Anti-Slavery International. http://www.antislavery.org/english
29 
FTS In India: Free a Village, Build a Movement. http://www.freetheslaves.net
30 
https://www.ashanet.org/projects/project-view.php?p=907
31 
Ginny Baumann, et al, “Follow Up Study of Slavery and Poverty In Kukrouthi village, St
Ravidas Nagar District, Uttar Pradesh,” June 2012, unpublished manuscript. Free the Slaves.
35A Conversation with Arch PuddingtonSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)
35© 2014 by The Johns Hopkins University Press
A Conversation with Arch
Puddington, Vice President for
Research at Freedom House
Who is the target audience of Freedom House reports?
From the beginning, we have sought to provide analysis that combines
scholarly rigor with a methodology and vocabulary that is accessible to the
general public. Obviously, there is a niche group of policymakers here and
in Europe, as well as journalists, scholars, political activists and dissidents,
who make up our core audience. But our data are also widely used by educa-
tors and students, including at the secondary level.
We have also developed a growing audience among foreign government
officials. This is in large measure due to the important role of democracy
and honest governance in the calculations of international development
agencies, financial institutions, and governments. Especially since Freedom
House findings have been formally incorporated into the foreign assistance
process of the American government, we have experienced a major increase
in communications with foreign diplomats, who want to discuss, or com-
plain about, our conclusions about their countries.
Freedom House’s 2013 Freedom on the Net report examined internet
activism and “increasingly sophisticated restrictions on internet free-
dom” by authoritarian regimes. Based on the report, what opportuni-
ties and obstacles do new technologies offer in promoting freedom?
New technologies offer a significant opportunity to advance democracy.
Throughout the world, online activists and ordinary social media users uti-
lize these tools to organize, lobby, and hold their governments accountable.
Women’s rights groups, free speech advocates, and human rights organiza-
tions have staged successful advocacy campaigns to overturn or prevent the
passage of oppressive laws. In many authoritarian states, such as China,
Saudi Arabia, and Bahrain, exposés by online and citizen journalists reveal-
ing corruption, police abuse, and pollution often force the authorities to
acknowledge the issue, and in some cases, hold the perpetrators accountable.
Unfortunately, the transformative power of digital media is not limited
Arch Puddington is vice president for research at Freedom House. He manages the
publication of Freedom in the World, an annual report assessing global political rights
and civil liberties, and is responsible for the development of new research and advocacy
programs.
36 SAIS Review Winter–Spring 2014
to individuals fighting to promote freedom. Technological advances also
bring new tools to censor the web and intimidate citizens who are engaged
in online speech that
is deemed to threaten
the regime, insult the
dominant religion,
or sow social discord.
Authoritarian regimes
monitor the personal
communications of
their citizens for po-
litical reasons, with
the goal of identify-
ing and suppressing
government critics and human rights activists. Such monitoring can have
dire repercussions for the targeted individuals in those countries, including
imprisonment, torture, and even death.
In 2007, Freedom House published a report indicating a “profoundly
disturbing deterioration”—a greater number of countries were becom-
ing less free than were becoming more free. Could you share insights
on this finding?
According to our findings, more countries have experienced declines in
freedom than have experienced gains during each of the last eight years.
This is unprecedented in the forty-one-year history of Freedom in the World.
At the same time, this decline is not in itself a cause for alarm. Many of the
declines represent quite small
setbacks and not a pell-mell
retreat from the gains of pre-
vious decades. Many coun-
tries that embraced democ-
racy over the previous four
decades had little experience
with the institutions of free-
dom, and their adherence to
good government standards
is beginning to fray. Especially in times of relative scarcity, corruption is
emerging as a particular evil, especially as top-to-bottom graft and favorit-
ism erodes popular faith in democratic institutions.
A more serious problem that is reflected in our findings is the durabil-
ity of what we call modern authoritarian regimes. Russia’s Vladimir Putin
and the Chinese Communist Party leadership are the best examples of this
phenomenon, but there are others as well: Aliyev in Azerbaijan, the Iranian
clerics, Correa in Ecuador, the post-Chavez group in Venezuela. Modern
authoritarians preside over countries that are well-integrated into the
global economic and diplomatic systems and often possess energy riches.
Technological advances also bring new
tools to censor the web and intimidate
citizens who are engaged in online
speech that is deemed to threaten the
regime, insult the dominant religion, or
sow social discord.
. . . corruption is emerging as a
particular evil, especially as top-
to-bottom graft and favoritism
erodes popular faith in democratic
institutions.
37A Conversation with Arch Puddington
The leaders are unabashedly antidemocratic and anti-Western. They devote
their energies to the control of the political process, the press, civil society,
and the rule of law. They avoid the excesses and stupidities of communism,
especially in economic policy, but use nuanced and sophisticated methods
to control the levers of power. Modern authoritarianism has emerged over
the past fifteen years, and its practitioners have grown in power and even
international respectability over time. Modern authoritarianism today ranks
as the most worrying threat to freedom around the world.
The most recent Freedom in the World report noted that the number of
electoral democracies has risen, while the distribution of countries
in each of the “free,” “partly free,” and “not free” categories did not
change significantly in comparison to 2012. Why do you think this is
the case?
One way to think of a country with a designation as “free” is as a liberal or
consolidated democracy. In recent years, the number of free countries has
remained steady at eighty-seven to ninety, meaning that approximately 45
percent of the world’s sovereign states enjoy systems that guarantee com-
petitive elections and a broad range of civil liberties. On the other hand, the
number of electoral democracies has oscillated between 115 and 123. There
are thus some thirty countries that can be said to have met internationally
accepted standards for competitive elections but which fall short on other
indicators that measure liberal democracy—press freedom, minority rights,
gender equality, corruption, and so forth.
Freedom in the World ranks Mexico as an electoral democracy but also
places it in the “partly free” category because of the impact of uncontrolled
violence. Indonesia likewise qualifies as an electoral democracy but is ranked
as “partly free” because, among other problems, its government has been
unable to secure the rights of religious minorities. Given that the standards
for gaining a designation as an electoral democracy are less strict than for
achieving designation as a free country, it is not surprising that there is more
movement in and out of the electoral democracy category.
What do you make of recent articles from BBC and al-Jazeera (among
others) calling attention to corruption in the EU? Some have argued
that the quality of electoral democracies in the United States and in
Europe have been deteriorating. What are your thoughts on this as-
sertion? How closely do you think popular indices like those from
Freedom House mirror the reality on the ground?
I’m not overly exercised about the level of corruption in the EU. Every so-
ciety based on money transactions suffers from corruption to one degree
or another. The key here is whether corruption is pervasive, officially toler-
ated, and engaged in by the political leadership. The most damning report
on European corruption was commissioned by the EU itself, and most EU
countries have media which investigate corruption charges and an indepen-
38 SAIS Review Winter–Spring 2014
dent judiciary which prosecutes corrupt officials. Europe should be con-
cerned when officials are intervening to prevent the press from uncovering
corrupt acts or prosecutors from bringing charges against officials accused
of graft. All too often accusations of widespread corruption in democracies
are advanced by people of bad faith from countries—Russia and Belarus, for
example—where corruption is a way of life.
As for the United States, there clearly are growing problems with its
political system. Gerrymandering has gotten worse and the new movement
for voter identification has been implemented in ways that suggest efforts
to weaken Democratic candidates. At the same time, the American system
retains a unique dynamism. It remains open to the emergence of new faces
(Barack Obama) and new forces (the Tea Party). Despite its multinational
character, the United States has managed to avoid the emergence of influ-
ential parties or movements that preach racism or xenophobia.
We place considerable effort on capturing these nuances in Freedom in
the World and other reports. Freedom in the World is not a report on governance
per se; we endeavor to reflect the level of freedom an individual experiences
on the ground, and zero in on the threats to freedom whether they come
from the state, terrorists, extremist movements, or other sources. We have
developed a methodology that looks at the broad set of institutions and val-
ues that make up human freedom while providing a flexibility that enables
us to highlight the qualitative differences between one society and another.
How does Freedom House collect the quantitative and qualitative in-
formation used in its reports? How do you extract significant insights
from this information?
We see our principal role as providing analysis, including scores and judg-
ments about democratic performance, to the policymaking community, the
media, and scholars. Our analysts make use of the vast sources of informa-
tion that are available these days, including government reports, the find-
ings of think tanks and NGOs, reports of multilateral institutions, press
accounts, interviews with officials and critics alike, and the many other
sources that have emerged in the data explosion era.
Freedom House is a source for analysis, not data. We see our role as
providing assessments on the state of freedom, identifying the principal
threats to freedom, and showcasing global and regional trends. Using data
from our country analysis, we are able to identify the global and regional
trajectory of freedom, broadly defined, as well as specific elements of free-
dom, such as freedom of expression and press, elections, corruption and
transparency, civil society, and rule of law. We can, in other words, illuminate
which institutions of democracy are most vulnerable to pressure from au-
thoritarian rulers, and which institutions have proved most durable. There
are other organizations that see their mission as providing data on elections,
corruption, assaults on journalists, economic freedom, and so forth. Free-
dom House, by contrast, works to inform the public about the gains and
setbacks in democratic government, civil liberties, and personal freedom.
39A Conversation with Arch Puddington
What do you make of the recent trend in which governments freely
release open data? What are the policy implications of this trend?
Clearly, enhanced transparency is preferable to less openness. My concern is
that some governments will be tempted to fudge or falsify data or decide to
stop publishing informa-
tion when the results are
embarrassing. For some
time now, Argentina has
been publishing inflation
figures that most experts
regard as bogus. After po-
litical attention was drawn
to spiraling crime rates, the
Venezuelan government
stopped publishing statis-
tics on violent crime. These
examples suggest that in the future, as in the past, the data world will be
divided between democracies that almost always publish honest statistics
and other countries whose data may or may not reflect reality.
For democracies, political leaderships will face a new challenge in
explaining the unwelcome news that will inevitably emerge from published
data. More data will mean a more informed citizenry, especially at the elite
level. But it will also mean more pressure on governments to communicate,
often in response to the arguments of demagogues, why unemployment
rates, inequality, traffic accidents, or test scores for children are moving in
the wrong direction.
Authoritarian regimes will have it easier. Their leaders will either quash
uncomfortable facts or distort them. Here it will be essential that interna-
tional financial institutions, transparency think tanks, and the global busi-
ness community weigh in by demanding honest accounting. It is instructive
that Argentina agreed to adjust its inflation figures after pressure from the
IMF.
The field of international affairs
has become more focused on
collecting and analyzing large
quantities of data. What would
you recommend to international
affairs students as the field be-
comes more data-driven?
I would urge students to remember that data and facts can be manipulated
and misused. Serious assessment of a society’s political well-being requires
facts, but it also demands honest interpretation. An overemphasis on data
can distort an analyst’s efforts to understand the true quality of freedom
as thoroughly as can outright bias.
. . . in the future, as in the past, the
data world will be divided between
democracies that almost always
publish honest statistics and other
countries whose data may or may
not reflect reality.
An overemphasis on data can
distort an analyst’s efforts to
understand the true quality of
freedom as thoroughly as can
outright bias.
41Corruption, Transparency, and Apathy in the Western WorldSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014)
Of Note
A Deterioration of Democracy?
Corruption, Transparency, and Apathy in
the Western World
Rachel Ostrow
Arch Puddington, in his interview with the SAIS Review of International
Affairs, expresses a firm belief in the power of interpretation. “An over-
emphasis on data,” he says, “can distort an analyst’s efforts to understand
the true quality of freedom as thoroughly as can outright bias.” Freedom
House’s annual reports on freedom have made data on democracy accessible
for millions of people in the diplomatic, academic, and wider communities.
These analyses have criticized governments throughout the Middle East,
Africa, and Asia for anti-democratic and autocratic methods. However, Free-
dom House’s important work researching authoritarianism and democracy
throughout the world should start to focus once again on its birthplace—the
Western world.
Freedom House, based in the United States (and largely funded by
government agencies such as the State Department and the U.S. Agency
for International Development), could be—and has been—accused of hav-
ing Western biases. A quick look at Freedom in the World 2014 shows that the
United States and Canada, as well as the majority of European nations, are
classified as “free” right up to the Ukrainian border.1
However, several coun-
tries in Europe—as Freedom House rightly notes—have suffered democratic
backsliding. France, Switzerland, and Hungary have all passed laws or gone
through social movements seeking to limit the rights of migrants and ethnic
minorities. These occurrences, though noted in Freedom House’s analysis,
do not seriously affect the calculations within.
Hungary itself is an excellent example of where this analysis has
masked the more sinister undertones within an open democracy. Hungary’s
recent re-election of Viktor Orban—in an election widely seen as free and
fair—can be seen as a backwards turn for Hungarian democracy. As a member
of the right-wing, nationalist Fidesz party, Orban will likely have to make
concessions to the far-right, anti-Semitic, and anti-Roma Jobbik party, which
41© 2014 by The Johns Hopkins University Press
Rachel Ostrow is a second-year M.A. candidate at the Johns Hopkins University Paul H.
Nitze School of Advanced International Studies (SAIS) concentrating in Russian and
Eurasian Studies. She is Web Editor of The SAIS Review.
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1
Sais.34.1

Más contenido relacionado

La actualidad más candente

UN Global Pulse: Big Data for a Better World (Strata Conf NYC)
UN Global Pulse: Big Data for a Better World (Strata Conf NYC)UN Global Pulse: Big Data for a Better World (Strata Conf NYC)
UN Global Pulse: Big Data for a Better World (Strata Conf NYC)UN Global Pulse
 
The Future of Big Data
The Future of Big Data The Future of Big Data
The Future of Big Data EMC
 
Unleashing government’s ‘innovation mojo’ an interview with the us chief tec...
Unleashing government’s ‘innovation mojo’  an interview with the us chief tec...Unleashing government’s ‘innovation mojo’  an interview with the us chief tec...
Unleashing government’s ‘innovation mojo’ an interview with the us chief tec...Mondher Ben-Hamida
 
Online Misinformation: Challenges and Future Directions
Online Misinformation: Challenges and Future DirectionsOnline Misinformation: Challenges and Future Directions
Online Misinformation: Challenges and Future DirectionsMiriam Fernandez
 
Identifying Trends in Discrimination against women in the workplace In Social...
Identifying Trends in Discrimination against women in the workplace In Social...Identifying Trends in Discrimination against women in the workplace In Social...
Identifying Trends in Discrimination against women in the workplace In Social...UN Global Pulse
 
Using Twitter to Understand the Post-2015 Global Conversation - Project Overview
Using Twitter to Understand the Post-2015 Global Conversation - Project OverviewUsing Twitter to Understand the Post-2015 Global Conversation - Project Overview
Using Twitter to Understand the Post-2015 Global Conversation - Project OverviewUN Global Pulse
 
(Lim Jun Hao) G8 Individual Essay for BGS
(Lim Jun Hao) G8 Individual Essay for BGS(Lim Jun Hao) G8 Individual Essay for BGS
(Lim Jun Hao) G8 Individual Essay for BGSJun Hao Lim
 
Amplification and Personalization: The impact of metrics, analytics, and algo...
Amplification and Personalization: The impact of metrics, analytics, and algo...Amplification and Personalization: The impact of metrics, analytics, and algo...
Amplification and Personalization: The impact of metrics, analytics, and algo...Nicole Blanchett
 
‘Like a Virus’: Disinformation in the Age of COVID-19
‘Like a Virus’: Disinformation in the Age of COVID-19‘Like a Virus’: Disinformation in the Age of COVID-19
‘Like a Virus’: Disinformation in the Age of COVID-19Axel Bruns
 
The first-ever-infodemilogical-study-1993
The first-ever-infodemilogical-study-1993The first-ever-infodemilogical-study-1993
The first-ever-infodemilogical-study-1993Ahmed-Refat Refat
 
Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...
Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...
Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...Liliana Bounegru
 
Understanding Public Perceptions of Immunisation Using Social Media - Project...
Understanding Public Perceptions of Immunisation Using Social Media - Project...Understanding Public Perceptions of Immunisation Using Social Media - Project...
Understanding Public Perceptions of Immunisation Using Social Media - Project...UN Global Pulse
 
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...g8briel
 
What Data Can Do: A Typology of Mechanisms . Angèle Christin
What Data Can Do: A Typology of Mechanisms . Angèle Christin What Data Can Do: A Typology of Mechanisms . Angèle Christin
What Data Can Do: A Typology of Mechanisms . Angèle Christin eraser Juan José Calderón
 
NCSU invited talk: Leveraging Social Media for Tourism Marketplace Coordination
NCSU invited talk: Leveraging Social Media for Tourism Marketplace CoordinationNCSU invited talk: Leveraging Social Media for Tourism Marketplace Coordination
NCSU invited talk: Leveraging Social Media for Tourism Marketplace CoordinationArtificial Intelligence Institute at UofSC
 
Future of Privacy - The Emerging View 11 06 15
Future of Privacy - The Emerging View 11 06 15 Future of Privacy - The Emerging View 11 06 15
Future of Privacy - The Emerging View 11 06 15 Future Agenda
 
Using Twitter Data to Analyse Public Sentiment on Fuel Subsidy Policy Reform ...
Using Twitter Data to Analyse Public Sentiment on Fuel Subsidy Policy Reform ...Using Twitter Data to Analyse Public Sentiment on Fuel Subsidy Policy Reform ...
Using Twitter Data to Analyse Public Sentiment on Fuel Subsidy Policy Reform ...UN Global Pulse
 
KM Russia 2014 - John Girard
KM Russia 2014 - John GirardKM Russia 2014 - John Girard
KM Russia 2014 - John GirardJohn Girard
 
Big Data: Friend, Phantom or Foe?
Big Data: Friend, Phantom or Foe?Big Data: Friend, Phantom or Foe?
Big Data: Friend, Phantom or Foe?John Girard
 
Open Government Data & Privacy Protection
Open Government Data & Privacy ProtectionOpen Government Data & Privacy Protection
Open Government Data & Privacy ProtectionSylvia Ogweng
 

La actualidad más candente (20)

UN Global Pulse: Big Data for a Better World (Strata Conf NYC)
UN Global Pulse: Big Data for a Better World (Strata Conf NYC)UN Global Pulse: Big Data for a Better World (Strata Conf NYC)
UN Global Pulse: Big Data for a Better World (Strata Conf NYC)
 
The Future of Big Data
The Future of Big Data The Future of Big Data
The Future of Big Data
 
Unleashing government’s ‘innovation mojo’ an interview with the us chief tec...
Unleashing government’s ‘innovation mojo’  an interview with the us chief tec...Unleashing government’s ‘innovation mojo’  an interview with the us chief tec...
Unleashing government’s ‘innovation mojo’ an interview with the us chief tec...
 
Online Misinformation: Challenges and Future Directions
Online Misinformation: Challenges and Future DirectionsOnline Misinformation: Challenges and Future Directions
Online Misinformation: Challenges and Future Directions
 
Identifying Trends in Discrimination against women in the workplace In Social...
Identifying Trends in Discrimination against women in the workplace In Social...Identifying Trends in Discrimination against women in the workplace In Social...
Identifying Trends in Discrimination against women in the workplace In Social...
 
Using Twitter to Understand the Post-2015 Global Conversation - Project Overview
Using Twitter to Understand the Post-2015 Global Conversation - Project OverviewUsing Twitter to Understand the Post-2015 Global Conversation - Project Overview
Using Twitter to Understand the Post-2015 Global Conversation - Project Overview
 
(Lim Jun Hao) G8 Individual Essay for BGS
(Lim Jun Hao) G8 Individual Essay for BGS(Lim Jun Hao) G8 Individual Essay for BGS
(Lim Jun Hao) G8 Individual Essay for BGS
 
Amplification and Personalization: The impact of metrics, analytics, and algo...
Amplification and Personalization: The impact of metrics, analytics, and algo...Amplification and Personalization: The impact of metrics, analytics, and algo...
Amplification and Personalization: The impact of metrics, analytics, and algo...
 
‘Like a Virus’: Disinformation in the Age of COVID-19
‘Like a Virus’: Disinformation in the Age of COVID-19‘Like a Virus’: Disinformation in the Age of COVID-19
‘Like a Virus’: Disinformation in the Age of COVID-19
 
The first-ever-infodemilogical-study-1993
The first-ever-infodemilogical-study-1993The first-ever-infodemilogical-study-1993
The first-ever-infodemilogical-study-1993
 
Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...
Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...
Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...
 
Understanding Public Perceptions of Immunisation Using Social Media - Project...
Understanding Public Perceptions of Immunisation Using Social Media - Project...Understanding Public Perceptions of Immunisation Using Social Media - Project...
Understanding Public Perceptions of Immunisation Using Social Media - Project...
 
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
Information Literacy, Privacy, & Risk: What Are the Implications of Mass Surv...
 
What Data Can Do: A Typology of Mechanisms . Angèle Christin
What Data Can Do: A Typology of Mechanisms . Angèle Christin What Data Can Do: A Typology of Mechanisms . Angèle Christin
What Data Can Do: A Typology of Mechanisms . Angèle Christin
 
NCSU invited talk: Leveraging Social Media for Tourism Marketplace Coordination
NCSU invited talk: Leveraging Social Media for Tourism Marketplace CoordinationNCSU invited talk: Leveraging Social Media for Tourism Marketplace Coordination
NCSU invited talk: Leveraging Social Media for Tourism Marketplace Coordination
 
Future of Privacy - The Emerging View 11 06 15
Future of Privacy - The Emerging View 11 06 15 Future of Privacy - The Emerging View 11 06 15
Future of Privacy - The Emerging View 11 06 15
 
Using Twitter Data to Analyse Public Sentiment on Fuel Subsidy Policy Reform ...
Using Twitter Data to Analyse Public Sentiment on Fuel Subsidy Policy Reform ...Using Twitter Data to Analyse Public Sentiment on Fuel Subsidy Policy Reform ...
Using Twitter Data to Analyse Public Sentiment on Fuel Subsidy Policy Reform ...
 
KM Russia 2014 - John Girard
KM Russia 2014 - John GirardKM Russia 2014 - John Girard
KM Russia 2014 - John Girard
 
Big Data: Friend, Phantom or Foe?
Big Data: Friend, Phantom or Foe?Big Data: Friend, Phantom or Foe?
Big Data: Friend, Phantom or Foe?
 
Open Government Data & Privacy Protection
Open Government Data & Privacy ProtectionOpen Government Data & Privacy Protection
Open Government Data & Privacy Protection
 

Destacado

Cyber Training: Developing the Next Generation of Cyber Analysts
Cyber Training: Developing the Next Generation of Cyber AnalystsCyber Training: Developing the Next Generation of Cyber Analysts
Cyber Training: Developing the Next Generation of Cyber AnalystsBooz Allen Hamilton
 
Miles To Go Before They Are Green
Miles To Go Before They Are GreenMiles To Go Before They Are Green
Miles To Go Before They Are GreenBooz Allen Hamilton
 
Using Advanced Analytics for Data-Driven Decision Making
Using Advanced Analytics for Data-Driven Decision MakingUsing Advanced Analytics for Data-Driven Decision Making
Using Advanced Analytics for Data-Driven Decision MakingBooz Allen Hamilton
 
Re-Imagined Infrastructure System: US 2040 Economy
Re-Imagined Infrastructure System: US 2040 EconomyRe-Imagined Infrastructure System: US 2040 Economy
Re-Imagined Infrastructure System: US 2040 EconomyBooz Allen Hamilton
 
Improving Intelligence Analysis Through Cloud Analytics
Improving Intelligence Analysis Through  Cloud AnalyticsImproving Intelligence Analysis Through  Cloud Analytics
Improving Intelligence Analysis Through Cloud AnalyticsBooz Allen Hamilton
 
Mitigating Our Nation’s Risks – Calling Upon the Whole Community
Mitigating Our Nation’s Risks – Calling Upon the Whole CommunityMitigating Our Nation’s Risks – Calling Upon the Whole Community
Mitigating Our Nation’s Risks – Calling Upon the Whole CommunityBooz Allen Hamilton
 
Mission Engineering Solution Infographic
Mission Engineering Solution InfographicMission Engineering Solution Infographic
Mission Engineering Solution InfographicBooz Allen Hamilton
 
Acquiring the Right Talent for the Cyber Age: The Need for a Candidate Develo...
Acquiring the Right Talent for the Cyber Age: The Need for a Candidate Develo...Acquiring the Right Talent for the Cyber Age: The Need for a Candidate Develo...
Acquiring the Right Talent for the Cyber Age: The Need for a Candidate Develo...Booz Allen Hamilton
 
Strategic Information Management Through Data Classification
Strategic Information Management Through Data ClassificationStrategic Information Management Through Data Classification
Strategic Information Management Through Data ClassificationBooz Allen Hamilton
 
The Next Gen Program Analysis Infographic
The Next Gen Program Analysis InfographicThe Next Gen Program Analysis Infographic
The Next Gen Program Analysis InfographicBooz Allen Hamilton
 
Predicting Mission Success through Improved Data Collection, Reuse and Analysis
Predicting Mission Success through Improved Data Collection, Reuse and AnalysisPredicting Mission Success through Improved Data Collection, Reuse and Analysis
Predicting Mission Success through Improved Data Collection, Reuse and AnalysisBooz Allen Hamilton
 
What's Ahead for EHRs: Experts Weigh In
What's Ahead for EHRs: Experts Weigh InWhat's Ahead for EHRs: Experts Weigh In
What's Ahead for EHRs: Experts Weigh InBooz Allen Hamilton
 
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...Booz Allen Hamilton
 
Supply Chain Data Standards in Healthcare
Supply Chain Data Standards in HealthcareSupply Chain Data Standards in Healthcare
Supply Chain Data Standards in HealthcareBooz Allen Hamilton
 

Destacado (20)

Cyber Training: Developing the Next Generation of Cyber Analysts
Cyber Training: Developing the Next Generation of Cyber AnalystsCyber Training: Developing the Next Generation of Cyber Analysts
Cyber Training: Developing the Next Generation of Cyber Analysts
 
Miles To Go Before They Are Green
Miles To Go Before They Are GreenMiles To Go Before They Are Green
Miles To Go Before They Are Green
 
Using Advanced Analytics for Data-Driven Decision Making
Using Advanced Analytics for Data-Driven Decision MakingUsing Advanced Analytics for Data-Driven Decision Making
Using Advanced Analytics for Data-Driven Decision Making
 
Re-Imagined Infrastructure System: US 2040 Economy
Re-Imagined Infrastructure System: US 2040 EconomyRe-Imagined Infrastructure System: US 2040 Economy
Re-Imagined Infrastructure System: US 2040 Economy
 
Polaris Product Fact Sheet
Polaris Product Fact SheetPolaris Product Fact Sheet
Polaris Product Fact Sheet
 
The Biggest Bang Theory
The Biggest Bang TheoryThe Biggest Bang Theory
The Biggest Bang Theory
 
The Business of Change
The Business of ChangeThe Business of Change
The Business of Change
 
Improving Intelligence Analysis Through Cloud Analytics
Improving Intelligence Analysis Through  Cloud AnalyticsImproving Intelligence Analysis Through  Cloud Analytics
Improving Intelligence Analysis Through Cloud Analytics
 
Mitigating Our Nation’s Risks – Calling Upon the Whole Community
Mitigating Our Nation’s Risks – Calling Upon the Whole CommunityMitigating Our Nation’s Risks – Calling Upon the Whole Community
Mitigating Our Nation’s Risks – Calling Upon the Whole Community
 
Mission Engineering Solution Infographic
Mission Engineering Solution InfographicMission Engineering Solution Infographic
Mission Engineering Solution Infographic
 
Dynamic Defense
Dynamic DefenseDynamic Defense
Dynamic Defense
 
The Vigilant Enterprise
The Vigilant EnterpriseThe Vigilant Enterprise
The Vigilant Enterprise
 
Acquiring the Right Talent for the Cyber Age: The Need for a Candidate Develo...
Acquiring the Right Talent for the Cyber Age: The Need for a Candidate Develo...Acquiring the Right Talent for the Cyber Age: The Need for a Candidate Develo...
Acquiring the Right Talent for the Cyber Age: The Need for a Candidate Develo...
 
Strategic Information Management Through Data Classification
Strategic Information Management Through Data ClassificationStrategic Information Management Through Data Classification
Strategic Information Management Through Data Classification
 
The Next Gen Program Analysis Infographic
The Next Gen Program Analysis InfographicThe Next Gen Program Analysis Infographic
The Next Gen Program Analysis Infographic
 
Predicting Mission Success through Improved Data Collection, Reuse and Analysis
Predicting Mission Success through Improved Data Collection, Reuse and AnalysisPredicting Mission Success through Improved Data Collection, Reuse and Analysis
Predicting Mission Success through Improved Data Collection, Reuse and Analysis
 
What's Ahead for EHRs: Experts Weigh In
What's Ahead for EHRs: Experts Weigh InWhat's Ahead for EHRs: Experts Weigh In
What's Ahead for EHRs: Experts Weigh In
 
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...
 
Supply Chain Data Standards in Healthcare
Supply Chain Data Standards in HealthcareSupply Chain Data Standards in Healthcare
Supply Chain Data Standards in Healthcare
 
IP Theft
IP TheftIP Theft
IP Theft
 

Similar a Sais.34.1

The future of real time information
The future of real time informationThe future of real time information
The future of real time informationthaiscarbonell1512
 
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...ALexandruDaia1
 
Experimenting with Big Data and AI to Support Peace and Security
Experimenting with Big Data and AI to Support Peace and SecurityExperimenting with Big Data and AI to Support Peace and Security
Experimenting with Big Data and AI to Support Peace and SecurityUN Global Pulse
 
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Katie Whipkey
 
Transforming Social Big Data into Timely Decisions and Actions for Crisis Mi...
Transforming Social Big Data into Timely Decisions  and Actions for Crisis Mi...Transforming Social Big Data into Timely Decisions  and Actions for Crisis Mi...
Transforming Social Big Data into Timely Decisions and Actions for Crisis Mi...Amit Sheth
 
"Big Data for Development: Opportunities and Challenges"
"Big Data for Development: Opportunities and Challenges" "Big Data for Development: Opportunities and Challenges"
"Big Data for Development: Opportunities and Challenges" UN Global Pulse
 
Future of the Internet Predictions March 2014 PIP Report
Future of the Internet Predictions March 2014 PIP ReportFuture of the Internet Predictions March 2014 PIP Report
Future of the Internet Predictions March 2014 PIP ReportVasily Ryzhonkov
 
Human Rights Council Study Guide
Human Rights Council Study GuideHuman Rights Council Study Guide
Human Rights Council Study Guidedudasings
 
Baban Hasnat is a professor of international business and ec.docx
Baban Hasnat is a professor of international business and ec.docxBaban Hasnat is a professor of international business and ec.docx
Baban Hasnat is a professor of international business and ec.docxwilcockiris
 
Targeted disinformation warfare how and why foreign efforts are
Targeted disinformation warfare  how and why foreign efforts areTargeted disinformation warfare  how and why foreign efforts are
Targeted disinformation warfare how and why foreign efforts arearchiejones4
 
Gender Equality and Big Data. Making Gender Data Visible
Gender Equality and Big Data. Making Gender Data Visible Gender Equality and Big Data. Making Gender Data Visible
Gender Equality and Big Data. Making Gender Data Visible UN Global Pulse
 
US Intelligence Council : Global trends 2040
US Intelligence Council : Global trends 2040US Intelligence Council : Global trends 2040
US Intelligence Council : Global trends 2040Energy for One World
 
Global Trends 2040
Global Trends 2040Global Trends 2040
Global Trends 2040ICJ-ICC
 
Big Data Analysis and Terrorism
Big Data Analysis and TerrorismBig Data Analysis and Terrorism
Big Data Analysis and TerrorismAmanda Tapp
 
CybersecurityTFReport2016 PRINT
CybersecurityTFReport2016 PRINTCybersecurityTFReport2016 PRINT
CybersecurityTFReport2016 PRINTAimee Shuck
 
Wedf brochure (september)
Wedf brochure (september)Wedf brochure (september)
Wedf brochure (september)Morne Olivier
 
MASINT and Global War on Terror
MASINT and Global War on TerrorMASINT and Global War on Terror
MASINT and Global War on TerrorTpeisi Nesby
 

Similar a Sais.34.1 (20)

The future of real time information
The future of real time informationThe future of real time information
The future of real time information
 
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...
 
Experimenting with Big Data and AI to Support Peace and Security
Experimenting with Big Data and AI to Support Peace and SecurityExperimenting with Big Data and AI to Support Peace and Security
Experimenting with Big Data and AI to Support Peace and Security
 
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
Guidance for Incorporating Big Data into Humanitarian Operations - 2015 - web...
 
Transforming Social Big Data into Timely Decisions and Actions for Crisis Mi...
Transforming Social Big Data into Timely Decisions  and Actions for Crisis Mi...Transforming Social Big Data into Timely Decisions  and Actions for Crisis Mi...
Transforming Social Big Data into Timely Decisions and Actions for Crisis Mi...
 
"Big Data for Development: Opportunities and Challenges"
"Big Data for Development: Opportunities and Challenges" "Big Data for Development: Opportunities and Challenges"
"Big Data for Development: Opportunities and Challenges"
 
Future of the Internet Predictions March 2014 PIP Report
Future of the Internet Predictions March 2014 PIP ReportFuture of the Internet Predictions March 2014 PIP Report
Future of the Internet Predictions March 2014 PIP Report
 
DIGITAL LIFE IN 2025
DIGITAL LIFE IN 2025DIGITAL LIFE IN 2025
DIGITAL LIFE IN 2025
 
Digital Life in 2025
Digital Life in 2025Digital Life in 2025
Digital Life in 2025
 
Human Rights Council Study Guide
Human Rights Council Study GuideHuman Rights Council Study Guide
Human Rights Council Study Guide
 
Baban Hasnat is a professor of international business and ec.docx
Baban Hasnat is a professor of international business and ec.docxBaban Hasnat is a professor of international business and ec.docx
Baban Hasnat is a professor of international business and ec.docx
 
Targeted disinformation warfare how and why foreign efforts are
Targeted disinformation warfare  how and why foreign efforts areTargeted disinformation warfare  how and why foreign efforts are
Targeted disinformation warfare how and why foreign efforts are
 
Gender Equality and Big Data. Making Gender Data Visible
Gender Equality and Big Data. Making Gender Data Visible Gender Equality and Big Data. Making Gender Data Visible
Gender Equality and Big Data. Making Gender Data Visible
 
US Intelligence Council : Global trends 2040
US Intelligence Council : Global trends 2040US Intelligence Council : Global trends 2040
US Intelligence Council : Global trends 2040
 
Global Trends 2040
Global Trends 2040Global Trends 2040
Global Trends 2040
 
Big Data Analysis and Terrorism
Big Data Analysis and TerrorismBig Data Analysis and Terrorism
Big Data Analysis and Terrorism
 
CybersecurityTFReport2016 PRINT
CybersecurityTFReport2016 PRINTCybersecurityTFReport2016 PRINT
CybersecurityTFReport2016 PRINT
 
Wedf brochure (september)
Wedf brochure (september)Wedf brochure (september)
Wedf brochure (september)
 
MASINT and Global War on Terror
MASINT and Global War on TerrorMASINT and Global War on Terror
MASINT and Global War on Terror
 
Big Data Paper
Big Data PaperBig Data Paper
Big Data Paper
 

Más de Booz Allen Hamilton

You Can Hack That: How to Use Hackathons to Solve Your Toughest Challenges
You Can Hack That: How to Use Hackathons to Solve Your Toughest ChallengesYou Can Hack That: How to Use Hackathons to Solve Your Toughest Challenges
You Can Hack That: How to Use Hackathons to Solve Your Toughest ChallengesBooz Allen Hamilton
 
Examining Flexibility in the Workplace for Working Moms
Examining Flexibility in the Workplace for Working MomsExamining Flexibility in the Workplace for Working Moms
Examining Flexibility in the Workplace for Working MomsBooz Allen Hamilton
 
Booz Allen's 10 Cyber Priorities for Boards of Directors
Booz Allen's 10 Cyber Priorities for Boards of DirectorsBooz Allen's 10 Cyber Priorities for Boards of Directors
Booz Allen's 10 Cyber Priorities for Boards of DirectorsBooz Allen Hamilton
 
Homeland Threats: Today and Tomorrow
Homeland Threats: Today and TomorrowHomeland Threats: Today and Tomorrow
Homeland Threats: Today and TomorrowBooz Allen Hamilton
 
Preparing for New Healthcare Payment Models
Preparing for New Healthcare Payment ModelsPreparing for New Healthcare Payment Models
Preparing for New Healthcare Payment ModelsBooz Allen Hamilton
 
The Product Owner’s Universe: Agile Coaching
The Product Owner’s Universe: Agile CoachingThe Product Owner’s Universe: Agile Coaching
The Product Owner’s Universe: Agile CoachingBooz Allen Hamilton
 
Immersive Learning: The Future of Training is Here
Immersive Learning: The Future of Training is HereImmersive Learning: The Future of Training is Here
Immersive Learning: The Future of Training is HereBooz Allen Hamilton
 
Nuclear Promise: Reducing Cost While Improving Performance
Nuclear Promise: Reducing Cost While Improving PerformanceNuclear Promise: Reducing Cost While Improving Performance
Nuclear Promise: Reducing Cost While Improving PerformanceBooz Allen Hamilton
 
Frenemies – When Unlikely Partners Join Forces
Frenemies – When Unlikely Partners Join ForcesFrenemies – When Unlikely Partners Join Forces
Frenemies – When Unlikely Partners Join ForcesBooz Allen Hamilton
 
Booz Allen Secure Agile Development
Booz Allen Secure Agile DevelopmentBooz Allen Secure Agile Development
Booz Allen Secure Agile DevelopmentBooz Allen Hamilton
 
Booz Allen Industrial Cybersecurity Threat Briefing
Booz Allen Industrial Cybersecurity Threat BriefingBooz Allen Industrial Cybersecurity Threat Briefing
Booz Allen Industrial Cybersecurity Threat BriefingBooz Allen Hamilton
 
Booz Allen Hamilton and Market Connections: C4ISR Survey Report
Booz Allen Hamilton and Market Connections: C4ISR Survey ReportBooz Allen Hamilton and Market Connections: C4ISR Survey Report
Booz Allen Hamilton and Market Connections: C4ISR Survey ReportBooz Allen Hamilton
 
Modern C4ISR Integrates, Innovates and Secures Military Networks
Modern C4ISR Integrates, Innovates and Secures Military NetworksModern C4ISR Integrates, Innovates and Secures Military Networks
Modern C4ISR Integrates, Innovates and Secures Military NetworksBooz Allen Hamilton
 
Agile and Open C4ISR Systems - Helping the Military Integrate, Innovate and S...
Agile and Open C4ISR Systems - Helping the Military Integrate, Innovate and S...Agile and Open C4ISR Systems - Helping the Military Integrate, Innovate and S...
Agile and Open C4ISR Systems - Helping the Military Integrate, Innovate and S...Booz Allen Hamilton
 
Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Hamilton
 

Más de Booz Allen Hamilton (20)

You Can Hack That: How to Use Hackathons to Solve Your Toughest Challenges
You Can Hack That: How to Use Hackathons to Solve Your Toughest ChallengesYou Can Hack That: How to Use Hackathons to Solve Your Toughest Challenges
You Can Hack That: How to Use Hackathons to Solve Your Toughest Challenges
 
Examining Flexibility in the Workplace for Working Moms
Examining Flexibility in the Workplace for Working MomsExamining Flexibility in the Workplace for Working Moms
Examining Flexibility in the Workplace for Working Moms
 
The True Cost of Childcare
The True Cost of ChildcareThe True Cost of Childcare
The True Cost of Childcare
 
Booz Allen's 10 Cyber Priorities for Boards of Directors
Booz Allen's 10 Cyber Priorities for Boards of DirectorsBooz Allen's 10 Cyber Priorities for Boards of Directors
Booz Allen's 10 Cyber Priorities for Boards of Directors
 
Inaugural Addresses
Inaugural AddressesInaugural Addresses
Inaugural Addresses
 
Military Spouse Career Roadmap
Military Spouse Career Roadmap Military Spouse Career Roadmap
Military Spouse Career Roadmap
 
Homeland Threats: Today and Tomorrow
Homeland Threats: Today and TomorrowHomeland Threats: Today and Tomorrow
Homeland Threats: Today and Tomorrow
 
Preparing for New Healthcare Payment Models
Preparing for New Healthcare Payment ModelsPreparing for New Healthcare Payment Models
Preparing for New Healthcare Payment Models
 
The Product Owner’s Universe: Agile Coaching
The Product Owner’s Universe: Agile CoachingThe Product Owner’s Universe: Agile Coaching
The Product Owner’s Universe: Agile Coaching
 
Immersive Learning: The Future of Training is Here
Immersive Learning: The Future of Training is HereImmersive Learning: The Future of Training is Here
Immersive Learning: The Future of Training is Here
 
Nuclear Promise: Reducing Cost While Improving Performance
Nuclear Promise: Reducing Cost While Improving PerformanceNuclear Promise: Reducing Cost While Improving Performance
Nuclear Promise: Reducing Cost While Improving Performance
 
Frenemies – When Unlikely Partners Join Forces
Frenemies – When Unlikely Partners Join ForcesFrenemies – When Unlikely Partners Join Forces
Frenemies – When Unlikely Partners Join Forces
 
Booz Allen Secure Agile Development
Booz Allen Secure Agile DevelopmentBooz Allen Secure Agile Development
Booz Allen Secure Agile Development
 
Booz Allen Industrial Cybersecurity Threat Briefing
Booz Allen Industrial Cybersecurity Threat BriefingBooz Allen Industrial Cybersecurity Threat Briefing
Booz Allen Industrial Cybersecurity Threat Briefing
 
Booz Allen Hamilton and Market Connections: C4ISR Survey Report
Booz Allen Hamilton and Market Connections: C4ISR Survey ReportBooz Allen Hamilton and Market Connections: C4ISR Survey Report
Booz Allen Hamilton and Market Connections: C4ISR Survey Report
 
CITRIX IN AMAZON WEB SERVICES
CITRIX IN AMAZON WEB SERVICESCITRIX IN AMAZON WEB SERVICES
CITRIX IN AMAZON WEB SERVICES
 
Modern C4ISR Integrates, Innovates and Secures Military Networks
Modern C4ISR Integrates, Innovates and Secures Military NetworksModern C4ISR Integrates, Innovates and Secures Military Networks
Modern C4ISR Integrates, Innovates and Secures Military Networks
 
Agile and Open C4ISR Systems - Helping the Military Integrate, Innovate and S...
Agile and Open C4ISR Systems - Helping the Military Integrate, Innovate and S...Agile and Open C4ISR Systems - Helping the Military Integrate, Innovate and S...
Agile and Open C4ISR Systems - Helping the Military Integrate, Innovate and S...
 
Women On The Leading Edge
Women On The Leading Edge Women On The Leading Edge
Women On The Leading Edge
 
Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science
 

Sais.34.1

  • 1. This issue is provided by the Johns Hopkins University Press Journals Division and powered by Project MUSE®
  • 2. Terms and Conditions of Use Thank you for purchasing this Electronic J-Issue from the Journals Division of the Johns Hopkins University Press. We ask that you respect the rights of the copyright holder by adhering to the following usage guidelines: This issue is for your personal, noncommercial use only. Individual articles from this J- Issue may be printed and stored on you personal computer. You may not redistribute, resell, or license any part of the issue. You may not post any part of the issue on any web site without the written permission of the copyright holder. You may not alter or transform the content in any manner that would violate the rights of the copyright holder. Sharing of personal account information, logins, and passwords is not permitted.
  • 3. 1ForewordSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014) 1© 2014 by The Johns Hopkins University Press Foreword Following the exposure of the U.S. National Security Administration’s (NSA) controversial surveillance program, there has been heated debate surrounding the collection and storage of personal data. Our latest issue of The SAIS Review of International Affairs, “Policy by Numbers: How Big Data is Transforming Security, Governance, and Development,” seeks to move beyond the sensationalism that has accompanied the NSA revelations. We hope to provide readers a more nuanced perspective on the role of data in international affairs, with a diverse collection of interviews, essays, and opinion editorials from scholars, technologists, and policymakers. We explore the rise of big data, in which governments and profit- seeking organizations make policies and predictions based upon correla- tions among massive quantities of data. We examine the trend toward open data, in which governments provide valuable datasets directly to the public. We assess the impact of data—positive and negative, international and do- mestic—on public policy, national security, international development, and individual well-being. While the rise of big and open data is associated with promising ap- plications, there are still vast uncertainties regarding how best to exploit this technology. We hope that readers from the academic and public policy communities will feel empowered to enhance their understanding of techni- cal tools and data analysis, in an age where technological innovation often outpaces government policy. We begin with a conversation with Robert Kirkpatrick, Director of the United Nations Global Pulse Initiative. The UN Global Pulse Initiative collects and analyzes real-time data to better protect populations from socioeconomic shocks. Kirkpatrick explores the challenges associated with big data analytics, the surprising correlations among seemingly unrelated datasets, and the initiative’s effort to predict food price crises with data from social media. Human rights data often impacts policy decisions. The next three articles explore the opportunities and risks associated with collecting and analyzing this sensitive information. Megan Price and Patrick Ball use case studies of violent conflicts in Syria and Iraq to evaluate data-gathering methodologies in conflict scenarios. They warn that datasets from conflict scenarios are often subject to bias, and should not be used in isolation to draw conclusions. Monti Narayan Datta argues that the collection of quantitative data on modern day slavery has generated discussion in media and among policymakers on how to mitigate and eradicate slavery. Our interview with Arch Puddington, Vice President for Research at Freedom House, discusses worldwide trends in freedom, and the impact of Freedom House’s annual reports and indices.
  • 4. 2 SAIS Review Winter–Spring 2014 The rise of big and open data has a powerful impact on government policymaking. Pongkwan Sawasdipakdi frames data as an information weapon in the context of Thai domestic politics. She examines the govern- ment’s rice-pledging scheme, and argues that contrasting datasets from the government and opposition parties are used to gain political power and credibility. Ian Kalin describes the theory and practice of open data policy in the United States, and explains how government leaders can replicate successful open data initiatives. Joel Gurin argues that open government and open data can improve economic growth, transparency, and citizen engagement. He also notes the obstacles for implementing open data initia- tives in developing countries. How can policymakers craft policies and frameworks that best take advantage of big data? Given the fast pace of innovation and the slow pace of policy, Kord Davis discusses how to bridge the gap between policymak- ers and innovators. He identifies spaces where the public and private sector can collaborate to produce effective and balanced policy. Aniket Bhushan argues that the rise of big data and open data has created an opportunity for disruptive innovation in international affairs. He offers examples related to real-time macroeconomic analysis, humanitarian response, and poverty measurement. Data impacts national security and individual privacy, as well. Chris Poulin outlines the processes of data collection and analysis, using case studies from the Arab Spring, medical risk analysis, and his work at the Durkheim Project, a data analysis initiative that seeks to predict and prevent veteran suicides. David Rubin, Kim Lynch, Jason Escaravage, and Hillary Lerner explain how to balance the opposing forces of opportunity and risk, collective security and individual privacy, and innovation and protection when using data for national security programs. Finally, we look to China for lessons on data infrastructure. Eric Hagt traces the history of China’s satellite navigation system, Beidou, and com- pares its potential as a tool for development versus domestic and national security. Margaret Ross statistically analyzes the risk of potential disrup- tions to the global undersea cable communications network. We conclude with analyses of influential literature and scholarly re- search. Ilaria Mazzocco reviews Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schönberger and Kenneth Cukier. Bartholomew Thanhauser reviews Evgeny Morozov’s To Save Ev- erything, Click Here: The Folly of Technological Solutionism. We would like to thank our advisory board for their guidance in shap- ing our exploration of data, our excellent editorial staff for their dedication and persistence, and our authors for their thoughtful work on complex global challenges. Their combined contributions made the publication of “Policy by Numbers” possible. Meghan Kleinsteiber Lauren Caldwell Editor-in-Chief Senior Editor
  • 5. 3A Conversation with Robert KirkpatrickSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014) 3© 2014 by The Johns Hopkins University Press A Conversation with Robert Kirkpatrick, Director of United Nations Global Pulse You are director of United Nations Global Pulse, an initiative to leverage real-time data and analytics to monitor impacts of inter- national and local shocks. How did the idea of Global Pulse come about? What is your mission statement? The initial idea of Global Pulse came about in the aftermath of the global financial crisis. There was a recognition that we live in a hyper-connected world where information moves at the speed of light, and crises and vul- nerabilities can emerge quickly, but we’re still using two- to three-year- old statistics to make most policy decisions. It was clear that there were swathes of people being pushed below the poverty line almost overnight, and we needed to modernize our systems and capacities for absorbing real-time informa- tion for decision-making. As a result, United Nations Secretary-General Ban Ki-moon established Global Pulse in 2009 to act as an innovation lab and catalyst for the United Nations. We bring together global de- velopment experts, as well as experts from academia and the private sector, to explore how analysis of big data can reveal faster insights about human well-being and emerging vulnerabilities, in order to better protect populations from hunger, poverty, and disease. So Global Pulse’s mission is to accelerate the use of data science for sustainable development and humanitarian action, to address systemic bar- riers to adoption, and to cultivate a robust innovation ecosystem. Robert Kirkpatrick is the director of UN Global Pulse, an initiative of the Executive Office of the United Nations Secretary-General. The Global Pulse initiative explores how Big Data and real-time analytics technologies can power a more agile approach to sustainable development. We . . . explore how digital data sources and real-time analytics technologies can help reveal insights about human well-being and emerging vulnerabilities, in order to better protect populations from shocks.
  • 6. 4 SAIS Review Winter–Spring 2014 As the New York Times wrote in its August 2013 profile of Global Pulse, the United Nations is often perceived as a “sprawling bureaucracy.” What makes the Global Pulse team unique? What qualities—personal and professional—do you seek in a team member? Global Pulse is unique because we have an “intrapreneurial” approach. It requires risk-taking and innovation to discover and generate new tools, techniques, and methodologies to help the UN system and wider community leverage new sources of real-time information and insights in the service of humanitarian response and development work. This also requires a real blend of expertise from within and outside of the UN. Due to the experi- mental nature of our work, we are set up as a network of labs. We have multidisciplinary teams working at our Pulse Labs in New York, Jakarta, and Kampala that include data scientists and analysts, social scientists, legal experts, and communications and partnerships specialists. Pulse Lab teams design, scope, and co-create projects with UN agencies and national institutions that provide sectoral expertise, and with private sector or academic partners who provide access to data or analytical and engineering tools. When building a team, I look for “T-shaped people”—that is, people with a broad range of skills and a flexible attitude, as well as deep knowledge of one discipline, whether it is data science, design, partnership management, or legal and privacy matters. From what range of sources do you derive the data used for your analyses? Which datasets do you consider to be the most unique or surprising? What are the challenges associated with data collection and analysis? Global Pulse is interested in trends that reveal something about human well-being, which can be revealed from data produced by people as they go about their daily lives (sometimes known as “data exhaust”). Broadly speaking, we have been exploring two types of data in the Pulse Labs. The first is data that reflects “what people say,” which includes publicly available content from the open web, such as tweets, blog posts, news stories, and so forth. The second is data that reflects “what people do,” which can include information routinely generated for business intelligence and to optimize sales in a private sector company. An example of “what people do” data is anonymized mobile phone traffic information, which can reveal everything from footfall in a shopping district during rush hour to how a population migrates after a natural catastrophe. A dataset that may be surprising is postal data (the traffic and volume of packages being shipped), which can be used as a proxy for GDP and eco- nomic activity in a country or region. We are beginning a series of research projects with the Universal Postal Union (UPU), the United Nations special- ized agency for the postal sector, to explore this relationship further. There are several challenges associated with moving this kind of analysis out of an innovation lab and into practice, including the need to
  • 7. 5A Conversation with Robert Kirkpatrick build skills and capacity around data science, the formation of sustain- able partnerships with potential data providers in the private sector, and identifying where new data and insights can fit into the planning and decision-making processes. And most importantly, we must take data protection and privacy norms, policies, and techniques to a new level to mitigate the po- tential for misuse. Our mission to find responsible ways of using big data for global development purposes does not include analyzing private or confidential information. We follow, and advocate for, robust privacy protection principles. Does Global Pulse focus on certain sectors? If so, why? Although we can and do work with any part of the UN system that has a de- velopment problem that data science might contribute to solving, there are certain areas that are particularly well-suited to big data analysis. This year, we will focus in particular on public health, including attitudes to health as expressed on social media, news media, and patterns in anonymized search data. For example, in partnership with the World Health Organiza- tion, we are exploring whether early warning of non-communicable disease risk factors in a country or community could be understood via analysis of key words in social media data. We continue to look at parental attitudes to immunizing children as expressed on social media, in order to address misinformation that stops parents from protecting their children against preventable diseases. Another research priority is food security. In Indonesia, our Pulse Lab Jakarta research team is exploring whether big data can provide insights about the impacts of food price changes, in order to support the social pro- tection policies of the government of Indonesia. Other areas of focus this year include supporting humanitarian action through new data analytics techniques, finding new ways to measure economic well-being, and using digital data mining to help shape the priority development agenda that will replace the Millennium Development Goals after they expire in 2015. Across all sectors, though, Global Pulse conducts a range of activities to strengthen the big data for development (BD4D) ecosystem by guiding the development of regulatory frameworks and technical standards to address data-sharing and privacy protection challenges. We support an emerging community of practice to accelerate public sector adoption through advocacy, policy guid- ance, and technical assistance. How do you identify and maintain relationships with your private sec- tor partners? Private sector partners are incredibly important in helping us leverage big data as a resource for sustainable development. Using big data responsibly We must take data protection and privacy norms, policies, and techniques to a new level to mitigate the potential for misuse.
  • 8. 6 SAIS Review Winter–Spring 2014 and effectively requires several different elements, so there are different areas of expertise, knowledge, and resources we look for when building partnerships. The Global Pulse network of partners and collaborators in- cludes forward-thinking private sector companies that are willing to engage in “data philanthropy,” by granting access to data and technology tools to the public sector. Our network also includes industry leaders, universities, research institutes, and non-profit networks of researchers and innovators who are ready to bring their skills and expertise to bear for advancing the use of data science across the global development and humanitarian fields. To establish and maintain these relationships, we have a partnership manager and a privacy and legal expert, both of whom help guide potential partners through the process. They work with counterparts in the com- panies to ensure that safeguards, legal agreements, and data protection principles are in place. Once collaboration is underway, our research team will work closely with data analysts in the partner organization to initiate a project or exploration. Often, the data never leaves the business that owns it; rather, our data scientists guide the process and then the trends or results are shared. This modality works well when the data is sensitive. The experience with our partners, overall, is one of mutual learning. The Pulse Lab network offers a safe “sandbox” for de-risking this type of experimentation as we all learn together how the public and private sector can responsibly harness big data for development. Could you share a few Global Pulse success stories? Similarly, which development challenges (such as particular regions or issues) are par- ticularly difficult to tackle? There are success stories on the data philanthropy front in which telecom- munications companies have made anonymized datasets available as part of a competition or challenge. For example, last year we collaborated with Orange Telecom to host a “Data for Development Challenge” in which the company opened up a dataset of anonymized mobile phone data to more than eighty research teams from around the world to analyze. This research garnered insights that the international development community can be inspired by or learn from. In terms of projects we are carrying out, as I mentioned previously, our Pulse Lab in Indonesia is conducting research on mining tweets to under- stand food price crises. Their research has provided new insights into the very real problem of sudden increases in the price of staple foodstuffs, like rice prices, pushing families below the poverty line and causing regional economic instability. Real-time information about these impacts could help policymakers and governments provide support to families who are suffer- ing as a result of food price hikes. Going forward, we plan to conduct further research on social media analysis for food security and for crowdsourcing food prices, since these are areas of focus for the government of Indonesia, and will be applicable in many other parts of the world. Certainly, this is not a solution that can be
  • 9. 7A Conversation with Robert Kirkpatrick applied universally. Social media analysis is of limited use in countries where internet penetration is low, and even in regions of a country where the digi- tal divide is vast. There are also ana- lytical challenges yet to be resolved, including the ne- cessity of build- ing technologies that can support diverse local lan- guages. We are in the early days of dis- covering how big data can be applied to development and humanitarian contexts, and there are diverse challenges ranging from data access and the capacity to use real-time data in decision-making to data privacy. These challenges will be addressed over time. As Viktor Mayer-Schönberger and Kenneth Cukier wrote, big data is revolutionizing the way we solve problems. You have noted several ways that data collection and analysis is helping Global Pulse address global development challenges. But does the shift toward big data analytics have drawbacks for the field of international development, as well? There are risks rather than drawbacks. I hear a lot about the supposed fear that all development decisions would be made by algorithms. This is not a realistic fear, but rather a false dichotomy between quantitative use of data in decision-making and policymakers using qualitative experiences to decide a course of action. Big data will always be one part of a solution, not the only solution. Of course, we need research, official statistics, and the deep knowledge of field workers, communities, and practitioners, but big data and data science represents a useful addition to the development and humanitarian worker’s toolbox. Another common misperception is that real-time data would replace official statistics, but this assumption is unrealistic, as well. Official statis- tics will continue to provide high-quality snapshots of progress that can be benchmarked. But increasingly, between those annual, bi-annual, or monthly updates, real-time data sources will provide valuable interim feed- back and indicators. This feedback can enable course-correction when it is evident that a program isn’t working. And real-time data can reveal shifts in food-pricing, population changes, or disease outbreaks within a day or an hour, rather than a month or a year. We are in the early days of discovering how big data can be applied to development and humanitarian contexts, and there are diverse challenges ranging from data access and the capacity to use real-time data in decision-making to data privacy.
  • 10. 8 SAIS Review Winter–Spring 2014 Many graduate schools of international affairs, including the Johns Hopkins School of Advanced International Studies (SAIS), offer cours- es or concentrations in international development. Do you find that these programs offer adequately rigorous quantitative or data analysis requirements? What skills should students develop if they intend to enter the field of international development? There is a need for greater skills capacity for data analysis—this is something that is needed across the board and not only in our field. The international development practitioner of the future will be someone who is data literate, and capable of using data analysis to inform his or her understanding and decision-making. So yes, we’d like to see graduate schools covering data for development and the process- es involved. Just as the schools of journalism are now teaching data journalism, all students must un- derstand how to identify credible informative sources, how to perform—or at least understand quantitative statistical and data analysis, and how to appropriately use data to inform their judgment. The good news is that I see a lot of appetite for these skills from current students, so this change is beginning to happen. The international development practitioner of the future will be someone who is data literate, and capable of using data analysis to inform his or her understanding and decision-making.
  • 11. 9SAIS Review vol. XXXVI no. 1 (Winter–Spring 2014) 9© 2014 by The Johns Hopkins University Press Big Data, Selection Bias, and the Statistical Patterns of Mortality in Conflict Megan Price and Patrick Ball The notion of “big data” implies very specific technical assumptions. The tools that have made big data immensely powerful in the private sector depend on having all (or nearly all) of the possible data. In our experience, these technical assumptions are rarely met with data about the policy and social world. This paper explores how information is generated about killings in conflict, and how the process of information generation shapes the statistical patterns in the observed data. Using case studies from Syria and Iraq, we highlight the ways in which bias in the observed data could mislead policy. The paper closes with recommendations about the use of data and analysis in the development of policy. Introduction Emerging technology has greatly increased the amount and availability of data in a wide variety of fields. In particular, the notion of “big data” has gained popularity in a number of business and industry applications, enabling companies to track products, measure marketing results, and in some cases, successfully predict customer behavior.1 These successes have, understandably, led to excitement about the potential to apply these meth- ods in an increasing number of disciplines. Megan Price is the director of research at the Human Rights Data Analysis Group. She has conducted data analyses for projects in a number of locales including Syria and Guatemala. She recently served as the lead statistician and head author of two reports commissioned by the Office of the United Nations High Commissioner of Human Rights. Patrick Ball is the executive director of the Human Rights Data Analysis Group. Beginning in El Salvador in 1991, Patrick has designed technology and conducted quantitative analyses for truth commissions, non-governmental organizations, domestic and international criminal tribunals, and United Nations missions. Most recently, he provided expert testimony in the trial of former de facto President of Guatemala, Gen. José Efraín Ríos Montt. The materials contained herein represent the opinions of the authors and editors and should not be construed to be the view of HRDAG, any of HRDAG’s constituent projects, the HRDAG Board of Advisers, the donors to HRDAG, or this project.
  • 12. 10 SAIS Review Winter–Spring 2014 Although we share this excitement about the potential power of data analysis, our decades of experience analyzing data about conflict-related violence motivates us to proceed with caution. The data available to hu- man rights researchers is fundamentally different from the data available to business and industry. The difference is whether the data are complete. In most business processes, an organization has access to all the data: every item sold in the past twelve months, every customer who clicked through their website, etc. In the exceptional cases where complete data are unavail- able, industry analysts are often able to generate a representative sample of the data of interest.2 In human rights, and more specifically in studies of conflict violence, we rarely have access to complete data. What we have instead are snapshots of violence: a few videos of public killings posted to YouTube, a particular set of events retro- spectively recorded by a truth commission, stories covered in the local or international press, protesters’ SMS messages aggregated onto a map, or victims’ testimonies recorded by non-governmental human rights organizations (NGOs) are typical sources. Statistically speaking, these snapshots are “convenience samples,” and they cover an unknown propor- tion of the total number of cases of violence.3 It is mathematically difficult, often impossible, to know how much is undocumented and, consequently, missing from the sample. Incompleteness is not a criticism of data—collecting complete or rep- resentative data under conflict conditions is generally impossible. The chal- lenge is that researchers and advocates naturally want to address questions that require either the total number or a representative subset of cases of violence. How many people have been killed? What proportion was from a vulnerable population? Were more victims killed last week or this week? Which perpetrator(s) are committing the majority of the violence? Basing answers and policy decisions on analyses of partial datasets with unknown, indeed unknowable, biases can prove to be misleading. These concerns should not deter researchers from asking questions of data; rather, it should caution them against basing conclusions on inadequate analyses of raw data. We conclude by suggesting methods from several quantitative disciplines to estimate the bias in direct observations. The Problem of Bias When people record data about events in the world, the records are almost always partial; reasons why the observation of violence often misses some or most of the violence are presented in the examples to follow. Most samples are partial, and in samples not collected randomly, the patterns of omission may have structure that influence the patterns observed in the data. For ex- ample, killings in urban areas may be nearly always reported, while killings In human rights, and more specifically in studies of conflict violence, we rarely have access to complete data.
  • 13. 11Big Data, Selection Bias, and Mortality in Conflict in rural areas are rarely documented. Thus, the probability of an event be- ing reported depends on where the event happened. Consequently, analysis done directly from this data will suggest that violence is primarily urban. This conclusion is incorrect because the data simply do not include many (or at least proportionally fewer) cases from the rural areas. In this case, the analysis is finding a pattern in the documentation that may appear to be a pattern in true violence—but if analysts are unaware of the documentation group’s relatively weaker coverage of the rural areas, they can be misled by the quantitative result. In our experience, even when analysts are aware of variable coverage in different areas, it is enormously difficult to draw a meaningful conclusion from a statistical pattern that is affected by bias. Statisticians call this problem “selection bias” because some events (in this example, urban ones) are more likely to be “selected” for the sample than other events (in this example, rural ones). Selection bias can affect human rights data collection in many ways.4 We use the word “bias” in the statistical sense, meaning a statistical difference between what is observed and what is “truth” or reality. “Bias” in this sense is not used to connote judgment. Rather, the point is to focus attention on empirical, calculable differences between what is observed and what actually happened. In this article, we focus on a particular kind of selection bias called “event size bias.” Event size bias is the variation in the probability that a given event is reported, related to the size of the event: big events are likely to be known, small events are less likely to be known. In studies of conflict violence, this kind of bias arises when events that involve only one victim are less likely to be documented than events that involve larger groups of victims. For example, a market bombing may involve the deaths of many people. The very public nature of the attack means that the event is likely to attract extensive attention from multiple media organizations. By contrast, an assassination of a single person, at night, by perpetrators who hide the victim’s body, may go unreported. The victim’s family may be too afraid to report the event, and the body may not be discovered until much later, if at all. These differences in the likelihood of observing information about an event can skew the available data and result in misleading interpretations about patterns of violence.5 Case Studies We present here two examples from relatively well-documented conflicts. Some analysts have argued that information about conflict-related killings in Iraq and Syria is complete, or at least sufficient for detailed statistical analysis. In contrast, our analysis finds that in both cases, the available data are likely to be systematically biased in ways that are likely to confound interpretation. Syria Many civilian groups are currently carrying out documentation efforts in the midst of the ongoing conflict in Syria. In early 2012, the United Nations Of- fice for the High Commissioner for Human Rights (OHCHR) commissioned
  • 14. 12 SAIS Review Winter–Spring 2014 the Human Rights Data Analysis Group (HRDAG) to examine datasets from several of these groups, and in two reports, Price et al. provide in-depth de- scriptions of these sources.6 In this section, we focus our attention on four sources—in essence, lists of people killed—which cover the entire length of the ongoing conflict and which have continued to provide us with updated records of victims. These sources are the Syrian Center for Statistics and Re- search7 (CSR-SY), the Syrian Network for Human Rights8 (SNHR), the Syria Shuhada website9 (SS) and the Violations Documentation Centre10 (VDC). Figure 1 shows the number of victims documented by each of the four sources over time within the Syrian governorate of Tartus. The large peak visible in all four lines in May 2013 corresponds to an alleged massacre in Banias.11 It appears that all four sources documented some portion of this event. Many victims were recorded in the alleged massacre, this event was very well reported, and all four of our sources reflect this event in their lists. However, three out of the four sources document very little violence occur- ring before or after May 2013 in Tartus. The fourth source, VDC, shows the peak of violence in May as the culmination of a year of consistent month- to-month increases in the number of reported killings. When interpreting figures such as Figure 1, we should not aim to iden- tify a single “correct” source. All of these sources are documenting different snapshots of the violence, and all of them are contributing substantial num- bers of unique records of victims undocumented by the other sources.12 The presence of event size bias is detectable in this particular example because all four of the sources obviously captured a similar event (or set of events) in May 2013, while at the same time one of those sources captured a very different subset of events during the preceding months. If we did not have access to the VDC data, our analysis of conflict violence in Tartus would incorrectly conclude that the alleged massacre in May 2013 was an isolated event surrounded by relatively low levels of violence. The conclusion from Figure 1 should not be that VDC is doing a “better” job of documenting victims. VDC is clearly capturing some events that are not captured by the other sources, but there is no way to tell how many events are not being captured by VDC. From this figure alone we cannot conclude what other biases may be present in the observed data. For example, the relatively small peak in February 2012 could be as small as it seems, or it could be as large as the later peak in May 2013. Without a method of statistical estimation that uses a probability model to account for the undocumented events, it is impossible to know.13 To underline this crucial point: despite the availability of a large amount of data describing violence in Tartus, there is no mathematically sound method to draw conclusions about the patterns of violence directly from the data (though it is possible to use the data and statistical models to estimate how many events are missing). The differences in the four sources available to us make it possible to detect the event size bias occurring in May 2013, but what other biases might also be present in this observed data and hidden from view? What new events might a fifth, sixth, or seventh source document? Are there enough undocumented events such that if they were
  • 15. 13BIG DATA, SELECTION BIAS, AND MORTALITY IN CONFLICT included, our interpretation of the patterns would change? These are the crucial questions that must be examined when interpreting perceived pat- terns in observed data. Iraq We detect a subtler form of event size bias in data from the Iraq Body Count (IBC), which indexes media and other sources that report on violent deaths in Iraq since the Allied invasion in March 2003.14 Our analysis is motivated by a recent study by Carpenter et al., which found evidence of substantial event size bias.15 Their approach was to compare the U.S. military’s “sig- nificant acts” (SIGACTS) database to the IBC records. As they report, this comparison showed that “[e]vents that killed more people were far more likely to appear in both datasets, with 94.1% of events in which ≥20 people were killed being likely matches, as compared with 17.4% of … killings [that occurred one at a time].”16 This implies that IBC, SIGACTS, or both, capture a higher fraction of large events than small events. Carpenter et al. go on Figure 1. Number of Victims Documented by Four Sources, Over Time, in Tartus
  • 16. 14 SAIS Review Winter–Spring 2014 to note that “[t]he possibility that large events, or certain kinds of events (e.g., car bombs) are overrepresented might allow attribution that one side in a conflict was more recklessly killing civilians, when in fact, that is just an artifact of the data collection process.”17 Motivated by this analysis, we considered other ways to examine IBC records for evidence of potential event size bias. Since IBC aggregates re- cords from multiple sources, updated IBC data already incorporates many records from SIGACTS.18 In contrast to the work of Carpenter et al., who treated IBC and SIGACTS as two separate data sources and conducted their own independent record linkage between the two sources, we examined only records in the IBC database, including those labeled as from SIGACTS. It should be noted that we conducted this analysis on a subset of the data after filtering out very large events with more than fifty victims. We made this choice because, on inspection, many of the records with larger numbers of reported victims are data released in batches by institutions such as morgues, or incidents aggregated over a period of time, rather than specific, individual events. We began by identifying the top one hundred data sources; one or more of the top one hundred sources cover 99.4 percent of the incidents in IBC.19 Given these sources, we counted the number of sources (up to one hundred) for each event. Event size was defined as the mean (rounded to the nearest integer) of the reported maximum and minimum event size values. Then the data were divided into three categories: events with one victim, events with two to five victims, and events with six to fifty victims. The analysis was performed on these groups. Figure 2 summarizes our findings. The shading of each bar in Figure 2 indicates the proportion of events of that size reported by one, two, or three or more sources. For each category of event sizes, most events have two sources. For events of size one, the second most frequent number of sources is one, accounting for nearly a third of all events of this size; almost no single-victim events have three or more sources. The number of events with three or more sources increases quickly in medium-sized events and in large events. Relatively few of the largest events are reported by a single source. Thus there seems to be a relationship between event size and the number of sources: larger events are captured by more sources. This rein- forces the finding by Carpenter et al. that larger events are more likely to be captured by both IBC and SIGACTS. We have generalized this finding to the top one hundred sources; larger events are more likely to be captured by multiple sources. The number of sources covering an event is an indicator of how “inter- esting” an event is to a community of documentation groups—in this case, media organizations. The pattern shown in Figure 2 implies that media sources are more interested in larger events than smaller events. Greater interest in the larger events implies that larger events are more likely to be reported (observed) by multiple sources relative to smaller events. Since a larger proportion of small events are covered by only a single source, it is likely that more small events are missed, and therefore excluded from IBC.20
  • 17. 15Big Data, Selection Bias, and Mortality in Conflict As noted by Carpenter et al., the correlation between event attributes and the likely reporting of those events can result in highly misleading in- terpretation of apparent patterns in the data. As a relatively neutral example, analysts might erroneously conclude that most victims in Iraq were killed in large events, whereas this may actually be an artifact of the data collec- tion. A potentially more damag- ing, incorrect conclusion might be reached if large events are centered in certain geographic regions or attributed to certain perpetrators; in these cases, reading the raw data directly would mistake the event size bias for a true pattern, thereby misleading the analyst. Inap- propriate interpretations could result in incorrect decisions regarding security measures, intervention strategies, and ultimately, accountability. The correlation between event attributes and the likely reporting of those events can result in highly misleading interpretation of apparent patterns in the data. Figure 2. Proportion of Events Covered by One, Two, or Three or More Sources
  • 18. 16 SAIS Review Winter–Spring 2014 Discussion Event size bias is one of many kinds of selection and reporting biases that are common to human rights data collection. It is important to recall that we refer here to biases in the statistical sense: a measurable difference be- tween the observed sample and the underlying population of interest. The biases that worry us here affect statistics and quantitative analyses; we are not implying that the political goals of the data collection groups have influenced their work. In the context of conflict violence, meaningful statistical analysis involves comparisons to answer questions such as: Did more violence oc- cur this month or last month? Were there more victims of ethnicity A or B? Did the majority of the violence occur in the north or the south of the country? The concern about bias focuses on how the data collection process may more effectively document one month rela- tive to another, creating the appearance of a difference between the months. Unfortu- nately, the apparent difference is the result of changes in the documentation process, not real changes in the patterns of violence. To make sense of such comparisons, the observed data must in some way be adjusted to represent the true rates. There are a number of methods for making this adjustment if the observed data were collected at random, but this is rarely the case. There are relatively few models that can adjust data that were collected because it was simply available. In order to compare nonrandom data across categories like months or regions, the analyst must assume that the rate at which events from each category are observed is the same. For example, 60 percent of the total killings were collected in March, and 60 percent of the total killings were collected in April. This rate is called the coverage rate, and it is unknown, unless somehow the true number of events were known or estimated. If the coverage rates for different categories are not the same, the observed data tell only the story of the documentation; they do not indicate an accurate pattern. For example, if victims of ethnicity A are killed in large-scale vio- lent events with many witnesses, while victims of ethnicity B are killed in targeted, isolated violent events, we may receive more reports of victims of ethnicity A and erroneously conclude that the violence is targeted at eth- nicity A. Until we adjust for the event size bias resulting in more reports of victims of ethnicity A, we cannot draw conclusions about the true relation- ship between the number of victims from ethnicity A versus B. There are many other kinds of selection bias. As an example, when rely- ing on media sources, journalists make decisions about what is considered newsworthy. Sometimes their decisions may create event size bias, as large . . . the apparent difference is the result of changes in the documentation process, not real changes in the patterns of violence.
  • 19. 17Big Data, Selection Bias, and Mortality in Conflict events are frequently considered newsworthy. But the death of individual, prominent members of a society are frequently also considered newswor- thy. Conversely, media “fatigue” may result in under-documentation later in a conflict, or when other newsworthy stories may limit the amount of time and space available to cover victims of a specific conflict.21 Many other characteristics of both the documentation groups and the conflict can result in these kinds of biases such as logistical or budgetary limitations, trust or affinity variations within the community, and the security and stability of the situation on the ground.22 As each of these factors changes, coverage rates are likely to change as well. The fundamental reason why biases are so problematic for quantita- tive analyses is that bias often correlates with other dimensions that are interesting to analysts, such as trends over time, patterns over space, differ- ences compared by the victims’ sex, or some other factor. As in the example of ethnicities A and B above, the event size bias is correlated with the kind of event. Failing to adjust for the reporting bias leads to the wrong conclu- sion. As another example, consider the Iraq case described above: If event size is correlated with the events’ perpetrators, then bias on event size means bias on perpetrator, and a naïve reading of the data could lead to security officials trying to solve the wrong security problems. Or, in the Syria case, if decisions about resource allocation to Tartus were made on the basis of the observed information, without taking into account the patterns of kill- ings that were not observed, researchers may have inaccurately concluded that violence documented in May 2013 represented an isolated event. One could imagine that such a conclusion could lead to any number of incorrect decisions: sending aid groups into Tartus under the erroneous assumption of relative security, or failing to send aid and assistance before or after May 2013, assuming that such resources were more in need elsewhere. It is important to note that these challenges frequently lack a scientific solution.23 We do not need to simply capture more data. What we need is to appropriately recognize and adjust for the biases present in the available data. Indeed, as indicated in the Iraq example, where multiple media sources appear to share similar biases, the addition of more data perpetuates and in some cases amplifies the event size bias. Detection of, and adjustment for, bias requires statistical estimation. A wide variety of statistical methods can be used to adjust for bias and es- timate what is missing from observed data. In our work we favor multiple systems estimation, which has been developed under the name capture- recapture in ecology, and used to study a variety of human populations in research in demography and public health. Analysts more familiar with traditional survey methods often prefer adjustments based on post-stratifi- cation or “raking,” each of which involves scaling unrepresentative data to a known representative sample or population.24 Each method has limitations and requires assumptions, which may or may not be reasonable, but formal statistical models provide a way to make those assumptions explicit, and in some cases, to test whether they are appropriate. Comparisons from raw data implicitly but necessarily assume that such snapshots are statistically repre- sentative. This assumption may sometimes be true, but only by coincidence.
  • 20. 18 SAIS Review Winter–Spring 2014 Conclusions Carpenter et al. warn that “press members and scientists alike should be cau- tious about assuming the completeness and representativeness of tallies for which no formal evaluation of sensitivity has been conducted. Citing partial tallies as if they were scientific samples confuses the public, and opens the press and scholars to being manipulated in the interests of warring parties.” In a back-of-the-envelope description elsewhere, we have shown that small variations in coverage rates can lead to an exactly wrong conclusion from raw data. 25 Groups such as the Iraq Body Count, the Syrian Center for Statistics and Research, the Syrian Network for Human Rights, the Syria Shuhada website, and the Violations Documentation Centre collect invaluable data, and they do so systematically, and with principled discipline. These groups should continue to collate and share it as a fundamental record of the past. The data can also be used in qualitative research about specific cases, and in some circumstances, in statistical models that can adjust for biases. It is tempting, particularly in emotionally charged research such as studies of conflict-related violence, to search available data for answers. It is intuitive to create infographics, to draw maps, and to calculate statistics and draft graphs to look for patterns in the data. Unfortunately, all people—even statisticians—tend to draw conclusions even when we know that the data are inadequate to support comparisons. Weakly founded statistics tend to mislead the reader. Statistics, graphs, and maps are seductive because they seem to prom- ise a solid basis for conclusions. The current obsession with using data to formulate evidence-based policy increases the pressure to use statistics, even as new doubts emerge about whether “big data” predictions about social conditions are accurate.26 When calculations are made in a way that enables a mathematical foundation for statistical inference, these statistics deliver on the promise of an objective measurement in relation to a specific question. But analysis with inadequate data is very hard even for subject matter experts to interpret. In the worst case, it offers a falsely precise view, a view that may be completely wrong. In the best case, it invites speculation about what’s missing and what biases are uncontrolled, creating more questions than answers, and ultimately, a distraction. When policymakers turn to statistical analysis to address key questions, they must assure that the analysis gives the right answers. Statistics, graphs, and maps are seductive because they seem to promise a solid basis for conclusions.
  • 21. 19Big Data, Selection Bias, and Mortality in Conflict Notes 1  One extreme example includes Target successfully predicting a customer’s pregnancy, as reported in the New York Times and Forbes. In particular, Target noticed that pregnant women buy specific kinds of products at regular points in their pregnancy, and the company used this information to build marketing campaigns. 2  However it is certainly worth noting that even in these contexts sometimes big data are not big enough and may still be subject to the kinds of biases we worry about in this paper. See Kate Crawford’s keynote at STRATA and Tim Harford’s recent post on Financial Times for examples. 3  Specifically, “convenience samples” refer to data that is non-randomly collected, though collecting such data is rarely convenient. 4  Another common kind of bias that affects human rights data is reporting bias. Whereas se- lection bias focuses on how the data collection process identifies events to sample, reporting bias describes how some points become hidden, while others become visible, as a result of the actions and decisions of the witnesses and interviewees. For an overview of the impact of selection bias on human rights data collection, see Jule Krüger, Patrick Ball, Megan Price, and Amelia Hoover Green (2013). “It Doesn’t Add Up: Methodological and Policy Implica- tions of Conflicting Casualty Data.” In Counting Civilian Casualties: An Introduction to Recording and Estimating Nonmilitary Deaths in Conflict, ed. by Taylor B. Seybolt, Jay D. Aronson, and Baruch Fischhoff. Oxford UP. 5  Christian Davenport and Patrick Ball. “Views to a Kill: Exploring the Implications of Source Selection in the Case of Guatemalan State Terror, 1977–1996.” Journal of Conflict Resolution 46(3): 427–450. 2002. 6  Megan Price, Jeff Klingner, Anas Qtiesh, and Patrick Ball (2013). “Full Updated Statistical Analysis of Documentation of Killings in the Syrian Arab Republic.” Human Rights Data Analysis Group, commissioned by the United Nations Office of the High Commissioner for Human Rights (OHCHR). Megan Price, Jeff Klingner, and Patrick Ball (2013). “Prelimi- nary Statistical Analysis of Documentation of Killings in the Syrian Arab Republic.” The Benetech Human Rights Program, commissioned by the United Nations Office of the High Commissioner for Human Rights (OHCHR). 7  http://www.csr-sy.com 8  http://www.syrianhr.org 9  http://syrianshuhada.com 10  http://www.vdc-sy.info 11  See reports in the LA Times, BBC, and the Independent, among others. 12  Price. et al. 2013. 13  See https://hrdag.org/mse-the-basics/ for the first in a series of blog posts describing Multiple Systems Estimation (MSE) or Kristian Lum , Megan Emily Price and David Banks (2013). Applications of Multiple Systems Estimation in Human Rights Research. The Ameri- can Statistician, 67:4, 191–200. DOI: 10.1080/00031305.2013.821093 14  http://www.iraqbodycount.org 15  Carpenter D, Fuller T, Roberts L. “WikiLeaks and Iraq Body Count: the sum of parts may not add up to the whole—a comparison of two tallies of Iraqi civilian deaths.” Prehosp Disaster Med. 2013;28(3):1–7. doi:10.1017/S1049023X13000113 16  Ibid. 17  Ibid. 18  We downloaded the ibc-incidents file on 14 Feb 2014, and processed it using the pandas package in python. 19  The top 100 sources include, for example, AFP, AL-SHAR, AP, CNN, DPA, KUNA, LAT, MCCLA, NINA, NYT, REU, VOI, WP, XIN, and US DOD VIA WIKILEAKS. 20  These assumptions can be formalized and tested within the framework of ‘species richness,’ which is a branch of ecology that estimates the number of different types of species within a geographic area and/or time period of interest using models for data organized in a very similar way to the IBC’s event records. See Wang, Ji-Ping. “Estimating species richness by a Poisson-compound gamma model.” Biometrika 97.3 (2010): 727–740.
  • 22. 20 SAIS Review Winter–Spring 2014 21  A research question to address this might be: Do media-reported killings in a globally- interesting conflict like Iraq or Syria decline during periods when other stories attract interest? Do reported killings decline during the Olympics? 22  Krüger et al. (2013) 23  Bias issues can sometimes be resolved with appropriate statistical models, that is, with better scientific reasoning about the specific kind of data involved. However, we underline that bias is not solvable with better technology. Indeed, some of the most severely biased datasets we have studied are those collected by semi- or fully-automated, highly technologi- cal methods. Technology tends to increase analytic confusion because it tends to amplify selection bias. 24  For a description of multiple systems estimation, see Lum et al. 2013. For methods on missing data in survey research which might be applicable to the adjustment of raw, non- random data if population-level information is available, see Brick, J. Michael, and Graham Kalton. “Handling missing data in survey research.” Statistical methods in medical research 5.3 (1996): 215–238. For an overview of species richness models which might be used to estimate total populations from data organized like the IBC, see op. cit Wang. For an analysis of sampling issues in “elusive” populations, see Johnston, Lisa G., and Keith Sabin. “Sampling hard-to-reach populations with respondent driven sampling.” Methodological Innovations Online 5.2 (2010): 38–48. 25  https://hrdag.org/why-raw-data-doesnt-support-analysis-of-violence/ 26  Lazer, David and Kennedy, Ryan and King, Gary and Vespignani, Alessandro, Google Flu Trends Still Appears Sick: An Evaluation of the 2013–2014 Flu Season (March 13, 2014). Available at SSRN: http://ssrn.com/abstract=2408560
  • 23. 21SAIS Review vol. XXXIV no. 1 (Winter–Spring 2014) 21© 2014 by The Johns Hopkins University Press Using Big Data and Quantitative Methods to Estimate and Fight Modern Day Slavery Monti Narayan Datta Given the hidden, criminal nature of contemporary slavery, empirically estimating the proportion of the population enslaved at the national and global level is a challenge. At the same time, little is understood about what happens to the lives of the survivors of slavery once they are free. I discuss some data collection methods from two nongovernmental organizations (NGOs) I have worked with that shed light on these issues. The first NGO, the Walk Free Foundation, estimates that there are about 30 million enslaved in the world today. The second NGO, Free the Slaves, employs a longitudinal analysis to chronicle the lives of survivors. The acquisition and dissemination of such information is crucial because policymakers and donors sometimes require hard data before committing time, political will, and resources to the cause. Unpacking the Problem of Contemporary Slavery As Kevin Bales of the Wilberforce Institute for the Study of Slavery and Emancipation explains, “Slavery is the possession and control of a person in such a way as to significantly deprive that person of his or her individual liberty, with the intent of exploiting that person through their use, management, profit, transfer or disposal. Usually this exercise will be achieved through means such as violence or threats of violence, deception and/ or coercion.”1 Thus, at its core, slavery is a dynamic be- tween two individuals, the enslaved and the slave- holder, in which the slave- holder has a monopoly of control and violence upon Monti Narayan Datta is an assistant professor of political science at the University of Richmond. His current book project, forthcoming with Cambridge University Press, focuses on the consequences of anti-Americanism. He is working on several projects on human trafficking and modern day slavery with Free the Slaves and Chab Dai and the Walk Free Foundation. Along with Kevin Bales and Fiona David, he is a co-author of the Global Slavery Index: http://www.globalslaveryindex.org. . . . at its core, slavery is a dynamic between two individuals, the enslaved and the slaveholder, in which the slaveholder has a monopoly of control and violence upon the enslaved.
  • 24. 22 SAIS Review Winter–Spring 2014 the enslaved. The slaveholder can coerce the enslaved to perform a number of abominable acts. This can include: sexual servitude on the streets of New York City;2 adult labor in the coltan mines of the Congo;3 child slavery in the shrimp farms of Bangladesh; or forced domestic servitude in the suburbs of Los Angeles.4 Compounding the matter is that enslaved persons can spend years—sometimes decades—under such conditions.5 This can sometimes lead to slavery lasting across several generations. Short of homicide, slavery is one of the most inhumane crimes one person can commit against another. In recent years, a number of governments and international govern- mental organizations have addressed modern day slavery at home and abroad. In the United States, Congress passed the Victims of Trafficking and Violence Protection Act (TVPA) in 2000. The TVPA established the President’s Interagency Task Force to Monitor and Combat Trafficking—a cabinet-level group whose mission is to coordinate efforts to combat traf- ficking in persons—led by the U.S. State Department. Since then, the State Department has produced its annual Trafficking in Persons (TIP) Report, which has become “the U.S. Government’s principal diplomatic tool to en- gage foreign governments on human trafficking.”6 Although not without controversy, the TIP Report has educated many on the sources and impact of modern day slavery.7 On the global stage, between 2000 and 2001 the United Nations Gen- eral Assembly adopted three protocols to its Convention against Transna- tional Organized Crime: (1) the Protocol to Prevent, Suppress and Punish Trafficking in Persons, especially Women and Children; (2) the Protocol against the Smuggling of Migrants by Land, Sea, and Air; and (3) the Pro- tocol against the Illicit Manufacturing and Trafficking in Firearms. With 117 signatory countries, these protocols, known as the Palermo Protocols, advanced the global discussion not only on what constitutes contemporary slavery, but also on what the international community can do to mitigate its spread. Although some may argue that international agreements like the Palermo Protocols and documents like the TIP Report matter only margin- ally,8 others counter they catalyze change.9 Building upon a crest of public awareness on human trafficking, U.S. President Barack Obama proclaimed in 2012 at the Clinton Global Initiative, “We are turning the tables on the traffickers. Just as they are now using technology and the Internet to exploit their victims, we are going to harness technology to stop them.”10 Although he did not mention it explicitly, President Obama was re- ferring to the idea of using big data to mitigate contemporary slavery. As Pulitzer-prize winning journalist Steve Lohr explains, big data is “shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions.”11 This typically involves using software to find trends and patterns in large amounts of aggregated data from the Internet, sometimes from publicly available data, and other times from clandestinely obtained data. Along the lines of utilizing publicly available data, the tech-giant Google announced in April 2013 a big data partnership with the Polaris
  • 25. 23Big Data and Quantitative Methods to Estimate Modern Day Slavery Project, an antislavery NGO in Washington, D.C. The partnership, called the Global Human Trafficking Hotline Network, aims to use data mining soft- ware to identify human trafficking trends from the hotline that can even- tually inform “eradication, prevention, and victim protection strategies.”12 Although using big data to fight trafficking is new, the idea has been demonstrated by scholars like Mark Latonero of the Annenberg Center on Communication Leadership & Policy at the University of Southern Califor- nia. Latonero’s team partnered with local law enforcement agencies in Los Angeles, explored trends in human trafficking on websites like Backpage. com, and applied this information to target specific traffickers. This was done by mining data from advertisements for the sexual services of domestic minors on the adult section of Backpage in the Greater Los Angeles area, and identifying the phone numbers from those ads that appeared in the greatest frequencies. With this information, Latonero’s team was able to provide law enforcement with data linking certain phone numbers to criminal networks. The U.S. government is also using big data to mine private information networks, not on the World Wide Web, but on what is called the Deep Web— that part of the Internet that is not searchable on databases like Google. The Defense Advanced Research Projects Agency (DARPA), a branch of the U.S. military, recently launched a program called Memex to hunt criminal networks on the Deep Web. The first domain DARPA intends to undercover with this new technology is human trafficking.13 These developments in big data dovetail with a broader discussion within academia about how social science researchers can apply quantitative methods to estimate trends in contemporary slavery. Although rigorous, many studies of modern day slavery only exist in the pol- icy and academic communi- ties, and very few published works actually employ quantitative methods. In a comprehensive review of the research-based literature on contemporary slavery, Elżbieta M. Goździak and Micah N. Bump of George- town University found that, of 218 research-based journal articles, only seven (about 3 percent) were based on quantitative methods. Without hard data, it can be challenging for scholars to make generalizable inferences to inform policy. In this paper, using big data as a backdrop, I discuss some novel quan- titative methods employed by two NGOs I have worked with that shed light on contemporary slavery. The first NGO, the Walk Free Foundation, esti- mates that there are about 30 million enslaved in the world today. The sec- ond NGO, Free the Slaves, working with its local Indian partner, MSEMVS, assesses the lives of survivors and how they are reintegrating into society. The acquisition and dissemination of such information is crucial because These developments in big data dovetail with a broader discussion within academia about how social science researchers can apply quantitative methods to estimate trends in contemporary slavery.
  • 26. 24 SAIS Review Winter–Spring 2014 policymakers and donors sometimes require hard data before committing time, political will, and resources to the cause. The Walk Free Foundation Australian philanthropists Andrew and Nicola Forrest established the Walk Free Foundation (Walk Free)14 three years ago to eradicate contemporary slavery. After meeting with Microsoft co-founder Bill Gates, Andrew For- rest was inspired to explore the underpinnings of contemporary slavery using quantitative methods. As Forrest recounts, “Global modern slavery is hard to measure, and Bill’s a measure kind of guy,” adding, “in manage- ment speak, if you can’t measure it, it doesn’t exist.”15 For Forrest, it was important to inform people in the business and policy worlds of the extent to which slavery exists, country-by-country, to prompt action. Although some quantitative assessments of contemporary slavery existed, very little was publicly available. Forrest sought to collect more precise data to dis- seminate freely and thus launched a Global Slavery Index (GSI), on which I have been working since 2012.16 The 2013 GSI ranks 162 of the world’s nations in terms of their level of contemporary slavery. Methodologically, these rankings are based on several factors; the most novel is an estimation of the proportion of the population enslaved in each country. For this measure, the GSI team (led by Kevin Bales and Fiona David) has drawn upon secondary source data analysis that Bales pioneered for his book, Disposable People, and later dis- seminated in Scientific American.17 These secondary sources consisted of a review of the public record, including materials from published reports from governments, the investigations of NGOs and international organizations, and journalistic reports. The GSI team has also drawn upon data from rep- resentative random sample surveys to extrapolate the prevalence of slavery for selected comparable countries. Figure 1 illustrates the 2013 GSI data for the proportion of the population estimated to be enslaved. In Figure 1 the countries with darker shades indicate a corresponding higher proportion of enslavement. Some of the countries with the highest proportions are Haiti (about 2.1 percent of the population enslaved), Mau- ritania (about 4.0 percent of the population enslaved), Pakistan (about 1.2 percent of the population enslaved), and India (about 1.1 percent of the popula- tion enslaved). Table 1 lists the 2013 GSI data in terms of the total estimated number of the enslaved, country-by-country. This is a novel contribution compared to other estimates of contemporary slavery. Such information can be useful to busi- ness people, policymakers, and students who want a more informed under- standing of where slavery occurs and with what frequency. Overall, the 2013 GSI estimates about 29.8 million are enslaved among the 162 countries under study. The country with the least number of esti- . . . 29.8 million are enslaved among the 162 countries under study.
  • 27. 25Big Data and Quantitative Methods to Estimate Modern Day Slavery mated enslaved in 2013 was Iceland (twenty-two enslaved), and the country with the greatest number was India (13.9 million enslaved). The standard deviation (or spread) was extremely large: about 1.2 million enslaved. One important question is if the GSI has made a difference in the real world. One way to shed light on this is to explore some of the statis- tics achieved since the GSI’s launch in October 2013. To date, the GSI has received over half a million website visits. There have been over thirteen thousand downloads of the full report, available in English, Arabic, French, and Spanish. Moreover, there have been over fifteen hundred media reports about the GSI in over thirty-five countries, including The Economist,18 Die Standaard,19 La Vanguardia,20 CNN,21 National Public Radio,22 and Time.23 Some of the media reports about the GSI illustrate how it can gener- ate discussion on an underreported issue. In India, for instance, where the GSI estimates the greatest number of the enslaved to be, media response has been strong. The Times of India reported, “Sixty-six years after indepen- dence, India has the dubious distinction of being home to half the number of modern-day slaves in the world.”24 Perhaps due to such sentiments, the Hindustan Times discussed the causes of slavery in India and observed, “Some of the reasons for high numbers caught in slavery in India are the difficulty in accessing protections and government entitlements, such as the food rations card, corruption or non-performance of safety nets (such as the National Employment Guarantee, primary health care and pensions) and practices of land grabbing and asset domination by high-caste groups.”25 There is also some evidence that the GSI has begun to influence gov- ernment policy. In January of this year, building upon the momentum of the GSI, Andrew Forrest signed a memorandum of understanding (MOU) Figure 1. Global Slavery Index (GSI)—Proportion of the Population Estimated Enslaved in 2013
  • 28. 26 SAIS Review Winter–Spring 2014 Table 1. Global Slavery Index—Estimated Enslaved in 2013 Country Estimated Enslaved Country Estimated Enslaved Afghanistan 86,089 Lebanon 4,028 Albania 11,372 Lesotho 14,560 Algeria 70,860 Liberia 29,504 Angola 16,767 Libya 17,683 Argentina 35,368 Lithuania 2,909 Armenia 10,678 Luxembourg 69 Australia 3,167 Macedonia 6,226 Austria 1,100 Madagascar 19,184 Azerbaijan 33,439 Malawi 110,391 Bahrain 2,679 Malaysia 25,260 Bangladesh 343,192 Mali 102,240 Barbados 46 Mauritania 151,353 Belarus 11,497 Mauritius 535 Belgium 1,448 Mexico 103,010 Benin 80,371 Moldova 33,325 Bolivia 29,886 Mongolia 4,729 Bosnia and   Herzegovina 13,789 Montenegro 2,234 Botswana 14,298 Morocco 50,593 Brazil 209,622 Mozambique 173,493 Brunei 417 Myanmar 384,037 Bulgaria 27,739 Namibia 15,729 Burkina Faso 114,745 Nepal 258,806 Burundi 71,146 Netherlands 2,180 Cambodia 106,507 New Zealand 495 Cameroon 153,258 Nicaragua 5,798 Canada 5,863 Niger 121,249 Cape Verde 3,688 Nigeria 701,032 Central African   Republic 32,174 Norway 652 Chad 86,329 Oman 5,739 Chile 37,846 Pakistan 2,127,132 China 2,949,243 Panama 548 Colombia 129,923 Papua New Guinea 6,131 Costa Rica 679 Paraguay 19,602 Côte d’Ivoire 156,827 Peru 82,272 Croatia 15,346 Philippines 149,973 Cuba 2,116 Poland 138,619 Czech Republic 37,817 Portugal 1,368   Democratic Republic of the Congo 462,327 Qatar 4,168 Denmark 727 Republic of the Congo 30,889 Djibouti 2,929 Romania 24,141 Dominican Republic 23,183 Russia 516,217 Ecuador 44,072 Rwanda 80,284 Egypt 69,372 Saudi Arabia 57,504 El Salvador 10,490 Senegal 102,481
  • 29. 27Big Data and Quantitative Methods to Estimate Modern Day Slavery Equatorial Guinea 5,453 Serbia 25,981 Eritrea 44,452 Sierra Leone 44,644 Estonia 1,496 Singapore 1,105 Ethiopia 651,110 Slovakia 19,458 Finland 704 Slovenia 7,402 France 8,541 Somalia 73,156 Gabon 13,707 South Africa 44,545 Gambia 14,046 South Korea 10,451 Georgia 16,227 Spain 6,008 Germany 10,646 Sri Lanka 19,267 Ghana 181,038 Sudan 264,518 Greece 1,466 Suriname 1,522 Guatemala 13,194 Swaziland 1,302 Guinea 82,198 Sweden 1,237 Guinea-Bissau 12,186 Switzerland 1,040 Guyana 2,264 Syria 19,234 Haiti 209,165 Tajikistan 23,802 Honduras 7,503 Tanzania 329,503 Hong Kong,   SAR China 1,543 Thailand 472,811 Hungary 35,763 Timor-Leste 1,020 Iceland 22 Togo 48,794 India 13,956,010 Trinidad and Tobago 486 Indonesia 210,970 Tunisia 9,271 Iran 65,312 Turkey 120,201 Iraq 28,252 Turkmenistan 14,711 Ireland 321 Uganda 254,541 Israel 8,096 Ukraine 112,895 Italy 7,919 United Arab Emirates 18,713 Jamaica 2,386 United Kingdom 4,426 Japan 80,032 United States 59,644 Jordan 12,843 Uruguay 9,978 Kazakhstan 46,668 Uzbekistan 166,667 Kenya 37,349 Venezuela 79,629 Kuwait 6,608 Vietnam 248,705 Kyrgyzstan 16,027 Yemen 41,303 Laos 50,440 Zambia 96,175 Latvia 2,040 Zimbabwe 93,749 Source: The Global Slavery Index with the Pakistani State of Punjab. In the business world, that a government would sign a deal with a businessman to help eradicate slavery in its own borders is atypical. Yet Forrest was able to leverage his influence in Pakistan to encourage a conversation that aims to provide the state of Punjab with inexpensive coal in exchange for assurances that the government will work toward the liberation of its own people. 26 Although it is too early to see how Pakistan will hold up to its promise, this agreement may herald future MOUs between NGOs like Walk Free and governments that want to mitigate slavery, and one day even eradicate it.
  • 30. 28 SAIS Review Winter–Spring 2014 The GSI may also be influencing heads of state. Former U.S. President Jimmy Carter references the GSI several times in his new bestselling book, A Call to Action: Women, Religion, Violence, and Power. And the GSI has been publicly endorsed by, among others, Hillary Clinton, Gordon Brown, Julia Gillard, and Tony Blair.27 Free the Slaves The GSI strives to use big data to count the number of slaves in the world. Other NGOs have begun to employ longitudinal techniques to chronicle the lives of survivors of slavery once they are free. One such NGO is Free the Slaves (FTS), which Kevin Bales, Peggy Callahan, and Jolene Smith co- founded in 2000 as the sister-organization of Anti-Slavery International (the oldest international human rights organization in the world).28 Early in its evolution, FTS reasoned that the liberation of any slave would be beneficial not only for that individual, but also for the commu- nity, and thus produce a “freedom dividend,” multiplied by each additional person freed. As FTS explains, “Local communities thrive when formerly enslaved people start their own businesses; communities begin to flourish as people come together to organize and watch out for one another; children go to school—and the benefits extend for generations.” 29 For the past decade, FTS has partnered with different grassroots orga- nizations in Haiti, India, Nepal, Ghana, the Democratic Republic of Congo, and Brazil to empower local communities of the enslaved to seek liberation. In India, FTS has worked with a local grassroots organization called Mina Sansadham Evam Mahila Vikas Sansthan (MSEMVS).30 Through the ef- forts of MSEMVS, over 150 villages have eradicated slavery and trafficking in recent years and many more are beginning to experience liberation in the North Indian States of Uttar Pradesh and Bihar, two of India’s poorest states, as Figure 2 highlights. In addition to empowering people in rural Uttar Pradesh to seek libera- tion, MSEMVS has been among the first NGOs to begin several longitudinal studies on the effects, in addition to quantitative studies of the predictive factors of enslavement. The studies are intended to provide insight into: (1) whether slavery and trafficking have been eradicated; and (2) whether the socio-economic conditions of people living in these communities have improved. I consulted with Free the Slaves at this time, and, along with Ginny Baumann, Jody Sarich, Austin Choi-Fitzpatrick, and Jessica Leslie, helped put together a follow-up report for the village of Kukrouthi in Ut- tar Pradesh. The follow-up report was conducted among the residents of three hamlets in Kukrouthi village.31 There were two sources of information: The first was a set of 120 household level surveys, and the second was a set of focus group discussions. A total of 929 people were accounted for by the surveys. The time periods under comparison were 2009 (when the libera- tion process began) and 2011 (when the process of self-liberation by local residents was completed).
  • 31. 29BIG DATA AND QUANTITATIVE METHODS TO ESTIMATE MODERN DAY SLAVERY Some of the key findings between the 2009 and 2011 studies are as follows, providing credence to FTS’s supposition of there being a “freedom dividend” after liberation. Growth in Childhood Education One important indicator of a freedom dividend in Kukrouthi village is the number of children in school. The underlying premise is that in free com- munities children receive better education, which fuels a society’s human capital. In Kukrouthi, the team from MSEVMS found evidence of significant growth in childhood education rates. Whereas in 2009 only 69 percent of the school-aged children were reported to be in school, by 2011, 91 percent were enrolled, as Figure 3 illustrates. Figure 2. Uttar Pradesh, North India
  • 32. 30 SAIS Review WINTER–SPRING 2014 Better Nutrition Another key indicator illuminating the freedom dividend is access to ad- equate nutrition. As with childhood education, the team from MSEVMS reported a dramatic increase in the number of families that were able to eat three meals a day, from 31 percent in 2009 to 71 percent in 2011. This was more than a 200 percent increase, as Table 2 details. Table 2. Number of Daily Meals By Household in Kukrouthi, 2009 and 2011 Number of Meals Year Percentage Two Meals Per Day 2011 22% 2009 31% Three Meals Per Day 2011 71% 2009 31% No Response 2011 8% 2009 3% Figure 3. Percent Children in School in Kukrouthi, 2009 and 2011
  • 33. 31Big Data and Quantitative Methods to Estimate Modern Day Slavery Improved Access to Health Care Yet another strong indicator of a freedom dividend is access to health care, even if of rudimentary quality. In 2011, MSEVMS reported that almost the entire population of Kukrouthi village had access to healthcare. This was another dramatic increase compared to 2009, when MSEVMS found that just 52 percent of families received health care treatment. Table 3 provides a breakdown of this comparison. Table 3. Comparison of Access to Health Care in Kukrouthi, 2009 and 2011 Access to Health Care Year Percent Yes 2011 96% 2009 57% No 2011 3% 2009 43% Don’t Know 2011 1% 2009 . No Response 2011 1% 2009 1% Improvement in Childhood Vaccinations Lastly, in 2009, just one-third of children had the proper number of recom- mended vaccinations (i.e., three vaccinations). By 2011, this had increased to 90 percent, as Table 4 shows. Table 4. Comparison of Child Vaccinations in Kukrouthi, 2009 and 2011 Immunizations Year Percentage None 2011 . 2009 49% One 2011 3% 2009 7% Two 2011 7% 2009 12% Three 2011 90% 2009 33%
  • 34. 32 SAIS Review Winter–Spring 2014 A World Without Slavery Applying quantitative methods to the study of contemporary slavery could contribute significantly to shedding more light on the phenomenon. In collaboration with my colleagues at the Walk Free Foundation, I have used quantitative methods to estimate the total number of enslaved in the world today. This, in turn, has generated discussion among the media and pol- icy community on how to mitigate modern day slav- ery, with an eye toward its eradication. With Free the Slaves and MSEVMS, we have begun to chronicle systematically how communities can benefit from freedom. This information provides preliminary evidence to policy makers that liberating slaves provides a wide range of socioeconomic benefits. The modern day anti-slavery movement is young. Moving forward, we need more scholars and policy makers who want to explore what quantita- tive methods and big data can do for the movement. We are at a point in the world where everyone agrees that contemporary slavery is a wrong that must be addressed. The time is ripe for further discussion on how to make this a reality. I hope we can get there, at least in part, through employing quantitative methods and exploring big data. Notes 1  Kevin Bales, The Global Slavery Index, 2013. http://www.globalslaveryindex.org/report/#view- online 2  For example: http://www.gems-girls.org/get-involved/very-young-girls 3  Congo. https://www.freetheslaves.net/congo 4  CNN Freedom Project, http://thecnnfreedomproject.blogs.cnn.com. 5  For example: Survivors of Slavery Speak Out, http://survivorsofslavery.org 6  Trafficking In Persons Report, http://www.state.gov/j/tip/rls/tiprpt 7  For example: http://www.coha.org/the-trafficking-in-persons-report-who-is-the-united- states-to-judge 8  For example: John J. Mearsheimer, “The False Promise of International Institutions,” In- ternational Security, Vol. 19, No. 3 (1995) pp. 5–49. 9  For example: Anne Marie-Slaughter, A New World Order, (Princeton University Press, 2005). 10  Barack Obama, “Remarks by the President to the Clinton Global Initiative,” September 25, 2012. http://www.whitehouse.gov/the-press-office/2012/09/25/remarks-president-clinton- global-initiative 11  Steve Lohr, “The Age of Big Data,” The New York Times, February 11, 2012. http:// www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world. html?pagewanted=all&_r=0 12  “Polaris Project Launches Global Human Trafficking Hotline Network.” http://www. polarisproject.org/media-center/news-and-press/press-releases/767-polaris-project-launches- global-human-trafficking-hotline-network 13  “Darpa Reinventing Search Engines to Fight Crime,” Wired, February 11, 2014. http:// www.wired.co.uk/news/archive/2014-02/11/darpa-memex-human-trafficking 14  The Walk Free Foundation, http://www.walkfreefoundation.org Applying quantitative methods to the study of contemporary slavery could contribute significantly to shedding more light on the phenomenon.
  • 35. 33Big Data and Quantitative Methods to Estimate Modern Day Slavery 15  Elisabeth Behrmann, “Gates Helps Australia’s Richest Man in Bid to End Slavery,” Bloom- berg, April 14, 2013. http://www.bloomberg.com/news/2013-04-10/gates-helps-australia-s- richest-man-in-bid-to-end-slavery.html 16  The Global Slavery Index, http://www.globalslaveryindex.org 17  Kevin Bales, “The Social Psychology of Modern Slavery,” Scientific American, April 2002. 18  “Dry Bones,” The Economist, October 19, 2013. http://www.economist.com/news/ international/21588105-hateful-practice-deep-roots-still-flourishing-dry-bones 19  “Wereldwijd bijna 29 miljoen slaven [29 million people in slavery worldwide],” De Stan- daard, October 17, 2013. http://nos.nl/artikel/563375-wereldwijd-bijna-30-miljoen-slaven. html 20  “Casi 30 millones de personas son esclavos modernos [Almost 30 million people are modern slaves],” La Vanguardia, October 18, 2013. http://www.lavanguardia. com/20131018/54391301708/casi-30-millones-de-personas-son-esclavos-modernos-barce- lona.html 21  Tim Hume, “India, China, Pakistan, Nigeria on Slavery’s List of Shame, Says Report,” CNN, October 17, 2013. http://www.cnn.com/2013/10/17/world/global-slavery-index 22  Audie Cornish, “Report Estimates 30 Million People in Slavery Worldwide,” National Public Radio, October 17, 2013, http://www.npr.org/templates/story/story.php?storyId=236407720 23  Nilanjana Bhowmick, “Report: Almost 14 Million Indians Live Like Slaves,” Time, October 17, 2013. http://world.time.com/2013/10/17/report-almost-14-million-indians-live-like- slaves/ 24  “India Has Half the World’s Modern Slaves: Study,” The Times of India, October 18, 2013. http://timesofindia.indiatimes.com/india/India-has-half-the-worlds-modern-slaves-Study/ articleshow/24313244.cms 25  Abhijit Patnaik, “Modern Slavery Widespread in India,” Hindustan Times, October 17, 2013. http://www.hindustantimes.com/India-news/NewDelhi/Modern-slavery-widespread-in- India/Article1-1136431.aspx 26  Dennis Shanahan, “Andrew Forrest Strikes Cheap Coal Deal to End Pakistan Slav- ery,” The Australian, January 23, 2014, http://www.theaustralian.com.au/business/ mining-energy/andrew-forrest-strikes-cheap-coal-deal-to-end-pakistan-slavery/story- e6frg9df-1226808181875# 27  The Global Slavery Index. http://www.globalslaveryindex.org/endorsements 28  Anti-Slavery International. http://www.antislavery.org/english 29  FTS In India: Free a Village, Build a Movement. http://www.freetheslaves.net 30  https://www.ashanet.org/projects/project-view.php?p=907 31  Ginny Baumann, et al, “Follow Up Study of Slavery and Poverty In Kukrouthi village, St Ravidas Nagar District, Uttar Pradesh,” June 2012, unpublished manuscript. Free the Slaves.
  • 36. 35A Conversation with Arch PuddingtonSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014) 35© 2014 by The Johns Hopkins University Press A Conversation with Arch Puddington, Vice President for Research at Freedom House Who is the target audience of Freedom House reports? From the beginning, we have sought to provide analysis that combines scholarly rigor with a methodology and vocabulary that is accessible to the general public. Obviously, there is a niche group of policymakers here and in Europe, as well as journalists, scholars, political activists and dissidents, who make up our core audience. But our data are also widely used by educa- tors and students, including at the secondary level. We have also developed a growing audience among foreign government officials. This is in large measure due to the important role of democracy and honest governance in the calculations of international development agencies, financial institutions, and governments. Especially since Freedom House findings have been formally incorporated into the foreign assistance process of the American government, we have experienced a major increase in communications with foreign diplomats, who want to discuss, or com- plain about, our conclusions about their countries. Freedom House’s 2013 Freedom on the Net report examined internet activism and “increasingly sophisticated restrictions on internet free- dom” by authoritarian regimes. Based on the report, what opportuni- ties and obstacles do new technologies offer in promoting freedom? New technologies offer a significant opportunity to advance democracy. Throughout the world, online activists and ordinary social media users uti- lize these tools to organize, lobby, and hold their governments accountable. Women’s rights groups, free speech advocates, and human rights organiza- tions have staged successful advocacy campaigns to overturn or prevent the passage of oppressive laws. In many authoritarian states, such as China, Saudi Arabia, and Bahrain, exposés by online and citizen journalists reveal- ing corruption, police abuse, and pollution often force the authorities to acknowledge the issue, and in some cases, hold the perpetrators accountable. Unfortunately, the transformative power of digital media is not limited Arch Puddington is vice president for research at Freedom House. He manages the publication of Freedom in the World, an annual report assessing global political rights and civil liberties, and is responsible for the development of new research and advocacy programs.
  • 37. 36 SAIS Review Winter–Spring 2014 to individuals fighting to promote freedom. Technological advances also bring new tools to censor the web and intimidate citizens who are engaged in online speech that is deemed to threaten the regime, insult the dominant religion, or sow social discord. Authoritarian regimes monitor the personal communications of their citizens for po- litical reasons, with the goal of identify- ing and suppressing government critics and human rights activists. Such monitoring can have dire repercussions for the targeted individuals in those countries, including imprisonment, torture, and even death. In 2007, Freedom House published a report indicating a “profoundly disturbing deterioration”—a greater number of countries were becom- ing less free than were becoming more free. Could you share insights on this finding? According to our findings, more countries have experienced declines in freedom than have experienced gains during each of the last eight years. This is unprecedented in the forty-one-year history of Freedom in the World. At the same time, this decline is not in itself a cause for alarm. Many of the declines represent quite small setbacks and not a pell-mell retreat from the gains of pre- vious decades. Many coun- tries that embraced democ- racy over the previous four decades had little experience with the institutions of free- dom, and their adherence to good government standards is beginning to fray. Especially in times of relative scarcity, corruption is emerging as a particular evil, especially as top-to-bottom graft and favorit- ism erodes popular faith in democratic institutions. A more serious problem that is reflected in our findings is the durabil- ity of what we call modern authoritarian regimes. Russia’s Vladimir Putin and the Chinese Communist Party leadership are the best examples of this phenomenon, but there are others as well: Aliyev in Azerbaijan, the Iranian clerics, Correa in Ecuador, the post-Chavez group in Venezuela. Modern authoritarians preside over countries that are well-integrated into the global economic and diplomatic systems and often possess energy riches. Technological advances also bring new tools to censor the web and intimidate citizens who are engaged in online speech that is deemed to threaten the regime, insult the dominant religion, or sow social discord. . . . corruption is emerging as a particular evil, especially as top- to-bottom graft and favoritism erodes popular faith in democratic institutions.
  • 38. 37A Conversation with Arch Puddington The leaders are unabashedly antidemocratic and anti-Western. They devote their energies to the control of the political process, the press, civil society, and the rule of law. They avoid the excesses and stupidities of communism, especially in economic policy, but use nuanced and sophisticated methods to control the levers of power. Modern authoritarianism has emerged over the past fifteen years, and its practitioners have grown in power and even international respectability over time. Modern authoritarianism today ranks as the most worrying threat to freedom around the world. The most recent Freedom in the World report noted that the number of electoral democracies has risen, while the distribution of countries in each of the “free,” “partly free,” and “not free” categories did not change significantly in comparison to 2012. Why do you think this is the case? One way to think of a country with a designation as “free” is as a liberal or consolidated democracy. In recent years, the number of free countries has remained steady at eighty-seven to ninety, meaning that approximately 45 percent of the world’s sovereign states enjoy systems that guarantee com- petitive elections and a broad range of civil liberties. On the other hand, the number of electoral democracies has oscillated between 115 and 123. There are thus some thirty countries that can be said to have met internationally accepted standards for competitive elections but which fall short on other indicators that measure liberal democracy—press freedom, minority rights, gender equality, corruption, and so forth. Freedom in the World ranks Mexico as an electoral democracy but also places it in the “partly free” category because of the impact of uncontrolled violence. Indonesia likewise qualifies as an electoral democracy but is ranked as “partly free” because, among other problems, its government has been unable to secure the rights of religious minorities. Given that the standards for gaining a designation as an electoral democracy are less strict than for achieving designation as a free country, it is not surprising that there is more movement in and out of the electoral democracy category. What do you make of recent articles from BBC and al-Jazeera (among others) calling attention to corruption in the EU? Some have argued that the quality of electoral democracies in the United States and in Europe have been deteriorating. What are your thoughts on this as- sertion? How closely do you think popular indices like those from Freedom House mirror the reality on the ground? I’m not overly exercised about the level of corruption in the EU. Every so- ciety based on money transactions suffers from corruption to one degree or another. The key here is whether corruption is pervasive, officially toler- ated, and engaged in by the political leadership. The most damning report on European corruption was commissioned by the EU itself, and most EU countries have media which investigate corruption charges and an indepen-
  • 39. 38 SAIS Review Winter–Spring 2014 dent judiciary which prosecutes corrupt officials. Europe should be con- cerned when officials are intervening to prevent the press from uncovering corrupt acts or prosecutors from bringing charges against officials accused of graft. All too often accusations of widespread corruption in democracies are advanced by people of bad faith from countries—Russia and Belarus, for example—where corruption is a way of life. As for the United States, there clearly are growing problems with its political system. Gerrymandering has gotten worse and the new movement for voter identification has been implemented in ways that suggest efforts to weaken Democratic candidates. At the same time, the American system retains a unique dynamism. It remains open to the emergence of new faces (Barack Obama) and new forces (the Tea Party). Despite its multinational character, the United States has managed to avoid the emergence of influ- ential parties or movements that preach racism or xenophobia. We place considerable effort on capturing these nuances in Freedom in the World and other reports. Freedom in the World is not a report on governance per se; we endeavor to reflect the level of freedom an individual experiences on the ground, and zero in on the threats to freedom whether they come from the state, terrorists, extremist movements, or other sources. We have developed a methodology that looks at the broad set of institutions and val- ues that make up human freedom while providing a flexibility that enables us to highlight the qualitative differences between one society and another. How does Freedom House collect the quantitative and qualitative in- formation used in its reports? How do you extract significant insights from this information? We see our principal role as providing analysis, including scores and judg- ments about democratic performance, to the policymaking community, the media, and scholars. Our analysts make use of the vast sources of informa- tion that are available these days, including government reports, the find- ings of think tanks and NGOs, reports of multilateral institutions, press accounts, interviews with officials and critics alike, and the many other sources that have emerged in the data explosion era. Freedom House is a source for analysis, not data. We see our role as providing assessments on the state of freedom, identifying the principal threats to freedom, and showcasing global and regional trends. Using data from our country analysis, we are able to identify the global and regional trajectory of freedom, broadly defined, as well as specific elements of free- dom, such as freedom of expression and press, elections, corruption and transparency, civil society, and rule of law. We can, in other words, illuminate which institutions of democracy are most vulnerable to pressure from au- thoritarian rulers, and which institutions have proved most durable. There are other organizations that see their mission as providing data on elections, corruption, assaults on journalists, economic freedom, and so forth. Free- dom House, by contrast, works to inform the public about the gains and setbacks in democratic government, civil liberties, and personal freedom.
  • 40. 39A Conversation with Arch Puddington What do you make of the recent trend in which governments freely release open data? What are the policy implications of this trend? Clearly, enhanced transparency is preferable to less openness. My concern is that some governments will be tempted to fudge or falsify data or decide to stop publishing informa- tion when the results are embarrassing. For some time now, Argentina has been publishing inflation figures that most experts regard as bogus. After po- litical attention was drawn to spiraling crime rates, the Venezuelan government stopped publishing statis- tics on violent crime. These examples suggest that in the future, as in the past, the data world will be divided between democracies that almost always publish honest statistics and other countries whose data may or may not reflect reality. For democracies, political leaderships will face a new challenge in explaining the unwelcome news that will inevitably emerge from published data. More data will mean a more informed citizenry, especially at the elite level. But it will also mean more pressure on governments to communicate, often in response to the arguments of demagogues, why unemployment rates, inequality, traffic accidents, or test scores for children are moving in the wrong direction. Authoritarian regimes will have it easier. Their leaders will either quash uncomfortable facts or distort them. Here it will be essential that interna- tional financial institutions, transparency think tanks, and the global busi- ness community weigh in by demanding honest accounting. It is instructive that Argentina agreed to adjust its inflation figures after pressure from the IMF. The field of international affairs has become more focused on collecting and analyzing large quantities of data. What would you recommend to international affairs students as the field be- comes more data-driven? I would urge students to remember that data and facts can be manipulated and misused. Serious assessment of a society’s political well-being requires facts, but it also demands honest interpretation. An overemphasis on data can distort an analyst’s efforts to understand the true quality of freedom as thoroughly as can outright bias. . . . in the future, as in the past, the data world will be divided between democracies that almost always publish honest statistics and other countries whose data may or may not reflect reality. An overemphasis on data can distort an analyst’s efforts to understand the true quality of freedom as thoroughly as can outright bias.
  • 41. 41Corruption, Transparency, and Apathy in the Western WorldSAIS Review vol. XXXIV no. 1 (Winter–Spring 2014) Of Note A Deterioration of Democracy? Corruption, Transparency, and Apathy in the Western World Rachel Ostrow Arch Puddington, in his interview with the SAIS Review of International Affairs, expresses a firm belief in the power of interpretation. “An over- emphasis on data,” he says, “can distort an analyst’s efforts to understand the true quality of freedom as thoroughly as can outright bias.” Freedom House’s annual reports on freedom have made data on democracy accessible for millions of people in the diplomatic, academic, and wider communities. These analyses have criticized governments throughout the Middle East, Africa, and Asia for anti-democratic and autocratic methods. However, Free- dom House’s important work researching authoritarianism and democracy throughout the world should start to focus once again on its birthplace—the Western world. Freedom House, based in the United States (and largely funded by government agencies such as the State Department and the U.S. Agency for International Development), could be—and has been—accused of hav- ing Western biases. A quick look at Freedom in the World 2014 shows that the United States and Canada, as well as the majority of European nations, are classified as “free” right up to the Ukrainian border.1 However, several coun- tries in Europe—as Freedom House rightly notes—have suffered democratic backsliding. France, Switzerland, and Hungary have all passed laws or gone through social movements seeking to limit the rights of migrants and ethnic minorities. These occurrences, though noted in Freedom House’s analysis, do not seriously affect the calculations within. Hungary itself is an excellent example of where this analysis has masked the more sinister undertones within an open democracy. Hungary’s recent re-election of Viktor Orban—in an election widely seen as free and fair—can be seen as a backwards turn for Hungarian democracy. As a member of the right-wing, nationalist Fidesz party, Orban will likely have to make concessions to the far-right, anti-Semitic, and anti-Roma Jobbik party, which 41© 2014 by The Johns Hopkins University Press Rachel Ostrow is a second-year M.A. candidate at the Johns Hopkins University Paul H. Nitze School of Advanced International Studies (SAIS) concentrating in Russian and Eurasian Studies. She is Web Editor of The SAIS Review.