are algorithms really a black box

Algorithms - A technical perspective
Are they really a black box?
Ansgar Koene
Algorithms Workshop
15 February 2017
http://unbias.wp.horizon.ac.uk/

• Technical issues
– Fundamental
– Practical
• Business/management interests, e.g. trade secrets
 How a decision was reached is in principle possible to be
revealed, if sufficient data about the state of the system at
time of operation is available (and we have access to the
code)
 Why a particular chain of operations was done is much more
difficult (especially with ML)
Origins of the ‘black box’
3

• Machine Learning (ML) • Hand coded
Fundamental properties
O1=f(w1,H1,w2,H2,w3,H3)

Machine Learning
• If parameters & data are
known, we can trace how
the output is computed.
• If history of data is known,
we can in principle trace
how parameters were set.
• Explaining why certain
parameters are optimal can
be very difficult.
=> Explaining why output is
produced is difficult.
Hand Coded
• If parameters & data are
known, we can trace how
the output is computed.
• We known the parameters
were set by engineers.
• We can ask the engineers
why certain parameters
were chosen.
=> Explaining why output is
produced depends on the
engineers.
Fundamental transparency: how vs. why

• High dimensionality of Big Data algorithms can make
interpretation of the ‘explanation’ problematic
– e.g. Google page ranking algorithms is estimated to
involve 200+ parameters
• Approximated transparency through dimensionality
reduction, e.g. Principle Component Analysis (PCA)
– requires case-by-case analysis depending on input data
– ‘general’ solution only valid for the ‘majority case’
conditions
High dimensionality, a.k.a. when an
explanation is not transparent
6

Machine Learning
• If Machine Learning
algorithms use ‘in situ’
continuous or intermitted
learning, the parameter
setting change over time.
• To re-create a system
behaviour requires
knowledge of the past
parameter states.
Hand Coded
• Hand coded systems are
also frequently updates,
especially if there is an
‘arms race’ between the
service provider and users
trying to ‘game’ the system
(e.g. Google search vs.
Search Engine Optimization)
Practical issues: non-static algorithms
In some cases, randomness might be built into an
algorithm’s design meaning its outcomes can never be
perfectly predicted.

• Defining precisely what a task/problem is (logic)
• Break that down into a precise set of instructions, factoring
in any contingencies, such as how the algorithm should
perform under different conditions (control).
• “Explain it to something as stonily stupid as a computer”
(Fuller 2008).
• Many tasks and problems are extremely difficult or
impossible to translate into algorithms and end up being
hugely oversimplified.
• Mistranslating the problem and/or solution will lead to
erroneous outcomes and random uncertainties.
The challenge of translating a
task/problem into an algorithm
8

System design in the real world
9
https://effectivesoftwaredesign.com/2012/04/23/communication-problems-in-software-projects/

• Algorithm are created through: trial and error, play,
collaboration, discussion, and negotiation.
• They are teased into being: edited, revised, deleted and
restarted, shared with others, passing through multiple
iterations stretched out over time and space.
• They are always somewhat uncertain, provisional and messy
fragile accomplishments.
• Algorithmic systems are not standalone little boxes, but
massive, networked ones with hundreds of hands reaching
into them, tweaking and tuning, swapping out parts and
experimenting with new arrangements.
Algorithm creation
10
Gillespie, T. (2014a) The relevance of algorithms, in Media Technologies: Essays on Communication, Materiality, and Society, ed. by
Gillespie, T., Boczkowski, P.J. and Foot, K.A. Cambridge, MA: MIT Press, pp.167-93.; Neyland, D. (2014) On organizing algorithms. Theory,
Culture and Society, online first. cited in: Kitchin, Rob, and Martin Dodge. 2017. “The (in)security of Smart Cities: Vulnerabilities, Risks,
Mitigation and Prevention.” SocArXiv. February 13. osf.io/preprints/socarxiv/f6z63.

• Deconstructing and tracing how an algorithm is constructed
in code and mutates over time is not straightforward.
• Code often takes the form of a “Big Ball of Mud”: “[a]
haphazardly structured, sprawling, sloppy, duct-tape and
bailing wire, spaghetti code jungle”.
Examining pseudo-code/source code
11
Foote, B. and Yoder, J. (1997) Big Ball of Mud. Pattern Languages of Program Design 4: 654-92; cited in
Kitchin, Rob, and Martin Dodge. 2017. “The (in)security of Smart Cities: Vulnerabilities, Risks, Mitigation and Prevention.” SocArXiv. February 13.
osf.io/preprints/socarxiv/f6z63.

• Reverse engineering is the process of articulating the
specifications of a system through a rigorous examination
drawing on domain knowledge, observation, and deduction
to unearth a model of how that system works.
• By examining what data is fed into an algorithm and what
output is produced it is possible to start to reverse engineer
how the recipe of the algorithm is composed (how it weights
and preferences some criteria) and what it does.
Reverse engineering
12

• HOW
– With access to the code, data and parameter settings,
HOW the output was produced can be ‘explained’.
– High dimensionality can make the ‘explanation’ difficult
to understand.
– Dimensionality reduction can help to generate an
approximate explanation that is understandable.
• WHY
– Can be (very) difficult to determine, especially if Machine
Learning methods are used.
– Approximate explanation based on the manually set
optimization targets can help.
Conclusion
13

UnBias project
14
http://unbias.wp.horizon.ac.uk/

are algorithms really a black box

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a are algorithms really a black box

Similar a are algorithms really a black box (20)

Más de Ansgar Koene

Más de Ansgar Koene (14)

Último

Último (20)

are algorithms really a black box

Notas del editor