[2024]Digital Global Overview Report 2024 Meltwater.pdf
School of Data - mapping company networks
1. Some
slide
prompts
to
support
a
data
framing
inves3ga3on
around
corporate
data
–
originally
prepared
for
the
OGP
Fes3val,
London,
October
2013.
For
more
informa3on,
contact:
schoolOfData.org
1
2. These
notes
provide
a
worked
example
of
how
to
download
company
ownership
rela3onship
data
from
OpenCorproates
(opencorporates.com)
using
the
cross-‐
plaNorm
data
cleaning
tool
OpenRefine
(openrefine.org),
and
then
visualise
the
data
using
the
cross-‐plaNorm
Gephi
netwrok
visualisa3on
tool
(gephi.org).
2
3. OpenCorporates
is
a
private
company
that
has
set
itself
the
ambi3ous
task
of
building
a
database
of
registered
company
informa3on
for
every
legal
corporate
en3ty
in
the
world.
One
of
the
views
OpenCorporates
offers
over
at
least
some
of
the
data
in
its
database
shows
how
companies
are
connected
by
beneficial
ownership
or
shareholder
rela3onships.
Although
complex,
this
diagram
is
“human
readable”
–
the
data
is
presented
in
a
way
that
is
intended
to
make
some
sort
of
meaningful
sense
to
us.
3
4. But
as
well
as
publishing
data
for
us
humans
to
read,
OpenCorporates
also
makes
data
available
in
a
way
that
machines
can
read
-‐
machine
readable
data.
You
may
have
heard
of
the
term
“API”
in
the
context
of
data
publishing
websites.
To
all
intents
and
purposes,
an
API
is
an
interface
that
computers
can
use
to
get
informa3on
out
of
websites
in
a
way
that
they,
and
the
databases
they
work
with,
can
understand.
The
data
is
published
in
a
format
known
as
JSON
–
Javascript
Object
Nota3on.
But
you
don’t
really
need
to
know
much
more
than
that
–
just
that
it’s
called
JSON,
and
tools
that
can
parse
and
work
with
JSON
can
parse
and
work
with
the
data
that
the
OpenCorporates
API
publishes.
4
5. If
you
aren’t
a
programmer,
here’s
way
of
ge]ng
the
data
out
of
OpenCorporates
and
into
a
tabular
form
you
may
be
more
comfortable
with,
and
which
we
can
use
to
generate
a
network
diagram
to
display
in
a
tool
such
as
Gephi…
You
can
download
the
OpenRefine
applica3on
from
openrefine.org.
When
you
run
it
on
your
computer,
it
will
launch
an
applica3on
that
runs
inside
a
browser
tab
using
your
default
web
browser.
5
6. We
can
get
company
ownership
(subsidiary
rela3ons,
major
shareholdings,
etc)
from
OpenCorporates
by
hacking
the
web
address/URL
of
a
company
page
on
OpenCorporates.
From
a
company
page
on
OpenCorporates,
which
should
have
the
form:
http://opencorporates.com/companies/JURISDICTION/
COMPANY_ID!
add
the
following
to
the
end
of
the
web
address/URL:
/network.json?depth=2
to
give
something
with
the
following
form:
http://opencorporates.com/companies/JURISDICTION/
COMPANY_ID/network.json?depth=2!
(Note:
company
network
data
may
not
be
available
in
all
jurisdic3ons
or
for
all
companies.)
6
7. In
OpenRefine,
select
the
op3on
to
Create
[a
new]
Project
using
the
web
address
–
or
URL
–
to
the
JSON
data
page
that
reveals
the
data
rela3ng
to
the
corporate
ownership
network
of
the
company
we
are
interested
in
on
OpenCorporates.
Note
that
you
can
import
data
into
OpenRefine
from
several
web
addresses
all
in
one
go,
though
the
data
returned
from
each
URL
should
have
the
same
format
or
structure.
Using
mul3ple
URLs
results
in
a
combined
data
set,
which
can
be
quite
handy.
7
8. Being
machine
readable,
the
data
makes
more
sense
to
OpenRefine
than
it
probably
does
to
us!
Select
a
block
of
data
in
the
preview
view
that
is
typical
of
a
set
of
data
that
you
want
to
map
into
a
single
row
in
a
“tradi3onal”
spreadsheet
like
view.
Data
blocks
are
typically
contained
within
braces
(curly
brackets);
these
things
:
{
}
Note
that
in
some
machine
readable
data,
some
data
blocks
may
be
contained
within
other
data
blocks…
Each
of
the
items
in
a
single
data
block
can
be
mapped
into
a
separate
cell
–
that
is,
a
separate
column
–
in
a
single
row
of
data.
So
each
data
block
is
a
row,
and
each
item
in
the
block
is
a
column….
OpenRefine
will
give
you
a
preview
of
how
the
data
will
look
if
you
click
the
right
bumon!
8
9. You
can
preview
the
effect
of
making
par3cular
block
selec3ons
using
Update
Preview.
To
return
to
the
block
highlighter,
use
‘Pick
Record
Nodes’.
When
you
are
happy
with
your
selec3on,
you
are
ready
to
“Create
Project”.
9
10. Once
we’re
happy
with
the
data
preview,
we
can
import
the
data
into
a
more
familiar
looking
layout.
The
arrows
at
the
top
of
each
column
pop
up
menus
that
allow
us
to
run
a
wide
variety
of
opera3ons
on
a
column.
One
of
the
opera3ons
let’s
us
change
the
column
name,
so
I’m
going
to
rename
the
child
company
and
parent
company
columns
to
what
Gephi
expects:
Source
and
Target.
10
11. This
is
the
format
that
Gephi
wants
to
see
when
we
import
data
from
a
simple
two
column,
comma
separated
variable
(CSV)
text
file.
One
of
the
columns
needs
to
be
called
Source,
another
needs
to
be
called
Target.
When
construc3ng
the
network
diagram,
Gephi
then
knows
to
draw
a
line
going
from
each
Source
element
to
the
corresponding
Target.
11
12. The
OpenCorporates
network
data
in
tabulated
form.
The
default
column
names
are
not
necessarily
as
human
readable
as
they
could
be!
In
par3cular,
we
can
iden3fy
the
name
of
the
parent
company
and
the
child
company
for
each
ownership
rela3on.
We
also
have
access
to
the
OpenCorporates
IDs
for
all
of
those
companies.
The
type
of
rela3onship
between
the
companies
is
also
described.
For
the
moment,
we
will
treat
them
all
equally.
(If
you
want
to
view
just
those
company
connec3ons
that
relate
to
a
par3cular
type
of
rela3on,
use
the
Facet
or
Text
Filter
tool
applied
to
the
appropriate
column.)
12
13. From
the
appropriate
column
menu,
select
“Edit
Column”
and
then
“Rename
this
column”
to
change
the
column
name.
13
14. We
can
now
export
the
data
using
the
Custom
Tabular
Exporter.
Deselect
all
the
columns
then
select
just
the
Source
and
Target
columns
–
we
will
only
export
data
from
these
two
columns.
14
15. Preview
your
data
to
check
that
it
looks
like
the
sort
of
data
you
expect
to
export.
From
the
Download
tab,
select
the
CSV
output
type
and
export
your
data
–
it
should
be
saved
into
the
default
download
directory
used
by
your
browser,
with
a
file
name
that
corresponds
to
the
OpenRefine
project
name.
You
should
have
the
two
column
data
saved
to
your
computer
that
you
can
now
load
in
to
Gephi.
15
16. Gephi
is
a
powerful
cross-‐plaNorm
desktop
tool
for
visualising
data
that
describes
networks,
such
as
social
networks
or
corporate
ownership
networks.
You
can
import
data
into
Gephi
using
specialised
graph/network
representa3on
formats,
or
from
simple
two
column
data
files
where
each
describes
a
simple
connec3on
between
two
elements
(eg
thing1,
thing2
would
say
that
thing1
connects
to
thing2).
You
can
download
the
Gephi
applica3on
from
gephi.org.
When
you
run
it
on
your
computer,
it
will
launch
a
desktop
applica3on.
Note
that
Gephi
requires
Java
–
if
you
are
on
a
Mac,
you
may
need
to
download
and
install
Java
yourself:
www.java.com
16
17. Launch
Gephi
(download
it
from
gephi.org
if
you
don’t
already
have
it
installed)
and
select
Data
Laboratory.
If
the
Data
Table
toolbar
is
empty,
go
to
the
applica3on’s
File
menu
and
select
‘New
Project’.
A
new
project
will
be
created
and
you
should
see
several
toolbar
op3ons
appear
in
the
Data
Table.
17
18. Load
the
data
in
using
the
“Import
Spreadsheet”
tool
op3on.
Make
sure
that
you
select
Edges
table
as
the
table
type.
If
your
data
file
does
not
have
Source
and
Target
column
names,
an
error
will
occur
and
you
will
not
be
able
to
import
the
data
file.
(In
such
a
case,
you
could
always
open
the
file
in
a
text
editor,
change
the
column
names
in
the
file,
save
it,
and
try
again.
Alterna3vely,
go
in
to
OpenRefine,
change
the
column
names
there,
and
re-‐
export
the
custom
tabulated
data…)
18
19. The
final
stage
of
the
import
gives
some
addi3onal
informa3on
about
how
uploaded
data
will
be
treated.
Because
we
are
simply
loading
in
data
that
describes
how
one
company
(iden3fied
by
its
name)
is
connected
to
another
company
(also
iden3fied
by
its
name),
we
need
to
get
Gephi
to
automa3cally
create
a
node
each
3me
it
sees
a
new
company
(as
iden3fied
by
its
company
name…).
19
20. When
the
data
is
imported,
we
can
preview
it,
either
by
looking
at
a
list
of
nodes
that
have
been
created,
or
‘edges’
–
that
is,
connec3ons
between
two
companies.
20
21. So
now
let’s
see
where
we
can
start
to
view
this
data
as
a
network
visualisa3on.
Click
on
the
top
paleme
Overview
bumon
to
get
an
overview
of
the
network
in
visual
form.
This
is
the
area
where
we
can
interac3vely
visualise
the
network.
21
22. The
default
Overview
layout
has
three
main
areas:
-‐
in
the
middle
is
the
canvas
where
we
can
see
the
current
layout
of
the
network;
along
the
les
hand
side
of
the
central
panel
are
several
tools
for
opera3ng
on
the
elements
shown
on
the
canvas;
along
the
bomom
of
the
central
panel
are
several
tools
for
controlling
how
text
labels
are
displayed.
-‐
to
the
les
are
several
tools
for
manipula3ng
what
the
network
looks
like:
tools
for
laying
out
the
network
(that
is,
posi3oning
the
nodes)
automa3cally,
as
well
as
colouring
and
sizing
the
nodes;
-‐
to
the
right
are
several
tools
that
allow
us
to
analyse
and
process
the
graph
(that
is,
the
mathema3cal
structure
that
defines
the
network);
for
example,
we
can
run
various
sta3s3cs
on
the
network,
or
filter
the
nodes
that
are
displayed
according
to
one
or
more
specified
criteria.
22
23. Let’s
start
by
laying
out
the
network.
There
are
several
layout
tools
provided
by
default
(you
can
install
more
from
the
Tools-‐>Plugins
menu)
which
each
have
slightly
different
behaviours
and
can
be
differently
effec3ve
at
laying
out
networks
with
different
sorts
of
structure.
A
couple
of
good
all-‐round
layout
algorithms
are:
-‐
ForceAtlas2
-‐
Yifan
Hu.
If
you
imagine
connected
nodes
held
together
by
springs,
you
can
thing
of
these
layout
tools
as
trying
to
posi3on
the
nodes
so
that
the
springs
are
stretched
as
limle
as
possible.
Sort
of.
23
24. At
the
moment,
we
don’t
know
what
each
node
represents.
By
default,
when
labels
are
switched
on,
Gephi
looks
for
a
label
column
value
associated
with
a
node
and
displays
that.
But
we
can
also
display
other
values.
In
this
case,
we
are
using
a
company
name
as
the
node
ID,
so
we
can
select
id
as
the
element
to
display
when
we
switch
labels
on.
Click
on
the
clipboard
icon
on
the
toolbar
at
the
bomom
of
the
screen
to
raise
the
label
selector.
To
actually
switch
labels
on,
click
on
the
lesmost/darket
T
bumon
on
the
toolbar
at
the
bomom
of
the
screen.
The
slider
on
the
right
controls
the
text
label
size.
24
25. We
can
also
change
the
size
of
labels
propor3onal
to
the
size
of
a
node
–
but
how
do
we
size
nodes?
Whilst
it
is
possible
to
load
in
data
that
describes
various
amributes
associated
with
each
node
(for
example,
in
the
case
of
a
company
node
it
might
be
the
turnover
or
profit
in
the
last
financial
year),
we
can
also
generate
informa3on
about
each
node
based
on
various
network
proper3es.
For
example,
the
degree
of
a
node
says
how
many
connec3ons
it
has
with
other
nodes.
Where
connec3ons
are
‘directed’
–
that
is,
represented
by
arrows
–
the
number
of
arrows
that
leave
a
node
is
referred
to
as
the
out-‐degree
of
the
node,
and
the
number
of
arrows
that
come
into
a
node
as
the
in-‐degree.
25
26. We
can
use
the
Average
Degree
sta3s3c
tool
to
calculate
the
degree,
in-‐degree
and
out-‐degree
values
for
each
node.
We
can
then
use
these
values
as
the
basis
for
sizing
the
nodes
in
the
network
visualisa3on.
26
27. Here
we
have
sized
the
nodes
by
Degree.
The
min
and
max
size
parameters
can
be
set
as
required
to
scale
the
size
of
the
nodes.
27
28. We
can
set
the
label
size
so
that
it
is
propor3onal
to
the
node
size
–
from
the
black/
dark
A
label
on
the
toolbar
at
the
bomom
of
the
screen,
select
the
[proporIonal
to]
Node
Size
menu
op3on.
28
29. As
well
as
tools
for
genera3ng
grandscale
layouts,
there
are
also
layout
tools
for
tweaking
a
par3cular
layout.
The
Expansion
tool
just
stretches
(or
shrinks)
the
layout
in
the
x
and
y
direc3ons.
This
can
be
good
for
just
pu]ng
a
bit
of
space
into
a
layout.
The
Label
Adjust
tool
juggles
nodes
so
that
their
labels
don’t
overlap.
Note
that
this
tool
may
move
some
nodes
quite
a
distance
compared
to
their
neighbours
and
so
may
upset
any
meaningful
spa3al
rela3onships
obtained
using
the
other
layout
tools.
29
30. We
can
colour
and
size
nodes
according
to
a
wide
range
of
proper3es
obtained
from
running
various
network
sta3s3cs.
As
you
work
with
network
data
more
and
more,
you
start
to
get
a
feel
for
which
tools
to
use
to
help
you
look
for
par3cular
pamerns,
structures
and
stories
within
the
data.
But
that
is
a
tutorial
for
another
day…
30
31. We
can
use
various
tools
in
concert
to
tweak
the
layout
of
the
network.
In
this
example,
I
have:
-‐
sized
the
nodes
by
degree;
-‐
set
the
label
sizes
propor3onal
to
the
Degree;
-‐
tweaked
the
scale
using
the
text-‐size
slide;
-‐
used
the
Authority
value
(obtained
via
the
HITS
sta3s3c)
to
colour
the
nodes;
-‐
laid
out
the
network
using
a
ForceAtlas2
algorithm,
a
bit
of
Expansion
and
a
dash
of
Label
Adjust.
31