2024: Domino Containers - The Next Step. News from the Domino Container commu...
Scoda project companygraph
1. This
tutorial
describes
how
to
use
network
analysis
tools
to
visually
explore
the
links
between
companies
working
on
the
same
contract.
1
2. The
example
dataset
we
will
use
comes
from
the
World
Bank.
Each
row
represents
a
contract.
Inspec@ng
the
column
names
tells
us
what
data
we
have
available
about
each
contract.
Looking
at
the
data,
we
can
see
how
we
could
order
the
companies
based
on
the
value
of
the
total
contract
amount;
or
we
might
order
the
contracts
by
@me;
or
we
might
look
to
see
which
contracts
were
awarded
in
a
par@cular
project,
or
to
a
par@cular
company
in
the
event
of
the
same
company
being
awarded
more
than
one
contract.
2
3. We
might
also
wish
to
look
for
paFerns
in
the
data
that
show
us
how
the
things
described
in
one
row
might
connect
to
things
described
in
other
rows.
For
example,
can
we
organise
the
data
somehow
to
see
which
companies
are
associated
with
which
projects?
Could
a
network
style
visualisa@on
help
us
do
this?
3
4. But
if
we
were
to
draw
a
network,
what
sort
of
thing
should
we
connect
to
what?
And
how
would
would
know
what
to
connect
to
each
other?
One
way
is
to
look
at
the
data…
at
which
point
we
might
no@ce
that
some
of
entries
within
a
column
take
on
the
same
value.
This
means
that
we
can
“connect”
the
data
that
appears
in
different
rows
using
these
common
elements…
4
5. So
what
columns
have
usefully
repea@ng
elements?
The
projects
column
certainly
has
repea@ng
elements,
so
if
we
should
be
able
to
draw
diagrams
that
show
all
the
companies
that
connect
to
each
project.
And
if
a
company
is
associated
with
more
than
one
project,
it
should
in
a
certain
sense
be
seen
to
join
those
projects
together…
5
6. A
few
of
the
contract
numbers
repeat,
so
it
might
be
interes@ng
to
explore
the
extent
to
which
companies
connect
to
contracts.
If
two
different
companies
are
associated
with
the
same
contracts,
that
might
be
interes@ng.
6
7. Let’s
get
some
data
so
we
can
start
to
explore
the
network…
7
8. We
just
need
to
do
a
liFle
bit
of
@dying
of
the
data
before
we
make
use
of
it.
The
major
problem
is
that
the
Total
Contract
Amount
column
does
not
contain
numbers,
as
such…
In
par@cular,
we
need
to
get
rid
of
the
dollar
sign.
Let’s
create
a
new
column
into
which
we
can
put
the
cleaned
values.
8
9. This
liFle
bit
of
code
says:
take
the
value
of
each
cell
in
the
original
column
and
replace
the
$
symbol
with
nothing
(that
is,
an
empty
string).
In
other
words,
delete
the
dollar
sign…
Put
this
value
in
the
corresponding
cell
of
the
new
column,
and
make
the
cell
a
number
type.
9
10. Now
we
can
export
the
data
using
the
Custom
Tabular
Exporter,
which
allows
us
to
select
just
those
columns
we
want
to
export.
(This
can
be
very
handy
when
a
table
has
a
large
number
of
columns
that
we
are
not
interested
in!)
I
have
rearranged
the
cells
in
the
Custom
Tabular
Exporter
simply
by
clicking
on
them
and
dragging
them
around.
We
just
want
three
columns
for
now:
Project
ID,
Supplier,
and
our
new
Amount
column.
Now
that
you
know
how
to
export
the
data
just
a
few
columns
at
a
@me,
once
you
are
comfortable
with
the
process
of
visualising
the
data,
you
should
be
able
to
take
other
slices
through
the
data
(such
as
companies
related
to
contracts)
and
visualise
them
yourself.
You
might
also
like
to
try
using
a
similar
method
on
a
data
set
of
your
own…
10
11. There’s
a
final
bit
of
@dying
to
do
before
we
can
use
this
data
in
Gephi,
the
applica@on
we’ll
be
using
to
visualise
the
network.
In
par@cular,
Gephi
expects
the
data
to
be
presented
to
it
with
par@cular
column
names.
Open
the
exported
CSV
data
in
a
text
editor
and
rename
the
columns:
Source,Target,Weight
(no
spaces?)
Note
–
you
could
have
also
renamed
the
columns
in
OpenRefine
before
expor@ng
them…
11
12. We
might
also
wish
to
look
for
paFerns
in
the
data
that
show
us
how
the
things
described
in
one
row
might
connect
to
things
described
in
other
rows.
For
example,
can
we
organise
the
data
somehow
to
see
which
companies
are
associated
with
which
projects?
Could
a
network
style
visualisa@on
help
us
do
this?
12
13. Network
diagrams
allow
us
to
show
rela@onships
between
different
things.
Networks
are
referred
to
in
mathema@cal
terms
as
graph
structures,
or
graphs.
You
may
be
more
familiar
with
thinking
of
things
like
line
charts
and
bar
charts
as
graphs,
but
when
it
comes
to
network,
we
use
the
term
graph
to
describe
the
mathema@cal
structure
that
defines
the
network.
The
circles
–
or
nodes
–
represent
“things”
in
the
network,
in
this
case,
par@cular
companies
or
projects.
The
lines
–
or
edges
–
represent
rela@onships
between
the
things
in
the
network.
In
this
example,
the
edges
represent
contracts
that
associate
a
par@cular
company
with
one
or
more
projects,
(or
conversely,
associate
a
project
with
one
or
more
companies).
Where
nodes
are
placed
in
the
diagram
can
be
used
to
convey
informa@on
about
the
structure
of
the
network.
Many
different
algorithms
exist
to
lay
out
(that
is,
place,
or
posi@on)
the
nodes
at
specific
points
in
the
diagram.
Typically,
we
try
to
place
nodes
that
are
heavily
interconnected
by
edges
close
to
each
other.
Nodes
that
are
grouped
closely
together
on
the
page
might
then
be
assumed
to
be
associated
in
some
way
because
of
the
increasing
number
of
links
that
connect
them
to
each
other.
13
14. Launch
Gephi
and
from
the
File
menu
select
New
Project.
Click
on
the
Data
Laboratory
tab,
and
then
Import
Spreadsheet.
Load
in
the
file
(with
amended
column
names)
as
an
Edges
Table.
The
default
seings
should
be
fine…
14
15. Click
on
the
Overview
tab
–
you
should
see
the
network
that
connects
Companies
to
Project
IDs
displayed
there…
But
what
does
it
mean?
And
can
we
@dy
it
up
a
liFle?!
15
16. I
used
the
Yifan
Hu
layout
to
generate
this
view
over
the
network.
Yifan
Hu
is
a
good
all
round
layout
engine
that
works
par@cularly
well
when
the
data
is
hierarchically
structured.
Another
good
general
purpose
layout
algorithm
is
ForeceAtlas2.
16
17. Whilst
we
might
get
a
feeling
for
the
structure
and
shape
of
the
dataset
as
a
whole
from
the
overall
visualisa@on,
we
oken
want
to
inspect
one
or
more
of
the
nodes
in
detail.
The
quickest
way
of
doing
this
is
to
look
at
the
labels…
You
may
also
have
no@ced
that
the
edge
thickness
is
thicker
for
some
lines
than
others.
In
this
case,
the
line
thicknesses
are
propor@onal
to
the
contract
value,
which
we
set
in
the
weight
column.
If
a
company
is
associated
with
more
than
a
single
contract
on
a
par@cular
project,
the
edge
weight
well
be
propor@onal
to
the
overall
(total)
sum
of
values
of
all
the
contracts
rela@ng
that
company
to
that
project.
17
18. As
well
as
using
space
(or
posi@on)
and
colour
to
represent
structural
elements
of
the
network,
we
can
also
use
edge
weight
(that
is
the
thickness,
or
width)
of
the
lines
connec@ng
nodes
to
each
other
to
represent
some
feature
of
the
network.
In
this
case,
we
might
use
edge
weight
to
represent
the
value
of
contract
that
connects
a
company
with
a
project,
or
the
number
of
contracts
that
a
company
has
on
a
par@cular
project.
When
placing
nodes,
we
might
also
use
edge
weight
to
contribute
to
the
determina@on
of
how
closely
two
connected
nodes
should
be
placed
to
each
other.
If
you
think
of
the
edge
thickness
in
terms
of
the
size,
thickness
or
strength
of
a
mechanical
spring,
you
might
perhaps
start
to
imagine
how
nodes
connected
by
thick
springs
will
be
pulled
closer
to
each
other
than
nodes
connected
by
much
weaker
springs.
18
19. As
well
as
edge
thickness,
we
might
also
make
use
of
node
size
to
highlight
some
feature
of
the
network.
In
this
example,
we
use
node
size
to
represent
the
degree
of
each
node,
that
is,
the
number
of
edges
connected
to
it.
Some@mes,
we
might
want
to
highlight
nodes
that
have
small
numbers
of
connec@ons,
for
example
to
iden@fy
projects
with
very
few
companies
contracted
to
them.
In
this
case,
we
might
make
nodes
with
only
a
single
incoming
edge
very
large,
and
nodes
with
large
number
of
edges
much
smaller.
The
node
size
thus
represents
how
well
connected
a
node
is.
In
this
case,
the
size
of
the
project
nodes
indicates
how
many
companies
are
associated
with
it,
and
the
size
of
the
company
nodes
depicts
how
many
project
contracts
the
company
is
engaged
with.
Note
that
we
can
combine
edge
weight
and
node
size,
for
example,
by
seing
node
size
propor@onal
to
the
summed
weights
of
edges
that
are
connected
to
the
node.
Hopefully,
you
are
already
star@ng
to
see
how
a
network
diagram
can
provide
a
range
of
powerful
visual
representa@ons
for
helping
us
explore
the
structure
of
network
and
iden@fy
key
elements
of
it.
19
20. We
can
size
the
nodes
according
to
sta@s@cal
values
calculated
over
the
network.
In
this
case,
we
might
want
to
highlight
nodes
according
to
the
total
value
of
contracts
flowing
into
them
(for
companies)
or
out
of
them
(for
projects).
The
weighted
average
sta@s@c
calculates
the
corresponding
value
for
each
node
in
the
network.
The
spline
operator
in
the
Ranking
tab
–
where
we
set
the
node
size
–
allows
us
to
tweak
the
rela@onship
between
the
value
used
to
size
the
node
and
the
node
size.
The
default
is
a
simple
linear
propor@onal
map.
However,
we
may
find
that
the
range
of
values
we
want
to
map
are
“clumped”
together
(for
example,
one
very
large
value
and
a
range
of
smaller
values
clumped
together
at
the
other
end
of
the
overall
range).
In
such
a
case,
we
might
want
to
tweak
the
mapping
to
provide
a
liFle
more
salience
when
it
comes
to
dis@nguishing
between
the
values
that
are
otherwise
clumped
together.
As
well
as
making
node
size
propor@onal
to
some
quan@ty,
we
can
also
set
the
label
size
to
be
propor@onal
to
the
node
size.
20
21. There
are
several
other
tools
available
to
us
that
allow
us
to
explore
other
proper@es
of
the
network.
For
example,
there
is
a
wide
selec@on
of
filters
that
allow
us
to
select
par@cular
filtered
views
of
the
network.
In
this
case,
we
use
the
degree
range
filter
to
show
only
nodes
that
have
degree
of
two
or
more.
This
filters
out
nodes
that
have
degree
1
–
for
example,
companies
that
are
only
associated
with
a
single
project.
The
result
is
a
view
over
the
network
that
shows
which
companies
are
associated
with
two
or
more
projects,
and
which
projects
they
are.
The
node
sizes
are
indica@ve
of
the
total
overall
vale
of
contracts
associated
with
each
par@cular
node.
So
for
example,
we
see
that
Siemens
AG
is
associated
with
contracts
from
projects
P072018
and
P090104.
The
large
node
size
suggests
that
the
sum
total
of
contracts
Siemens
AG
has
received
via
this
projects
is
quite
significant.
In
addi@on,
the
line
from
P072018
to
Siemens
AG
suggests
that
the
total
value
of
contracts
(or
maybe
just
a
single
contract)
Siemens
AG
has
received
from
that
project
is
quite
large.
21
22. So
far,
out
network
diagram
has
shown
us
how
companies
relate
to
projects,
and
conversely,
how
projects
relate
to
companies.
But
some@mes
we
may
want
to
know
rather
more
directly
the
extent
to
which
two
things
are
connected
by
virtue
of
having
a
common
partner
–
for
example,
which
companies
worked
on
the
same
projects
together,
or
which
projects
are
linked
by
virtue
of
having
used
the
same
companies.
When
the
data
is
represented
as
a
graph,
we
can
manipulate
the
graph
in
order
to
generate
derived
graphs
that
can
capture
these
sorts
of
rela@onship
directly.
22
23. When
we
have
a
dataset
represented
in
the
form
of
a
network,
we
can
start
to
analyse
it
by
looking
at
addi@onal
network
proper@es.
For
example,
for
the
projects
and
companies
graph,
we
might
process
the
graph
so
as
to
remove
project
nodes
and
replace
the
edges
with
edges
that
connect
companies
that
were
on
one
or
more
project
with
each
other.
We
might
even
use
edge
weight
to
depict
how
many
projects
there
were
in
common
between
two
companies.
23
24. From
the
workspace
menu,
duplicate
the
original
network
(remember
to
turn
off
all
the
filters!
We
want
the
whole
network.)
You
will
automa@cally
be
moved
to
a
new
workspace
containing
a
copy
of
the
original
network.
(Navigate
between
workspaces
from
the
workspace
selector
at
the
boFom
right
hand
corner
of
the
whole
applica@on
window.)
In
the
Mul@mode
Networks
Projec@on
panel,
click
on
Graph
Coloring
to
try
to
split
the
network
into
complementary
types
of
node
(companies
and
projects).
Hopefully,
the
tool
will
return
with
the
report
that
Bipar22e:true.
That
is,
two
complementary
sets
of
nodes
have
been
found
(nodes
in
the
first
group
are
only
ever
connected
to
nodes
in
the
second
group.)Click
on
Load
aFributes
and
select
the
Node
Color
Mul@mode
op@on.
24
25. To
check
what
the
mul@mode
tool
has
called
nodes
of
each
type,
click
on
the
edit
buFon
in
the
paleFe
toolbar,
and
click
on
a
project
node.
An
edit
panel
will
appear
–
make
a
note
of
what
colour
the
project
type
node
has
been
labeled.
We
can
now
use
the
mul@mode
network
projec@on
tool
to
process
the
network
by
joining
together
company
nodes
that
are
connected
by
a
common
project,
and
dele@ng
the
project
nodes.
That
is,
we
want
to
connect
blue
company
nodes
to
blue
company
nodes
if
they
are
connected
by
edges
that
pass
through
a
common
red
project
node.
One
we
have
made
the
mapping,
we
can
delete
the
inner
red
project
nodes.
Running
the
projec@on
results
in
several
dis@nct
clusters
of
companies
that
are
connected
to
each
other
by
virtue
of
being
associated
with
the
same
project,
as
well
as
some
companies
that
bridge
different
clusters
by
virtueof
being
associated
with
companies
from
different
projects.
25
26. Conversely,
we
might
remove
the
company
nodes,
and
iden@fy
a
new
set
of
edges
that
connect
projects
that
shared
one
or
more
common
contracted
companies.
Again,
edge
thickness
might
be
use
to
show
how
@ghtly
connected
two
projects
were
by
virtue
of
increasing
numbers
of
common
contracted
companies.
26
27. By
projec@ng
the
original
network
onto
the
network
that
shows
links
between
projects
that
arise
from
common
companies,
we
get
a
much
clearer
picture
about
how
many
projects
there
are,
as
well
as
possible
linkages
between
them.
27
28. Here
are
some
of
the
things
you
have
hopefully
learned…feel
free
to
add
anything
else
you
might
have
learned
to
the
list…
28
29. For
more
informa@on,
and
a
wide
range
of
further
tutorials
on
all
maFers
data
related,
visit
the
School
Of
Data
at
SchoolOfData.org,
or
on
TwiFer
via
@SchoolOfData.
29