DevEX - reference for building teams, processes, and platforms
07 chris davis
1. Challenges and Opportunities of
Linked Open Energy Data
Chris Davis
http://enipedia.tudelft.nl
c.b.davis@tudelft.nl
2. Who am I?
●
●
Postdoc Energy & Industry,
TBM, TU Delft
Focus on Industrial Ecology,
Open Data, Collaborative
Software, Modeling,
Visualization, Analytics, etc.
3. Motivations
●
●
●
●
●
●
Energy and sustainability are some of
the most important topics of the 21st
century
Need both aggregated and
fine-grained data
Research can be data intensive
There's a lot out there, but
connecting it is tedious
Researchers often duplicate effort
It would be great to revolutionize
how we deal with this data
4. There's a Tension...
Information wants to be free
because it has become so cheap
to distribute, copy, and recombine
- too cheap to meter.
Stewart Brand
5. There's a Tension...
It wants to be expensive
because it can be
immeasurably valuable
to the recipient.
Stewart Brand
6. There's a Tension...
That tension will not go away.
It leads to endless wrenching debate
about price, copyright, “intellectual property,”
and the moral rightness of casual distribution,
because each round of new devices
makes the tension worse, not better.
Stewart Brand
7. There's a Tension...
If you cling blindly
to the expensive part
of the paradox,
you miss all the action
going on in the free part.
Stewart Brand
The pressure of the paradox
forces information
to explore incessantly.
8. Pirolli & Card (2005) The Sensemaking Process and Leverage Points for
Analyst Technology as Identified Through Cognitive Task Analysis
11. It's about Resource Efficiency
●
●
●
●
Information is a resource just as much as physical
resources
...however, it ideally gets better the more that it is used
Data quality is (partly) a function of the amount of
attention it gets
Structure leads to benefits, but requires effort – figure
out what has most value to the community
21. A tale of one (or four?) power stations
and seven data sets
21
22. How the European Commission
manages data
Large Combustion Plants Directive
http://ec.europa.eu/environment/air/pollutants/stationary/lcp/legislation.htm
22
29. Matching Entities
09600 1 co erzgeb fabrikstrasse felix foto gmbh jr kg schoeller spezialpapiere und
weissenborn werk
0001
09600 1 909 anlagenkonto anlagennummer co erzgeb fabrikstrasse felix foto gmbh jr kg
kraftwerk schoeller spezialpapiere technocell und weissenborn
49086 burg
co felix foto gmbh gretesch jr kg osnabruck schoeller spezialpapiere und
co erzgeb fabrikstrasse felix foto gmbh jr kg kraftwerk
schoeller spezialpapiere technocell und weissenborn
0001 09600 1 909 anlagenkonto anlagennummer
http://en.wikipedia.org/wiki/Self-information
https://en.wikipedia.org/wiki/Claude_Shannon
30. The current data management
practices results in:
Unintentionally
Anonymized
Open Data
Optimized for
Inefficient Maintenance
and an Uphill Battle
to Enforce
Principles of
Data Integrity
30
31. It's power laws all the way down
●
●
Both contributors & data
Challenge is aligning the two
34. Officially Curated vs. Crowdsourced Data
●
●
●
Crowdsourcing generally OK for easily verifiable data
Officially curated data needed for comprehensive, hard
to verify data, small specialized communities
Crowdsourced data is only possible because of revision
control.
34
35. How to Measure Data Quality?
Data
Quality
=
Researcher
Skill/Experience
X
# Viewers/
Editors
X
Ease of
Independent
Verification
Low Editor Diversity
High Editor Diversity
36. How to Measure Data Quality?
●
●
Eric Raymond – “With many eyes all bugs are shallow”
But... not all eyes are evenly distributed
36