A talk given at the annual Computer Science for High School Teachers event at Victoria University of Wellington. I presented on some basics of the World Wide Web and why it's worth to preserve it, our work on non-expert tools to populate semantically enriched content, a current project to identify NZ native birds based on their calls that involves citizen science and contemporary deep learning using TensorFlow, a project that investigates the impact of online citizen science on the development of science capabilities of primary school children, and my collaboration with Adam Grener from the School of English, Film, Theater and Media Studies at VUW with whom I am working on computational tools for the literature studies.
1. Our World is Socio-technical
Markus Luczak-Roesch | @mluczak
Senior Lecturer in Information Systems
School of Information Management
Victoria University of Wellington
2. Humans in the information age
• IT project management
• Business process design
• Application development
• User experience design
• Business and systems analysis
Data insights
Information Management
3. Today’s road ahead
• From Dickens to data science
http://www.morguefile.com
• Citizen scientists in the classroom
• What is the World Wide Web and why should we preserve it?
8. The World Wide WebThe Web - a decentralized hypermedia system
Basic client-server architecture
Hypertext gateway to represent
databases as hypertext
9. The REST is historyThe REST is historyThe REST is history
10. The REST is history
The REST is history
The REST is history
The REST is history
11. What is REST?
• Representational State Transfer
• generic architectural style for network-based systems
• architectural style of the WorldWide Web
• literature: Roy Thomas Fielding. 2000. Architectural Styles and the Design of Network-Based Software
Architectures. Ph.D. Dissertation. University of California, Irvine. AAI9980887.
13. HTTP
• Hypertext Transfer Protocol
• transfer data between Web servers and clients
• transport protocol is TCP
• textbased
• literature: R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach und T. Berners-Lee. Hypertext
Transfer Protocol - HTTP/1.1. RFC 2616, http://www.ietf.org/rfc/rfc2616.txt
19. The Semantic Web vision
“The Semantic Web is an
extension of the current web
in which information is given
well-defined meaning, better
enabling computers and
people to work in
cooperation.“
Berners-Lee, Hendler, and Lassila, 2001.
20. Meatadata
• Data about data
• describe content
• in the best case
metadata is machine processable
§Author
§Title
§Date
§…
§Species: Android
§Height…
Content
Metadata
23. M. Luczak-Rösch,R. Heese. Linked Data Authoring forNon-Experts.In proceedings of LDOW
2009, co-located with World Wide Web Conference, Madrid, Spain.
of the loomp OCA interface elements
ubset
he ap-
esign:
e per-
d the
s. We
f text
n MS
n dif-
high-
strat-
n the
extual
s, we
each
(e.g.,
l con-
e pro-
p be-
A
BC
Fig. 8. loomp screenshot – authoring with annotation elements
4.2. Effectiveness of OCA design: Usability
As the core motivation and purpose for loomp was
Fig. 3. Core elements of the loomp Domain Ontology
mashup
fragment
loomp content data
data:fragment4
Frankfurt
data: fragment3
data:mashup1
data:fragment5
data:fragemet6
data:mashup2
loomp annotation data
rdf:type
dbp:uri3
dbp:uri1
Frankfurtrdf:label
rdf:type
M. Luczak-
Roesch
foaf:name
M. Luczak-
Roesch
rdf:_3
rdf:_1
loomp:contains
Ext. Linked Data (e.g. DBpedia)
dbp:uri1 Frankfurt/Main
dbp:uri2 Frankfurt (Oder)
dbp:uri3 city
loomp:
annotation9
Frankfurt
M. Luczak-
Roesch
loomp vocabulary
Country
loomp:
annotation10
City
dbp:uri3
foaf:Person
rdf:_2
rdf:_2
rdf:_1
1
2
3
4
external ontology
foaf:Person foaf:name
Symbolic links (expressing identity)
RDF relationships
data:resource7
loomp:hasRDFa full-text as
XHTML+
RDFa snippet
5
6
7
Fig. 4. Selected elements of the loomp content model with example encoding for content and annotations
typed as loomp:Fragment and loomp:Mashup,
respectively (cf. Figure 3).
ple, the shared fragment4 is linked via rdf:_1 to
mashup1 and via rdf:_3 to mashup2. Fragments
Markus Luczak-Rösch,Ralf Heese, Adrian Paschke,"FutureContent Authoring",In
Nodilities – The Magazineof the Semantic Web, Issue11, pp.17-18, 2010.
Populating the Semantic Web
28. The online citizen science workflow
n words
n words
k slices of n words
(e.g. n=1,000)
...
1 2 3 k
Abel Magwitch
Joe Gargery,
Mrs Joe Gargery, Pip
Joe Gargery,
Mrs Joe Gargery, Pip
+
Lexicon of characters(can alsobe
a list of other entitiessuch as
placesor phrases)
- net
- stat
Directed network
Dynamicnetworkvisualization
source texts from
Project Gutenberg
Matched information
• Abel Magwitch
• Joe Gargary
• Mrs Joe Gargary
• Pip
Matched information
• Abel Magwitch
automatic/semi-automatic
methodsto detectmatch entities
in text, topicdetection, sentiment
analysis, part-of-speech tagging)
configure
n words
n words
k slices of n words
(e.g. n=1,000)
...
1 2 3 k
Abel Magwitch
Joe Gargery,
Mrs Joe Gargery, Pip
Joe Gargery,
Mrs Joe Gargery, Pip
+
Lexicon of characters(can alsobe
a list of other entitiessuch as
placesor phrases)
-
-
Directed network
Dynamicnetworkvisualization
source texts from
Project Gutenberg
Matched information
• Abel Magwitch
• Joe Gargary
• Mrs Joe Gargary
• Pip
Matched information
• Abel Magwitch
automatic/semi-automatic
methodsto detectmatch entities
in text, topicdetection, sentiment
analysis, part-of-speech tagging)
configure
n words
n words
k slices of n words
(e.g. n=1,000)
...
1 2 3 k
Abel Magwitch
Joe Gargery,
Mrs Joe Gargery, Pip
Joe Gargery,
Mrs Joe Gargery, Pip
+
Lexicon of characters(can alsobe
a list of other entitiessuch as
placesor phrases)
- network
- statistic
Directed network
Dynamicnetworkvisualization
source texts from
Project Gutenberg
Matched information
• Abel Magwitch
• Joe Gargary
• Mrs Joe Gargary
• Pip
Matched information
• Abel Magwitch
automatic/semi-automatic
methodsto detectmatch entities
in text, topicdetection, sentiment
analysis, part-of-speech tagging)
configure
n words
n words
k slices of n words
(e.g.n=1,000)
...
1 2 3 k
Abel Magwitch
Joe Gargery,
Mrs Joe Gargery, Pip
Joe Gargery,
Mrs Joe Gargery, Pip
+
Lexicon of characters(can alsobe
a list of other entitiessuch as
placesor phrases)
- network visuali
- statistics and ot
Directed network
Dynamicnetworkvisualization
source texts from
Project Gutenberg
Matched information
• Abel Magwitch
• Joe Gargary
• Mrs Joe Gargary
• Pip
Matched information
• Abel Magwitch
automatic/semi-automatic
methodsto detectmatch entities
in text, topicdetection, sentiment
analysis,part-of-speech tagging)
configure
n words
n words
k slices of n words
(e.g. n=1,000)
...
1 2 3 k
Abel Magwitch
Joe Gargery,
Mrs Joe Gargery, Pip
Joe Gargery,
Mrs Joe Gargery, Pip
+
Lexicon of characters(can alsobe
a list of other entitiessuch as
placesor phrases)
- network vis
- statistics an
Directed network
Dynamicnetworkvisualization
source texts from
Project Gutenberg
Matched information
• Abel Magwitch
• Joe Gargary
• Mrs Joe Gargary
• Pip
Matched information
• Abel Magwitch
automatic/semi-automatic
methodsto detectmatch entities
in text, topicdetection, sentiment
analysis, part-of-speech tagging)
configure
Online
Citizen
Science
Platform
Machine
Learning
Scientific
Results
Citizen scientists
29. Our project: Detecting NZ native birds using AI
• Recorded sound around Zealandia
• Segment the recordings and upload them to Zooniverse for people to decide if they have bird calls
or not (and if they have, what bird is it)
• Use the data from Zooniverse to train and test our AI
30. Hanny’s Voorwerp
Galaxy Zoo [2007]
Green Pea Galaxies
Galaxy Zoo [2007]
Yellow Balls
Milky Way [2009]
Circumbinary Planet Ph1b
Planet Hunters [2012]
Convict Worm
Seafloor Explorer [2012]
Spanish Flu
Operation War Diaries [2014]
Serendipitous discoveries through talk
31. Upcoming project: Understanding the impact of
online citizen science participation on the
development of science capabilities of primary
age children.
33. Dickens and the Serial Novel Form
All fourteen of Dickens’scompleted novels
were published serially in weekly or monthly
installments.
34. From “Sketches” to Novels
1836-37
1864-65
“I have endeavoured in the progress of this Tale, to resist the
temptation of the current Monthly Number, and to keep a
steadier eye upon the general purpose and design.”
Preface to Martin Chuzzlewit (1844)
36. TIC approach applied to Victorian novels
n words
n words
k slices of n words
(e.g. n=1,000)
...
1 2 3 k
Abel Magwitch
Joe Gargery,
Mrs Joe Gargery, Pip
Joe Gargery,
Mrs Joe Gargery, Pip
+
Lexicon of characters(can alsobe
a list of other entitiessuch as
placesor phrases)
- network visualisations
- statistics and other measures
Directed network
Dynamicnetworkvisualization
source texts from
Project Gutenberg
Matched information
• Abel Magwitch
• Joe Gargary
• Mrs Joe Gargary
• Pip
Matched information
• Abel Magwitch
automatic/semi-automatic
methodsto detectmatch entities
in text, topicdetection, sentiment
analysis, part-of-speech tagging)
configure
analyze
39. Not-so-distant Reading
Distant Reading “tackles literary problems by
scientific means: hypothesis-testing,
computational modeling, quantitative
analysis....understanding literature not by
studying particular texts, but by aggregating and
analyzing massive amounts of data.”[1]
[1]Shulz, Kathryn. “What is Distant Reading?” New York Times, 24 June2011.
42. ● At the School of Information Management we focus on
○ humans in the information age
○ data insights
● Topics at the intersection of
○ computer science (e.g. data management, data analytics,
software development)
○ behavioral science (e.g. organizational decision making,
social networks)
○ social science (organizational processes, inequalities)
Thanks to my collaborators
• Dr Adam Grener (Dickens project)
• Emma Fenton (Dickens project)
• Tom Goldfinch (Dickens project)
• Isabel Parker (Dickens project)
• Victor Anton (NZ bird identification)
• Jacob Woods (NZ bird identification)
• Dr Dayle Anderson (Citizen science in educ.)
• Dr Cathal Doyle (Citizen science in educ.)
• …