Finding relevant multimedia content is notoriously difficult, and the difficulty increases with the size and heterogeneity of the content collection. Linked cultural media collections are heterogeneous by nature and rapidly increase in size, mainly through enormous amounts of user-generated content and metadata that are placed on the Internet on a daily basis. Without mechanisms for keeping any part of these collections easily accessible by any user at any time and any use context, the value of these collections for the community will drop, just like their value as an economic asset.
demo: http://2-dot-rma-accurator.appspot.com/#Intro
website: http://sealincmedia.wordpress.com/
3. however, only a small fraction of about
8000 items are currently on display
4. to grant the public access to the objects
in archives and depots, the Rijksmuseum
started to digitize the artworks ...
5. … and present the collection online. 125.000
artworks are already available, and another
40.000 are added every year
6. the expertise of museum professionals lies in
describing & annotating collection with arthistorical information, thus for most artworks,
we know when they were created, by whom
7. “We’re adding 40.000 items to the collection
every year. After the scan, we have limited
time for each painting and this occasionally
results in incomplete annotations.”
Henrike Hövelmann, Head of Print Cabinet Online
8. detailed information about the depicted
objects, e.g. which species the animal or plant
belongs to, is in most cases not available
9. the need for more detailed annotations:
this painting is annotated only with “bird with
blue head near branch with red leaf”, and the
species of the bird and the plant are missing
10. by involving people from outside the
museum in annotation process, we support
museum professionals in their annotation task
11. we use crowdsourcing to get more
annotations. we use nichesourcing, i.e. niches
of people with the right expertise, to add more
specific information
12. first, we use the crowd from Crowdflower &
Amazon Mechanical Turk to make a general
classification of the artworks
13. the crowd tags artworks on a generic level, e.g.
‘bird’, ‘flower’. Most people can provide
common knowledge tags, but it is unlikely that
they also know the scientific name of the bird.
15. we use sources like Twitter to find experts or
groups of experts on certain areas, e.g. bird
lovers, ornithologists or people who enjoy
bird-watching in their spare time
16. these experts can contribute their knowledge
about bird species using the Accurator
platform
17. We create user profiles for each expert to
better match the annotation tasks with the
right expert, e.g. if an expert knows well
songbirds, but not much of birds of prey …
18. … she will be asked to annotate more of the former
19. we have developed a platform where users can enter
tags, either by using terms from a structured
vocabulary or by adding free text
20. experts can enter any information about the
depicted object & they can also review the tags that
others have provided
21. for tasks that are too difficult, we developed a game
in which players can carry out an expert annotation
task with some assistance
22. … and the possibility to gain points, compete with
others keeps the users engaged
23. to evaluate the correctness of annotations they are
reviewed & rated by other experts who have
expertise in the same topic
24. to evaluate the correctness of annotations they are
reviewed & rated by other experts who have
expertise in the same topic
25. for example, if expert A has annotated the bird with
Minivet and expert B, whose specialty is also
Japanese birds, is certain that this is not Minivet , he
can rate expert A’s annotation as incorrect and add
his own
26. next to the peer reviews, we use trust algorithms to
determine the reputation of experts over time. This
reputation is also considered when assessing the
annotation correctness
27. Trust-aware Ranking & Relevance
x
Legend
Accepted Tags
Rejected Tags
Cluster Medoid
Reviewers
Evaluate
Evaluated
Tags
Provide
External
Annotators
Extrapolate
Provenance
Un-evaluated
Tags
Generate
Provenancebased
estimates
Tags
x
x
x
Generate
x
x
+
x
Cluster semantically
similar tags. Store the
corresponding evidence.
Reputationbased
estimates
Tag1 - Accept
Tag2 - Reject
Tag3 - Accept
……
TagN - Accept
Predict Tag
evalutation
28. we use the computed correctness to select high
quality annotations