Quaero Technology Catalog

TechnologyCatalog
quaero-catalogue-210x210-v1.6-page-per-page.indd 1 02/10/2013 09:53:32

CatalogueTechnologique
quaero-catalogue-210x210-v2-dernieres-pages.indd 1 02/10/2013 12:47:23

Quaero, premier pôle de recherche et
d’innovation sur les technologies de traitement
automatique des contenus multimédias et multilingues
À l’origine de Quaero, il y a le besoin de fédérer et renforcer
une filière technologique émergente, celle du traitement
sémantique des contenus multimédias et multilingues (texte,
parole, musique, images fixes, vidéo, documents numérisés).
Il y a aussi la volonté de se comparer régulièrement à l’état de
l’art international, d’organiser la chaîne complète de transfert
technologique, et de mobiliser les acteurs de cette filière
autour d’applications correspondant à des marchés identifiés
comme importants, tels que les moteurs de recherche, la
télévision personnalisée et la numérisation du patrimoine
culturel.
Ceprogrammeestportéparunconsortiumde32partenaires
publics et privés, français et allemands. Pendant la phase de
R&D, de 2008 à 2013, ces partenaires ont produit des corpus,
effectué des recherches sur un large spectre scientifique,
développé et testé des modèles de plus en plus élaborés,
partagéleurexpérience,intégréleurslogiciels. Leprogramme
de travail a été adapté aux évolutions du contexte. De
nouvelles applications sont apparues, comme la gestion
du courrier entrant en entreprise ou l’aide à la création de
site web multilingues, et les efforts sur les technologies
correspondantesontétérenforcés,commelareconnaissance
d’écriture ou la traduction automatique. Les collaborations,
notamment entre disciplines et entre laboratoires de
recherche et entreprises, se sont approfondies.
Fruit de ces travaux, qui ont donné lieu à plus de
800publicationsnationalesetinternationalesetde
nombreusesdistinctions,unecentainedemodules
technologiques et démonstrateurs applicatifs ont
été développés, dont certains commencent déjà
à être exploités commercialement. Un grand
nombre d’entre eux présente un intérêt au-delà
des membres du consortium. Ils font l’objet du
présent catalogue.
Ce catalogue technologique présente 72
modules ou démonstrateurs décrits chacun
dans une double page qui en précise le
domaine d’application et les caractéristiques
techniques. Il est composé de deux parties :
59• modules technologiques, organisés
par domaine thématique ; la liste des 12
domaines, qui apparaît p. 4, est rappelée
sur chaque page de gauche
13• démonstrateurs applicatifs ; la liste,
qui apparaît p. 5, est rappelée sur chaque
page de gauche
Le lecteur pourra également effectuer une
recherche par partenaire à partir de l’index
en fin de document

The Quaero program stems from the need to federate and
strengthen an emerging technological sector, dealing with
the semantic processing of multimedia and multilingual
content (text, speech, music, still and video image, scanned
documents). It also arises from the will to systematically
benchmark results against international standards, to
organize a complete technological transfer value chain, and
to mobilize the actors of the sector around applications
corresponding to identified and potentially large markets,
suchassearchengines,personalizedTV,andthedigitization
of cultural heritage.
Quaero, the first research and innovation cluster
on multimedia and multilingual content processing
This program is borne by a consortium of 32
FrenchandGermanpartnersfromthepublicand
privatesector. DuringtheR&Dphase,from2008
to 2013, these partners have produced corpora,
performed research covering a large scientific
spectrum, developed and tested increasingly
elaborate models, shared experience, integrated
software. The work plan has been adapted
to the context evolutions. New applications
have appeared, such as the management of
professional incoming mail or computer-aided
multilingual web site creation, and additional
efforts have been put on the corresponding
technologies, such as handwriting recognition
or machine translation. Collaboration became
more extensive, especially across disciplines and
between research and industry.
Thanks to these efforts, which led to more than
800 national and international publications and
to numerous distinctions, about one hundred
core technology modules and application
demonstrators have been developed, some
of them being already commercially exploited.
Many of these technologies are of interest
beyond the consortium members. Presenting
them is the purpose of this catalog.
The Technology Catalog presents 72 modules or
demonstrators, each of them being described in a
double page which provides details on the application
domain and technical characteristics. It is composed of
two parts:
59• Core Technology Modules, organized per
thematic domains; the list of 12 domains, provided on
p.4, is reminded on each left-hand page
13• Application Demonstrators; their list, provided
on p.5, is reminded on each left-hand page
The catalog can also be searched by institution using the
index provided at the end of the document.

Semantic Acquisition &
Annotation (5) p8 to 17
Q&A
(4) p20 to 27
Translation of Text
and Speech (2) p46 to 49
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Object Recognition & Image
Clustering (3) p82 to 87
Music
Processing
(7) p90 to 103
Indexing, Ranking
and Retrieval (1) p106 to 107
Content
Analysis
(4) p110 to 117
Video Analysis & Structuring
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
Core Technology Modules
4

Application Demonstrators
Chromatik - p148
MediaCentric® - p152
MuMa: The Music Mashup - p158
SYSTRANLinks - p166
Voxalead multimedia
search engine - p170
MECA:
Multimedia Enterprise CApture - p150
MobileSpeech - p156
PlateusNet - p164
MediaSpeech® product line - p154
Personalized and social TV - p162
OMTP: Online Multimedia
Translation Platform - p160
Voxalead Débat Public - p168
VoxSigma SaaS - p172
5

AlvisAE: Alvis Annotation Editor -
Inra p8
KIWI: Keyword extractor - Inria p14
Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
6

AlvisIR - Inra p10
TyDI: Terminology Design Interface
- Inra p16
Alvis NLP: Alvis Natural Language
Processing - Inra p12
7

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
8
Alvis Annotation Editor
Application sectors
Any sector using text documents•
Information Extraction•
Contents analysis•
Target users and customers
With AlvisAE, remote users display annotated
documents in their web browser and manually create
new annotations over the text and share them.
Partners:
Inra

9
AlvisAE: Alvis Annotation Editor
Contact details:
Robert Bossy
robert.bossy@jouy.inra.fr
Description:
Technical requirements:
Conditions for access and use:
INRA MIG
Domaine de Vilvert
78352 Jouy-en-Josas France
http://bibliome.jouy.inra.fr
AlvisAE is a Web Annotation Editor designed
to display and edit fine-grained semantic formal
annotations of textual documents. The annotations
are used for fast reading or for training Machine
Learning algorithms in text mining. The annotations
can also be stored in a database and queried. The
annotations are entities, n-ary relations and groups.
The entities can be discontinuous and overlapping.
They are typed by a small set of categories or by concepts from an
external ontology.
The user can dynamically extend the ontology by dragging new
annotations from the text to the ontology. AlvisAE supports
collaborative and concurrent annotations and adjudication. Input
documents can be in HTML or text format. AlvisAE takes also
as input semantic pre-annotations automatically produced by
AlvisNLP.
Server side: Java 6 or 7, a Java Application and a
RDMS.
Client Side: The client application can be run by any
recent JavaScript enabled web browser (e.g. Firefox,
Chromium, Safari). Internet Explorer is not supported.
AlvisAE software is developped by INRA,
Mathématique, Informatique et Génome lab. It is
property of INRA.
AlvisAE can be supplied under licence on a case-by-
case basis. An open-source distribution is planned in
the short term.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
10
Semantic document indexing and
search engine framework
Domain-specific communities, especially technical
and scientific, willing to build search engines and
information systems to manage documents with fine-
grained semantic annotations.
Partners:
Inra
Application sectors
Search engines and information systems
development.

11
AlvisIR
Contact details:
Robert Bossy
Description:
INRA MIG
Domaine de Vilvert
78352 Jouy-en-Josas cedex France
Linux platform•
Perl•
libxml2•
Zebra indexing engine•
PHP5•
Sources available upon request. Free of use for
academic institutions.
AlvisIR is a complete suite for indexing documents
with ﬁne-grained semantic annotations. The search
engine performs a semantic analysis of the user query
and searches for synonyms and sub-concepts.
AlvisIR has two main components:
1. the indexing tool and search daemon based on
IndexData’s Zebra that supports standard CQL
queries,
2. the web user interface featuring result snippets,
query-term highlight, facet ﬁltering and concept
hierarchy browsing.
Setting up a search engine requires the semantic resources for
query analysis (synonyms and concept hierarchy) and a set of
annotated documents. AlvisIR is closely integrated with AlvisNLP
and TyDI for document annotation and semantic resources
acquisition respectively.
Indicative indexing time: 24mn for a corpus containing 5 million
annotations.
Indicative response time: 18s for a response containing 20,000
annotations.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
12
A pipeline framework for Natural
Language Processing
The targeted audience includes projects that
require usual Natural Language Processing tools for
production and research purpose.
Partners:
Inra
Application sectors
Natural language processing•
Contents analysis•
Information retrieval•

13
Alvis NLP: Alvis Natural Language Processing
Contact details:
Robert Bossy
Description:
INRA MIG
Domaine de Vilvert
78352 Jouy-en-Josas cedex France
Java 7 Weka
Sources available upon request. Free of use for
academic institutions.
Alvis NLP is a pipeline framework to annotate text
documents using Natural Language Processing (NLP)
tools for sentence and word segmentation, named-entity
recognition, term analysis, semantic typing and relation
extraction (see the paper by Nedellec et al. in Handbook
on Ontologies 2009 for a comprehensive overview).
The various available functions are accessible as modules,
that can be composed in a sequence forming the pipeline.
This sequence, as well as parameters for the modules, is
specified through a XML-based configuration file.
New components can easily be integrated into the pipeline.
To implement a new module, one has to build a Java class
manipulating text annotations following the data model
defined in Alvis NLP.
The class is loaded at run-time by Alvis NLP, which makes
the integration much easier.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
14
Keyword extractor
The targeted users and customers are the
multimedia industry actors, and all academic or
industrial laboratories interested in textual document
processing.
Partners:
Inra
Application sectors
Textual and multimedia document processing•

15
KIWI: Keyword extractor
Contact details:
General issues:
Patrick Gros
patrick.gros@irisa.fr
Description:
IRISA/Texmex team
Campus de Beaulieu
35042 Rennes Cedex France
http://www.irisa.fr/
SPC with Unix/Linux OS•
Kiwi requires the TreeTagger [1] software to be•
installed on the system
Kiwi requires the Flemm [2] software to be installed on•
the system
[1] http://www.ims.uni-stuttgart.de/projekte/corplex/
TreeTagger/
[2] http://www.univnancy2.fr/pers/namer/Telecharger_
Flemm.htm
Kiwi is a software that has been developed at Irisa/Inria-
Rennes and is the property of Inria. Registration at the
Agency for Program Protection (APP) in France, is under
process.
Kiwi is currently available as a prototype only. It can be
released and supplied under license on a case-by-case basis.
Technical issues:
Sébastien Campion
scampion@irisa.fr
Kiwi is a software dedicated to the extraction of
keywords from a textual document. From an input
text, or preferably a normalized text, Kiwi outputs
a weighted word vector (see figure 1 below). This
ranked keyword vector can then be used as a
document description or for indexing purposes.
Kiwi is a software dedicated to the extraction of keywords from a
textual document.
From an input text, or preferably a normalized text, Kiwi outputs
a weighted word vector (see figure 1 below). This ranked keyword
vector can then be used as a document description or for indexing
purposes.
Kiwi was developed at Irisa/INRIA Rennes by the Texmex team.
The Kiwi author is: Gwénolé Lecorvé

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
16
A platform for the validation,
structuration and export of termino-
ontologies
The primary use of TyDI is the design of termino-
ontologies for the indexing of textual documents.
It can therefore be of great help for most projects
involved in natural language processing.
Partners:
Inra
Application sectors
Terminology structuration•
Textual document indexing•
Natural language processing•

17
TyDI: Terminology Design Interface
Contact details:
Robert Bossy
Description:
INRA MIG
Domaine de Vilvert
78352 Jouy-en-Josas Cedex France
Server side: Glassﬁsh and Postgresql servers•
Client side: Java Virtual Machine version 1.5•
TyDI is a software developped by INRA, Mathématique,
Informatique et Génome and is the property of INRA.
TyDI can be supplied under licence on a case-by-case
basis. For more information, please contact Robert Bossy
(robert.bossy@jouy.inra.fr)
Figure 1: The client interface of TyDI. It is composed of
several panels (hierarchichal/tabular view of the terms,
search panel, context of appearance of selected terms …)
TyDI is a collaborative tool for manual validation/
annotation of terms either originating from
terminologies or extracted from training corpus
of textual documents. It is used on the output of
so-called term extractor programs (like Yatea),
which are used to identify candidate terms (e.g.
compound nouns).
Thanks to TyDI, a user can validate candidate terms and specify
synonymy/hyperonymy relations. These annotations can then be
exported in several formats, and used in other Natural Language
Processing tools.

FIDJI:Web Question-Answering
System - LIMSI - CNRS p20
RITEL: Spoken and Interactive
Question-Answering System -
LIMSI - CNRS p26
Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
18

Question-Answering System -
Synapse Développement p22
QAVAL: Question Answering by
Validation - LIMSI - CNRS p24
19

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
20
A question-answering system aims at
answering questions written in natural
language with a precise answer.
Web Question-answering is an end-user application.
FIDJI is an open-domain QA system for French and
English
Partners:
LIMSI-CNRS
Application sectors
Information retrieval on the Web or in document
collections

21
FIDJI: Web Question-Answering System
Contact details:
Véronique Moriceau
moriceau@limsi.fr
Description:
LIMSI-CNRS
Groupe ILES B.P. 133
91403 Orsay Cedex France
http://www.limsi.fr/
PC with Linux platform
Available for licensing on case-by-case basis
Xavier Tannier
xtannier@limsi.fr
Document retrieval systems such as search engines
provide the user with a large set of pairs URL/snippets
containing relevant information with respect to a query.
To obtain a precise answer, the user then needs to locate
relevant information within the documents and possibly to
combine different pieces of information coming from one
or several documents.
To avoid these problems, focused retrieval aims at
identifying relevant documents and locating the precise
answer to a user question within a document. Question-
answering (QA) is a type of focused retrieval: its goal
is to provide the user with a precise answer to a natural
language question. While information retrieval (IR)
methods are mostly numerical and use only little linguistic
knowledge, QA often implies deep linguistic processing,
large resources and expert rule-based modules.
Most question-answering systems can extract the
answer to a factoid question when it is explicitly present
in texts, but are not able to combine different pieces
of information to produce an answer. FIDJI (Finding
In Documents Justifications and Inferences), an open-
domain QA system for French and English, aims at going
beyond this insufficiency and focuses on introducing text
understanding mechanisms.
The objective is to produce answers which are fully
validated by a supporting text (or passage) with respect to
a given question. The main difficulty is that an answer (or
some pieces of information composing an answer) may be
validated by several documents. For example:
Q: Which French Prime Minister committed suicide?
A: Pierre Bérégovoy
P1: The French Prime Minister Pierre Bérégovoy warned Mr. Clinton
against…
P2: Two years later, Pierre Bérégovoy committed suicide after he was
indirectly implicated…
In this example, the information “French Prime Minister” and “committed
suicide” are validated by two different complementary passages. Indeed,
this question may be decomposed into two sub-questions, e.g. “Who
committed suicide?” and “Are they French Prime Minister?”.
FIDJI uses syntactic information, especially dependency relations which
allow question decomposition. The goal is to match the dependency
relations derived from the question and those of a passage and to
validate the type of the potential answer in this passage or in another
document.
Another important aim of FIDJI is to answer new categories of questions,
called complex questions, typically “how” and “why” questions. Complex
questions do not exist in traditional evaluation campaigns but have been
introduced within the Quaero framework. Answers to these particular
questions are no longer short and precise answers, but rather parts of
documents or even full documents. In this case, the linguistic analysis of
the question provides a lot of information concerning the possible form of
the answer and keywords that should be sought in candidate passages.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
22
A Question Answering system allows the user
to ask questions in natural language and to
obtain one or several answers. For boolean
and generic questions, our system is able to
generate potential questions and to return the
corresponding answers.
End-user application, Question-Answering is the
easiest way to find information for everybody: ask
the question as you want and obtain answers, not
snippets or pages.
Partners:
Synapse Développement
Application sectors
Search and find precise answers in any collection
of texts, from the Web or any other source (voice
recognition, optical character recognition, etc.),
with eventual correction of the source text, ability to
generate questions from generic requests, eventually
a single word, ability to find similar questions and
their answers, etc.
Monolingual and multilingual Question-Answering
system. Languages: English, French (+ Spanish,
Portuguese, Polish, with partners using the same
API).

23
Question-Answering System
Contact details:
Patrick Séguéla
patrick.seguela@synapse-fr.com
Description:
Synapse Développement
33, rue Maynard
31000 Toulouse France
http://www.synapse-developpement.fr/
SPC with Windows or Linux•
RAM minimum : 4 Gb•
HDD minimum : 100 Gb•
SDK available for integration in programs or Web
services.
For specific conditions of use, contact us.
The technology is a system based on very
consequent linguistic resources and on NLP state-
of-the-art technologies, especially, syntactic and
semantic parsing, with sophisticated features like
resolution of anaphora, word sense disambiguation or
relations between named entities.
On news and Web corpora, our system is regularly awarded in the
international and national evaluation campaigns (EQueR 2004,
CLEF 2005, 2006, 2007, Quaero 2008, 2009).

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
24
A question answering system that is
adapted for searching precise answers
in textual passages extracted from Web
documents or text collections.
Question-answering is for both the general public
to retrieve precise information in raw texts, and for
companies and organizations, that have specific text
mining needs. Question-answering systems suggest
short answers and their justification passage to
questions provided in natural language.
Partners:
LIMSI-CNRS
Application sectors
Extension of search engine, technology monitoring

25
QAVAL: Question Answering by VALidation
Contact details:
Brigitte Grau
Brigitte.Grau@limsi.fr
LIMSI-CNRS
ILES Group B.P. 133
www.limsi.fr/Scientifique/iles
Linux platform
Available for licensing on a case-by-case basis
Description:
The large number of documents currently on the
Web, but also on intranet systems, makes it necessary
to provide users with intelligent assistant tools to help
them find the specific information they are searching
for. Relevant information at the right time can help
solving a particular task. Thus, the purpose is to be able
to access the content of texts, and not only give access
to documents. Question-answering systems address
this question.
Question-answering systems aim at finding answers to
a question asked in natural language, using a collection
of documents. When the collection is extracted from
the Web, the structure and style of the texts are
quite different from those of newspaper articles. We
developed a question-answering system QAVAL
based on an answer validation process able to handle
both kinds of documents. A large number of candidate
answers are extracted from short passages in order
to be validated, according to question and excerpt
characteristics. The validation module is based on a machine
learning approach. It takes into account criteria characterizing
both excerpt and answer relevance at surface, lexical, syntactic and
semantic levels, in order to deal with different types of texts.
QAVAL is made of sequential modules, corresponding to five
main steps. The question analysis provides main characteristics to
retrieve excerpts and guide the validation process. Short excerpts
are obtained directly from the search engine and are parsed and
enriched with the question characteristics, which allows QAVAL
to compute the different features for validating or discarding
candidate answers.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
26
A spoken and interactive QA system
that helps a user to find an answer to his
question, spoken or written, in a collection
of documents.
Question-answering is an end-user application.
The purpose is to go beyond the traditional way of
retrieving information through search engines. Our
system is interactive, with both a speech (phone or
microphone) and text (web) interface.
Partners:
LIMSI-CNRS
Application sectors
QA system can be viewed as a direct extension of
search engines. They allow a user to ask questions in
natural language.

27
RITEL: Spoken and Interactive Question-Answering System
Contact details:
Sophie Rosset
sophie.rosset@limsi.fr
Description:
LIMSI-CNRS
TLP Group B.P. 133
http://www.limsi.fr/tlp/
PC with Linux platform.
Available for licensing on case-by-case basis.
There are different ways to go beyond standard
retrieval systems such as search engines. One of them
is to offer the users different ways to express their
query: some prefer to use speech to express a query,
while others prefer written natural language. Another
way is to allow the user to interact with the system.
The Ritel system aims at integrating a dialog system
and an open-domain information retrieval system to
allow a human to ask a general question (f.i. « Who is
currently presiding the Senate?’’ or « How did the price
of gas change for the last ten years?’’) and refine his
research interactively.
A human-computer dialog system analyses and acts
on the user requests depending on the task at hand,
the previous interactions and the user’s behaviour.
Its aim is to provide the user with the information
being sought while maintaining a smooth and natural
interaction flow.
The following example illustrates the kind of possible
interaction with the Ritel system:
[S] Hi, Ritel speaking! What is your first question?
[U] who built the Versailles Castle
[S] Your search is about Versailles Castle and built . The answer is
Louis XIII. Do you want to ask another question?
[U] in which year
[S] 1682, according to the documents I had access to. Another
question?
[U] Who designed the garden
[S] The following items are used for searching: Versailles, gardens
and designed. André Le Nôtre. Anything else?
The dialog system is comprised of a component for user utterance
analysis, a component for dialog management, and a component for
interaction management. The system for information retrieval and
question-answering is tightly integrated within it. The user interface
can be phone-based or web-based for written interaction.

Acoustic Speaker Diarization -
LIMSI - CNRS p30
Automatic Speech Transcription -
Vocapia p36
Speech-to-Text -
Karlsruhe Institute of Technology
(KIT) p42
Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
28

MediaSpeech®alignement -Vecsys
p32
Corinat®- Vecsys p38
Automatic Speech Recognition -
RWTH Aachen University p34
Language Identification -
Vocapia p40
29

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
30
The module aims at performing
automatic segmentation and clustering
of an input audio according to speaker
identity using acoustic cues.
Multimedia document indexing and archiving
services.
Partners:
LIMSI-CNRS
Application sectors
Multimedia document management•
Search by content into audio-visual documents•

31
Acoustic Speaker Diarization
Contact details:
Claude Barras
claude.barras@limsi.fr
Description:
Technical requirements: Conditions for access and use:
LIMSI-CNRS
Spoken Language Processing Group B.P. 133
http://www.limsi.fr/tlp/
A standard PC with Linux operating system. The technology developed at LIMSI-CNRS is
available for licensing on a case-by-case basis.
Speaker diarization is the process of partitioning an
input audio stream into homogeneous segments
according to their speaker identity. This partitioning
is a useful preprocessing step for an automatic
speech transcription system, but it can also improve
the readability of the transcription by structuring the
audio stream into speaker turns. One of the major
issues is that the number of speakers in the audio
stream is generally unknown a priori and needs to be
automatically determined.
Given samples of known speaker’s voices, speaker
verification techniques can be further applied and
provide clusters of identified speaker.
The LIMSI multi-stage speaker diarization system combines an
agglomerative clustering based on Bayesian information criterion
(BIC) with a second clustering stage using speaker identification
(SID) techniques with more complex models.
This system participated to several evaluations on acoustic
speaker diarization, on US English Broadcast News for NIST
Rich Transcription 2004 Fall (NIST RT’04F) and on French
broadcast radio and TV news and conversations for the ESTER-1
and ESTER-2 evaluation campaigns, providing state-of-the-art
performances. Within the QUAERO program, LIMSI is developing
improved speaker diarization and speaker tracking systems for
broadcast news but also for more interactive data like talk shows.
It is a building block of the system presented by QUAERO partners
to the REPERE challenge on multimodal person identification.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
32
Audio and Text synchronization tool
E-editors•
Media content producers•
Media application developers•
Search interface integrators•
Partners:
Vecsys
Bertin Technologies
Exalead
Application sectors
Public/Private debates and conference: E.g.•
Parliament, Meetings
E-learning/E-books: E.g. Audiobook•
Media Asset Management: E.g. Search in•
annotated media streams (TV, radio, films…)

33
MediaSpeech® alignment
Contact details:
Ariane Nabeth-Halber
anabeth@vecsys.fr
Vecsys
Parc d’Activité du Pas du Lac
10 bis avenue André Marie Ampère
78180 Montigny-le-Bretonneux France
http://www.vecsys.fr
Standard Web access Available in SaaS mode or installed on a server or installed as
Virtual Machine in MediaSpeech® product line.
Quotation on request
Description:
This technology synchronizes an audio stream with its
associated text transcript: it takes as inputs both audio
stream and raw transcript and produces as output a
“time coded” transcript, i.e. each word or group of
words is associated with its precise occurrence in the audio stream.
The technology is pretty robust and handles nicely slight variations
between audio speech and text transcript.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
34
Automatic speech recognition, also
known as speech-to-text, is the
transcription of speech into (machine-
readable) text by a computer
Researchers•
Developers•
Integrators•
Partners:
RWTH Aachen University
Application sectors
The use of automatic speech recognition is so
manifold that it is hard to list here. Main usages
today are customer interaction via the telephone,
healthcare dictation and usage on car navigation
systems and smartphones. With increasingly better
technology, these applications are extending to audio
mining, speech translation and an increased use of
human computer interaction via speech.

35
Automatic Speech Recognition
Contact details:
Volker Steinbiss
steinbiss@informatik.rwth-aachen.de
Description:
Lehrstuhl Informatik 6
Templergraben 55
52072 Aachen Germany
http://www-i6.informatik.rwth-aachen.de
Speech translation is a computationally and
memory-intensive process, so the typical set-up is
to have one or several computers in the internet
serving the speech translation requirements of
many users.
RWTH provides on open-source speech recognizer
free of charge for academic usage.
Other usage should be subject to a bilateral
agreement.
Automatic speech recognition is a very hard problem
in computer science but more mature than machine
translation.
After a media hype at the end of the 1990’s, the
technology has continuously improved and it
has been adopted by the market, e.g. in large
deployments in the customer contact sector, in the
automation in radiology dictation, or in voice enabled navigation
systems in the automotive sector.
Public awareness has increased through the use on smart-phones,
in particular Siri. The research community concentrates on
problems such as the recognition of spontaneous speech or the
easy acquisition of new languages.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
36
Vocapia Research develops core multilingual large
vocabulary speech recognition technologies* for voice
interfaces and automatic audio indexing applications.
This speech-to-text technology is available for multiple
languages. (* Under license from LIMSI-CNRS)
The targeted users and customers of speech-to-text
transcription technologies are actors in the multimedia and call
center sector, including academic and industrial organizations
interested in the automatic mining processing of audio or
audiovisual documents.
Partners:
Vocapia
Application sectors
This core technology can serve as the basis for a variety of
applications: multilingual audio indexing, teleconference
transcription, telephone speech analytics, transcription of
speeches, subtitling…
Large vocabulary continuous speech recognition is the key
technology for enabling content-based information access
in audio and audiovisual documents. Most of the linguistic
information is encoded in the audio channel of audiovisual data,
which once transcribed can be accessed using text-based tools.
Via speech recognition, spoken document retrieval can support
random access using specific criteria to relevant portions
of audio documents, reducing the time needed to identify
recordings in large multimedia
databases. Some applications are data-mining, news-on-
demand, and
media monitoring.

37
Automatic Speech Transcription
Contact details:
Bernard Prouts
prouts@vocapia.com
contact@vocapia.com
+33 (0)1 84 17 01 14
Description:
Vocapia Research
28, rue Jean Rostand
Parc Orsay Université
91400 Orsay France
www.vocapia.com
PC with Linux platform (via licensing use). The VoxSigma software is available both via licensing
and via our web service.
The Vocapia Research speech transcription system
transcribes the speech segments located in an audio
file. Currently systems for 17 languages varieties are
available for broadcast and web data. Conversational
speech transcription systems are available for 7
languages.
The transcription system has two main components:
an audio partitioner and a word recognizer.
The audio partitioner divides the acoustic signal into
homogeneous segments, and associates appropriate (document
internal) speaker labels with the segments.
For each speech segment, the word recognizer determines
the sequence of words, associating start and end times and a
confidence measure for each word.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
38
Language Resources production
infrastructure
Linguistic resources providers•
Audio content transcribers; Media transcribers•
Speech processing users and developers•
Partners:
Vecsys
LIMSI-CNRS
Application sectors
Language resources production•
Speech technology industry•
Media subtitling, conferences and meetings•
transcription services

39
Corinat®
Contact details:
Ariane Nabeth-Halber
anabeth@vecsys.fr
Description:
Vecsys
Parc d’Activité du Pas du Lac
10 bis avenue André Marie Ampère
78180 Montigny-le-Bretonneux France
http://www.vecsys.fr
Standard Web access Quotation on request
Corinat® is a hardware/software infrastructure
for language resources production that offers the
following functionalities:
Data collection (broadcast, conversational)•
Audio data automatic pre-processing•
Annotation tasks distribution•
Annotations semi-automatic post-processing•
Corinat® is a high availability platform (24/7), with a web-based
interface for language resources production management in any
location.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
40
Vocapia Research provides a language
identification technology* that can identify
languages in audio data.
(* Under license from LIMSI-CNRS)
The targeted users and customers of language
recognition technologies are actors in the multimedia
and call center sectors, including academic and industrial
organizations, as well as actors in the defense domain,
interested in the processing of audio documents, and in
particular if the collection of documents contains multiple
languages.
Partners:
Vocapia
Application sectors
A language identification system can be run prior to
a speech recognizer. Its output is used to load the
appropriate language dependent speech recognition
models for the audio document.
Alternatively, the language identification might be used to
dispatch audio documents or telephone calls to a human
operators fluent in the corresponding identified language.
Other potential applications also involve the use of LID
as a front-end to a multi-lingual translation system. This
technology can also be part of automatic system for
spoken data retrieval or automatic enriched transcriptions.

41
Language Identification
Contact details:
Bernard Prouts
prouts@vocapia.com
contact@vocapia.com
+33 (0)1 84 17 01 14
Description:
Vocapia Research
28, rue Jean Rostand
Parc Orsay Université
91400 Orsay
www.vocapia.com
PC with Linux platform (via licensing use).
The VoxSigma software is available both via
licensing and via our web service.
The VoxSigma software suite can recognize the
language spoken in an audio document or in speech
segments defined in an input XML file. The default
set of possible languages and their associated models
can be specified by the user.
LID systems are available for broadcast and
conversational data. Currently 15 languages
for broadcast news audio and 50 languages for
conversational telephone speech are included in the
respective VR LID system. New languages can easily
be added to the system.
The VoxSigma software suite uses multiple phone-based
decoders in parallel to take a decision about which language is in
the audio file.
The system specifies the language of the audio document along
with a confidence score. In the current version, it is assumed that
a channel of an audio document is in a single language. In future
versions, it is planned to allow multiple languages in a single
document.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
42
Transcription of human speech into
written word sequences
Companies who want to integrate the transcription
of human speech into their products.
Partners:
Karlsruhe Institute of Technology (KIT)
Application sectors
Speech-to-Text technology is key to indexing
multimedia content as it is found in multimedia
databases or in video and audio collections on
the World Wide Web, and to make it searchable
by human queries. In addition, it offers a natural
interface for submitting and executing queries.
This technology is further part of speech-translation
services. In communication with machine translation
technology, it is possible to design machines that
take human speech as input and translate it into a
new language. This can be used to enable human-to-
human combination across the language barrier or to
access languages in a cross-lingual way.

43
Speech-to-Text
Contact details:
Prof. Alex Waibel
waibel@ira.uka.de
Description:
Karlsruhe Institute of Technology (KIT)
Adenauerring 2
76131 Karlsruhe Germany
http://isl.anthropomatik.kit.edu
Linux based server with 2GB of RAM.
Available for licensing on a case-by-case basis.
The KIT speech transcription system is based on the
JANUS Recognition Toolkit (JRTk) which features
the IBIS single pass decoder. The JRTk is a flexible
toolkit which follows an object-oriented approach and
which is controlled via Tcl/Tk scripting.
Recognition can be performed in different modes:
In offline mode, the audio to be recognized is first
segmented into sentence-like units. Theses segments
are then clustered in an unsupervised way according
to speaker. Recognition can then be performed
in several passes. In between passes, the models
are adapted in an unsupervised manner in order
to improve the recognition performance. System
combination using confusion network combination
can be used in addition to further improve
recognition performance.
In run-on mode, the audio to be recognized is continuously
processed without prior segmentation. The output is a steady
stream of words.
The recognizer can be flexibly configured to meet given real-time
requirements, between the poles of recognition accuracy and
recognition speed.
Within the Quaero project, we are targeting the languages English,
French, German, Russian, and Spanish. Given sufficient amounts
of training material, the HMM based acoustic models can be easily
adapted to additional languages and domains.

Machine Translation - RWTH
Aachen University p46
Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
44

Speech Translation - RWTH
Aachen University p48
45

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
46
Automatic translation of text breaks the
language barrier: It allows instant access
to information in foreign languages.
Researchers•
Developers•
Integrators•
Partners:
Application sectors
As translation quality is far below the work of
professional human translators, machine translation
is targeted to situations where instant access and low
cost are key and high quality is not demanded, for
example:
Internet search (cross-language document•
retrieval)
Internet (on-the-fly translation of foreign-•
language websites or news feeds)

47
Machine Translation
Contact details:
Volker Steinbiss
Description:
Templergraben 55
Translation is a memory-intense process, so the
typical set-up is to have one or several computers in
the internet serving the translation requirements of
many users.
RWTH provides open-source translation tools free
of charge for academic usage. Other usage should
be subject to a bilateral agreement.
Machine translation is a very hard problem in
computer science and has been worked on for
decades. The corpus-based methods that emerged
in the 1990’s allow the computer to actually learn
translation from existing bilingual texts – you could
say, from many translation examples.
A correct mapping is indeed not easy to learn, as the translation
of a word depends on its context, and word orders typically differ
across languages. It is fascinating to see this technology improving
over the years. The learning methods are more of a mathematical
kind and can be applied to any language pair.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
48
Automatic translation of speech practically
sub-titles – in your native language! – the
speech of foreign-language speakers.
Researchers•
Developers•
Integrators•
Partners:
Application sectors
Sub-titling of broadcast via television or internet•
Internet search in audio and video material•
(cross-language retrieval)

49
Speech Translation
Contact details:
Volker Steinbiss
Description:
Templergraben 55
Speech translation is a computationally and
memory-intensive process, so the typical set-up is
to have one or several computers in the internet
serving the speech translation requirements of
many users.
RWTH provides on open-source speech recognizer
and various open-source tools free of charge for
academic usage. Other usage should be subject to
a bilateral agreement.
In a nutshell, speech translation is the combination of
two hard computer science problems, namely speech
recognition (automatic transcription of speech into
text) and machine translation (automatic translation
of a text from a source to a target language).
While both technologies do not work perfectly, it is impressive to
see them working in combination, in particular when we have not
even rudimentary knowledge of the source language – for many
of us, this is the case for the Chinese or the Arabic language.
The mathematical methods behind both speech recognition
and machine translation are related, and the systems draw their
knowledge from large amounts of example data.

Sync Audio Watermarking
-Technicolor p52
Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
50

SAMuSA: Speech And Music
Segmenter and Annotator -Inria
p54
Yaafe: Audio feature extractor -
Télécom ParisTech p56
51

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
52
Technicolor Sync Audio Watermarking
technologies
Content Owners•
Studios•
Broadcasters•
Content distributors•
Partners:
Technicolor
Application sectors
Technicolor Sync Audio Watermarking allows studios and
content owners
to create more valuable and attractive content by•
delivering premium quality information
to generate additional earnings through targeted ads,•
e-commerce and product placement alongside main
screen content
Technicolor Sync Audio Watermarking allows
broadcasters and content distributors
to provide distinctive content and retain audiences•
to control complementary content on the 2nd screen•
within their branded environment
to leverage real-time, qualified behavior metadata to•
better understand customers and deliver personalized
content and recommendations
ContentArmor™ Audio Watermarking allows content
owners to deter content leakage by tracking the source of
pirated copies.

53
Sync Audio Watermarking
Contact details:
Gwenaël Doërr
gwenael.doerr@technicolor.com
Description:
Technicolor R&D France
975, avenue des Champs Blancs
ZAC des Champs Blancs / CS 176 16
35 576 Cesson-Sévigné France
http://www.technicolor.com
Technicolor Sync Audio Watermarking detector•
works on Android and iOS.
The watermark embedder of both technologies•
works on Linux and MacOS.
Both systems can be licensed as software
executables or libraries.
With Technicolor Sync Audio Watermarking
technologies, studios, content owners, aggregators
and distributors can sync live, recorded or time-
shifted content and collect qualified metadata.
And thanks to Technicolor’s expertise in both
watermarking and entertainment services, these
solutions are easily integrated into your existing post-
production, broadcast and any new media delivery
workflows.
Technicolor sync technologies open access to all the
benefits of new attractive companion app markets
with no additional infrastructure cost.
Content identification and a time stamp are inaudibly inserted
into the audio signal in post-production or during broadcast.
The 2nd screen device picks up the audio signal, decodes the
watermark and synchronizes the app on the 2nd screen thanks to
the embedded content identification data. Audio watermarking
uses the original content audio signal as its transmission channel,
ensuring compatibility with all existing TVs, PVRs or DVD/Blu-
ray players as well as legacy devices without network interfaces. It
works for realtime, time-shifted and recorded content.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
54
Speech And Music Segmenter and
Annotator
The targeted users and customers are the
multimedia industry actors, and all academic or
industrial laboratories interested in audio document
processing.
Partners:
Inra
Application sectors
Audio and multimedia document processing•

55
SAMuSA: Speech And Music Segmenter and Annotator
Contact details:
General issues:
Patrick Gros
patrick.gros@irisa.fr
IRISA/Texmex team
Campus de Beaulieu
35042 Rennes Cedex France
http://www.irisa.fr/
PC with Unix/Linux OS•
SAMuSA is a software that has been developed at
Irisa in Rennes and is the property of CNRS and Inria.
SAMuSA is currently available as a prototype only. It
can be released and supplied under license on a case-
by-case basis.
Technical issues:
Sébastien Campion
scampion@irisa.fr
Description:
As shown on Figure below, the SAMuSA module
takes an audio file or stream as an input, and returns
a text file containing detected segments of: speech,
music and silence.
To perform segmentation, SAMuSA uses audio class
models as external resources. It also calls external
tools for audio feature extraction (Spro software
[1]), and for audio segmentation and classification
(Audioseg software [2]). These tools are included in
the SAMuSA package.
Trained on hours of various TV and radio programs,
this module provides efficient results: 95% of speech and 90% of
music are correctly detected.
One hour of audio can be computed in approximately one
minute on standard computers.
[1] http://gforge.inria.fr/projects/spro/
[2] http://gforge.inria.fr/projects/audioseg/
SAMuSA was developed in Irisa/INRIA Rennes by the Metiss
team.
The SAMuSA authors are: Frédéric Bimbot, Guillaume Gravier,
Olivier Le Blouch.
The Spro author is: Guillaume Gravier
The Audioseg authors are: Mathieu Ben, Michaël Betser,
Guillaume Gravier

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
56
Yaafe is a low-level and mid-level audio
features extractor, designed to extract large
number of features over large audio files.
Targeted integrators and users are industrial or
academic laboratories in the field of audio signal
processing and in particular for music information
retrieval tasks.
Partners:
Télécom ParisTech
Application sectors
Music information retrieval.•
Audio segmentation.•

57
Yaafe: Audio feature extractor
Contact details:
S. Essid
slim.essid@telecom-paristech.fr
Description:
Télécom ParisTech
37 rue Dareau
75014 Paris / France
http://www.tsi.telecomparistech.fr/aao/en/2010/02/19/
yaafe-audio-feature-extractor/
Yaafe is a C++/Python software available for linux
and Mac.
Yaafe has been released under LGPL licence
and is available for download on Sourceforge.
Some mid-level feature ARE available in a
separate library, with a proprietary licence.
Yaafe is designed to extract a large number of
features simultaneously, in an efficient way. It
automatically optimizes features’ computation, so
that each intermediate representation (spectrum,
CQT, envelope, etc…) is computed only once.
Yaafe works in a streaming mode, so it has a low
memory footprint and can process arbitrarily long
audio files.
Available features are spectral features, perceptual features
(loudness), MFCC, CQT, chroma, chords, onsets detection.
A user can select his own set of features and transformations
(derivative, temporal integration), and easily adapt all
parameters to his own task.

Colorimetric Correction System -
Jouve p60
Document Reader -A2iA p66
Handwriting Recognition System -
Jouve p72
Recognition of Handwritten Text -
RWTH Aachen University p78
Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
58

Document Classification System -
Jouve p62
Document Structuring System -
Jouve p68
Image Descreening System - Jouve
p74
Document Layout Analysis System
- Jouve p64
Grey Level Character Recognition
System -Jouve p70
Image Resizing for Print on
Demand Scanning - Jouve p76
59

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
60
A specific tool to create a suitable
colorimetric correction and check its
stability over time
Everyone who has to deal with highcolorimetric
constraints.
Partners:
Jouve
Application sectors
Patrimony•
Industry•

61
Colorimetric Correction System
Contact details:
Jean-Pierre Raysz
jpraysz@jouve.fr
Jouve R&D
1, rue du Dr Sauvé
53000 Mayenne France
www.jouve.com
Any Posix compliant system Ask Jouve
Description:
The system uses a file containing reference values of
calibration target and the image obtained from target
scanning. A profile is created from this file. In order to
improve correction, a table of colors transformation is
integrated to the system.
To guarantee the required quality, the system checks several
times the values of a calibration target against the specifications.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
62
A generic tool for classifying documents
based on a hybrid learning technique
Everyone who has to deal with document
classification with a large amount of already classified
documents.
Partners:
Jouve
Application sectors
Industrial property•
Scientific Edition•

63
Document Classification System
Contact details:
Gustavo Crispino
gcrispino@jouve.fr
Jouve R&D
30, rue du Gard
62300 Lens France
www.jouve.com
Description:
The 100% automatic system is based on linguistic
resources that are extracted from already classified
documents.
On a 100 classes patent preclassification task, this system
achieves 85% precision (that is 5% better than human operators
for this task).

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
64
A generic tool to identify and extract
regions of text by analyzing connected
components
Everyone who has to deal with document image
analysis.
Layout analysis is the first major step in a document
image analysis workflow. The correctness of
the output of page segmentation and region
classification is crucial as the resulting representation
is the basis for all subsequent analysis and
recognition processes.
Partners:
Jouve
Application sectors
Industry•
Service•
Patrimony•
Edition•
Administration•

65
Document Layout Analysis System
Contact details:
Jean-Pierre Raysz
jpraysz@jouve.fr
Jouve R&D
1, rue du Dr Sauvé
www.jouve.com
Description:
The system identifies and extracts regions of text by
analyzing connected components constrained by
black and white (background) separators. The rest is
filtered out as non-text. First, the image is binarized,
any skew is corrected and black page borders are removed.
Subsequently, connected components are extracted and filtered
according to their size (very small components are filtered out).

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
66
Classification of all types of paper
documents, Data Extraction and Mail
Processing and Workflow Automation
Independent Software Vendors•
Business Process Outsourcers•
Partners:
A2iA
Application sectors
Bank, Insurance, Administration, Telecom and Utility
Companies, Historical Document Conversion

67
Document Reader
Contact details:
Venceslas Cartier
venceslas.cartier@a2ia.com
A2iA
39, rue de la Bienfaisance
75008 Paris France
www.a2ia.com
Wintel Platform
Upon request
Description:
Classification of all types of paper documents
A2iA DocumentReader classifies digitized
documents into user-defined classes or “categories”
(letters, contracts, claim forms, accounts receivable,
etc.) based on both their geometry and their content.
The software analyzes the layout of items on the
document. Then, using a general dictionary and trade
vocabulary, it carries out a literal transcription of the
handwritten and/or typed areas.
A2iA DocumentReader can then extract key-words
or phrases in order to determine the category of the
document.
Data Extraction
A2iA DocumentReader uses 3 methods to extract
data from all types of paper documents:
Extraction from predefined documents. Some
documents (such as checks, bank documents
and envelopes) are preconfigured within A2iA
DocumentReader. The software recognizes their
structure, the format of data to be extracted and
their location on the document.Extraction from
structured documents. A2iA DocumentReader recognizes and
extracts data within a fixed location on the document.Extraction
from semi-structured documents. The layout of the document
varies but the data to be extracted remains unchanged. A2iA
DocumentReader locates this data by its format and the proximity
of key-words, wherever they appear on the document.
Mail Processing and Workflow Automation
A2iA DocumentReader analyzes the entire envelope or folder on
a wholistic level, just as a human would, to identify its purpose and
subject-matter (termination of subscription, request for assistance,
change of address, etc.). All of the documents together can have
a different meaning or purpose than a single document on its
own. A2iA DocumentReader then transmits the digital data to the
classification application in order to route the mail to the correct
person or department. Mail is sent to the appropriate location as
soon as it arrives: processing and response times are minimized,
workflow automated, and manual labor decreased.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
68
A generic tool to recognize the logical
structure of documents from a OCR stream
Everyone who has to deal with electronic document
encoding of from the original source material
and needs to consider the hierarchical structure
represented in the digitized document.
Partners:
Jouve
Application sectors
Industry•
Service•
Patrimony•
Administration•

69
Document Structuring System
Contact details:
Jean-Pierre Raysz
jpraysz@jouve.fr
Jouve R&D
1, rue du Dr Sauvé
www.jouve.com
Description:
The system recognizes the logical structure of
documents from a OCR stream in accordance with
the descriptions of a model (DTD, XML Schema).
The result is a hierarchically structured flow. The model involves
both knowledge of the macro-structure of the documents and
the micro-structure of their content.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
70
A recognition engine for degraded
printed documents
Everyone who has to deal with Character recognition
on grey level images. Specifically targeted for low
quality documents, the system also outperforms on
the shelf OCR engines for good quality images.
Partners:
Jouve
Application sectors
Heritage scanning•
Printing•

71
Grey Level Character Recognition System
Contact details:
Jean-Pierre Raysz
jpraysz@jouve.fr
Description:
Jouve R&D
1, rue du Dr Sauvé
www.jouve.com
Despite all other OCR engines, this system processes
grey level images directly (without using a temporary
B&W image).
Using all the information present in the image, this system is
able to recognize degraded characters.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
72
Capture handwritten and machine-
printed data from documents
Everyone who has to deal with forms containing
handwritten fields or to process incoming mails
Partners:
Jouve
Application sectors
Banking•
Healthcare•
Government•
Administration•

73
Handwriting Recognition System
Contact details:
Jean-Pierre Raysz
jpraysz@jouve.fr
Jouve R&D
1, rue du Dr Sauvé
www.jouve.com
Description:
JOUVE ICR (Intelligent Character Recognition)
engine is a combination of two complementary
systems: HMM and multidimensional recurrent
neural networks.
This engine has the advantage of dealing with input data of
varying size and taking the context into account. JOUVE ICR
carries on increasing recognition rate of handwritten fields in
forms, using links between the fields.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
74
A system that removes annoying
halftones in scanned images
Everyone who has to deal with high quality
reproduction of halftone images.
Partners:
Jouve
Application sectors
Printing•

75
Image Descreening System
Contact details:
Christophe Lebouleux
clebouleux@jouve.fr
Jouve R&D
1, rue du Dr Sauvé
www.jouve.com
Description:
Halftone is a process to reproduce photographs or
other images in which the various tones of grey or
color are produced by variously sized dots of ink.
When a document using this process is scanned, a
very uncomfortable screening effect may appear.
The system uses a combination of removal of peaks in Fourier
image and local Gaussian blur.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
76
A specific tool for recreating matter
that was lost during the scanning
process of bonded books.
Everyone who has to deal with high quality
reproduction of bonded books.
Partners:
Jouve
Application sectors
Printing•

77
Image Resizing for Print on Demand Scanning
Contact details:
Christophe Lebouleux
clebouleux@jouve.fr
Jouve R&D
1, rue du Dr Sauvé
www.jouve.com
Any Posix compliant system•
Grey level or color images•
Ask Jouve
Description:
In many cases, when documents have been debinded
before scanning (that suppresses a part of the
original), we are asked to provide an image at the
original size, and sometimes to provide larger images
than the original for reprint purpose.
Using Seam Carving technique, we are able to obtain very
realistic results.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
78
Recognition of handwritten text
transforms handwritten text into machine-
readable text on a computer.
Researchers•
Developers•
Integrators•
Partners:
Application sectors
Recognition of printed or handwritten text is heavily
used in the mass processing of paper mail, filled-out
forms and letters e.g. to insurance companies, and
has been covered by the media in connection with
the mass digitization of books. New usage patterns
will evolve from the better coverage of handwriting
and difficult font systems like Arabic or Chinese and
from the recognition of text in any form of image
data that due to digital cameras and the Internet, is
being produced and distributed in ever increasing
volumes.

79
Recognition of Handwritten Text
Contact details:
Volker Steinbiss
Templergraben 55
The text needs to be available in digitized form, e.g.
through a scanner as part of a digital image or video.
Processing takes place on a normal computer.
RWTH does currently not provide public access to
software in this area. Any usage should be subject to
a bilateral agreement.
Description:
Optical character recognition (OCR) works
sufficiently well on printed text but is in particular
difficult for handwritten material. This is due to the
fact that handwritten material contains a far higher
variability than printed one.
Methods that have been proven successful in other areas
such as speech recognition and machine translation are being
exploited to tackle this set of OCR problems.

Image Clusterization System -
Jouve p82
Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
80

Image Identification System -
Jouve p84
LTU Leading Image Recognition
Technologies - LTU technologies
p86
81

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
82
A generic tool to perform automatic
clustering of scanned images
Everyone who has to group a large set of images in
such a way that images in the same group are more
similar to each other than to those in other groups,
like for instance, in incoming mail processing.
Partners:
Jouve
Application sectors
Banking•
Insurance•
Industry•

83
Image Clusterization System
Contact details:
Jean-Pierre Raysz
jpraysz@jouve.fr
Jouve R&D
1, rue du Dr Sauvé
www.jouve.com
Description:
Two kinds of methods have been implemented. The
first method consists in applying optical character
recognition on pages. Distances are computed
between images to classify and images contained in a
database of labeled images.
The second method consists in randomly selecting a pool of
images inside a directory. For each image, invariant key points
are extracted and characteristic features are computed (SIFT or
SURF) to build the clusters.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
84
A generic tool to identify
automatically documents, photos and
text zones in scanned images
Everyone who has to deal with document recognition
like identity cards, passports, invoices…
Partners:
Jouve
Application sectors
Administration•
Banking•
Insurance•

85
Image Identification System
Contact details:
Jean-Pierre Raysz
jpraysz@jouve.fr
Jouve R&D
1, rue du Dr Sauvé
www.jouve.com
Description:
The system searches the best match between image
signatures and model signatures. It determines
whether the same kind of model is present in the
image which has to be segmented or not.
The segmentation done on the model is reported in the
image to be segmented by applying an affine transformation
(translation, rotation and homothety).

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
86
Leading Image Recognition
Technologies
Brands•
Retailers•
Social Media Monitoring companies•
Research companies•
Government agencies•
Partners:
LTU technologies
Application sectors
Visual Brand Intelligence: e-reputation, brand•
protection
Media Monitoring•
M-Commerce and E-Commerce: augmented•
reality, interactive catalogs, virtual Shop,
advanced search functionalities, etc.
Visual Asset Management: Images classification,•
Images de-duplication, Images filtering,
moderation, etc.

87
LTU Leading Image Recognition Technologies
Contact details:
Frédéric Jahard
fjahard@ltutech.com
LTU technologies
Headquarter:
132 rue de Rivoli
75001 Paris, France
+33 1 53 43 01 68
Coming soon
Coming soon
US office:
232 Madison Ave
New York, NY 10016 USA
+1 646 434 0273
http://www.ltutech.com
Description:
Founded in 1999 by researchers at MIT, Oxford and
Inria, LTU provides cutting-edge image recognition
technologies and services to global companies and
organizations such as Adidas, Kantar Media and Ipsos.
LTU’s solutions are available on-demand with
LTU Cloud or on an on-premise basis with LTU
Enterprise Software. These patented image
recognition solutions enable LTU’s clients to
effectively manage their visual assets – internally and
externally – and innovate by bringing their end-users
truly innovative visual experiences.
In an image-centric world, LTU’s expertise runs the image
recognition gamut from visual search, visual data management,
investigations and media monitoring, to e-commerce, brand
intelligence, and mobile applications.

AudioPrint - IRCAM p90
Ircamchord: Automatic Chord
Estimation - IRCAM p96
Music Structure - Inria p102
Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
88

Ircamaudiosim: Acoustical Similarity
Ircammusicgenre and
Ircammusicmood: Genre and Mood
Ircambeat: Music Tempo, Meter,
Beat and Downbeat Estimation -
IRCAM p94
Ircamsummary: Music Summary
Generation and Music Structure
89

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
90
AudioPrint captures the acoustical
properties by computing a robust
representation of the sound
AudioPrint is dedicated to middleware integrators
that wish to develop audio fingerprint applications
(i.e. systems for live recognition of music on air),
as well as synchronization frameworks for second
screen applications (a mobile device brings contents
directly related to the live TV program). The music
recognition application can also be used by digital
rights management companies.
Partners:
IRCAM
Application sectors
Second screen software providers•
Digital right management•
Music query software developers•

91
AudioPrint
Contact details:
Frédérick Rousseau
Frederick.Rousseau@ircam.fr
IRCAM
Sound Analysis /Synthesis
1 Place Igor-Stravinsky
75004 Paris France
http://www.ircam.fr
AudioPrint is available as a static library for Linux,
Mac OS X and iOS platforms.
Ircam Licence
Description:
AudioPrint is an efficient technology for live or
offline recognition of musical tracks, within a database
of learnt tracks. It captures the acoustical properties
of the audio signal by computing a symbolic
representation of the sound profile that is robust to
common alterations.
Moreover, it provides a very precise estimation of the temporal
offset within the detected musical track. This offset estimation
can be used as a means to synchronize devices.

Q&A
(4) p20 to 27
Translation of Text
Speech
Processing
(7) p30 to 43
Document Processing
(10) p60 to 79
Audio
Processing
(3) p52 to 57
Music
Processing
(7) p90 to 103
Indexing, Ranking
Content
Analysis
(4) p110 to 117
(12) p124 to 147
Gesture
Recognition
(1) p120 to 121
92
Ircamaudiosim estimates the acoustical
similarity between two music tracks.
It can be used to perform music
recommendation based on music
content similarity.
Ircamaudiosim allows the development of music
recommendation based on music content similarity.
It can therefore be used for any system (online or
offline) requiring music recommendation, such as
for the development of a recommendation engine
for online music service or offline music collection
browsing.
Partners:
IRCAM
Application sectors
Online music providers•
Online music portals•
Music players developers•
Music software developers•

93
Ircamaudiosim: Acoustical Similarity Estimation
Contact details:
Frédérick Rousseau
Frederick.Rousseau@ircam.fr
IRCAM
Sound Analysis /Synthesis
1 Place Igor-Stravinsky
75004 Paris France
http://www.ircam.fr
Ircamaudiosim is available as software or as a
dynamic library for Windows, Mac OS-X and Linux
platform.
Ircam Licence
Description:
Ircamaudiosim estimates the acoustical similarity
between two audio tracks. For this, each music track
of a database is first analyzed in terms of its acoustical
content (timbre, rhythm, harmony). An efficient
representation of this content is used, that allows a
fast comparison between two music tracks.
Because of this, the system is scalable to large databases. Given
a target music track, the most similar (in terms of acoustical
content) items of the database can be found quickly and then
be used to provide recommendation to the listener.

Quaero Technology Catalog

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Quaero Technology Catalog

Similar a Quaero Technology Catalog (20)

Más de Nabil Bouzerna

Más de Nabil Bouzerna (15)

Último

Último (20)

Quaero Technology Catalog