SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
David Baehrens
Large-Scale Patent Classification
at the European Patent Office
ABOUT AVERBIS
Founded: 2007
Location: Freiburg im Breisgau
Team: Domain & IT-Experts
Focus: Leverage structured & unstructured information
Current Sectors: Pharma, Health, Automotive, Publishers & Libraries
PORTFOLIO
Solutions
Libraries PharmaPatentsHealthcare Social Media
Terminology
Management Text Mining
Search &
Analytics NoSQL
Categorization
& Clustering
Automotive
TERMINOLOGY MANAGEMENT
Terminology management
software
Provision of terminologies
Mappings between
terminologies
Building terminology-based
applications
Synonyms: dimethyl sulfoxide, dimethylsulfoxide, Domoso, Infiltrina
Hierarchies: cancer, carcinoma, melanoma, lymphoma, glioblastoma…
Patterns: dates, citations, mail addresses…
Rule-based extraction of all different kinds of complex information
Persons, Locations, Genes, ….
Coocurrences, Typed Relations, e.g. Genes / Diseases / Modification Type
TEXT MINING
Term Detection
Regular
Expressions
Rule Engine
Named Entities
Relations
Sentences, Tokens, POS-Tags, Chunks, Paragraphs, Sections, Stemming, Decompounding…Syntax Detection
RULE ENGINE
1. NAME OF THE MEDICINAL PRODUCT
Desloratadine ratiopharm 5 mg film-coated tablets
Primary Field Name Secondary Field Name Field Value
MedicalProductName coveredText Desloratadine ratiopharm 5 mg film-coated tablets
inventedPartName DESLORATADINE
strengthPart 5 mg
pharmaceuticalDoseFormPart FILM-COATED TABLET
TextRegelErgebnis
SEARCH & NOSQL
Free text + concept based
search
Text mining integration
Guided navigation / facets
NoSQL functionalities
Multi- & cross lingual search
Related documents
Based on Apache Solr
• Extended Query Syntax
• JSON-API
• Scalability
…
DOCUMENT CLASSIFICATION
Hotel Reviews
Patents
SEARCH & NOSQL
INFORMATION DISCOVERY
Terminology
Management Text Mining
Search &
Analytics NoSQL
Categorization
& Clustering
Delivery / Deployment / Runtime Environment
Integration Tests / Continuous Integration
Extensive Documentation
Common Architecture / Application Design
User & Role Management, Security
Communication Bus
Project Management
PATENT CLASSIFICATION AT EPO
Tender No. 1585
1) Pre-Classification of
unpublished patents into departments
2) Re-Classification on
published patents, if category system changes
ABOUT EPO
• The European Patent Office (EPO)
grants European patents for the
Contracting States to the European
Patent Convention
• Second largest intergovernmental
institution in Europe
• Not an EU institution
• Self-financing, i.e. revenue
from fees covers operating
and capital expenditure
NUMBER OF STAFF
Status: December 2008
PATENT APPLICATIONS
http://www.epo.org/about-us/annual-reports-statistics/annual-report/2014.html
COOPERATIVE PATENT CLASSIFICATION
• Patent Classification System based on ECLA / IPC
• jointly developed by the European Patent Office (EPO)
and the United States Patent and Trademark Office
(USPTO)
• used by both the EPO and USPTO since 1 January 2013
• currently contains about 250.000 classes
EXAMPLE CPC CLASS
GRANTED PATENT
EARLY PATENT
EARLY PATENT
EARLY PATENT
PATENT CLASSIFICATION AT EPO
Tender No. 1585
1) Pre-Classification of
unpublished patents into departments
Our Motivation:
• Great Classification Use-Case
– Big Data (80 Mio. patents available)
– Large Scale Category System >250.000 CPC codes
– Tough classification quality and response time
constraints
• Text Mining Success Story
OLD CLASSIFICATION PROCESS
PATENTS CLA SSIFICATION DEPARTMENTS
CLASSIFICATION COMPLEXITY
~250.000
CPC Codes
~1.500
Ranges
250
Departments
CLASSIFICATION PROCESS
PATENTS CLA SSIFICATION DEPARTMENTS
NEW CLASSIFICATION PROCESS
PATENTS CLA SSIFICATION DEPARTMENTS
SOME FACTS
• about 650k training documents from 2005-2013
• supervised learning: light-weight and fast linear support
vector machine
• Training time (16 Cores, 128 GB RAM)
– Feature Extraction: ~1 hour
– Training of Classifiers: ~1 hour
– 90/10 tests with a look-a-head of 3 levels
and reporting 3 best candidates: ~1 hour
• Prediction: 5 docs in 5 sec
HIERARCHICAL CLASSIFICATION
STATUS & OUTLOOK
 Range-specific quality
evaluation
 Going live with best
ranges
• Continuous optimization
PATENT CLASSIFICATION AT EPO
Tender No. 1585
1) Re-Classification on
published patents, if category system changes
Challenges and Facts:
– 250.000 CPC codes, regular changes/refinements
– Several re-classification projects at any one time, great
variation in size, a class is split into 5-20(?) subclasses
– No training material available
NEW RE-CLASSIFICATION PROCESS
Training Data
• Human Annotator starts labeling about 20% of
the documents with new subclasses
Statistical Models
• are generated on-the-fly, and
• Cross-validation test are carried out
Threshold
• If cross-validation achieves certain threshold
(e.g. 90%), the remaining documents are
classified fully automatically without further
review
• Otherwise, more training data is being generated
STATUS & OUTLOOK
 Currently in evaluation
phase
• Going live in the next
weeks
…NOT ONLY PATENTS
Solutions
Libraries PharmaPatentsHealthcare Social Media
Terminology
Management Text Mining
Search &
Analytics NoSQL
Categorization
& Clustering
Automotive
For further questions, please contact:
David Baehrens
 + 49 (0)761 203 97690
 info@averbis.com

Más contenido relacionado

Destacado

(Open) Data Activities in the City of Vienna
(Open) Data Activities in the City of Vienna(Open) Data Activities in the City of Vienna
(Open) Data Activities in the City of ViennaSemantic Web Company
 
Data Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked DataData Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked DataSemantic Web Company
 
Florian Bauer: Using open data thesauri to connect climate platforms
Florian Bauer: Using open data thesauri to connect climate platformsFlorian Bauer: Using open data thesauri to connect climate platforms
Florian Bauer: Using open data thesauri to connect climate platformsSemantic Web Company
 
Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...
Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...
Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...Semantic Web Company
 
BigDataEurope - Empowering Communities with Data Technologies
BigDataEurope - Empowering Communities with Data TechnologiesBigDataEurope - Empowering Communities with Data Technologies
BigDataEurope - Empowering Communities with Data TechnologiesSemantic Web Company
 
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...Semantic Web Company
 
SKOS as a key element in Enterprise Linked Data Strategies
SKOS as a key element in Enterprise Linked Data StrategiesSKOS as a key element in Enterprise Linked Data Strategies
SKOS as a key element in Enterprise Linked Data StrategiesSemantic Web Company
 
Lieke Verhelst: Ontology Development ..the Lean way
Lieke Verhelst: Ontology Development ..the Lean wayLieke Verhelst: Ontology Development ..the Lean way
Lieke Verhelst: Ontology Development ..the Lean waySemantic Web Company
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Achim Steinacker: Technical Documentation in the age of Industry 4.0
Achim Steinacker: Technical Documentation in the age of Industry 4.0Achim Steinacker: Technical Documentation in the age of Industry 4.0
Achim Steinacker: Technical Documentation in the age of Industry 4.0Semantic Web Company
 
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...Semantic Web Company
 
Taxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingTaxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingSemantic Web Company
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
第10讲顺序控制设计法与顺序功能图
第10讲顺序控制设计法与顺序功能图 第10讲顺序控制设计法与顺序功能图
第10讲顺序控制设计法与顺序功能图 qtsharp
 

Destacado (20)

(Open) Data Activities in the City of Vienna
(Open) Data Activities in the City of Vienna(Open) Data Activities in the City of Vienna
(Open) Data Activities in the City of Vienna
 
The Healthdirect Australia Story
The Healthdirect Australia StoryThe Healthdirect Australia Story
The Healthdirect Australia Story
 
Data Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked DataData Strategies: Metadata, Open Data, Linked Data
Data Strategies: Metadata, Open Data, Linked Data
 
SKOS - An Overview
SKOS - An OverviewSKOS - An Overview
SKOS - An Overview
 
Florian Bauer: Using open data thesauri to connect climate platforms
Florian Bauer: Using open data thesauri to connect climate platformsFlorian Bauer: Using open data thesauri to connect climate platforms
Florian Bauer: Using open data thesauri to connect climate platforms
 
Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...
Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...
Vincenzo Orabona (Raffaele Palmieri): Semantic Web technologies to increase W...
 
BigDataEurope - Empowering Communities with Data Technologies
BigDataEurope - Empowering Communities with Data TechnologiesBigDataEurope - Empowering Communities with Data Technologies
BigDataEurope - Empowering Communities with Data Technologies
 
Data Activities in Austria
Data Activities in AustriaData Activities in Austria
Data Activities in Austria
 
Study: #Big Data in #Austria
Study: #Big Data in #AustriaStudy: #Big Data in #Austria
Study: #Big Data in #Austria
 
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
Heiner Oberkampf: Semantics for Integrated Analytical Laboratory Processes – ...
 
SKOS as a key element in Enterprise Linked Data Strategies
SKOS as a key element in Enterprise Linked Data StrategiesSKOS as a key element in Enterprise Linked Data Strategies
SKOS as a key element in Enterprise Linked Data Strategies
 
Lieke Verhelst: Ontology Development ..the Lean way
Lieke Verhelst: Ontology Development ..the Lean wayLieke Verhelst: Ontology Development ..the Lean way
Lieke Verhelst: Ontology Development ..the Lean way
 
ODINE - Open Data Incubator Europe
ODINE - Open Data Incubator EuropeODINE - Open Data Incubator Europe
ODINE - Open Data Incubator Europe
 
SKOS - Some Use Cases
SKOS - Some Use CasesSKOS - Some Use Cases
SKOS - Some Use Cases
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Achim Steinacker: Technical Documentation in the age of Industry 4.0
Achim Steinacker: Technical Documentation in the age of Industry 4.0Achim Steinacker: Technical Documentation in the age of Industry 4.0
Achim Steinacker: Technical Documentation in the age of Industry 4.0
 
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
Julien Gonçalves: Named entity recognition and disambiguation using an iterat...
 
Taxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingTaxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
第10讲顺序控制设计法与顺序功能图
第10讲顺序控制设计法与顺序功能图 第10讲顺序控制设计法与顺序功能图
第10讲顺序控制设计法与顺序功能图
 

Similar a David Baehrens: Large-Scale Patent Classification at the European Patent Office

II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
 
Metadata and Terminology Registries
Metadata and Terminology RegistriesMetadata and Terminology Registries
Metadata and Terminology RegistriesMarcia Zeng
 
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...Dr. Haxel Consult
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Amit Sheth
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceSusanna-Assunta Sansone
 
Nectar cloud workshop ndj 20110331.2
Nectar cloud workshop ndj 20110331.2Nectar cloud workshop ndj 20110331.2
Nectar cloud workshop ndj 20110331.2Nick Jones
 
Autonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformersAutonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformersPatrick Nicolas
 
Standards: awareness, information, education
Standards: awareness, information, educationStandards: awareness, information, education
Standards: awareness, information, educationSusanna-Assunta Sansone
 
de theory and practice of digital preservation
de theory and practice of digital preservationde theory and practice of digital preservation
de theory and practice of digital preservationFIAT/IFTA
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataBarry Smith
 
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy ProjectCosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy ProjectIntland Software GmbH
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...ICZN
 
TSSG Security research unit May11_zdooly
TSSG Security research unit May11_zdoolyTSSG Security research unit May11_zdooly
TSSG Security research unit May11_zdoolyzdooly
 

Similar a David Baehrens: Large-Scale Patent Classification at the European Patent Office (20)

II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
 
FAIR: standards and services
FAIR: standards and servicesFAIR: standards and services
FAIR: standards and services
 
Metadata and Terminology Registries
Metadata and Terminology RegistriesMetadata and Terminology Registries
Metadata and Terminology Registries
 
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
 
FAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and NeuroscienceFAIR and metadata standards - FAIRsharing and Neuroscience
FAIR and metadata standards - FAIRsharing and Neuroscience
 
Nectar cloud workshop ndj 20110331.2
Nectar cloud workshop ndj 20110331.2Nectar cloud workshop ndj 20110331.2
Nectar cloud workshop ndj 20110331.2
 
PoolParty Platform 2013
PoolParty Platform 2013PoolParty Platform 2013
PoolParty Platform 2013
 
Autonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformersAutonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformers
 
Standards: awareness, information, education
Standards: awareness, information, educationStandards: awareness, information, education
Standards: awareness, information, education
 
Taxonomy Fundamentals - SLA 2014
Taxonomy Fundamentals - SLA 2014Taxonomy Fundamentals - SLA 2014
Taxonomy Fundamentals - SLA 2014
 
agriopenlink - summary
agriopenlink  - summary agriopenlink  - summary
agriopenlink - summary
 
de theory and practice of digital preservation
de theory and practice of digital preservationde theory and practice of digital preservation
de theory and practice of digital preservation
 
Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort Data
 
Document repositories-and-metadata
Document repositories-and-metadataDocument repositories-and-metadata
Document repositories-and-metadata
 
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy ProjectCosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
Cosylab | codeBeamer ALM as a Swiss Army Knife on a Particle Therapy Project
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
 
TSSG Security research unit May11_zdooly
TSSG Security research unit May11_zdoolyTSSG Security research unit May11_zdooly
TSSG Security research unit May11_zdooly
 

Más de Semantic Web Company

How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...Semantic Web Company
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AISemantic Web Company
 
Deep Text Analytics - How to extract hidden information and aboutness from text
Deep Text Analytics - How to extract hidden information and aboutness from textDeep Text Analytics - How to extract hidden information and aboutness from text
Deep Text Analytics - How to extract hidden information and aboutness from textSemantic Web Company
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemSemantic Web Company
 
Linking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured DataLinking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured DataSemantic Web Company
 
The Fast Track to Knowledge Engineering
The Fast Track to Knowledge EngineeringThe Fast Track to Knowledge Engineering
The Fast Track to Knowledge EngineeringSemantic Web Company
 
Leveraging Taxonomy Management with Machine Learning
Leveraging Taxonomy Management with Machine LearningLeveraging Taxonomy Management with Machine Learning
Leveraging Taxonomy Management with Machine LearningSemantic Web Company
 
PoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
PoolParty GraphSearch - The Fusion of Search, Recommendation and AnalyticsPoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
PoolParty GraphSearch - The Fusion of Search, Recommendation and AnalyticsSemantic Web Company
 
Semantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive ComputingSemantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive ComputingSemantic Web Company
 
PoolParty 6.0 - Climbing the Semantic Ladder
PoolParty 6.0 - Climbing the Semantic LadderPoolParty 6.0 - Climbing the Semantic Ladder
PoolParty 6.0 - Climbing the Semantic LadderSemantic Web Company
 
PoolParty Semantic Suite - Release 6.0 (Technical Overview)
PoolParty Semantic Suite - Release 6.0 (Technical Overview)PoolParty Semantic Suite - Release 6.0 (Technical Overview)
PoolParty Semantic Suite - Release 6.0 (Technical Overview)Semantic Web Company
 
PROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataPROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataSemantic Web Company
 
PoolParty Semantic Suite - Release 5.5
PoolParty Semantic Suite - Release 5.5PoolParty Semantic Suite - Release 5.5
PoolParty Semantic Suite - Release 5.5Semantic Web Company
 

Más de Semantic Web Company (20)

How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
How Enterprise Architecture & Knowledge Graph Technologies Can Scale Business...
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
 
Deep Text Analytics - How to extract hidden information and aboutness from text
Deep Text Analytics - How to extract hidden information and aboutness from textDeep Text Analytics - How to extract hidden information and aboutness from text
Deep Text Analytics - How to extract hidden information and aboutness from text
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
 
Linking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured DataLinking SharePoint Documents with Structured Data
Linking SharePoint Documents with Structured Data
 
The Fast Track to Knowledge Engineering
The Fast Track to Knowledge EngineeringThe Fast Track to Knowledge Engineering
The Fast Track to Knowledge Engineering
 
Semantic AI
Semantic AISemantic AI
Semantic AI
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
PoolParty Semantic Classifier
PoolParty Semantic ClassifierPoolParty Semantic Classifier
PoolParty Semantic Classifier
 
Leveraging Taxonomy Management with Machine Learning
Leveraging Taxonomy Management with Machine LearningLeveraging Taxonomy Management with Machine Learning
Leveraging Taxonomy Management with Machine Learning
 
Taxonomies put in the right place
Taxonomies put in the right placeTaxonomies put in the right place
Taxonomies put in the right place
 
PoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
PoolParty GraphSearch - The Fusion of Search, Recommendation and AnalyticsPoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
PoolParty GraphSearch - The Fusion of Search, Recommendation and Analytics
 
Semantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive ComputingSemantics as the Basis of Advanced Cognitive Computing
Semantics as the Basis of Advanced Cognitive Computing
 
Structured Content Meets Taxonomy
Structured Content Meets TaxonomyStructured Content Meets Taxonomy
Structured Content Meets Taxonomy
 
PoolParty 6.0 - Climbing the Semantic Ladder
PoolParty 6.0 - Climbing the Semantic LadderPoolParty 6.0 - Climbing the Semantic Ladder
PoolParty 6.0 - Climbing the Semantic Ladder
 
PoolParty Semantic Suite - Release 6.0 (Technical Overview)
PoolParty Semantic Suite - Release 6.0 (Technical Overview)PoolParty Semantic Suite - Release 6.0 (Technical Overview)
PoolParty Semantic Suite - Release 6.0 (Technical Overview)
 
PROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataPROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked Data
 
Taxonomy Quality Assessment
Taxonomy Quality AssessmentTaxonomy Quality Assessment
Taxonomy Quality Assessment
 
Taxonomy-Driven UX
Taxonomy-Driven UXTaxonomy-Driven UX
Taxonomy-Driven UX
 
PoolParty Semantic Suite - Release 5.5
PoolParty Semantic Suite - Release 5.5PoolParty Semantic Suite - Release 5.5
PoolParty Semantic Suite - Release 5.5
 

Último

Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 

Último (20)

Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 

David Baehrens: Large-Scale Patent Classification at the European Patent Office

  • 1. David Baehrens Large-Scale Patent Classification at the European Patent Office
  • 2. ABOUT AVERBIS Founded: 2007 Location: Freiburg im Breisgau Team: Domain & IT-Experts Focus: Leverage structured & unstructured information Current Sectors: Pharma, Health, Automotive, Publishers & Libraries
  • 3. PORTFOLIO Solutions Libraries PharmaPatentsHealthcare Social Media Terminology Management Text Mining Search & Analytics NoSQL Categorization & Clustering Automotive
  • 4. TERMINOLOGY MANAGEMENT Terminology management software Provision of terminologies Mappings between terminologies Building terminology-based applications
  • 5. Synonyms: dimethyl sulfoxide, dimethylsulfoxide, Domoso, Infiltrina Hierarchies: cancer, carcinoma, melanoma, lymphoma, glioblastoma… Patterns: dates, citations, mail addresses… Rule-based extraction of all different kinds of complex information Persons, Locations, Genes, …. Coocurrences, Typed Relations, e.g. Genes / Diseases / Modification Type TEXT MINING Term Detection Regular Expressions Rule Engine Named Entities Relations Sentences, Tokens, POS-Tags, Chunks, Paragraphs, Sections, Stemming, Decompounding…Syntax Detection
  • 6. RULE ENGINE 1. NAME OF THE MEDICINAL PRODUCT Desloratadine ratiopharm 5 mg film-coated tablets Primary Field Name Secondary Field Name Field Value MedicalProductName coveredText Desloratadine ratiopharm 5 mg film-coated tablets inventedPartName DESLORATADINE strengthPart 5 mg pharmaceuticalDoseFormPart FILM-COATED TABLET TextRegelErgebnis
  • 7. SEARCH & NOSQL Free text + concept based search Text mining integration Guided navigation / facets NoSQL functionalities Multi- & cross lingual search Related documents Based on Apache Solr • Extended Query Syntax • JSON-API • Scalability …
  • 10. INFORMATION DISCOVERY Terminology Management Text Mining Search & Analytics NoSQL Categorization & Clustering Delivery / Deployment / Runtime Environment Integration Tests / Continuous Integration Extensive Documentation Common Architecture / Application Design User & Role Management, Security Communication Bus Project Management
  • 11. PATENT CLASSIFICATION AT EPO Tender No. 1585 1) Pre-Classification of unpublished patents into departments 2) Re-Classification on published patents, if category system changes
  • 12. ABOUT EPO • The European Patent Office (EPO) grants European patents for the Contracting States to the European Patent Convention • Second largest intergovernmental institution in Europe • Not an EU institution • Self-financing, i.e. revenue from fees covers operating and capital expenditure
  • 13. NUMBER OF STAFF Status: December 2008
  • 16. COOPERATIVE PATENT CLASSIFICATION • Patent Classification System based on ECLA / IPC • jointly developed by the European Patent Office (EPO) and the United States Patent and Trademark Office (USPTO) • used by both the EPO and USPTO since 1 January 2013 • currently contains about 250.000 classes
  • 22. PATENT CLASSIFICATION AT EPO Tender No. 1585 1) Pre-Classification of unpublished patents into departments Our Motivation: • Great Classification Use-Case – Big Data (80 Mio. patents available) – Large Scale Category System >250.000 CPC codes – Tough classification quality and response time constraints • Text Mining Success Story
  • 23. OLD CLASSIFICATION PROCESS PATENTS CLA SSIFICATION DEPARTMENTS
  • 25. CLASSIFICATION PROCESS PATENTS CLA SSIFICATION DEPARTMENTS
  • 26. NEW CLASSIFICATION PROCESS PATENTS CLA SSIFICATION DEPARTMENTS
  • 27. SOME FACTS • about 650k training documents from 2005-2013 • supervised learning: light-weight and fast linear support vector machine • Training time (16 Cores, 128 GB RAM) – Feature Extraction: ~1 hour – Training of Classifiers: ~1 hour – 90/10 tests with a look-a-head of 3 levels and reporting 3 best candidates: ~1 hour • Prediction: 5 docs in 5 sec
  • 29. STATUS & OUTLOOK  Range-specific quality evaluation  Going live with best ranges • Continuous optimization
  • 30. PATENT CLASSIFICATION AT EPO Tender No. 1585 1) Re-Classification on published patents, if category system changes Challenges and Facts: – 250.000 CPC codes, regular changes/refinements – Several re-classification projects at any one time, great variation in size, a class is split into 5-20(?) subclasses – No training material available
  • 31. NEW RE-CLASSIFICATION PROCESS Training Data • Human Annotator starts labeling about 20% of the documents with new subclasses Statistical Models • are generated on-the-fly, and • Cross-validation test are carried out Threshold • If cross-validation achieves certain threshold (e.g. 90%), the remaining documents are classified fully automatically without further review • Otherwise, more training data is being generated
  • 32. STATUS & OUTLOOK  Currently in evaluation phase • Going live in the next weeks
  • 33. …NOT ONLY PATENTS Solutions Libraries PharmaPatentsHealthcare Social Media Terminology Management Text Mining Search & Analytics NoSQL Categorization & Clustering Automotive
  • 34. For further questions, please contact: David Baehrens  + 49 (0)761 203 97690  info@averbis.com