SlideShare a Scribd company logo
1 of 11
“Extra” by Jeremy Brooks https://flic.kr/p/4aKH3c
An Update on EXTRA
Stuart Myles * Associated Press * 24th October 2016
© 2016 IPTC (www.iptc.org) All rights reserved
https://flic.kr/p/HMQ514
EXTRA
EXTraction Rules Apparatus
Rules-based classification of text
Open source software
EXTRA is being developed by the IPTC
Grant from the Digital News Initiative
https://iptc.github.io/extra/
© 2016 IPTC (www.iptc.org) All rights reserved 3
Google DNI
• Google’s €150 million Digital News Initiative fund
– Stimulate innovation among European news organizations
– https://www.digitalnewsinitiative.com/fund/
• Multiple funding rounds
– First funding of €27 million to projects in 23 countries
– http://googlepolicyeurope.blogspot.gr/2016/02/digital-news-initiative-
first-funding_24.html
• IPTC’s EXTRA project funded in first round - October 2015
– Developer to create the engine ~ €35,000
– Linguists to develop sample rules ~ €14,000
– Marketing to promote the work ~ €1,000
– Total grant to IPTC from DNI = €50,000
© 2016 IPTC (www.iptc.org) All rights reserved
EXTRA
EXTraction Rules Apparatus
• Open source
– IPTC always uses open licenses – in this case, the MIT license
• Rules-based
– Better for breaking news than statistical methods
– More consistent and scalable than hand tagging
– Easier to explain why rules classify content
• Multilingual
– Developing rules for two IPTC Media Topics Languages
• News classification
– Rules will be developed using news content corpora
© 2016 IPTC (www.iptc.org) All rights reserved 5
EXTRA Requirements
Weekly teleconferences and emails to document requirements
https://iptc.org/events/
https://groups.yahoo.com/neo/groups/iptc-extra/info
https://goo.gl/EY4pMP
– Use Cases
– Performance
– Internationalization and Character Encoding
– Rule Language Operators and Functions
– Input and Output Formats
– Hit and miss highlighting
– Relevance
– Machine Learning
– Sample Rules
© 2016 IPTC (www.iptc.org) All rights reserved 6
Seeking Developers
Know anyone who might be qualified to develop EXTRA?
Send them our way
https://goo.gl/nUGrGT
Qualifications?
Proposed technical approach?
Particular frameworks/languages/tools?
© 2016 IPTC (www.iptc.org) All rights reserved 7
Apache UIMA Ruta
UIMA - Unstructured Information Management Applications
Ruta - Rules-Based Text Annotation – consists of two parts:
1. Analysis Engine for executing the rules
2. Eclipse-based rule-writing workbench
https://uima.apache.org/ruta.html
Has many – but not all – of the features we require for EXTRA
UIMA has a reputation for a steep learning curve
ASF License is slightly more restrictive than MIT License
© 2016 IPTC (www.iptc.org) All rights reserved 8
Rules and News
Securing news corpora in two+ Media Topics languages
• English from Thomson Reuters
• German from APA
• French from AFP
• English+ from Signal http://research.signalmedia.co/
• Agreeing on licensing remains the stumbling block
© 2016 IPTC (www.iptc.org) All rights reserved 9
How Can You Get Involved?
In order of increasing effort (and potential reward):
1. Join the (low frequency) email list to stay up-to-date
https://groups.yahoo.com/neo/groups/iptc-extra/info
2. Suggest to someone they should apply to develop EXTRA
https://goo.gl/nUGrGT
3. Read and comment on the requirements
https://goo.gl/EY4pMP
4. Join the weekly teleconferences
https://iptc.org/events/
© 2016 IPTC (www.iptc.org) All rights reserved 10
Date and Place of Next Meeting
London, UK 15 – 17 May 2017
https://flic.kr/p/suXCVH
Danke und auf wiedersehen!
© 2016 IPTC (www.iptc.org) All rights reserved 11

More Related Content

What's hot

FutureTDM Symposium_DEMOS
FutureTDM Symposium_DEMOSFutureTDM Symposium_DEMOS
FutureTDM Symposium_DEMOSFutureTDM
 
OpenAIRE at the Open Access Tage 2010, Göttingen
OpenAIRE at the Open Access Tage 2010, GöttingenOpenAIRE at the Open Access Tage 2010, Göttingen
OpenAIRE at the Open Access Tage 2010, GöttingenOpenAIRE
 
The Regulation of Text and Data Mining
The Regulation of Text and Data MiningThe Regulation of Text and Data Mining
The Regulation of Text and Data MiningLIBER Europe
 
Green Light for Open Access: the European Commission
Green Light for Open Access: the European CommissionGreen Light for Open Access: the European Commission
Green Light for Open Access: the European CommissionJean-François Dechamp
 
OpenAIRE at WeNMR Kick-off meeting
OpenAIRE at WeNMR Kick-off meetingOpenAIRE at WeNMR Kick-off meeting
OpenAIRE at WeNMR Kick-off meetingOpenAIRE
 
Text and Data Mining and DG RTD (European Commission)
Text and Data Mining and DG RTD (European Commission)Text and Data Mining and DG RTD (European Commission)
Text and Data Mining and DG RTD (European Commission)Jean-François Dechamp
 

What's hot (7)

FutureTDM Symposium_DEMOS
FutureTDM Symposium_DEMOSFutureTDM Symposium_DEMOS
FutureTDM Symposium_DEMOS
 
OpenAIRE at the Open Access Tage 2010, Göttingen
OpenAIRE at the Open Access Tage 2010, GöttingenOpenAIRE at the Open Access Tage 2010, Göttingen
OpenAIRE at the Open Access Tage 2010, Göttingen
 
Cinema open data
Cinema open data Cinema open data
Cinema open data
 
The Regulation of Text and Data Mining
The Regulation of Text and Data MiningThe Regulation of Text and Data Mining
The Regulation of Text and Data Mining
 
Green Light for Open Access: the European Commission
Green Light for Open Access: the European CommissionGreen Light for Open Access: the European Commission
Green Light for Open Access: the European Commission
 
OpenAIRE at WeNMR Kick-off meeting
OpenAIRE at WeNMR Kick-off meetingOpenAIRE at WeNMR Kick-off meeting
OpenAIRE at WeNMR Kick-off meeting
 
Text and Data Mining and DG RTD (European Commission)
Text and Data Mining and DG RTD (European Commission)Text and Data Mining and DG RTD (European Commission)
Text and Data Mining and DG RTD (European Commission)
 

Similar to Update on IPTC's EXTRA Open Source Classification Engine

IPTC EXTRA - Open Source Rules Classification
IPTC EXTRA - Open Source Rules ClassificationIPTC EXTRA - Open Source Rules Classification
IPTC EXTRA - Open Source Rules ClassificationStuart Myles
 
MICO — Towards Contextual Media Analysis
MICO — Towards Contextual Media AnalysisMICO — Towards Contextual Media Analysis
MICO — Towards Contextual Media AnalysisThomas Kurz
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Project
 
IPTC EXTRA Rules Based Classification for News
IPTC EXTRA Rules Based Classification for NewsIPTC EXTRA Rules Based Classification for News
IPTC EXTRA Rules Based Classification for NewsStuart Myles
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspaperscneudecker
 
OpenNTF Overview DanNotes 11/23/11
OpenNTF Overview DanNotes 11/23/11OpenNTF Overview DanNotes 11/23/11
OpenNTF Overview DanNotes 11/23/11Niklas Heidloff
 
Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers
 
Fuzzing RTC @ Kamailio World 2019
Fuzzing RTC @ Kamailio World 2019Fuzzing RTC @ Kamailio World 2019
Fuzzing RTC @ Kamailio World 2019Lorenzo Miniero
 
UC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdf
UC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdfUC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdf
UC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdfWlamir Molinari
 
IPTC NewsCodes - Controlled Vocabularies for the News Media (EBU MDN Workshop...
IPTC NewsCodes - Controlled Vocabularies for the News Media (EBU MDN Workshop...IPTC NewsCodes - Controlled Vocabularies for the News Media (EBU MDN Workshop...
IPTC NewsCodes - Controlled Vocabularies for the News Media (EBU MDN Workshop...IPTC
 
The MRAA and UPM Middleware Libraries
The MRAA and UPM Middleware LibrariesThe MRAA and UPM Middleware Libraries
The MRAA and UPM Middleware LibrariesIntel® Software
 
IPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeIPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeStuart Myles
 
Flink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASFFlink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASFFabian Hueske
 
IPTC EXTRA Open Source Classification Workshop
IPTC EXTRA Open Source Classification WorkshopIPTC EXTRA Open Source Classification Workshop
IPTC EXTRA Open Source Classification WorkshopStuart Myles
 
DevSecCon London 2018: Is your supply chain your achille's heel
DevSecCon London 2018: Is your supply chain your achille's heelDevSecCon London 2018: Is your supply chain your achille's heel
DevSecCon London 2018: Is your supply chain your achille's heelDevSecCon
 
OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...
OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...
OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...Shane Coughlan
 

Similar to Update on IPTC's EXTRA Open Source Classification Engine (20)

IPTC EXTRA - Open Source Rules Classification
IPTC EXTRA - Open Source Rules ClassificationIPTC EXTRA - Open Source Rules Classification
IPTC EXTRA - Open Source Rules Classification
 
MICO — Towards Contextual Media Analysis
MICO — Towards Contextual Media AnalysisMICO — Towards Contextual Media Analysis
MICO — Towards Contextual Media Analysis
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
IPTC EXTRA Rules Based Classification for News
IPTC EXTRA Rules Based Classification for NewsIPTC EXTRA Rules Based Classification for News
IPTC EXTRA Rules Based Classification for News
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspapers
 
OpenNTF Overview DanNotes 11/23/11
OpenNTF Overview DanNotes 11/23/11OpenNTF Overview DanNotes 11/23/11
OpenNTF Overview DanNotes 11/23/11
 
H2020 ICT calls
H2020 ICT callsH2020 ICT calls
H2020 ICT calls
 
Sundance TULIPP Workshop at Nottingham Trent University
Sundance TULIPP Workshop at Nottingham Trent UniversitySundance TULIPP Workshop at Nottingham Trent University
Sundance TULIPP Workshop at Nottingham Trent University
 
Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013
 
Fuzzing RTC @ Kamailio World 2019
Fuzzing RTC @ Kamailio World 2019Fuzzing RTC @ Kamailio World 2019
Fuzzing RTC @ Kamailio World 2019
 
UC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdf
UC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdfUC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdf
UC18NA-D3D202-Dianomic-IZoratti-Introduction-To-FogLAMP.pdf
 
IPTC NewsCodes - Controlled Vocabularies for the News Media (EBU MDN Workshop...
IPTC NewsCodes - Controlled Vocabularies for the News Media (EBU MDN Workshop...IPTC NewsCodes - Controlled Vocabularies for the News Media (EBU MDN Workshop...
IPTC NewsCodes - Controlled Vocabularies for the News Media (EBU MDN Workshop...
 
The MRAA and UPM Middleware Libraries
The MRAA and UPM Middleware LibrariesThe MRAA and UPM Middleware Libraries
The MRAA and UPM Middleware Libraries
 
IPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeIPTC AGM 2018 Welcome
IPTC AGM 2018 Welcome
 
Flink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASFFlink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASF
 
Lime broker
Lime brokerLime broker
Lime broker
 
IPTC EXTRA Open Source Classification Workshop
IPTC EXTRA Open Source Classification WorkshopIPTC EXTRA Open Source Classification Workshop
IPTC EXTRA Open Source Classification Workshop
 
DevSecCon London 2018: Is your supply chain your achille's heel
DevSecCon London 2018: Is your supply chain your achille's heelDevSecCon London 2018: Is your supply chain your achille's heel
DevSecCon London 2018: Is your supply chain your achille's heel
 
OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...
OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...
OpenChain Webinar #58 - FOSS License Management through aliens4friends in Ecl...
 
SFScon 21 - Rafael Fernandez Font - Why we do open source
SFScon 21 - Rafael Fernandez Font - Why we do open sourceSFScon 21 - Rafael Fernandez Font - Why we do open source
SFScon 21 - Rafael Fernandez Font - Why we do open source
 

More from Stuart Myles

IPTC Rights Statements For News
IPTC Rights Statements For NewsIPTC Rights Statements For News
IPTC Rights Statements For NewsStuart Myles
 
IPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasIPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasStuart Myles
 
IPTC Board Spring 2019
IPTC Board Spring 2019IPTC Board Spring 2019
IPTC Board Spring 2019Stuart Myles
 
IPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceIPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceStuart Myles
 
Photomation or Fauxtomation?
Photomation or Fauxtomation?Photomation or Fauxtomation?
Photomation or Fauxtomation?Stuart Myles
 
Image Tagging at the Associated Press
Image Tagging at the Associated PressImage Tagging at the Associated Press
Image Tagging at the Associated PressStuart Myles
 
IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018Stuart Myles
 
How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?Stuart Myles
 
IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018Stuart Myles
 
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...Stuart Myles
 
Ap Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesAp Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesStuart Myles
 
IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018Stuart Myles
 
Sustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesSustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesStuart Myles
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...Stuart Myles
 
The Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorThe Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorStuart Myles
 
IPTC Approach to News in JSON
IPTC Approach to News in JSONIPTC Approach to News in JSON
IPTC Approach to News in JSONStuart Myles
 
IPTC News in JSON November 2017
IPTC News in JSON November 2017IPTC News in JSON November 2017
IPTC News in JSON November 2017Stuart Myles
 
IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017Stuart Myles
 
Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Stuart Myles
 
Credibility Schema Working Group
Credibility Schema Working GroupCredibility Schema Working Group
Credibility Schema Working GroupStuart Myles
 

More from Stuart Myles (20)

IPTC Rights Statements For News
IPTC Rights Statements For NewsIPTC Rights Statements For News
IPTC Rights Statements For News
 
IPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasIPTC New Taxonomies Ideas
IPTC New Taxonomies Ideas
 
IPTC Board Spring 2019
IPTC Board Spring 2019IPTC Board Spring 2019
IPTC Board Spring 2019
 
IPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceIPTC Spring 2019 Conference
IPTC Spring 2019 Conference
 
Photomation or Fauxtomation?
Photomation or Fauxtomation?Photomation or Fauxtomation?
Photomation or Fauxtomation?
 
Image Tagging at the Associated Press
Image Tagging at the Associated PressImage Tagging at the Associated Press
Image Tagging at the Associated Press
 
IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018
 
How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?
 
IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018
 
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
 
Ap Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesAp Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and Challenges
 
IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018
 
Sustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesSustaining Television News Technical Challenges
Sustaining Television News Technical Challenges
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
 
The Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorThe Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing Director
 
IPTC Approach to News in JSON
IPTC Approach to News in JSONIPTC Approach to News in JSON
IPTC Approach to News in JSON
 
IPTC News in JSON November 2017
IPTC News in JSON November 2017IPTC News in JSON November 2017
IPTC News in JSON November 2017
 
IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017
 
Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017
 
Credibility Schema Working Group
Credibility Schema Working GroupCredibility Schema Working Group
Credibility Schema Working Group
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Update on IPTC's EXTRA Open Source Classification Engine

  • 1. “Extra” by Jeremy Brooks https://flic.kr/p/4aKH3c
  • 2. An Update on EXTRA Stuart Myles * Associated Press * 24th October 2016 © 2016 IPTC (www.iptc.org) All rights reserved https://flic.kr/p/HMQ514
  • 3. EXTRA EXTraction Rules Apparatus Rules-based classification of text Open source software EXTRA is being developed by the IPTC Grant from the Digital News Initiative https://iptc.github.io/extra/ © 2016 IPTC (www.iptc.org) All rights reserved 3
  • 4. Google DNI • Google’s €150 million Digital News Initiative fund – Stimulate innovation among European news organizations – https://www.digitalnewsinitiative.com/fund/ • Multiple funding rounds – First funding of €27 million to projects in 23 countries – http://googlepolicyeurope.blogspot.gr/2016/02/digital-news-initiative- first-funding_24.html • IPTC’s EXTRA project funded in first round - October 2015 – Developer to create the engine ~ €35,000 – Linguists to develop sample rules ~ €14,000 – Marketing to promote the work ~ €1,000 – Total grant to IPTC from DNI = €50,000 © 2016 IPTC (www.iptc.org) All rights reserved
  • 5. EXTRA EXTraction Rules Apparatus • Open source – IPTC always uses open licenses – in this case, the MIT license • Rules-based – Better for breaking news than statistical methods – More consistent and scalable than hand tagging – Easier to explain why rules classify content • Multilingual – Developing rules for two IPTC Media Topics Languages • News classification – Rules will be developed using news content corpora © 2016 IPTC (www.iptc.org) All rights reserved 5
  • 6. EXTRA Requirements Weekly teleconferences and emails to document requirements https://iptc.org/events/ https://groups.yahoo.com/neo/groups/iptc-extra/info https://goo.gl/EY4pMP – Use Cases – Performance – Internationalization and Character Encoding – Rule Language Operators and Functions – Input and Output Formats – Hit and miss highlighting – Relevance – Machine Learning – Sample Rules © 2016 IPTC (www.iptc.org) All rights reserved 6
  • 7. Seeking Developers Know anyone who might be qualified to develop EXTRA? Send them our way https://goo.gl/nUGrGT Qualifications? Proposed technical approach? Particular frameworks/languages/tools? © 2016 IPTC (www.iptc.org) All rights reserved 7
  • 8. Apache UIMA Ruta UIMA - Unstructured Information Management Applications Ruta - Rules-Based Text Annotation – consists of two parts: 1. Analysis Engine for executing the rules 2. Eclipse-based rule-writing workbench https://uima.apache.org/ruta.html Has many – but not all – of the features we require for EXTRA UIMA has a reputation for a steep learning curve ASF License is slightly more restrictive than MIT License © 2016 IPTC (www.iptc.org) All rights reserved 8
  • 9. Rules and News Securing news corpora in two+ Media Topics languages • English from Thomson Reuters • German from APA • French from AFP • English+ from Signal http://research.signalmedia.co/ • Agreeing on licensing remains the stumbling block © 2016 IPTC (www.iptc.org) All rights reserved 9
  • 10. How Can You Get Involved? In order of increasing effort (and potential reward): 1. Join the (low frequency) email list to stay up-to-date https://groups.yahoo.com/neo/groups/iptc-extra/info 2. Suggest to someone they should apply to develop EXTRA https://goo.gl/nUGrGT 3. Read and comment on the requirements https://goo.gl/EY4pMP 4. Join the weekly teleconferences https://iptc.org/events/ © 2016 IPTC (www.iptc.org) All rights reserved 10
  • 11. Date and Place of Next Meeting London, UK 15 – 17 May 2017 https://flic.kr/p/suXCVH Danke und auf wiedersehen! © 2016 IPTC (www.iptc.org) All rights reserved 11