SlideShare a Scribd company logo
1 of 20
Unlocking the Data
in BBC News
ISKO Conference July 8th 2013
www.bbc.co.uk/news
moving to linked data
• moving from static HTML to dynamic,
responsive site
• introducing linked data to power content
aggregations around related topics
• starting to embed linked open data in every
page as RDFa
• using the IPTC rNews vocabulary to
describe contnet in a machine-readable way
impact on journalists
• annotating (“tagging”) content
with topics
• tool embedded into existing
CMS
• concept extraction/NLP for
topic suggestion
• journalists accept/reject
suggested topics for
annotation
pilot - local indexes
learning from the pilot
• generally - it works
• but duplication for
big events
• also need pinning
• concept extraction
poor
• journalists gaming
the system
corenews model
pilot - publishing RDFa
• using RDFa + rNews to embed machine-
readable metadata in article source code
• discoverability: rich snippets + better
ranking
• publish Linked Open Data:
<articleURI> rdf:type rnews:Article
<articleURI> rnews:about <thingURI>
etc...
learning from the pilot
learning from the pilot
next steps
• rolling out tagging to journalists throughout
BBC News
• making better use of rNews/RDFa - full
mark-up integration
• piloting the use of organising content by
storylines
more info
• http://www.bbc.co.uk/blogs/internet/posts/News-L
• http://www.bbc.co.uk/ontologies/news/2013
-05-01.shtml
• jeremy.tarling@bbc.co.uk
• twitter: @jeremytarling
BBC News Labs
At ISKO
BBC News Labs
• Explore opportunities for BBC News
• Using real data
• Prototype quickly
• …which is normally hard in big Orgs…
Unlocking the Data in BBC News
• All we have is a bunch of articles...
• What does a “tagged” world looks like?
• The Juicer does [badly] what Journalists will do
1
Grab
BBC News
& Sport
Articles
2
Extract
Concepts
3
Match to
DBpedia
4
Annotate
Article
5
Push to
Triplestore
6
Expose
via
API
The News Juicer
Demo
• Juicer : http://staging.juicer.bbcnewslabs.co.uk/
• Person :
http://staging.juicer.bbcnewslabs.co.uk/demo/person?
q=Andy_Murray
• Place :
http://staging.juicer.bbcnewslabs.co.uk/demo/place?
q=Cheshire
• News Near Me :
http://newsnearme2.herokuapp.com/
Next
• “Juice” more of BBC Archive
• Build prototypes
• See what works
• Storyline : News Org Partnerships
More info
• http://www.bbc.co.uk/blogs/internet/posts/
BBC-News-Lab
• Matt.shearer@bbc.co.uk
• twitter: @completedespair
• @BBC_News_Labs
In case network blows up

More Related Content

What's hot

GYOD - Give Your Own Data (technical)
GYOD - Give Your Own Data (technical)GYOD - Give Your Own Data (technical)
GYOD - Give Your Own Data (technical)
Federico Stefani
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
Primal Pappachan
 

What's hot (12)

SexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial DataSexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial Data
 
Community-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraphCommunity-Driven Graphs with JanusGraph
Community-Driven Graphs with JanusGraph
 
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group
 
Cascalog
CascalogCascalog
Cascalog
 
SubSift web services and workflows for profiling and comparing scientists and...
SubSift web services and workflows for profiling and comparing scientists and...SubSift web services and workflows for profiling and comparing scientists and...
SubSift web services and workflows for profiling and comparing scientists and...
 
IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017
 
GYOD - Give Your Own Data (technical)
GYOD - Give Your Own Data (technical)GYOD - Give Your Own Data (technical)
GYOD - Give Your Own Data (technical)
 
Talis Platform: A Linked Data Engine
Talis Platform: A Linked Data EngineTalis Platform: A Linked Data Engine
Talis Platform: A Linked Data Engine
 
Implementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkImplementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache Flink
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scale
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
 
Data_Size_statistics
Data_Size_statisticsData_Size_statistics
Data_Size_statistics
 

Viewers also liked

Bbc news labs at yahoo! hack europe
Bbc news labs at yahoo! hack europeBbc news labs at yahoo! hack europe
Bbc news labs at yahoo! hack europe
BBC News Labs
 

Viewers also liked (6)

Bbc news labs at yahoo! hack europe
Bbc news labs at yahoo! hack europeBbc news labs at yahoo! hack europe
Bbc news labs at yahoo! hack europe
 
Rapid Prototyping - a good idea for Startups - battling the human condition
Rapid Prototyping - a good idea for Startups - battling the human conditionRapid Prototyping - a good idea for Startups - battling the human condition
Rapid Prototyping - a good idea for Startups - battling the human condition
 
The Newsroom of Things by BBC News Labs - for ISKOUK "Taming the News Beast"
The Newsroom of Things by BBC News Labs - for ISKOUK "Taming the News Beast"The Newsroom of Things by BBC News Labs - for ISKOUK "Taming the News Beast"
The Newsroom of Things by BBC News Labs - for ISKOUK "Taming the News Beast"
 
BBC JUICER API Presentation - for SeedHack 4.0 - BBC News Labs
BBC JUICER API Presentation - for SeedHack 4.0 - BBC News LabsBBC JUICER API Presentation - for SeedHack 4.0 - BBC News Labs
BBC JUICER API Presentation - for SeedHack 4.0 - BBC News Labs
 
BBC NEWS LABS - the story & the Juicer - for SeedHack 4.0
BBC NEWS LABS - the story & the Juicer - for SeedHack 4.0BBC NEWS LABS - the story & the Juicer - for SeedHack 4.0
BBC NEWS LABS - the story & the Juicer - for SeedHack 4.0
 
Fusion Lightening Talks - BBC News Labs - Matt Shearer on Innovation in Large...
Fusion Lightening Talks - BBC News Labs - Matt Shearer on Innovation in Large...Fusion Lightening Talks - BBC News Labs - Matt Shearer on Innovation in Large...
Fusion Lightening Talks - BBC News Labs - Matt Shearer on Innovation in Large...
 

Similar to BBC News Labs at ISKO Conference, UCL, London - July 2013

Similar to BBC News Labs at ISKO Conference, UCL, London - July 2013 (20)

Introducing linked data into BBC News online
Introducing linked data into BBC News onlineIntroducing linked data into BBC News online
Introducing linked data into BBC News online
 
IPTC Semantic Web 2012 Spring Working Group
IPTC Semantic Web 2012 Spring Working GroupIPTC Semantic Web 2012 Spring Working Group
IPTC Semantic Web 2012 Spring Working Group
 
Describing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.orgDescribing Theses and Dissertations Using Schema.org
Describing Theses and Dissertations Using Schema.org
 
SC1 Workshop 2 General Introduction to BDE
SC1 Workshop 2 General Introduction to BDESC1 Workshop 2 General Introduction to BDE
SC1 Workshop 2 General Introduction to BDE
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
Design patternsforiot
Design patternsforiotDesign patternsforiot
Design patternsforiot
 
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
 
IPTC Semantic Web Working Group 2011 Autumn Working Group
IPTC Semantic Web Working Group 2011 Autumn Working GroupIPTC Semantic Web Working Group 2011 Autumn Working Group
IPTC Semantic Web Working Group 2011 Autumn Working Group
 
Establishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNBEstablishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNB
 
Resume yanwen lin
Resume yanwen linResume yanwen lin
Resume yanwen lin
 
Using Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case studyUsing Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case study
 
BDE SC6 workshop - introduction 2016
BDE SC6 workshop - introduction 2016BDE SC6 workshop - introduction 2016
BDE SC6 workshop - introduction 2016
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutions
 
IPTC Semantic Web Working Group Autumn 2013
IPTC Semantic Web Working Group Autumn 2013IPTC Semantic Web Working Group Autumn 2013
IPTC Semantic Web Working Group Autumn 2013
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
 
BigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal PilotsBigDataEurope @BDVA Summit2016 2: Societal Pilots
BigDataEurope @BDVA Summit2016 2: Societal Pilots
 
Connecting DMPs & Repositories
Connecting DMPs & RepositoriesConnecting DMPs & Repositories
Connecting DMPs & Repositories
 
Planetdata simpda
Planetdata simpdaPlanetdata simpda
Planetdata simpda
 
PlanetData: Consuming Structured Data at Web Scale
PlanetData: Consuming Structured Data at Web ScalePlanetData: Consuming Structured Data at Web Scale
PlanetData: Consuming Structured Data at Web Scale
 
An Introduction to Semantic Web Technology
An Introduction to Semantic Web TechnologyAn Introduction to Semantic Web Technology
An Introduction to Semantic Web Technology
 

Recently uploaded

The political system of the united kingdom
The political system of the united kingdomThe political system of the united kingdom
The political system of the united kingdom
lunadelior
 
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
Faga1939
 
call girls inMahavir Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls inMahavir Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7call girls inMahavir Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls inMahavir Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
9953056974 Call Girls In Pratap Nagar, Escorts (Delhi) NCR
9953056974 Call Girls In Pratap Nagar, Escorts (Delhi) NCR9953056974 Call Girls In Pratap Nagar, Escorts (Delhi) NCR
9953056974 Call Girls In Pratap Nagar, Escorts (Delhi) NCR
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
hyt3577
 

Recently uploaded (20)

422524114-Patriarchy-Kamla-Bhasin gg.pdf
422524114-Patriarchy-Kamla-Bhasin gg.pdf422524114-Patriarchy-Kamla-Bhasin gg.pdf
422524114-Patriarchy-Kamla-Bhasin gg.pdf
 
The political system of the united kingdom
The political system of the united kingdomThe political system of the united kingdom
The political system of the united kingdom
 
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
 
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
 
declarationleaders_sd_re_greens_theleft_5.pdf
declarationleaders_sd_re_greens_theleft_5.pdfdeclarationleaders_sd_re_greens_theleft_5.pdf
declarationleaders_sd_re_greens_theleft_5.pdf
 
04052024_First India Newspaper Jaipur.pdf
04052024_First India Newspaper Jaipur.pdf04052024_First India Newspaper Jaipur.pdf
04052024_First India Newspaper Jaipur.pdf
 
06052024_First India Newspaper Jaipur.pdf
06052024_First India Newspaper Jaipur.pdf06052024_First India Newspaper Jaipur.pdf
06052024_First India Newspaper Jaipur.pdf
 
China's soft power in 21st century .pptx
China's soft power in 21st century   .pptxChina's soft power in 21st century   .pptx
China's soft power in 21st century .pptx
 
call girls inMahavir Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls inMahavir Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7call girls inMahavir Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls inMahavir Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Unveiling the Characteristics of Political Institutions_ A Comprehensive Anal...
Unveiling the Characteristics of Political Institutions_ A Comprehensive Anal...Unveiling the Characteristics of Political Institutions_ A Comprehensive Anal...
Unveiling the Characteristics of Political Institutions_ A Comprehensive Anal...
 
9953056974 Call Girls In Pratap Nagar, Escorts (Delhi) NCR
9953056974 Call Girls In Pratap Nagar, Escorts (Delhi) NCR9953056974 Call Girls In Pratap Nagar, Escorts (Delhi) NCR
9953056974 Call Girls In Pratap Nagar, Escorts (Delhi) NCR
 
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
 
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...
 
Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...
Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...
Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...
 
Dubai Call Girls Pinky O525547819 Call Girl's In Dubai
Dubai Call Girls Pinky O525547819 Call Girl's In DubaiDubai Call Girls Pinky O525547819 Call Girl's In Dubai
Dubai Call Girls Pinky O525547819 Call Girl's In Dubai
 
Job-Oriеntеd Courses That Will Boost Your Career in 2024
Job-Oriеntеd Courses That Will Boost Your Career in 2024Job-Oriеntеd Courses That Will Boost Your Career in 2024
Job-Oriеntеd Courses That Will Boost Your Career in 2024
 
Politician uddhav thackeray biography- Full Details
Politician uddhav thackeray biography- Full DetailsPolitician uddhav thackeray biography- Full Details
Politician uddhav thackeray biography- Full Details
 
05052024_First India Newspaper Jaipur.pdf
05052024_First India Newspaper Jaipur.pdf05052024_First India Newspaper Jaipur.pdf
05052024_First India Newspaper Jaipur.pdf
 
10052024_First India Newspaper Jaipur.pdf
10052024_First India Newspaper Jaipur.pdf10052024_First India Newspaper Jaipur.pdf
10052024_First India Newspaper Jaipur.pdf
 
*Navigating Electoral Terrain: TDP's Performance under N Chandrababu Naidu's ...
*Navigating Electoral Terrain: TDP's Performance under N Chandrababu Naidu's ...*Navigating Electoral Terrain: TDP's Performance under N Chandrababu Naidu's ...
*Navigating Electoral Terrain: TDP's Performance under N Chandrababu Naidu's ...
 

BBC News Labs at ISKO Conference, UCL, London - July 2013

  • 1. Unlocking the Data in BBC News ISKO Conference July 8th 2013
  • 3. moving to linked data • moving from static HTML to dynamic, responsive site • introducing linked data to power content aggregations around related topics • starting to embed linked open data in every page as RDFa • using the IPTC rNews vocabulary to describe contnet in a machine-readable way
  • 4. impact on journalists • annotating (“tagging”) content with topics • tool embedded into existing CMS • concept extraction/NLP for topic suggestion • journalists accept/reject suggested topics for annotation
  • 5. pilot - local indexes
  • 6. learning from the pilot • generally - it works • but duplication for big events • also need pinning • concept extraction poor • journalists gaming the system
  • 8. pilot - publishing RDFa • using RDFa + rNews to embed machine- readable metadata in article source code • discoverability: rich snippets + better ranking • publish Linked Open Data: <articleURI> rdf:type rnews:Article <articleURI> rnews:about <thingURI> etc...
  • 9.
  • 12. next steps • rolling out tagging to journalists throughout BBC News • making better use of rNews/RDFa - full mark-up integration • piloting the use of organising content by storylines
  • 13. more info • http://www.bbc.co.uk/blogs/internet/posts/News-L • http://www.bbc.co.uk/ontologies/news/2013 -05-01.shtml • jeremy.tarling@bbc.co.uk • twitter: @jeremytarling
  • 15. BBC News Labs • Explore opportunities for BBC News • Using real data • Prototype quickly • …which is normally hard in big Orgs…
  • 16. Unlocking the Data in BBC News • All we have is a bunch of articles... • What does a “tagged” world looks like? • The Juicer does [badly] what Journalists will do 1 Grab BBC News & Sport Articles 2 Extract Concepts 3 Match to DBpedia 4 Annotate Article 5 Push to Triplestore 6 Expose via API The News Juicer
  • 17. Demo • Juicer : http://staging.juicer.bbcnewslabs.co.uk/ • Person : http://staging.juicer.bbcnewslabs.co.uk/demo/person? q=Andy_Murray • Place : http://staging.juicer.bbcnewslabs.co.uk/demo/place? q=Cheshire • News Near Me : http://newsnearme2.herokuapp.com/
  • 18. Next • “Juice” more of BBC Archive • Build prototypes • See what works • Storyline : News Org Partnerships
  • 19. More info • http://www.bbc.co.uk/blogs/internet/posts/ BBC-News-Lab • Matt.shearer@bbc.co.uk • twitter: @completedespair • @BBC_News_Labs
  • 20. In case network blows up

Editor's Notes

  1. UK&apos;s most popular news website - 6 million unique browsers every day (3rd biggest site in the UK after Google and Facebook) publish around 500 articles every day - local, national global publish in 27 languages as World Service (+ 2 UK languages alongside English) hundreds of journalists, many working cross-media (TV/radio/online)
  2. articles created in a home-grown Content Management System flat page publishing via FTP - good for high load events but limits our UX and data potential migrating to a dynamic publishing platform typical three-tier architecture: presentation – service – data data layer is a content store (MarkLogic) + a triple store (Bigowlim) that holds annotations made by journalists about content in the content store
  3. need to minimize impact on journalists integration with existing tools and workflow as much as possible tagging rather than semantic annotation suggest concepts rather than free-hand annotation Sheffield University’s GATE framework for Natural Language Processing, identify the ‘things’ in an article use the concepts in the triple store as a data dictionary jiurnalists should mostly just have to accept or reject tags
  4. pilot - can we automate the production of the 58 local news region sub-index pages? (old transmitter locations) currently entirely manual task to maintain these pages GET articles about or mentioning places that fall within the BBC News region
  5. generally worked well – journalists tagging did not cause too much disruption, and we were able to generate aggregations of topic by concept BUT we saw some problems duplication where multiple articles were written about large events journalists wanted the ability to set the running order (defaults to chronologically most recent) quality of concept extraction was poor (may improve over time?) journalists gaming the system – adding tags to get on specific indexes, republishing to effect pinning
  6. - a simple ontology for people, organisations, places and intangibles (themes) and their intersection with events - based on rNews, the Event ontology and PA ’ s SNaP Stuff ontology - annotate articles with events, where the event:place is Birmingham etc.
  7. - IPTC rNews terms in RDFa - basic publishing metadata in the &lt;head&gt; for rich snippets - linked open data in the body
  8. - immediate results - rich snippets for articles - apparently better ranking by topic (anecdotal)
  9. - we introduced the change in the first week of May - by the end of may we were seeing some positive press coverage, people were noticing