Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Semantic Technology 
in Publishing & Finance 
Triplestores and inference, applications in Finance, the GraphDB engine, 
te...
Outline 
• Introduction to Ontotext 
• Clients, cases 
• Text Mining, Media and Publishing Solution 
• SemTech application...
Ontotext 
• Information management company providing text analysis, 
data management and state-of-the-art semantic technol...
Interlinking Text and Data 
Semantic Technology in Publishing & Finance Oct 2014 #4
Semantic Annotation 
Bronchial Diseases 
pmid:17714090 
Clinical and experimental pharmacology … 
Semantic Technology in P...
Semantic Annotation 
Semantic Annotation goes far beyond 
tagging. It allows search using enrichment, 
linking and rules t...
What is RDF Good for? 
• Metadata-based content management 
– Metadata represents a re-usable result of content analytics ...
LOD: Growing Exponentially 
Linked Data Datasets 
27 43 89 162 
295 
822 
2,289 
2007 2008 2009 2010 2011 2012 2013 
• Jul...
How Does Inference Help? 
• Intelligent mapping of queries to data 
– This matters a lot when an application should query ...
Ontotext Technology Portfolio 
Semantic Technology in Publishing & Finance Oct 2014 #10
Outline 
• Introduction to Ontotext 
• Clients, cases 
• Text Mining, Media and Publishing Solution 
• SemTech application...
BBC: The Perfect Application 
Since year 2000 Semantic technology was striving for: 
• Pertinent applications, a really go...
Ontotext and BBC 
Profile 
• Mass media broadcaster founded in 1922 
• 23,000 employees and over 5 billion 
pounds in annu...
BBC: The Perfect Application (ctd) 
• The BBC’s FIFA World cup 2010 project was widely 
recognized as the best showcase fo...
Ontotext and AstraZeneca 
Profile 
• Global, Bio-pharma company 
• $28 billion in sales in 2012 
• $4 billion in R&D acros...
Context-based Disambiguation 
Semantic Technology in Publishing & Finance Oct 2014 #16
Ontotext and LMI 
Profile 
• Established in 1961 to enable federal 
Semantic Technology in Publishing & Finance 
agencies ...
Some of our clients 
#18 
The most 
popular 
financial 
newspaper 
Semantic Technology in Publishing & Finance Oct 2014
Outline 
• Introduction of Ontotext 
• Clients, cases 
• Text Mining, Media and Publishing Solution 
• SemTech application...
Publishing and Media Solution 
Semantic Technology in Publishing & Finance Oct 2014 #20
Solution Features 
• Dedicated solutions for media and publishing 
• Based on the Ontotext Semantic Platform 
• Mature imp...
#22 
Methodology 
Semantic Technology in Publishing & Finance Oct 2014
Architecture Overview 
Semantic Technology in Publishing & Finance Oct 2014 #23
Authoring 
Related assets – as you type 
Related entities and concepts 
Entity profiles and facts on the fly 
Create highe...
Contextual Authoring 
Semantic Technology in Publishing & Oct 2014 
Finance 
#25
Curation 
Continuous adaptation through editorial feedback 
Query driven publishing templates 
Dynamic re-purposing and re...
Example of Client Integrated Curation 
Semantic Technology in Publishing & Oct 2014 #27 
Finance
Example Curation Tool: PressAssociation 
Semantic Technology in Publishing & Finance Oct 2014 #28
Monitoring and Curation Curation Tool 
Semantic Technology in Publishing & Finance Oct 2014 #29
Continuous Adaptation 
Semantic Technology in Publishing & Finance Oct 2014 #30
Publishing 
Dynamic construction of products (e.g. topic 
pages) 
Personalized content streams 
Semantics driven trend and...
User Behavior Tracking 
perform 
comments 
votes 
posts 
preview 
read 
Article 
contains leads to 
read 
leads to 
previe...
Personalized Recommendations 
User Profile 
Behavioural 
and 
Contextual 
Simil ar ity Reads 
Semantic Technology in Publi...
#34 
Methodology 
Semantic Technology in Publishing & Finance Oct 2014
Methodology 
Semantic Technology in Publishing & Finance Oct 2014 #35
Methodology 
Semantic Technology in Publishing & Finance Oct 2014 #36
Methodology 
Semantic Technology in Publishing & Finance Oct 2014 #37
Complete Domain Ontology 
Semantic Technology in Publishing & Finance Oct 2014 #38
Example KB for 50 daily publications 
Semantic Technology in Publishing & Finance Oct 2014 #39
Methodology 
Semantic Technology in Publishing & Finance Oct 2014 #40
Design of Machine Learning Pipeline 
Semantic Technology in Publishing & Finance Oct 2014 #41
Outline 
• Introduction of Ontotext 
• Clients, cases 
• Text Mining, Media and Publishing Solution 
• SemTech application...
Discovering Suspicious Relationships 
Semantic Technology in Publishing & Finance Oct 2014 #43
• Have a database of locations, with part-of info 
• Have a database with companies, with dependencies 
• Define semantics...
Use Cases 
• Investigating networks of linked entities 
– As prerequisite for risk assessment and compliance research 
• R...
How: Semantic BI/Data-warehouses 
• Imagine integrated database, which allows querying 
across silted databases 
– E.g. bo...
Ontotext and top 3 Business Media 
Profile 
• Top 3 business media 
• Focused both on B2C publishing and B2B 
Semantic Tec...
Reference Projects: BCA/Euromoney 
• BCA/Euromoney Macroeconomics Reports 
– Implementation of the Euromoney Semantic Plat...
Ontotext and Euromoney 
Profile 
• Euromoney Institutional Investor PLC, the 
international online information and events ...
Wrap up 
• Ontotext has a full stack of semantic technologies 
• Triplestores combine beauties from NoSQL and SQL 
• Infer...
Próxima SlideShare
Cargando en…5
×

Semantic Technology in Publishing & Finance

1.114 visualizaciones

Publicado el

Triplestores and inference, applications in Finance, text-mining. Projects and solutions for financial media and publishers.
Keystone Industrial Panel, ISWC 2014, Riva del Garda, 18 Oct 2014.
Thanks to Atanas Kiryakov for this presentation, I just cut it to size.

  • Sé el primero en comentar

Semantic Technology in Publishing & Finance

  1. 1. Semantic Technology in Publishing & Finance Triplestores and inference, applications in Finance, the GraphDB engine, text-mining, projects and solutions for financial media and publishers Keystone Industrial Panel ISWC 2014, Riva del Garda, 18 Oct 2014 Semantic Technology in Publishing & Finance Oct 2014 #1
  2. 2. Outline • Introduction to Ontotext • Clients, cases • Text Mining, Media and Publishing Solution • SemTech applications in Finance • Wrap-up Semantic Technology in Publishing & Finance Oct 2014 #2
  3. 3. Ontotext • Information management company providing text analysis, data management and state-of-the-art semantic technology • 75 employees, head quartered in Sofia, Bulgaria • Sales presence in London, Washington DC, and Boston • Clients include BBC, AstraZeneca, US DoD, OUP, Wiley, Getty… • Over 400 person-years in R&D to create a one-stop shop for: – Content enrichment – Data management – Graph database engine • Open and standard compliant technology: – RDF(S), OWL, GATE, Sesame Semantic Technology in Publishing & Finance Oct 2014 #3
  4. 4. Interlinking Text and Data Semantic Technology in Publishing & Finance Oct 2014 #4
  5. 5. Semantic Annotation Bronchial Diseases pmid:17714090 Clinical and experimental pharmacology … Semantic Technology in Publishing & Finance #5 umls:C0035204 COPD Respiration Disorders umls:C0006261 Chronic Obstructive Airway Diseases Asthma umls:C000496 Ian A Yang Oct 2014
  6. 6. Semantic Annotation Semantic Annotation goes far beyond tagging. It allows search using enrichment, linking and rules to return explicit and implicit results – complete intelligence. Semantic Technology in Publishing & Finance Content Enrichment • Text Mining & Classification • Curation • Quality Monitoring Data Management • Ontologies and Semantic Annotation • Web mining • Identity Resolution Graph Database • Standards Based • 24-7 Resiliency • Hybrid Semantic Queries & Search Oct 2014 #6
  7. 7. What is RDF Good for? • Metadata-based content management – Metadata represents a re-usable result of content analytics – It can be repurposed allowing for a wide range of applications – Most of the search engines do analytics, but the results are not explicit; so, they cannot be validated, refined and used by other applications • Linking text and structured data – Allows structured, uniform and efficient access to diverse domain models, taxonomies, dictionaries, reference databases • Reference data management – E.g. product catalogs and taxonomies that are too structured to be managed with NoSQL, but too diverse and interconnected for SQL • Using open linked data (LOD) – A growing amount and diverse public data can be used in enterprise Knowledge Management applications Semantic Technology in Publishing & Finance Oct 2014 #7
  8. 8. LOD: Growing Exponentially Linked Data Datasets 27 43 89 162 295 822 2,289 2007 2008 2009 2010 2011 2012 2013 • July 2013 stats: 2 289 datasets (http://stats.lod2.eu/) • Growing exponentially (see the dotted trend line) Semantic Technology in Publishing & Finance Oct 2014 #8
  9. 9. How Does Inference Help? • Intelligent mapping of queries to data – This matters a lot when an application should query a dataset combined from 10+ sources, which evolve independently – There is no way application developer can stay on top of all schemata and all datasets, all the time • Finding patterns and inferring new relationships – Think of someone constantly looking for patterns that elicit new relations, which can match patterns that elicit other relations … – Or someone who goes deeper and deeper into finding new ways to rewrite a query, over and over again, until all alternatives are exhausted • Get deeper results and more complete results • Cheaper data integration, easier querying Semantic Technology in Publishing & Finance Oct 2014 #9
  10. 10. Ontotext Technology Portfolio Semantic Technology in Publishing & Finance Oct 2014 #10
  11. 11. Outline • Introduction to Ontotext • Clients, cases • Text Mining, Media and Publishing Solution • SemTech applications in Finance • Wrap-up Semantic Technology in Publishing & Finance Oct 2014 #11
  12. 12. BBC: The Perfect Application Since year 2000 Semantic technology was striving for: • Pertinent applications, a really good use case • Real high-profile projects to prove its maturity The “Dynamic Semantic Publishing” architecture implemented by the BBC for its FIFA World cup 2010 web-site filled this gap! It demonstrates: • How RDF database serves well, where RDBMS fail to • How text-mining and triplestores complement one another • How inference adds value at a decent scale • 24/7 live operation that cannot work without a functional triplestore Semantic Technology in Publishing & Finance Oct 2014 #12
  13. 13. Ontotext and BBC Profile • Mass media broadcaster founded in 1922 • 23,000 employees and over 5 billion pounds in annual revenue. Goals • Create a dynamic semantic publishing platform that assembled web pages on-the-fly using a variety of data sources • Deliver highly relevant data to web site visitors with sub-second response Challenges • BBC journalists author and publish content which is then statistically rendered. The costs and time to do this were high. • Diverse content was difficult to navigate, content re-use was not flexible • User experience needed to be improved Semantic Technology in Publishing & Finance with relevant content "The goal is to be able to more easily and accurately aggregate content, find it and share it across many sources. From these simple relationships and building blocks you can dynamically build up incredibly rich sites and navigation on any platform." John O’Donovan Chief Technical Architect Oct 2014 #13
  14. 14. BBC: The Perfect Application (ctd) • The BBC’s FIFA World cup 2010 project was widely recognized as the best showcase for SemTech – It used OWLIM as a triplestore (chosen after a thorough evaluation) – It triggered a wave of adoption of the technology • The next milestone: London 2012 Olympic Games – The two most important websites used the DSP architecture: the official one, operated by Press Association, and the one of the BBC – Ontotext text-mining technology was used for content enrichment • Four years later this application pattern is still the best use case – And there are still no other triplestores that can survive such load, judging by the LDBC Semantic Publishing Benchmark, public information and feedback from the industry Semantic Technology in Publishing & Finance Oct 2014 #14
  15. 15. Ontotext and AstraZeneca Profile • Global, Bio-pharma company • $28 billion in sales in 2012 • $4 billion in R&D across three continents Goals • Efficient design of new clinical studies • Quick access to all of the data • Improved evidence based decision-making • Strengthen the knowledge feedback loop • Enable predictive science Challenges • Over 7,000 studies and 23,000 documents Semantic Technology in Publishing & Finance are difficult to obtain • Searches returning 1,000 – 10,000 results • Document repositories not designed for reuse • Tedious process to arrive at evidence based decisions Oct 2014 #15
  16. 16. Context-based Disambiguation Semantic Technology in Publishing & Finance Oct 2014 #16
  17. 17. Ontotext and LMI Profile • Established in 1961 to enable federal Semantic Technology in Publishing & Finance agencies • Specializes in logistics, financial, infrastructure & information management Goals • Unlock large collections of complex documents • Improve analyst productivity • Create an application they can sell to US Federal agencies Challenges • Analysts taking hours to find, download and search documents, using inaccurate keyword searches • Needed a knowledge base to search quickly and guide the analysts – highly relevant searches • Extracts knowledge from collection of documents • Uses GraphDB to intuitively search and filter • Knowledge base used to suggest searches • Hyper speed performance • Huge savings in analyst time • Accurate results Oct 2014 #17
  18. 18. Some of our clients #18 The most popular financial newspaper Semantic Technology in Publishing & Finance Oct 2014
  19. 19. Outline • Introduction of Ontotext • Clients, cases • Text Mining, Media and Publishing Solution • SemTech applications in Finance • Wrap-up Semantic Technology in Publishing & Finance Oct 2014 #19
  20. 20. Publishing and Media Solution Semantic Technology in Publishing & Finance Oct 2014 #20
  21. 21. Solution Features • Dedicated solutions for media and publishing • Based on the Ontotext Semantic Platform • Mature implementation and continuous adaptation methodology • Introducing advanced features to the authoring, editorial and publishing phases of content and data workflows Semantic Technology in Publishing & Finance Oct 2014 #21
  22. 22. #22 Methodology Semantic Technology in Publishing & Finance Oct 2014
  23. 23. Architecture Overview Semantic Technology in Publishing & Finance Oct 2014 #23
  24. 24. Authoring Related assets – as you type Related entities and concepts Entity profiles and facts on the fly Create higher value content at the same cost Semantic Technology in Publishing & Oct 2014 #24 Finance
  25. 25. Contextual Authoring Semantic Technology in Publishing & Oct 2014 Finance #25
  26. 26. Curation Continuous adaptation through editorial feedback Query driven publishing templates Dynamic re-purposing and reuse New publishing products with the same content Semantic Technology in Publishing & Oct 2014 #26 Finance
  27. 27. Example of Client Integrated Curation Semantic Technology in Publishing & Oct 2014 #27 Finance
  28. 28. Example Curation Tool: PressAssociation Semantic Technology in Publishing & Finance Oct 2014 #28
  29. 29. Monitoring and Curation Curation Tool Semantic Technology in Publishing & Finance Oct 2014 #29
  30. 30. Continuous Adaptation Semantic Technology in Publishing & Finance Oct 2014 #30
  31. 31. Publishing Dynamic construction of products (e.g. topic pages) Personalized content streams Semantics driven trend and user analytics Behavior driven personal asset streams Semantic Technology in Publishing & Finance Oct 2014 #31
  32. 32. User Behavior Tracking perform comments votes posts preview read Article contains leads to read leads to preview Search Action Result Date FTS Q. Tag Cat Tag set results cat taxonomy Search Log ------------- ------------- ------------- ------------- ------------- Semantic Technology in Publishing & Finance Oct 2014 #32
  33. 33. Personalized Recommendations User Profile Behavioural and Contextual Simil ar ity Reads Semantic Technology in Publishing & Finance Oct 2014 #33
  34. 34. #34 Methodology Semantic Technology in Publishing & Finance Oct 2014
  35. 35. Methodology Semantic Technology in Publishing & Finance Oct 2014 #35
  36. 36. Methodology Semantic Technology in Publishing & Finance Oct 2014 #36
  37. 37. Methodology Semantic Technology in Publishing & Finance Oct 2014 #37
  38. 38. Complete Domain Ontology Semantic Technology in Publishing & Finance Oct 2014 #38
  39. 39. Example KB for 50 daily publications Semantic Technology in Publishing & Finance Oct 2014 #39
  40. 40. Methodology Semantic Technology in Publishing & Finance Oct 2014 #40
  41. 41. Design of Machine Learning Pipeline Semantic Technology in Publishing & Finance Oct 2014 #41
  42. 42. Outline • Introduction of Ontotext • Clients, cases • Text Mining, Media and Publishing Solution • SemTech applications in Finance • Wrap-up Semantic Technology in Publishing & Finance Oct 2014 #42
  43. 43. Discovering Suspicious Relationships Semantic Technology in Publishing & Finance Oct 2014 #43
  44. 44. • Have a database of locations, with part-of info • Have a database with companies, with dependencies • Define semantics for the relevant relationships: – sub-region and control are transitive relationships – Located-in is transitive over sub-region • Define the semantics of suspicious relationships CONSTRUCT { ?orgA my:suspiciousLink ?orgB } WHERE { ?orgA ptop:locatedIn ?x ; fibo:controls ?y . ?y fibo:controls ?orgB ; ptop:locatedIn ?z . ?orgB ptop:locatedIn ?x . ?z a ptop:OffshoreZone . } What It Takes to Make It Work? Semantic Technology in Publishing & Finance Oct 2014 #44
  45. 45. Use Cases • Investigating networks of linked entities – As prerequisite for risk assessment and compliance research • Risk assessment – Tracing information about suspicious entities – Identifying risk-indicators across multiple sources – Identifying risks related to linked entities – Determining exposure against a group of linked entities • Compliance-related research – Fraud detection, insider trading, etc. • Searching in large policies and regulations Semantic Technology in Publishing & Finance – See Open Policy Oct 2014 #45
  46. 46. How: Semantic BI/Data-warehouses • Imagine integrated database, which allows querying across silted databases – E.g. bond market data vs. risk assessment vs. equity markets vs. M&A – A lot of duplicate data across various databases in different departments of banks, and data is simply not linked or organized in a unified data model • Benefits compared to the mainstream technology: – Lower cost of development and maintenance; – Direct benefit from industry standards, using inference – Real-time updates, unlike traditional data-warehousing, where updates should often be scheduled overnight – Support for a wide variety of analytical queries, which are far more flexible than traditional approaches Semantic Technology in Publishing & Finance Oct 2014 #46
  47. 47. Ontotext and top 3 Business Media Profile • Top 3 business media • Focused both on B2C publishing and B2B Semantic Technology in Publishing & Finance services Goals • Create a horizontal platform for both data and content based on semantics and serve all functionality through it Challenges • Critical part of the entire workflow • Multiple development projects in parallel with up to 2 months time between inception and go live • GraphDB used not only for data, but for content storage as well • Horizontal platform with focus on organizations, people, GPEs and relations between them • Automatic extraction of all these concepts and relationships • Separate stream of work for a user behavior based recommendation of relevant content and data across the entire media Oct 2014 #47
  48. 48. Reference Projects: BCA/Euromoney • BCA/Euromoney Macroeconomics Reports – Implementation of the Euromoney Semantic Platform • Automatically generate metadata about: – Markets, geo-political entities, economies, currencies, indicators, indices; – Themes of the report; – Economic and market conditions; – Views of the economist with horizon, focus of the view, and prediction; – Suggested trades of tradable objects (bonds, commodities, equities). • Semantic indices powering various services: – Live Charts – serving macro economics charts with the possibility to add additional data series/indices; – Macroeconomists dashboard of views with their objects, sentiment, horizon, and agreement/disagreement. Semantic Technology in Publishing & Finance Oct 2014 #48
  49. 49. Ontotext and Euromoney Profile • Euromoney Institutional Investor PLC, the international online information and events group Goals • Create a horizontal platform to serve 100 Semantic Technology in Publishing & Finance different publications • create a new publishing and information platform which would include the latest authoring, storing, and display technologies including, semantic annotation, search and a triple store repository Challenges • Different domains covered • Sophisticated content analytics incl. Relation, template and scenario extraction • Analytics of reports and news of various domains • Extraction of sophisticated macro economic views on markets and market conditions; trades, condition and trade horizons, assets, asset allocations, etc. • Multi-faceted search • Completely new content and data infrastructure Oct 2014 #49
  50. 50. Wrap up • Ontotext has a full stack of semantic technologies • Triplestores combine beauties from NoSQL and SQL • Inference fosters discovery in diverse dynamic data • GraphDB is in a league on its own: – Standard compliant – comprehensive support for OWL and SPARQL – Efficient inference through the entire life-cycle of the data – H igh-availability cluster architecture – proven and mature – FTS and NoSQL Connectors for seamless integration • End-to-end solution for Media and Publishing – Authoring, curation and publishing through adaptive text-mining • All the above proven with industry leaders Semantic Technology in Publishing & Finance Oct 2014 #50

×