Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

COAR Next Generation Repositories WG - Text mining and Recommender system stories

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 24 Anuncio

COAR Next Generation Repositories WG - Text mining and Recommender system stories

Descargar para leer sin conexión

One of the key aims of the COAR NGR group is to help us to overcome the challenges that still make it difficult to move beyond repositories as document silos. The group wants to see a globally interoperable network of repositories and global services built on top of repositories fulfilling the expectations of users in the 21st century. During this talk, I will address two use cases the COAR NGR working group aims to enable: text and data mining and recommender systems.

One of the key aims of the COAR NGR group is to help us to overcome the challenges that still make it difficult to move beyond repositories as document silos. The group wants to see a globally interoperable network of repositories and global services built on top of repositories fulfilling the expectations of users in the 21st century. During this talk, I will address two use cases the COAR NGR working group aims to enable: text and data mining and recommender systems.

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a COAR Next Generation Repositories WG - Text mining and Recommender system stories (20)

Anuncio

Más de petrknoth (20)

Más reciente (20)

Anuncio

COAR Next Generation Repositories WG - Text mining and Recommender system stories

  1. 1. User stories: Text and data mining & Recommender systems Petr Knoth
  2. 2. User story 1: Text and data mining (TDM)
  3. 3. Use story 1: What do we want to achieve? “As a human or machine user, I want to be able to mine text or data from the collective content of repositories”
  4. 4. Use story 1: What do we want to achieve? »Repositories as an enabling infrastructure for text and data mining of scholarly literature. »The TDM activity needs to happen outside of the repository »What data? ›Metadata and content! ›But also datasets & anonymous user-interaction data
  5. 5. Use story 1: Why is it important? “No one has the time to read all scientific literature nor even keep track of what is being created“
  6. 6. Use story 1: Why is it important? “With the recent advancements in AI, machines can read literature at scale”
  7. 7. Use story 1: Why is it important? “New discoveries by text mining research literature have already been simulated”
  8. 8. Use story 1: What can we apply it to? »Information retrieval systems »Recommender systems »Literature based discovery »Metadata enrichments (document types, subject classification, etc.) »Expert search »Exploratory search »Fact checking »Domain specific TDM (life science, agriculture, social science, etc.) »Research evaluation (e.g. semantometrics) »Cross-linking (papers<->patents, news, books, blogs, wikipedia, etc.) »Many others …
  9. 9. Use story 1: Why should repositories enable this? “TDM of research literature at scale could only be done by the privileged few.”
  10. 10. User story 1: Problem OA Repositories OA Journals Seamless & efficient access to the collective content from repositories for harvesting metadata and content. Mostly OAI-PMH »One of the focuses of CORE is to act as a bridge between the repository and the TDM professional.
  11. 11. User story 1: Problem OA Repositories OA Journals Seamless & efficient access to the collective content from repositories for harvesting metadata and content. »Why not OAI-PMH? ›slow and very inefficient for big repositories. ›Standardised for metadata transfer but not for content transfer. › Very difficult to represent the richness of metadata from a broad range of data providers. Mostly OAI-PMH
  12. 12. User story 1: Problem OA Repositories OA Journals Seamless & efficient access to the collective content from repositories for harvesting metadata and content. »Why not OAI-PMH? ›slow and very inefficient for big repositories. ›Standardised for metadata transfer but not for content transfer. › Very difficult to represent the richness of metadata from a broad range of data providers. Mostly OAI-PMH ResourceSync
  13. 13. User Story 1: Examples of what is happening OA Repositories OA Journals Key publishers (OA + hybrid OA) Publisher connector ResourceSync Mostly OAI-PMH OMTD-SHARE (over REST) A range of bespoke APIs + many others »CORE and OpenAIRE are content sources in the OpenMinTeD TDM platform (EU infrastructure project) being developed to enable the mining of scholarly literature.
  14. 14. Use story 1: Examples of what is happening »A lot of interest from both text analytics companies and researchers in scientific papers across a number of domains. »Workshops on mining scholarly literature at major conferences: WOSP, BIR, SBD, SWM, etc. »Bulk access to content number 1 barrier.
  15. 15. User story 2: Recommender systems
  16. 16. Use story 2: What do we want to achieve? “As a user, I want to receive recommendations about content (papers, datasets, software, people to follow, grant opportunities, methods, conferences, etc.) that is of interest to me, so I can continuously increase knowledge in my field.”
  17. 17. Use story 2: Why is it important? »Increase the accessibility of resources in repositories »Twice as often people access resources on CORE via its recommender system than via search.
  18. 18. Use story 2: Why is it important? »An essential glue to link related content from a global distributed repositories network.
  19. 19. Use story 2: Why is it important? »Basis for building social networking functionality over repositories content
  20. 20. Use story 2: Example (CORE recommender) »Recommendations from over 70 million metadata and over 8 million full text research papers. »Features: text, recency, metrics, quality of the records, etc. »Easy install in a repository (plugin for Eprints, but also applicable for Dspace, OJS, etc.) »All recommended content is OAOR2017 paper: https://arxiv.org/abs/1705.00578
  21. 21. Use story 2: What are the problems we are facing? »1. Personalised vs non-personalised
  22. 22. Use story 2: What are the problems we are facing? »Collaborative filtering (CB) vs Content-based filtering (CBF)
  23. 23. Use story 2: How does COAR aim to overcome them? »Repositories should: 1. Support voluntary global sign-on 2. Openly releasing anonymised user-interaction data, to enable the creation of effective research recommender systems in the future.
  24. 24. Conclusions »Text and Data mining from repositories: ›Repositories are a key infrastructure enabling TDM. Repositories should support effective means of harvesting. »Recommender systems: ›Key for increasing accessibility of content and developing social functionality in repositories. ›Need for global sign-on and interaction data. »COAR NGR WG wants to see repositories succeed. To achieve that, the repositories technology needs to be competitive with commercial offerings.

×