Our Mission
Elsevier helps researchers and
healthcare professionals advance
science and improve health
outcomes for the benefit of society.
7
• 2,700+ digitized journals,
including The Lancet (1823)
and Cell
• 43,000+ eBook titles; including
iconic works: Gray's Anatomy.
• Since the year 2000, more than
99% of the Nobel Laureates in
science have published in
Elsevier journals
• 600k+ peer-reviewed articles in
2020 - 89% more than a decade
ago
Trusted in research and health for over 140 years
8
Trusted The future is open The innovation delta A better world At a glance
As a shared service, KD doesn’t go to market directly. We build
collaborative partnerships with products, and share objectives.
We help products grow by enabling:
1. Better Discovery experiences with Embeddings at scale
2. Access linked data more quickly with Structured Search
3. Increase engagement by using reusable Recommenders
Knowledge Discovery core services
KD enables Elsevier products to lead the market in academic discovery services
Research Process - Simplified
Discover
Find existing research and
experts to refine areas of
focus. Stay up to date.
Secure Funding
Publish
Establish in the system a
record the hypothesis and
conclusions of research.
Carefully document
Methods & Protocols
Assess
Evaluate personal academic
output, compare against
peers, compare institutions.
Get hired/promoted.
Research Process - Simplified
Discover
Find existing research and
Experts to refine areas of
focus. Stay up to date.
Secure Funding
Publish
Establish in the system a
record the hypothesis and
conclusions of research.
Carefully Document
Methods & Protocols
Assess
Evaluate personal
Academic output, compare
against peers,, compare
institutions. Get hired /
promoted.
Scopus
Editorially Curated
A&I database
The most trusted source for
measuring and assessing
academic output
Key Use Cases
• Find assess literature
• Assess my academic output
• Assess my institutions
academic output
• Find Experts
Structured Queries Use Cases
1.Find all Papers by Author
2.Find all Citations that reference a paper
3.Find all Metadata about a paper
Introducing the graph
Neo 4j – Solved our Structured Query problems allowing us to move away from
a search engine. Using Graph QL we are enabling data driven applications
throughout the portfolio
Our graph by the numbers
References
Grants
Works:
311M
Abstracts:
85M
Authors:
47M
Topics:
56K
Journals:
163K
Organizations:
8.8M
Use case 1: Find all Papers by Author
References
Grants
Works:
311M
Abstracts:
85M
Authors:
47M
Topics:
56K
Journals:
163K
Organizations:
8.8M
1
2
Use case 2: Find all Citations that reference a paper
References
Grants
Works:
311M
Abstracts:
85M
Authors:
47M
Topics:
56K
Journals:
163K
Organizations:
8.8M
1
2
Use case 3: Find all Metadata about a paper
References
Grants
Works:
311M
Abstracts:
85M
Authors:
47M
Topics:
56K
Journals:
163K
Organizations:
8.8M
1
2
2
2
2
2
2
Graphs help us build new product experiences
Scopus
Societal Impact
Article Sustainable Development Goals (SDGs)
Editorial Manager
Conflict of Interest
Find Reviewer
Scopus and ScienceDirect
Showcase my work
Author Profiles
ScienceDirect
Read Literature
Enhanced PDF Reader
Author Connections
ScienceDirect
Find and Assess Literature
Search Results
Citation counts on SERP / Profiles
Scopus
Societal Impact
Organization SDGs
Practically speaking, we can now take
the data that we have in the graph and
create a much more precise view of our
data. Combined with Embeddings we
can now get a much deeper
understanding of our Author profiles
• Are they really an expert in a field?
• Are they still working in this field?
• Have they changed fields?
More sophisticated ways to understand Experts
Accelerating Data and Analytics
PAGE RANK TO EVALUATE
ACADEMIC IMPACT
CONVENIENT AND EFFICIENT
SUPPORT FOR DATA SCIENCE
GRAPH DATA SCIENCE (GDS)
LIBRARIES FOR
EXPLORATIONAL
EXPERIMENTS
Where are we in our Graph Journey?
Evaluation
Neo4j was the best
performing Graph
DB on the market
Integration
Connected Graphs
to our data pipelines
with near real time
performance
Scaling
Ensuring that the
Graph can me our
performance and
scale requirements
Decision
Selected Enterprise
for current and future
projects
Accelerate
Solving existing and
new use cases
You are
here
Speaker Biographies
• Erik M. Schwartz
• Elsevier, 5 years
• e.schwartz@elsevier.com
• m. +44 (0) 7880 300319
• o. +44 (0) 2074 244309
• Erik has 25+ years of building search product
experiences before joining Elsevier with Convera,
FAST, Microsoft, Comcast
@Eschwaa
https://www.linkedin.com/in/eschwaa/
Notas del editor
Orange font on dark background
Good morning, everyone. My name is Erik Schwartz, and I am a knowledge discovery (KD) guy. Today, I would like to share my journey of building a knowledge graph and the lessons we have learned along the way.
In 1995, I built my first search application at a Navy research facility in Washington, D.C. The library where I worked was running out of space, so we started receiving academic journals in TIFF format on CD-ROMs. We created a digital library by OCRing the TIFF images and making them fully text searchable. That was the beginning of my journey into knowledge discovery.
The NRL is a historic research facility, credited with discovering RADAR by sending radio signals across the Patomac River and detecting passing ships . Seated across the river from the Reagan National Airport in Washington, DC, this iconic radar dish sits atop the building that holds the base commander and ther Library. In DC they lovingly refer to the dish as the world’s largest bird bath
The library was responsible for receiving journals in paper format for the researchers on the lab. The fundamental challenge that the library had was that they were out of physical space.
We would rip the images off of the disks, OCR’d them, wrapped them into PDFs, and made them fully text searchable.
A bit about me. After leaving the NRL, I worked for search engine companies, was acquired twice in 2007, and then spent 8 years at Comcast before coming over seas to London to change the search experiences at Elsevier
[Script:]
As it has for so many, this pandemic has brought a lot into focus. For the people at Elsevier, our mission has never been clearer. We help researchers and healthcare professionals advance science and improve health outcomes for the benefit of society. It is the scientists, the researchers and healthcare professionals who are leading us out of this global health crisis.
[Script:]
Ofcourse you know Elsevier as a publisher and the pace of research and knowledge creation is accelerating. Last year we published more than 600,000 peer-reviewed articles, 89% more than a decade ago. Every month, more than 18 millon users visit ScienceDirect®. In 2020 more than 1.6 billion articles were downloaded.
[Script:]
While our publishing continues to grow, Elsevier does much more than produce content. We combine Machine Learning and Natural Language Processing with vast quantities of quality structured data to help researchers, engineers and clinicians perform their work better. It’s this unique delta of data, analytics and evidence that’s taking us in exciting directions. They say that “innovation happens at the intersections.” For example in this cord graph we’re able to visualize the state research in artificial intelligence; to identify connections, relationships, emerging fields – the intersections of science.
Today, I work at Elsevier, which is part of RELX, one of four companies that make up the STM, Legal, Risk, and RX segments. In the STM segment, we provide three core services: text search, structured search, and recommenders. Our team serves A&G and our primary focus is to modernize Scopus, an A&I database containing enriched titles and abstracts for almost 90 million journal articles from Elsevier and hundreds of other publishers
Who we are and what we do. We support A&G products globally and at scale
Focused on 3 key ares: Search, Graph and Recommenders to grow products while aligned strategically with their outcomes
But let me tell you, the path to getting here has not been easy. Our team was faced with a daunting challenge - modernizing Scopus, an A&I database that contains enriched titles and abstracts for almost 90 million journal articles from Elsevier and hundreds of other publishers. Customers use it primarily to evaluate academic output and to find and assess literature.
Our search engine was receiving 750 billion requests per year, and 95% of those queries were structured queries. The primary objective of using a graph was to move those structured queries to a more suitable infrastructure, away from a search engine. And that's where the drama begins.
780Billion . ¾ of a trillion requests handled by our Search Engine per year
By Comparison, Google does about 8.5 Billion searches per day
95% of our requests our structured queries – these include requests like, give me all of the metadata a document, give me all of the information about an author, give me all of the information about my institution. This is supported today by almost 200 Nodes of Search Indexes (SOLR)
So why Neo4j? We wanted a graph so that we could solve for structured queries now and leverage graph relationships for KD in the future. Neo was the fastest graph database on the market for both ingest and query.
We built a Graph QL based system to handle structured queries. Our KD graph consists of the following services: ingestion, metrics service, taxonomy service, graph query service, and hydration. The graph data model consists of the relationships between the core entities in our academic literature, which include works (articles, books, and book chapters), abstracts, authors, topics, journals, and organizations.
The total number of relationships that we have in our graph connecting our core entities.
It starts with Works. Works are articles, books, book chapters. It’s the content that is the core to our business.
Associated with the Works are Abstracts. Not every article has an abstract but roughly 75% of all articles in Scopus have an abstract. This jumps to over 85% when we look at content published after 1985.
Authors are associate to a Work. As are Topics
Works belong to Journal.
Authors are affiliated with an organization. But there is an important temporal nature here. The association is with the organization at time of publication. This can change over time.
Grants are associate with Authors.
As you can see this graph now allows us to start answering some pretty interesting questions.
How much is a given topic worth?
What is the societal impact of an Organization?
What is this organization best at?
By adding embeddings of Abstracts, we can enable natural language and semantic representation to engage with this data model.
It starts with Works. Works are articles, books, book chapters. It’s the content that is the core to our business.
Associated with the Works are Abstracts. Not every article has an abstract but roughly 75% of all articles in Scopus have an abstract. This jumps to over 85% when we look at content published after 1985.
Authors are associate to a Work. As are Topics
Works belong to Journal.
Authors are affiliated with an organization. But there is an important temporal nature here. The association is with the organization at time of publication. This can change over time.
Grants are associate with Authors.
As you can see this graph now allows us to start answering some pretty interesting questions.
How much is a given topic worth?
What is the societal impact of an Organization?
What is this organization best at?
By adding embeddings of Abstracts, we can enable natural language and semantic representation to engage with this data model.
It starts with Works. Works are articles, books, book chapters. It’s the content that is the core to our business.
Associated with the Works are Abstracts. Not every article has an abstract but roughly 75% of all articles in Scopus have an abstract. This jumps to over 85% when we look at content published after 1985.
Authors are associate to a Work. As are Topics
Works belong to Journal.
Authors are affiliated with an organization. But there is an important temporal nature here. The association is with the organization at time of publication. This can change over time.
Grants are associate with Authors.
As you can see this graph now allows us to start answering some pretty interesting questions.
How much is a given topic worth?
What is the societal impact of an Organization?
What is this organization best at?
By adding embeddings of Abstracts, we can enable natural language and semantic representation to engage with this data model.
It starts with Works. Works are articles, books, book chapters. It’s the content that is the core to our business.
Associated with the Works are Abstracts. Not every article has an abstract but roughly 75% of all articles in Scopus have an abstract. This jumps to over 85% when we look at content published after 1985.
Authors are associate to a Work. As are Topics
Works belong to Journal.
Authors are affiliated with an organization. But there is an important temporal nature here. The association is with the organization at time of publication. This can change over time.
Grants are associate with Authors.
As you can see this graph now allows us to start answering some pretty interesting questions.
How much is a given topic worth?
What is the societal impact of an Organization?
What is this organization best at?
By adding embeddings of Abstracts, we can enable natural language and semantic representation to engage with this data model.
We have learned many lessons throughout our journey of building a knowledge graph. We have defined our metrics of success, which include expert finding use cases, using Page Rank as a new way to rank academic impact, providing convenient and efficient data support for data science work, and using graph data science libraries for explorational experiments.
New technology is hard, but graph thinking enables a new way of problem-solving. We have applied graph thinking to solve problems, such as conflicts of interest and user-curated organization hierarchies, and we have found success. We have also learned that combining hierarchies and taxonomies with graph data allows us to use user-curated organization hierarchies to detect conflicts of interest at various levels of organization structures.
We are setting up for success for the future. Conflicts of interest enable expert finding use cases. Graph QL and federated graphs enable acceleration for innovation. We are building hybrid recommenders leveraging our data.
In conclusion, our journey of building a knowledge graph has taught us many valuable lessons. We have defined our metrics of success, applied graph thinking to solve problems, and set up for success for the future. Thank you for listening.