David Leeming 67 BricksAI and machine learning has been generating a lot of attention over the past couple of years, but they still raise a lot of questions for our industry. How should publishers, librarians and researchers engage with these technologies? Are these technologies a threat to the current scholarly ecosystem or an opportunity? Can these technologies really help us drive the discovery and dissemination of research? How have these technologies already become an essential part of the scholarly ecosystem? After a short introduction to the concepts of AI and machine learning we will address these questions by engaging the audience in a live interactive demonstration in which we work together to train a machine learning algorithm to work with scholarly content. We will share areas of opportunity we have uncovered from our experiences of working with these technologies within the industry and discuss how publishers, librarians and researchers might work with these technologies to further advance the future of scholarly communication.
4. “
Artificial Intelligence
The use of computers to simulate human
intelligent behaviour in order to tackle
complex problems that are difficult to solve
using traditional computational approaches
5. Experience data Improve task performance
What is machine learning?
Where a computer programme’s performance in a task
improves with experience
6. “
“Those who can imagine anything,
can create the impossible.”
― Alan Turing
11. 1999 2019
Go outside (in the cold)……………………………………. Take out your phone (from anywhere)
Hail a cab……………………………………………………… Request an uber
Don’t know how long it will take………………………….. Know how long it will take
Don’t know your vehicle or what passengers think of
your driver……………………………………………………
Know your vehicle and passenger feedback ratings
about your driver
Enter a (mostly) dirty taxi………………………………… Enter a (mostly) clean car
Must converse with your driver about where you are
going, etc. …………………………………………………..
Not necessary to speak with your driver, it’s up to you
Upon arrival, pay driver via cash………………………. Upon arrival, payment including pre-determined tip
Ask for and wait for hand written receipt………………. Receipt automatically sent via email
THE TRANSPORTATION EXPERIENCE
12. 1999 2019
Research written up as human readable 'paper‘………. Research written up as human readable 'paper‘ (data
and video in early stages)
Painful non-user friendly submission process………… Painful non-user friendly submission process (mostly)
Subjective untrained human peer review process ……. Subjective untrained human peer review process
Research packaged in periodical journals (this
structure dates from the 17th century)……………………
Research packaged in periodical journals
Scientific research behind paywalls …………………….. Most scientific content still behind paywalls
Researchers value mostly determined by journal
impact factor……………………………………………….….
Researchers value still mostly determined by journal
impact factor
Powerful incentives to withhold results for months or
years until research is published………………………….
Powerful incentives to withhold results for months or
years until research is published (not all fields)
THE SCHOLALRY PUBLISHING EXPERIENCE
13.
14. “
AI will be “either the best, or the worst
thing, ever to happen to humanity”
— Stephen Hawking
20. AI opportunities: research
Ask questions
instead of
inputting search
terms
Identify
researchers and
institutions for
collaboration
Reveal trends
that are
important to
your work
Discover the
research you
should be
reading
Use images to
diagnose
medical
conditions
Improve the
quality of
science
Automated or
semi-automated
content creation
Understand the
links between
research objects
Discover
interdisciplinary
connections
Predicting
emerging
subject areas
Automated labs
Knowledge
graphs and
relationship
maps
Predict disease
and drug
combinations
Generate
scientific
hypothesises
Understand
grant and
funding trends
Faster time to
publication
21. AI opportunities: publishing
Predicting high
impact research
Fighting
plagiarism
Delivering
customised /
personalised
experiences
Helping
researchers
discover content
Delivering
knowledge
rather than
documents
Selling data to
machine
learning
companies
Automated or
semi-automated
content creation
Detecting
fraudulent image
manipulation or
data modification
Augmenting /
automating peer
review
Predicting
emerging
subject areas
Automating
internal
processes
Adaptive
learning
products
Identifying
flawed reporting
and statistics
Unlocking value
in legacy content
Improved and/or
automated
marketing
Creating
automated
content
collections
26. Artificial Intelligence and machine learning-
the opportunities
◎Excels at well defined tasks
◎Supports researchers, does not replace them
◎Cost effective in resource intensive activities
◎Provide new insights previously hidden away
◎Free up workers to do more productive jobs
◎Provides consumers with access to better
products and services
28. Insights
(predictive and
prescriptive)
Knowledge
(what has happened)
Personalisation
(personalised search and discovery
experience)
Smart ‘granular’ content
(improved search and discovery)
Raw content assets
(document based storage and access)
Low
High
High
Low
29.
30. Product development data maturity model
Level 1: Raw content assets 2: Smart ‘granular’ content 3: Personalisation 4: Knowledge 5: Insights
Store and manage Document level metadata
User access rights data
Granular content ‘chunk’
level metadata
Semantic fingerprints for
users
‘Integrated’ usage data
Data relationships (e.g. as
triples)
Large primary and
supplementary data sets
Extract and create Manual metadata creation
during editorial processes
Semantic fingerprints for
content items
User interest metadata
User ‘cohort’ analysis
Knowledge extraction - as
relationships
Predictions and
recommendations
Data use Access control
Document collections
Granular search
Faceted search
Proximity matching
Usage trending
Personalised proximity
matching
User type analytics
Powerful knowledge query
capabilities
Predictive and prescriptive
analytics
Product value Find and access
documents online
Enhanced search and
discovery
Slice and dice content
products
Personalised discovery
experience
Answer questions
Explain what has happened
Explain what will happen and
what to do about it
32. Summary
1. Increase in data, speed of processors and AI is rapidly
changing our world
2. We can use AI as an opportunity rather than a threat
3. Improving how researchers, librarians and publishers
discover and use content and data for better results
4. Better understanding of your content and data (data-
maturity) improves discovery and ability to remain
relevant
33.
34. David Leeming
Head of client services
67 Bricks
www.67bricks.com
David.leeming@67bricks.com
+44 (0) 7454 734401
@67bricks
Notas del editor
How many of you use AI today? – ask the audience for a show of hands. Expected that all do?
Things that use AI:
Personal assitsants in our home
Financial institutes to check on bank fraud
Google maps
Its not new thinking and has been going on since the days of AlanTuring and his intelligent machines. In terms of how I like this quote and I think this is what makes this technology exciting. There are probably things we thought previously were impossible which are now possible.
Single Jet engine - 20 TB per engine every hour
Twitter - 4.8 TB in a year
Single jet engine creates more data in an hour than twitter does in a year
The jet engine industry is rapidly changing as a result
GE started work on this problem to understand why some engines developed problems before others - they were able to solve this problem and in the process they have developed a whole new business - the new data business is more profitable than the original business of building engines
We have seen a 1 trillion fold increase in the power of computer processors since 1956
In 1969 the Apollo Guidance Computer managed to land astronauts on the moon - it was 100 times slower than a gameboy
Moore’s law states that the number of transistors in a processor doubles about every two years - this has held relatively true since it was first observed in 1965
We already have supercomputers that compute faster than the human brain and hold 10 times as much data
It is predicted that in the next few years we will have a $1,000 computer that is as fast as the human brain
More surprising is that by 2050 we expect to have a £1,000 computer that will exceed the processing power of the entire human race
We have not been very good at focusing on the user experience
An interesting contrast is the change in the transportation experience
Minimal change
It is time for scholarly publishing to evolve
We need to move out of the water and onto land
We need to understand what huge content and data assets that we have, we need to understand what is possible and we need to get better at using data
We need to get ready for the coming changes
Recent predictions claim that AI will contribute $15.7 trillion to the global economy by 2030. From self-driving cars, personal assistants and chatbots taking on roles as wide ranging as customer services, medicine, law and therapy - AI continues to generate hype, excitement and even fear. AI will clearly shape our future, but what will this future look like? And what - if anything - does it mean for scholarly publishing?
Stephen Hawking told us that “AI will be either the best, or the worst thing, ever to happen to humanity”. There are certainly some big issues to be tackled, including privacy, transparency, security, ethical concerns around training data containing human bias, and even what jobs the future holds for humans in an automated world. While we need to keep these factors in mind, I personally believe that AI - if applied appropriately and diligently in the right places - will bring more positives to scholarly publishing than negatives. Although each organisation is unique there are many shared problems to solve and areas of opportunity.
A test was carried out to see how an AI would compare to a human expert when diagnosing breast cancer
It turns out that the AI was substantially better at correctly identifying breast cancer from scans
What is interesting for us is that it was the data work that was the challenge in this activity
The hard work was to turn the scans into data that the computer could use
Google claims 99% accuracy in metastatic breast cancer detection
Interesting work done on Nature
Faced with mountains of image and audio data, researchers are turning to artificial intelligence to answer pressing ecological questions.
When researchers collect audio recordings of birds, they are usually listening for the animals’ calls. But conservation biologist Marc Travers is interested in the noise produced when a bird collides with a power line.
In 2011, Travers wanted to know how many of these collisions were occurring on the Hawaiian island of Kauai.
With some 600 hours of audio collected — a full 25 days’ worth — counting the laser blasts manually was impractical. So, Travers sent the audio files (as well as metadata, such as times and locations) to Conservation Metrics, a firm in Santa Cruz, California, that uses artificial intelligence (AI) to assist wildlife monitoring. The company’s software was able to detect the collisions automatically and, over the next several years, Travers’ team increased its data harvest to about 75,000 hours per field season.
Changes are happening
As part of a three-month trial, PA will generate thousands of localised variations of stories sourced by reporters, on subjects ranging from health and crime to housing and employment. It marks the next stage in PA’s RADAR (Reporters and Data and Robots) project, looking at how artificial intelligence can be leveraged in the newsroom.
Let’s take a look at some of the opportunities available to us when we have more data
The Nobel Prize in Physics 2018 was awarded “for groundbreaking inventions in the field of laser physics”
As a scholarly publisher with a back catalogue of laser content this is a great opportunity to share content on laser physics
At the moment you’d probably look in a particular journal, perform searches of your own content to find relevant things, or use keywords assigned either by the author or a production editor. You are likely to miss things. It would be nice if you could push a button and see all your content that fits into this category instantly.
demo : tripod
This is a quick demo to show you how AI can support this activity
All AI’s have a name - ours is called Al-bot Einstein, but unlike Albert Einstein he currently knows absolutely nothing.
So let’s train him...
Your ability to store, manage, create and use data to deliver value to your users
There are different ways to think about the impact of data maturity - how it will impact you and your organisation
We could consider how it will drive internal processes and increase efficiencies
We could think about how it will impact the submission process, the researcher experience to find resources
Primary data sources
Secondary or supplementary data sources
Let’s take a look at what data maturity is required to deliver these different levels of information product
As an example let’s consider a research deciding where to focus his next piece of research
1. Find, read content from multiple sources
2. Improved search and discovery
3. Personal recommendations - you might be interested in this
4. Which organisations are doing research in this area, what has been trending, who else is doing research in this area, what are the most important papers in this area
5. What are the trends going to be over the next 3 years, what factors might impact that the success of my research output, which institution should I apply to, who should I contact because they have just finished some research...
Increase in data, speed of processors and AI is rapidly changing our world
There are substantial drivers for change in scholarly publishing
Your data maturity will determine your ability to remain relevant in this new environment