A presentation on smart content (what it is, how it is produced, why it is useful, and its relevance to the future of scholarly publishing) for the Association of American Publishers Professional and Scholarly Publishing Pre-Conference in Washington, D.C. on 2012-02-01.
How AI, OpenAI, and ChatGPT impact business and software.
Smart Content AAP PSP 2012 02-01 rev 1
1. Smart Content
Bradley P. Allen
Elsevier Labs
Presentation to AAP PSP Pre-Conference 2012
Washington, D.C.
2012-02-01 (revised version of 2012-02-21)
2. What is smart content?
• Smart content is content structured to make it
easier to do your work
– Discover it faster
– Understand it better
– Integrate it more cheaply with other solutions
2
17. Methods for adding structure to content
Manual Automated
Annotation Curation Modeling Extraction
Classification Regression Clustering
Collaborative Topic Entity Relation
filtering Modeling Extraction Extraction
• Very mature, but • Variable degrees of maturity, but huge • Language-driven,
hard to scale strides through machine learning research so challenging to
• Crowdsourcing is and practical application on the consumer generalize and
a possible solution, Internet scale
but quality control • Data-driven, so the more data the better • Crucial to realize
is a challenge • Models can be used to build applications, promise of ease of
can be a new type of publication integration
17
18. Elsevier’s approach
• Embrace linked data principles while leveraging
our existing content production workflow and
infrastructure
– Find the right balance between production/QA and
online delivery
• Leverage partners for content enhancement and
knowledge organization
– Reuse Web-standard vocabularies, taxonomies,
ontologies and entity resources where possible
• Start with asset metadata and subjects
• Deliver benefits across the complementary use
cases of researcher and practitioner
18
19. What is linked data?
“Linked data is just a term for how to publish
data on the web while working with the web.
And the web is the best architecture we
know for publishing information in a hugely
diverse and distributed environment, in a
gradual and sustainable way.”
Jeni Tennison. 2010. Why Linked Data for data.gov.uk?
http://www.jenitennison.com/blog/node/140
19
20. The four principles of linked data
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up
those names
3. When someone looks up a URI, provide
useful information, using the standards
4. Include links to other URIs, so that they can
discover more things
Tim Berners-Lee. 2006. Linked Data
http://www.w3.org/DesignIssues/LinkedData.html
20
28. Challenges in publishing smart content
• URI and namespace management and
governance
• Globalization/localization of knowledge
organization systems
• Registries for resolving identity of named
entities for accreditation, provenance and
trust
28
29. What smart content means for the publishing business
“Our new knowledge does not consist of a
careful set of works that have passed through a
series of gates. … Our new knowledge is not
even a set of works. It is an infrastructure of
connection.”
David Weinberger. 2011. Too Big to Know: Rethinking Knowledge Now That
the Facts Aren't the Facts, Experts Are Everywhere, and the Smartest
Person in the Room Is the Room, Basic Books, New York, NY
29
30. The digital library is a horseless carriage
Print era: 1600s - Digital Library era: Platform-as-a-
1980 1980 – 2010s Service era: 2010s
• Packaged as • Packaged as • Packaged as
books and books and apps and APIs
articles articles • Digitally
• Physically • Digitally distributed
distributed distributed • Access and
• Access and • Access and discovery
discovery discovery through social
through through search networks
libraries engines
30
33. Content publishing is becoming business intelligence
Surajit Chaudhuri, Umeshwar Dayal, and Vivek Narasayya. 2011. An
overview of business intelligence technology. Commun. ACM 54, 8
(August 2011), 88-98. http://doi.acm.org/10.1145/1978542.1978562
33
36. Smart content is a bridge to the future of publishing
• Smart content allows publishers to create new
products and services through structuring
content for better discovery, insight and utility
– The value is in the structure, not the content
– Creating that structure is hard work
– The kind of hard work that publishers have
traditionally focused on
• New consumer Internet businesses are using
open source software and the cloud to add
structure to content today… quickly and on the
cheap
• Publishers and societies both large and small can
use the same techniques to follow suit
36
37. The skills to do this are increasingly accessible
37
38. Things you can do now
• Expose existing asset and subject metadata as
linked data in Web pages to aid discovery
– E.g. schema.org
• Embrace and extend authoritative ontologies
and repositories on the Web
• Collaborate in building needed authoritative
resources for identity resolution and metrics
– E.g. ORCID
38