SlideShare una empresa de Scribd logo
1 de 42
ON BEYOND
KEYWORD SEARCH:
THE THINKING BEHIND
JSTOR LABS’ TEXT ANALYZER
NFAIS Webinar: Shifting Patterns in Search and Discovery
June 15, 2017
@abhumphreys
Alex Humphreys, JSTOR Labs
ITHAKA is a not-for-profit organization that helps the academic
community use digital technologies to preserve the scholarly record
and to advance research and teaching in sustainable ways.
JSTOR is a not-for-profit
digital library of academic
journals, books, and
primary sources.
Ithaka S+R is a not-for-profit
research and consulting
service that helps academic,
cultural, and publishing
communities thrive in the
digital environment.
Portico is a not-for-profit
preservation service for
digital publications, including
electronic journals, books,
and historical collections.
Artstor provides 2+ million
high-quality images and
digital asset management
software to enhance
scholarship and teaching.
JSTOR Labs works with partner publishers, libraries and
labs to create tools for researchers, teachers and students
that are immediately useful – and a little bit magical.
WHAT’S A
TEXT ANALYZER?
LET’S JUST START WITH A DEMO
www.jstor.org/analyze
WHAT’S IT
GOOD FOR?
SCHOLARS DOING LITERATURE REVIEWS
FINDING KEYWORDS IN UNFAMILIAR FIELDS
https://publish.illinois.edu/commonsknowledge/2017/04/04/spotlight-jstor-labs-text-analyzer/:
ESL RESEARCHERS FINDING KEYWORDS
OK, I GUESS IT’S KINDA COOL.
SO HOW’D YOU COME UP WITH
IT?
THIS IS HOW:
The Design Squiggle
Damien Newman:
http://cargocollective.com/central/The-Design-Squiggle/
THE SEED…
CONCEPT TESTING
ITERATING ON
INTERACTION
DESIGN
RELEASE AS JSTOR BETA
STILL A LONG WAY TO GO!
A BRIEF ASIDE
(OR YOU COULD CALL IT A RANT)
#devops is great.
#devops is great.
Can we please try
#userresearchproddesigndevopscustservice?
WHAT HAVE WE
LEARNED SO FAR?
Combining semantic indexing with
topic modeling can be powerful.
THREE STEPS FOR EACH SEARCH
• From many textual
formats (pdf, word,
html, etc.)
• OCR, if needed (e.g.
a picture of a page in
a magazine)
• Topics: JSTOR
Thesaurus & an LDA
Topic Model
• Entities: Alchemy
(Watson),
OpenCalais,
Stanford, Apache
• TF-IDF to select 5
terms
• “OR” search
• Relevance ranked
based on “equalizer”
1. Extract text 2. Identify terms 3. Generate results
WHERE DO THE TOPICS COME FROM?
• A controlled vocabulary containing +40,000 terms, representing
concepts (no entities, currently) found in the JSTOR corpus
• Constructed from 20 thesauri obtained from various sources, including
ERIC, MeSH, and NASA
• Developed in collaboration with Access Innovations
• Key branches in the thesaurus are reviewed and corrected by subject
matter experts
THE JSTOR THESAURUS
JSTOR THESAURUS
WHY THESE TOPICS?
AND, WHERE DID THEY COME FROM?
Human curated tagging rules have been developed for each concept in the
JSTOR Thesaurus enabling concepts to be extracted from unstructured
text
All documents in the JSTOR corpus have been tagged with thesaurus
concepts using a rules-based indexer
THESAURUS TAGGER
RULE BUILDER
WHY THESE TOPICS?
AND, WHERE DID THEY COME FROM?
This tagged corpus is then used to select training documents for building
an LDA topic model
The LDA topic model enables us to identify latent topics found in text in
addition to those explicitly identified with the human-generated rules
TOPIC MODEL
• Labeled LDA Topic model
• Model trained using documents
selected from JSTOR corpus
with tagged thesaurus concepts
• Using OSS Mallet tool
• Current version of model
includes approximately 11,000
topics
• Each topic represents a
distribution of word probabilities
redistricting district congressional minority political majority house legislative racial
gerrymandering court republican plan electoral districting seat representative black voter
democrat partisan election democratic representation line supreme legislature drawn control
population voting drawing policy texas draw map claim boundary following commission outcome
shaw race census legal principle creation decision create finding elect lublin polarization optimal
elected composition affect member measure vote gain previous legislator geographic southern
section every approach controlled round note gerrymander reapportionment compactness
decennial bipartisan constitutional find substantive california roll competitive county competition
party requirement federal north post redrawn incumbent criterion consequence likely formal safe
delegation georgia justice influence shotts equal favor might scholar equality south power law
judicial bias king carolina call according voss baker panel professor rule mandate creating
increased determine constraint politics argue standard redis grofman reno cain redrawing margin
share ing tricting decrease congress geographical requires simple held critic empirical david
niemi perverse latino analyze examine debate rather impact next provides give balance affected
subsequent possible take practice community robbins constitution computer evenly fraction
constituent illinois supporter shape responsiveness typically various proposed despite either
focus conclusion african opportunity redistrict mcdonald white numerous test statewide percent
suggests thus choice largely develop decade conclude fact four reached
Redistricting
district congressional congress house representative member federal districting seat majority
plan representation population congressman apportionment elected court president washington
columbia legislative census party interest political gerrymandering redistricting home thomas
affect every black democrat dis foley carolina find reapportionment constituency supreme
constitution voting geographic active dinner responsiveness south force john gingrich legislature
equal membership neighborhood testimony north james service decennial constituent passed
boundary law creation firm charles spending congruent election politically addition april contact
proportion con assistant position following york land unconstitutional resident miller voter pledge
stephen city official minority respective mainland kentucky post clause better divisor perimeter
yao secretary republican senate moderate congruence map county grant senior drawing portion
speaker feature decision professor became gerrymander swain trict leapfrog federalist partisan
senator vote captain compelling lucas candidate race create harm require fourth shape you
traditional purpose shaped concern people shaw historical simply policy henry david allocation
vetoed arkansas smiley serra carl volunteer politician budget burden electoral leaf education
reduced principle proximity november significant just represented second gathered fiorina
representa gressional glazer apportion gerrymandered boris bronx issn rank redrawing twice
refused eliminates provincial jefferson returned witness campaign fletcher georgia empirically
personnel size maximize half reserve read demographic percent contrary required determining
throughout …
Congressional districts
Top words from some sample topics
Keyword searching is great, but it ain’t
perfect. There’s more we can do for
users.
THANKS, DESIGN THINKING!
FOCUS ON A USER’S GOALS…
This article needs
to pass peer
review.
I need more sources
to back up my
argument.
I need to make sure
I’m not missing
anything.
THANKS, DESIGN THINKING!
…AND WHAT’S STANDING IN THEIR WAY
This research touches
on disciplines I’m
new to. How do I
know if I’m finding
everything?
I know what I’m
interested in, but
the search terms
I’m using aren’t
working.
Blergh, boolean
search is too
complicated.
THANKS, DESIGN THINKING!
UNDERSTAND THE USER’S CONTEXT
Hey, I’ve got my
first draft right
here.
At least I’ve found
ONE article I can
use.
All I have to work with is
the assignment my
teacher handed out.
I’m nowhere near
my laptop.
WHAT ARE WE
STILL LEARNING?
Can we improve the topic model & the
recommendations?
How can we embed this deeper within a
user’s workflow?
Is this a feature, a product or a
business?
Thank you
Alex Humphreys
Director, JSTOR Labs
ITHAKA
labs.jstor.org
@abhumphreys
alex.humphreys@ithaka.org
APPENDIX
(OPEN IN CASE OF
NO INTERNET CONNECTION)
(BUT THIS IS A BIT SILLY,
SINCE THIS IS A WEBINAR)
On Beyond Keyword Search: The Thinking Behind JSTOR Labs' Text Analyzer - NFAIS Webinar 2017
On Beyond Keyword Search: The Thinking Behind JSTOR Labs' Text Analyzer - NFAIS Webinar 2017
On Beyond Keyword Search: The Thinking Behind JSTOR Labs' Text Analyzer - NFAIS Webinar 2017

Más contenido relacionado

Similar a On Beyond Keyword Search: The Thinking Behind JSTOR Labs' Text Analyzer - NFAIS Webinar 2017

Creating a New Way to Search - CNI Fall 2017
Creating a New Way to Search - CNI Fall 2017Creating a New Way to Search - CNI Fall 2017
Creating a New Way to Search - CNI Fall 2017Alex Humphreys
 
Data for the Non-Data Librarian
Data for the Non-Data LibrarianData for the Non-Data Librarian
Data for the Non-Data LibrarianSAGE Publishing
 
Soci 4385 mcmullen fall15 cli
Soci 4385 mcmullen fall15 cliSoci 4385 mcmullen fall15 cli
Soci 4385 mcmullen fall15 cliciakov
 
Ethical challenges in the arts, humanities and social sciences: initial resea...
Ethical challenges in the arts, humanities and social sciences: initial resea...Ethical challenges in the arts, humanities and social sciences: initial resea...
Ethical challenges in the arts, humanities and social sciences: initial resea...C0pe
 
Research Strategies
Research StrategiesResearch Strategies
Research StrategiesEliot Boden
 
Information literacy
Information literacyInformation literacy
Information literacySean Socha
 
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxReading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxsodhi3
 
Research4C4U
Research4C4UResearch4C4U
Research4C4Uianmcnee
 
Data for the Humanities
Data for the HumanitiesData for the Humanities
Data for the Humanitieslibrarianrafia
 
Racism In Football Essay
Racism In Football EssayRacism In Football Essay
Racism In Football EssayRobin Chandler
 
ODR 2013 SDSkills dashboard umass
ODR 2013 SDSkills dashboard umassODR 2013 SDSkills dashboard umass
ODR 2013 SDSkills dashboard umassperspegrity5
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)UNCResearchHub
 
Data Citation Update
Data Citation UpdateData Citation Update
Data Citation UpdateJisc RDM
 
Information Literacy Orientation (Fall, 2011)
Information Literacy Orientation (Fall, 2011)Information Literacy Orientation (Fall, 2011)
Information Literacy Orientation (Fall, 2011)sbishoptcl
 
Geog 4311 historical geography
Geog 4311 historical geographyGeog 4311 historical geography
Geog 4311 historical geographyciakov
 
[Jaalouk, Vivas-Thomas] SR15 Poster
[Jaalouk, Vivas-Thomas] SR15 Poster[Jaalouk, Vivas-Thomas] SR15 Poster
[Jaalouk, Vivas-Thomas] SR15 PosterLuciana Jaalouk
 

Similar a On Beyond Keyword Search: The Thinking Behind JSTOR Labs' Text Analyzer - NFAIS Webinar 2017 (20)

Creating a New Way to Search - CNI Fall 2017
Creating a New Way to Search - CNI Fall 2017Creating a New Way to Search - CNI Fall 2017
Creating a New Way to Search - CNI Fall 2017
 
Dove, "A Model of the User's Psychological State as a Framework for Understan...
Dove, "A Model of the User's Psychological State as a Framework for Understan...Dove, "A Model of the User's Psychological State as a Framework for Understan...
Dove, "A Model of the User's Psychological State as a Framework for Understan...
 
Community needs assessment.pla_2014.handout
Community needs assessment.pla_2014.handoutCommunity needs assessment.pla_2014.handout
Community needs assessment.pla_2014.handout
 
Data for the Non-Data Librarian
Data for the Non-Data LibrarianData for the Non-Data Librarian
Data for the Non-Data Librarian
 
Soci 4385 mcmullen fall15 cli
Soci 4385 mcmullen fall15 cliSoci 4385 mcmullen fall15 cli
Soci 4385 mcmullen fall15 cli
 
Ethical challenges in the arts, humanities and social sciences: initial resea...
Ethical challenges in the arts, humanities and social sciences: initial resea...Ethical challenges in the arts, humanities and social sciences: initial resea...
Ethical challenges in the arts, humanities and social sciences: initial resea...
 
Research Strategies
Research StrategiesResearch Strategies
Research Strategies
 
Information literacy
Information literacyInformation literacy
Information literacy
 
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxReading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
 
Research4C4U
Research4C4UResearch4C4U
Research4C4U
 
Research4C4U
Research4C4UResearch4C4U
Research4C4U
 
Data for the Humanities
Data for the HumanitiesData for the Humanities
Data for the Humanities
 
Poli100q guide
Poli100q guidePoli100q guide
Poli100q guide
 
Racism In Football Essay
Racism In Football EssayRacism In Football Essay
Racism In Football Essay
 
ODR 2013 SDSkills dashboard umass
ODR 2013 SDSkills dashboard umassODR 2013 SDSkills dashboard umass
ODR 2013 SDSkills dashboard umass
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)
 
Data Citation Update
Data Citation UpdateData Citation Update
Data Citation Update
 
Information Literacy Orientation (Fall, 2011)
Information Literacy Orientation (Fall, 2011)Information Literacy Orientation (Fall, 2011)
Information Literacy Orientation (Fall, 2011)
 
Geog 4311 historical geography
Geog 4311 historical geographyGeog 4311 historical geography
Geog 4311 historical geography
 
[Jaalouk, Vivas-Thomas] SR15 Poster
[Jaalouk, Vivas-Thomas] SR15 Poster[Jaalouk, Vivas-Thomas] SR15 Poster
[Jaalouk, Vivas-Thomas] SR15 Poster
 

Más de Alex Humphreys

Creating Infrastructure for Teaching Text Analytics - ASIS&T 2020 Panel on In...
Creating Infrastructure for Teaching Text Analytics - ASIS&T 2020 Panel on In...Creating Infrastructure for Teaching Text Analytics - ASIS&T 2020 Panel on In...
Creating Infrastructure for Teaching Text Analytics - ASIS&T 2020 Panel on In...Alex Humphreys
 
Breaking Down Barriers to Higher Education in Prison: Access to Library Resou...
Breaking Down Barriers to Higher Education in Prison: Access to Library Resou...Breaking Down Barriers to Higher Education in Prison: Access to Library Resou...
Breaking Down Barriers to Higher Education in Prison: Access to Library Resou...Alex Humphreys
 
Expanding JSTOR's Support for Higher Education in Prison - NCHEP 2019
Expanding JSTOR's Support for Higher Education in Prison - NCHEP 2019Expanding JSTOR's Support for Higher Education in Prison - NCHEP 2019
Expanding JSTOR's Support for Higher Education in Prison - NCHEP 2019Alex Humphreys
 
Design Thinking, Digital Humanities and a Tool for Plant Humanists
Design Thinking, Digital Humanities and a Tool for Plant HumanistsDesign Thinking, Digital Humanities and a Tool for Plant Humanists
Design Thinking, Digital Humanities and a Tool for Plant HumanistsAlex Humphreys
 
Enabling New Methods of Discovery - Data Harmony Users Group
Enabling New Methods of Discovery - Data Harmony Users GroupEnabling New Methods of Discovery - Data Harmony Users Group
Enabling New Methods of Discovery - Data Harmony Users GroupAlex Humphreys
 
Enabling New Methods of Discovery - Digital Preservation Virtual Conference -...
Enabling New Methods of Discovery - Digital Preservation Virtual Conference -...Enabling New Methods of Discovery - Digital Preservation Virtual Conference -...
Enabling New Methods of Discovery - Digital Preservation Virtual Conference -...Alex Humphreys
 
Cultural History Baseball Cards: Flash-building a New Tool for Baseball Resea...
Cultural History Baseball Cards: Flash-building a New Tool for Baseball Resea...Cultural History Baseball Cards: Flash-building a New Tool for Baseball Resea...
Cultural History Baseball Cards: Flash-building a New Tool for Baseball Resea...Alex Humphreys
 
Text Analyzer - Previews Session at SSP 2018 Annual Meeting
Text Analyzer - Previews Session at SSP 2018 Annual MeetingText Analyzer - Previews Session at SSP 2018 Annual Meeting
Text Analyzer - Previews Session at SSP 2018 Annual MeetingAlex Humphreys
 
The Case for Applied Digital Humanities in Scholarly Communications
The Case for Applied Digital Humanities in Scholarly CommunicationsThe Case for Applied Digital Humanities in Scholarly Communications
The Case for Applied Digital Humanities in Scholarly CommunicationsAlex Humphreys
 
Your Chocolate, My Peanut Butter: JSTOR Labs' Content Mashups - NFAIS Webinar...
Your Chocolate, My Peanut Butter: JSTOR Labs' Content Mashups - NFAIS Webinar...Your Chocolate, My Peanut Butter: JSTOR Labs' Content Mashups - NFAIS Webinar...
Your Chocolate, My Peanut Butter: JSTOR Labs' Content Mashups - NFAIS Webinar...Alex Humphreys
 
How JSTOR Labs Applies (Some) Methods & Tools from Digital Scholarship - SSP ...
How JSTOR Labs Applies (Some) Methods & Tools from Digital Scholarship - SSP ...How JSTOR Labs Applies (Some) Methods & Tools from Digital Scholarship - SSP ...
How JSTOR Labs Applies (Some) Methods & Tools from Digital Scholarship - SSP ...Alex Humphreys
 
How JSTOR Labs Thinks about Change - German Studies Association 2017 Annual C...
How JSTOR Labs Thinks about Change - German Studies Association 2017 Annual C...How JSTOR Labs Thinks about Change - German Studies Association 2017 Annual C...
How JSTOR Labs Thinks about Change - German Studies Association 2017 Annual C...Alex Humphreys
 
Reimagining the Monograph - guest lecture at the Kluge Center of the Library ...
Reimagining the Monograph - guest lecture at the Kluge Center of the Library ...Reimagining the Monograph - guest lecture at the Kluge Center of the Library ...
Reimagining the Monograph - guest lecture at the Kluge Center of the Library ...Alex Humphreys
 
Reimagining the Monograph - AAUP 2017 Annual Meeting
Reimagining the Monograph - AAUP 2017 Annual MeetingReimagining the Monograph - AAUP 2017 Annual Meeting
Reimagining the Monograph - AAUP 2017 Annual MeetingAlex Humphreys
 
Introduction to JSTOR Labs: What We Do & How We Do It
Introduction to JSTOR Labs: What We Do & How We Do ItIntroduction to JSTOR Labs: What We Do & How We Do It
Introduction to JSTOR Labs: What We Do & How We Do ItAlex Humphreys
 
ACRL 2017: Unlocking the Value of the Monograph
ACRL 2017: Unlocking the Value of the MonographACRL 2017: Unlocking the Value of the Monograph
ACRL 2017: Unlocking the Value of the MonographAlex Humphreys
 
Building Your Next Great Product by Talking to Users Each Step of the Way
Building Your Next Great Product by Talking to Users Each Step of the WayBuilding Your Next Great Product by Talking to Users Each Step of the Way
Building Your Next Great Product by Talking to Users Each Step of the WayAlex Humphreys
 
Design Jam: Brainstorm Innovative Ideas by Focusing on the User - AAUP 2016
Design Jam: Brainstorm Innovative Ideas by Focusing on the User - AAUP 2016Design Jam: Brainstorm Innovative Ideas by Focusing on the User - AAUP 2016
Design Jam: Brainstorm Innovative Ideas by Focusing on the User - AAUP 2016Alex Humphreys
 
JSTOR Sustainabilty: Supporting Multidisciplinary Researchers
JSTOR Sustainabilty: Supporting Multidisciplinary ResearchersJSTOR Sustainabilty: Supporting Multidisciplinary Researchers
JSTOR Sustainabilty: Supporting Multidisciplinary ResearchersAlex Humphreys
 
Of Libraries and Labs: Effecting User-Driven Innovation
Of Libraries and Labs: Effecting User-Driven InnovationOf Libraries and Labs: Effecting User-Driven Innovation
Of Libraries and Labs: Effecting User-Driven InnovationAlex Humphreys
 

Más de Alex Humphreys (20)

Creating Infrastructure for Teaching Text Analytics - ASIS&T 2020 Panel on In...
Creating Infrastructure for Teaching Text Analytics - ASIS&T 2020 Panel on In...Creating Infrastructure for Teaching Text Analytics - ASIS&T 2020 Panel on In...
Creating Infrastructure for Teaching Text Analytics - ASIS&T 2020 Panel on In...
 
Breaking Down Barriers to Higher Education in Prison: Access to Library Resou...
Breaking Down Barriers to Higher Education in Prison: Access to Library Resou...Breaking Down Barriers to Higher Education in Prison: Access to Library Resou...
Breaking Down Barriers to Higher Education in Prison: Access to Library Resou...
 
Expanding JSTOR's Support for Higher Education in Prison - NCHEP 2019
Expanding JSTOR's Support for Higher Education in Prison - NCHEP 2019Expanding JSTOR's Support for Higher Education in Prison - NCHEP 2019
Expanding JSTOR's Support for Higher Education in Prison - NCHEP 2019
 
Design Thinking, Digital Humanities and a Tool for Plant Humanists
Design Thinking, Digital Humanities and a Tool for Plant HumanistsDesign Thinking, Digital Humanities and a Tool for Plant Humanists
Design Thinking, Digital Humanities and a Tool for Plant Humanists
 
Enabling New Methods of Discovery - Data Harmony Users Group
Enabling New Methods of Discovery - Data Harmony Users GroupEnabling New Methods of Discovery - Data Harmony Users Group
Enabling New Methods of Discovery - Data Harmony Users Group
 
Enabling New Methods of Discovery - Digital Preservation Virtual Conference -...
Enabling New Methods of Discovery - Digital Preservation Virtual Conference -...Enabling New Methods of Discovery - Digital Preservation Virtual Conference -...
Enabling New Methods of Discovery - Digital Preservation Virtual Conference -...
 
Cultural History Baseball Cards: Flash-building a New Tool for Baseball Resea...
Cultural History Baseball Cards: Flash-building a New Tool for Baseball Resea...Cultural History Baseball Cards: Flash-building a New Tool for Baseball Resea...
Cultural History Baseball Cards: Flash-building a New Tool for Baseball Resea...
 
Text Analyzer - Previews Session at SSP 2018 Annual Meeting
Text Analyzer - Previews Session at SSP 2018 Annual MeetingText Analyzer - Previews Session at SSP 2018 Annual Meeting
Text Analyzer - Previews Session at SSP 2018 Annual Meeting
 
The Case for Applied Digital Humanities in Scholarly Communications
The Case for Applied Digital Humanities in Scholarly CommunicationsThe Case for Applied Digital Humanities in Scholarly Communications
The Case for Applied Digital Humanities in Scholarly Communications
 
Your Chocolate, My Peanut Butter: JSTOR Labs' Content Mashups - NFAIS Webinar...
Your Chocolate, My Peanut Butter: JSTOR Labs' Content Mashups - NFAIS Webinar...Your Chocolate, My Peanut Butter: JSTOR Labs' Content Mashups - NFAIS Webinar...
Your Chocolate, My Peanut Butter: JSTOR Labs' Content Mashups - NFAIS Webinar...
 
How JSTOR Labs Applies (Some) Methods & Tools from Digital Scholarship - SSP ...
How JSTOR Labs Applies (Some) Methods & Tools from Digital Scholarship - SSP ...How JSTOR Labs Applies (Some) Methods & Tools from Digital Scholarship - SSP ...
How JSTOR Labs Applies (Some) Methods & Tools from Digital Scholarship - SSP ...
 
How JSTOR Labs Thinks about Change - German Studies Association 2017 Annual C...
How JSTOR Labs Thinks about Change - German Studies Association 2017 Annual C...How JSTOR Labs Thinks about Change - German Studies Association 2017 Annual C...
How JSTOR Labs Thinks about Change - German Studies Association 2017 Annual C...
 
Reimagining the Monograph - guest lecture at the Kluge Center of the Library ...
Reimagining the Monograph - guest lecture at the Kluge Center of the Library ...Reimagining the Monograph - guest lecture at the Kluge Center of the Library ...
Reimagining the Monograph - guest lecture at the Kluge Center of the Library ...
 
Reimagining the Monograph - AAUP 2017 Annual Meeting
Reimagining the Monograph - AAUP 2017 Annual MeetingReimagining the Monograph - AAUP 2017 Annual Meeting
Reimagining the Monograph - AAUP 2017 Annual Meeting
 
Introduction to JSTOR Labs: What We Do & How We Do It
Introduction to JSTOR Labs: What We Do & How We Do ItIntroduction to JSTOR Labs: What We Do & How We Do It
Introduction to JSTOR Labs: What We Do & How We Do It
 
ACRL 2017: Unlocking the Value of the Monograph
ACRL 2017: Unlocking the Value of the MonographACRL 2017: Unlocking the Value of the Monograph
ACRL 2017: Unlocking the Value of the Monograph
 
Building Your Next Great Product by Talking to Users Each Step of the Way
Building Your Next Great Product by Talking to Users Each Step of the WayBuilding Your Next Great Product by Talking to Users Each Step of the Way
Building Your Next Great Product by Talking to Users Each Step of the Way
 
Design Jam: Brainstorm Innovative Ideas by Focusing on the User - AAUP 2016
Design Jam: Brainstorm Innovative Ideas by Focusing on the User - AAUP 2016Design Jam: Brainstorm Innovative Ideas by Focusing on the User - AAUP 2016
Design Jam: Brainstorm Innovative Ideas by Focusing on the User - AAUP 2016
 
JSTOR Sustainabilty: Supporting Multidisciplinary Researchers
JSTOR Sustainabilty: Supporting Multidisciplinary ResearchersJSTOR Sustainabilty: Supporting Multidisciplinary Researchers
JSTOR Sustainabilty: Supporting Multidisciplinary Researchers
 
Of Libraries and Labs: Effecting User-Driven Innovation
Of Libraries and Labs: Effecting User-Driven InnovationOf Libraries and Labs: Effecting User-Driven Innovation
Of Libraries and Labs: Effecting User-Driven Innovation
 

Último

Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.soniya singh
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Servicegwenoracqe6
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.soniya singh
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Standkumarajju5765
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663Call Girls Mumbai
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLimonikaupta
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Último (20)

Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 

On Beyond Keyword Search: The Thinking Behind JSTOR Labs' Text Analyzer - NFAIS Webinar 2017

  • 1. ON BEYOND KEYWORD SEARCH: THE THINKING BEHIND JSTOR LABS’ TEXT ANALYZER NFAIS Webinar: Shifting Patterns in Search and Discovery June 15, 2017 @abhumphreys Alex Humphreys, JSTOR Labs
  • 2. ITHAKA is a not-for-profit organization that helps the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways. JSTOR is a not-for-profit digital library of academic journals, books, and primary sources. Ithaka S+R is a not-for-profit research and consulting service that helps academic, cultural, and publishing communities thrive in the digital environment. Portico is a not-for-profit preservation service for digital publications, including electronic journals, books, and historical collections. Artstor provides 2+ million high-quality images and digital asset management software to enhance scholarship and teaching.
  • 3. JSTOR Labs works with partner publishers, libraries and labs to create tools for researchers, teachers and students that are immediately useful – and a little bit magical.
  • 5. LET’S JUST START WITH A DEMO www.jstor.org/analyze
  • 8. FINDING KEYWORDS IN UNFAMILIAR FIELDS https://publish.illinois.edu/commonsknowledge/2017/04/04/spotlight-jstor-labs-text-analyzer/:
  • 10. OK, I GUESS IT’S KINDA COOL. SO HOW’D YOU COME UP WITH IT?
  • 12. The Design Squiggle Damien Newman: http://cargocollective.com/central/The-Design-Squiggle/
  • 17. STILL A LONG WAY TO GO!
  • 18. A BRIEF ASIDE (OR YOU COULD CALL IT A RANT)
  • 20. #devops is great. Can we please try #userresearchproddesigndevopscustservice?
  • 22. Combining semantic indexing with topic modeling can be powerful.
  • 23. THREE STEPS FOR EACH SEARCH • From many textual formats (pdf, word, html, etc.) • OCR, if needed (e.g. a picture of a page in a magazine) • Topics: JSTOR Thesaurus & an LDA Topic Model • Entities: Alchemy (Watson), OpenCalais, Stanford, Apache • TF-IDF to select 5 terms • “OR” search • Relevance ranked based on “equalizer” 1. Extract text 2. Identify terms 3. Generate results
  • 24. WHERE DO THE TOPICS COME FROM? • A controlled vocabulary containing +40,000 terms, representing concepts (no entities, currently) found in the JSTOR corpus • Constructed from 20 thesauri obtained from various sources, including ERIC, MeSH, and NASA • Developed in collaboration with Access Innovations • Key branches in the thesaurus are reviewed and corrected by subject matter experts THE JSTOR THESAURUS
  • 26. WHY THESE TOPICS? AND, WHERE DID THEY COME FROM? Human curated tagging rules have been developed for each concept in the JSTOR Thesaurus enabling concepts to be extracted from unstructured text All documents in the JSTOR corpus have been tagged with thesaurus concepts using a rules-based indexer
  • 28. WHY THESE TOPICS? AND, WHERE DID THEY COME FROM? This tagged corpus is then used to select training documents for building an LDA topic model The LDA topic model enables us to identify latent topics found in text in addition to those explicitly identified with the human-generated rules
  • 29. TOPIC MODEL • Labeled LDA Topic model • Model trained using documents selected from JSTOR corpus with tagged thesaurus concepts • Using OSS Mallet tool • Current version of model includes approximately 11,000 topics • Each topic represents a distribution of word probabilities redistricting district congressional minority political majority house legislative racial gerrymandering court republican plan electoral districting seat representative black voter democrat partisan election democratic representation line supreme legislature drawn control population voting drawing policy texas draw map claim boundary following commission outcome shaw race census legal principle creation decision create finding elect lublin polarization optimal elected composition affect member measure vote gain previous legislator geographic southern section every approach controlled round note gerrymander reapportionment compactness decennial bipartisan constitutional find substantive california roll competitive county competition party requirement federal north post redrawn incumbent criterion consequence likely formal safe delegation georgia justice influence shotts equal favor might scholar equality south power law judicial bias king carolina call according voss baker panel professor rule mandate creating increased determine constraint politics argue standard redis grofman reno cain redrawing margin share ing tricting decrease congress geographical requires simple held critic empirical david niemi perverse latino analyze examine debate rather impact next provides give balance affected subsequent possible take practice community robbins constitution computer evenly fraction constituent illinois supporter shape responsiveness typically various proposed despite either focus conclusion african opportunity redistrict mcdonald white numerous test statewide percent suggests thus choice largely develop decade conclude fact four reached Redistricting district congressional congress house representative member federal districting seat majority plan representation population congressman apportionment elected court president washington columbia legislative census party interest political gerrymandering redistricting home thomas affect every black democrat dis foley carolina find reapportionment constituency supreme constitution voting geographic active dinner responsiveness south force john gingrich legislature equal membership neighborhood testimony north james service decennial constituent passed boundary law creation firm charles spending congruent election politically addition april contact proportion con assistant position following york land unconstitutional resident miller voter pledge stephen city official minority respective mainland kentucky post clause better divisor perimeter yao secretary republican senate moderate congruence map county grant senior drawing portion speaker feature decision professor became gerrymander swain trict leapfrog federalist partisan senator vote captain compelling lucas candidate race create harm require fourth shape you traditional purpose shaped concern people shaw historical simply policy henry david allocation vetoed arkansas smiley serra carl volunteer politician budget burden electoral leaf education reduced principle proximity november significant just represented second gathered fiorina representa gressional glazer apportion gerrymandered boris bronx issn rank redrawing twice refused eliminates provincial jefferson returned witness campaign fletcher georgia empirically personnel size maximize half reserve read demographic percent contrary required determining throughout … Congressional districts Top words from some sample topics
  • 30. Keyword searching is great, but it ain’t perfect. There’s more we can do for users.
  • 31. THANKS, DESIGN THINKING! FOCUS ON A USER’S GOALS… This article needs to pass peer review. I need more sources to back up my argument. I need to make sure I’m not missing anything.
  • 32. THANKS, DESIGN THINKING! …AND WHAT’S STANDING IN THEIR WAY This research touches on disciplines I’m new to. How do I know if I’m finding everything? I know what I’m interested in, but the search terms I’m using aren’t working. Blergh, boolean search is too complicated.
  • 33. THANKS, DESIGN THINKING! UNDERSTAND THE USER’S CONTEXT Hey, I’ve got my first draft right here. At least I’ve found ONE article I can use. All I have to work with is the assignment my teacher handed out. I’m nowhere near my laptop.
  • 34. WHAT ARE WE STILL LEARNING?
  • 35. Can we improve the topic model & the recommendations?
  • 36. How can we embed this deeper within a user’s workflow?
  • 37. Is this a feature, a product or a business?
  • 38. Thank you Alex Humphreys Director, JSTOR Labs ITHAKA labs.jstor.org @abhumphreys alex.humphreys@ithaka.org
  • 39. APPENDIX (OPEN IN CASE OF NO INTERNET CONNECTION) (BUT THIS IS A BIT SILLY, SINCE THIS IS A WEBINAR)

Notas del editor

  1. Better training data Pointing beyond jstor
  2. APIs, Chrome extensions, elsewhere within jstor UX