SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
© Copyright 2013 LucidWorks
Solr Powered Libraries:
A survey of the world's knowledge bases
May 2, 2013
Presented by Erik Hatcher
Thursday, May 2, 13
© 2013 LucidWorks
Abstract
Using Apache Lucene and Solr search technologies, information and
knowledge have become vastly more searchable, findable, and accessible.
Because scholars and researchers are some of the most demanding users of
search systems, the problems encountered by the implementers are complex.
For example, many of the applications built on these technologies also thrive on
intentionally designed-in serendipitous discovery capabilities, bringing to light
previously unknown, yet related and potentially interesting, content.
Libraries and other public knowledge-sharing environments, such as
Wikipedia, generally embrace "open source" and community improving
contributions as core principles, making a lovely synergy with the power,
features, and community-driven ecosystem provided by Lucene and Solr.
This talk will introduce you to several Solr powered library-related systems,
detail how they work, and leave you with lessons learned that can be applied to
your applications.
2
Thursday, May 2, 13
© 2013 LucidWorks
Real Solar Powered Library !
•http://www.ktsm.com/news/texas-library-runs-sunshine
3
Thursday, May 2, 13
© 2013 LucidWorks
Card carrying library geek
•Applied Research in Patacriticism (ARP)
- Rossetti Archive: http://www.rossettiarchive.org
- NINES: http://www.nines.org/
- Collex: http://www.collex.org
•Blacklight
- originated as an implementation of Solr Flare
•Presentations
- http://code4lib.org/conference: 2007, 2009, 2010, 2011, 2013
- Library of Congress: "Solr Powered Libraries" (2007)
»http://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=4113
- EBTI/CBETA Conference 2008
- Publication: “Library 2.0 Initiatives in Academic Libraries”
•Windsor Lucene Summit
•eIFL-FOSS
4
Thursday, May 2, 13
© 2013 LucidWorks
Rossetti Archive
5
Thursday, May 2, 13
© 2013 LucidWorks
NINES/Collex
6
Thursday, May 2, 13
© 2013 LucidWorks
Card catalog
•the original inverted index
7
http://commons.wikimedia.org/wiki/File:Copyright_Card_Catalog_Files.jpg
Thursday, May 2, 13
© 2013 LucidWorks
•http://openlibrary.org/
- project of the Internet Archive
•Goal: "A (community editable) web page for every book"
8
Thursday, May 2, 13
© 2013 LucidWorks
dp.la - Digital Public Library of America
9
Lucene/ElasticSearch Powered
Thursday, May 2, 13
© 2013 LucidWorks
Wikimedia/Wikipedia/MediaWiki
•Solr powered: translation memory service, GeoData extension,
etc
•"heavily modified Lucene" powers main site search currently
10
Thursday, May 2, 13
© 2013 LucidWorks
HathiTrust
• "partnership of major research institutions and libraries working to ensure
that the cultural record is preserved and accessible long into the future."
• 10.5M books, 12TB OCR+metadata, hundreds of languages
- "Books are different"
- http://code4lib.org/conference/2013/burton-west
• http://www.hathitrust.org/blogs/large-scale-search
- http://www.hathitrust.org/blogs/large-scale-search/too-many-words
- "org.apache.solr.common.SolrException: Impossible Exception"
- CommonGrams
- word segmentation: autoGeneratePhraseQueries="false"
• HathiTrust Research Center
- The infrastructure includes an entrance portal, search and collection-building tools (using
Blacklight), ... analysis algorithms that can be run against the HathiTrust public domain corpus
(more than 3 million volumes). In addition to the production services, the HTRC offers a
development “sandbox”. The sandbox runs against non-Google scanned content (about
260,000 volumes) and provides a test-bed for interested researchers to experiment with writing
their own algorithms for use in the HTRC infrastructure.
11
Thursday, May 2, 13
© 2013 LucidWorks
Smithsonian Institution
•http://collections.si.edu
•Many disparate data sources:
- 19 museums, 20 libraries, 14 archives,1 National Zoo,1 Astrophysical
Observatory, research centers in Panama,Boston, New York, Maryland,and
Virginia
•"Documents" of all varieties:
- Photographs, paintings, manuscripts, letters, postage stamps,scientific
specimens, rockets, airplanes, postcards, sound recordings, posters,
decorative arts, ceramics, maps, sculptures, publication papers, books, trade
catalogs, etc
•User tagging, negative/exclude filtering, DIH SolrEntityProcessor
•http://bit.ly/13P41YJ
- http://www.basistech.com/pdf/events/open-source-search-conference/
oss-2011-wang-steps-toward-open-government.pdf
12
Thursday, May 2, 13
© 2013 LucidWorks
13
Thursday, May 2, 13
© 2013 LucidWorks
14
Thursday, May 2, 13
© 2013 LucidWorks
•SerialsSolutions Summon
•http://www.serialssolutions.com/en/services/summon
•SaaS, single unified index, match & merge
15
Thursday, May 2, 13
© 2013 LucidWorks
Astrophysics Data System Labs
•Smithsonian, NASA, Harvard
•http://adslabs.org
16
http://code4lib.org/conference/2013/luker
Thursday, May 2, 13
© 2013 LucidWorks
•vufind.org
•Powers main HathiTrust UI (currently) and many more
- see http://vufind.org/wiki/installation_status
17
Thursday, May 2, 13
© 2013 LucidWorks
18
Thursday, May 2, 13
© 2013 LucidWorks
• "Blacklight is an open source Ruby on Rails gem that provides a discovery interface for
any Solr index. Blacklight provides a default user interface which is customizable via the
standard Rails (templating) mechanisms. Blacklight accommodates heterogeneous
data, allowing different information displays for different types of objects."
- http://projectblacklight.org
• Founded at the University of Virginia (2007): search.lib.virginia.edu
- UV-A solar radiation == blacklight
• Initial contributors: UVa, Stanford, JHU, WGBH
• University of Hull, United States Holocaust Memorial Museum, University of Wisconsin-
Madison, Tufts, Australian gov't (Natural Resource Management), Penn State's
ScholarSphere, Northwestern, New York Public Library, NCSU, Columbia University,
Agriculture Network Information Center (USDA), alicelaw.org (American Legislative and
Issue Campaign Exchange, is a one-stop web-based public library of progressive state
and local laws), and many more
• http://projecthydra.org/ uses Blacklight as UI component
19
Thursday, May 2, 13
© 2013 LucidWorks
searchworks at Stanford
20
Thursday, May 2, 13
© 2013 LucidWorks
Advanced search at Stanford's searchworks
21
Thursday, May 2, 13
© 2013 LucidWorks
searchworks:
Mapping Text Boxes to Solr query pieces
•http://code4lib.org/conference/2010/dushay_keck
22
Thursday, May 2, 13
© 2013 LucidWorks
•https://catalyst.library.jhu.edu/
23
Thursday, May 2, 13
© 2013 LucidWorks
Rock and Roll!
•m/
24
Thursday, May 2, 13
© 2013 LucidWorks
Community and Resources
•code4lib:
- http://www.code4lib.org/
•HathiTrust folks
- http://www.hathitrust.org/blogs/large-scale-search
- http://robotlibrarian.billdueber.com/
•http://bighumanities.net/
- The Workshop on Big Humanities will be held in conjunction with the 2013
IEEE International Conference on Big Data (IEEE BigData 2013), which will
take place between 6-9 October 2013 in Silicon Valley, California, USA, and
which provides a leading international forum for disseminating the latest
research in the growing field of “big data
25
Thursday, May 2, 13
© 2013 LucidWorks
26
http://heatherbrewer.com/blog/2013/04/15/libraries-rock/
Thursday, May 2, 13

Más contenido relacionado

La actualidad más candente

Rightscaling, engagement, learning: reconfiguring the library for a network e...
Rightscaling, engagement, learning: reconfiguring the library for a network e...Rightscaling, engagement, learning: reconfiguring the library for a network e...
Rightscaling, engagement, learning: reconfiguring the library for a network e...
lisld
 
The Biodiversity Heritage Library and bibliographic citations: towards new u...
The Biodiversity Heritage Library and bibliographic citations: towards new u...The Biodiversity Heritage Library and bibliographic citations: towards new u...
The Biodiversity Heritage Library and bibliographic citations: towards new u...
Trish Rose-Sandler
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
OCLC
 
Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...
Chris Freeland
 
MichalkoLibrary Collaboration iSchoolShareable-1
MichalkoLibrary Collaboration iSchoolShareable-1MichalkoLibrary Collaboration iSchoolShareable-1
MichalkoLibrary Collaboration iSchoolShareable-1
MRJPM
 

La actualidad más candente (20)

International Digital Library Initiatives
International Digital Library InitiativesInternational Digital Library Initiatives
International Digital Library Initiatives
 
Rightscaling, engagement, learning: reconfiguring the library for a network e...
Rightscaling, engagement, learning: reconfiguring the library for a network e...Rightscaling, engagement, learning: reconfiguring the library for a network e...
Rightscaling, engagement, learning: reconfiguring the library for a network e...
 
Next Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital PlatformNext Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital Platform
 
Building the New Open Linked Library
Building the New Open Linked LibraryBuilding the New Open Linked Library
Building the New Open Linked Library
 
Working collaboratively: scaling infrastructure, services, learning and innov...
Working collaboratively: scaling infrastructure, services, learning and innov...Working collaboratively: scaling infrastructure, services, learning and innov...
Working collaboratively: scaling infrastructure, services, learning and innov...
 
The Biodiversity Heritage Library and bibliographic citations: towards new u...
The Biodiversity Heritage Library and bibliographic citations: towards new u...The Biodiversity Heritage Library and bibliographic citations: towards new u...
The Biodiversity Heritage Library and bibliographic citations: towards new u...
 
Challenges and opportunities for academic libraries
Challenges and opportunities for academic librariesChallenges and opportunities for academic libraries
Challenges and opportunities for academic libraries
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.
 
Collections unbound: collection directions and the RLUK collective collection
Collections unbound: collection directions and the RLUK collective collectionCollections unbound: collection directions and the RLUK collective collection
Collections unbound: collection directions and the RLUK collective collection
 
Documenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repositoryDocumenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repository
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
 
Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...
 
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...
 
Burke, "Discovery Tools - Changing the Nature of Collections in an Item-cente...
Burke, "Discovery Tools - Changing the Nature of Collections in an Item-cente...Burke, "Discovery Tools - Changing the Nature of Collections in an Item-cente...
Burke, "Discovery Tools - Changing the Nature of Collections in an Item-cente...
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
MichalkoLibrary Collaboration iSchoolShareable-1
MichalkoLibrary Collaboration iSchoolShareable-1MichalkoLibrary Collaboration iSchoolShareable-1
MichalkoLibrary Collaboration iSchoolShareable-1
 
Organizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in MissouriOrganizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in Missouri
 
[[edit]] this GLAM
[[edit]] this GLAM[[edit]] this GLAM
[[edit]] this GLAM
 
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
 
Islandora and Omeka: Building U of T Digital Collections & Exhibits
Islandora and Omeka: Building U of T Digital Collections & ExhibitsIslandora and Omeka: Building U of T Digital Collections & Exhibits
Islandora and Omeka: Building U of T Digital Collections & Exhibits
 

Similar a Solr Powered Libraries

ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
Jon Voss
 
Agile resources on the open web …. a global digital library
Agile resources on the open web …. a global digital libraryAgile resources on the open web …. a global digital library
Agile resources on the open web …. a global digital library
Jisc
 
Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008
Karen S Calhoun
 
Charper.lawdi.20130531
Charper.lawdi.20130531Charper.lawdi.20130531
Charper.lawdi.20130531
charper
 
Fuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network FlowFuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network Flow
kramsey
 
Virtual systems
Virtual systemsVirtual systems
Virtual systems
jsutclif
 

Similar a Solr Powered Libraries (20)

ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Providing First World Library services By using Koha, DSpace, vufind and Drupal
Providing First World Library services By using  Koha, DSpace, vufind and DrupalProviding First World Library services By using  Koha, DSpace, vufind and Drupal
Providing First World Library services By using Koha, DSpace, vufind and Drupal
 
"In the Early Days of a Better Nation": Enhancing the power of metadata today...
"In the Early Days of a Better Nation": Enhancing the power of metadata today..."In the Early Days of a Better Nation": Enhancing the power of metadata today...
"In the Early Days of a Better Nation": Enhancing the power of metadata today...
 
Library discovery: past, present and some futures
Library discovery: past, present and some futuresLibrary discovery: past, present and some futures
Library discovery: past, present and some futures
 
Agile resources on the open web …. a global digital library
Agile resources on the open web …. a global digital libraryAgile resources on the open web …. a global digital library
Agile resources on the open web …. a global digital library
 
Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008
 
Charper.lawdi.20130531
Charper.lawdi.20130531Charper.lawdi.20130531
Charper.lawdi.20130531
 
Open Source ILS Add-Ons
Open Source ILS Add-OnsOpen Source ILS Add-Ons
Open Source ILS Add-Ons
 
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
 
Fuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network FlowFuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network Flow
 
Data Publishing in Archaeozoology
Data Publishing in ArchaeozoologyData Publishing in Archaeozoology
Data Publishing in Archaeozoology
 
Boundless Opportunity
Boundless OpportunityBoundless Opportunity
Boundless Opportunity
 
The Open Access Community, and OAIster
The Open Access Community, and OAIsterThe Open Access Community, and OAIster
The Open Access Community, and OAIster
 
LOD/LAM Presentation
LOD/LAM PresentationLOD/LAM Presentation
LOD/LAM Presentation
 
Open access (1)
Open access (1)Open access (1)
Open access (1)
 
Virtual systems
Virtual systemsVirtual systems
Virtual systems
 
Oair du
Oair duOair du
Oair du
 
Scholarship in a connected world: New ways to know, new ways to show
Scholarship in a connected world: New ways to know, new ways to showScholarship in a connected world: New ways to know, new ways to show
Scholarship in a connected world: New ways to know, new ways to show
 
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository MeetingNetworking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
 
Open access
Open accessOpen access
Open access
 

Más de Erik Hatcher

Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
Erik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
Erik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
Erik Hatcher
 

Más de Erik Hatcher (20)

Ted Talk
Ted TalkTed Talk
Ted Talk
 
Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
it's just search
it's just searchit's just search
it's just search
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Solr 4
Solr 4Solr 4
Solr 4
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Solr Powered Libraries

  • 1. © Copyright 2013 LucidWorks Solr Powered Libraries: A survey of the world's knowledge bases May 2, 2013 Presented by Erik Hatcher Thursday, May 2, 13
  • 2. © 2013 LucidWorks Abstract Using Apache Lucene and Solr search technologies, information and knowledge have become vastly more searchable, findable, and accessible. Because scholars and researchers are some of the most demanding users of search systems, the problems encountered by the implementers are complex. For example, many of the applications built on these technologies also thrive on intentionally designed-in serendipitous discovery capabilities, bringing to light previously unknown, yet related and potentially interesting, content. Libraries and other public knowledge-sharing environments, such as Wikipedia, generally embrace "open source" and community improving contributions as core principles, making a lovely synergy with the power, features, and community-driven ecosystem provided by Lucene and Solr. This talk will introduce you to several Solr powered library-related systems, detail how they work, and leave you with lessons learned that can be applied to your applications. 2 Thursday, May 2, 13
  • 3. © 2013 LucidWorks Real Solar Powered Library ! •http://www.ktsm.com/news/texas-library-runs-sunshine 3 Thursday, May 2, 13
  • 4. © 2013 LucidWorks Card carrying library geek •Applied Research in Patacriticism (ARP) - Rossetti Archive: http://www.rossettiarchive.org - NINES: http://www.nines.org/ - Collex: http://www.collex.org •Blacklight - originated as an implementation of Solr Flare •Presentations - http://code4lib.org/conference: 2007, 2009, 2010, 2011, 2013 - Library of Congress: "Solr Powered Libraries" (2007) »http://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=4113 - EBTI/CBETA Conference 2008 - Publication: “Library 2.0 Initiatives in Academic Libraries” •Windsor Lucene Summit •eIFL-FOSS 4 Thursday, May 2, 13
  • 5. © 2013 LucidWorks Rossetti Archive 5 Thursday, May 2, 13
  • 7. © 2013 LucidWorks Card catalog •the original inverted index 7 http://commons.wikimedia.org/wiki/File:Copyright_Card_Catalog_Files.jpg Thursday, May 2, 13
  • 8. © 2013 LucidWorks •http://openlibrary.org/ - project of the Internet Archive •Goal: "A (community editable) web page for every book" 8 Thursday, May 2, 13
  • 9. © 2013 LucidWorks dp.la - Digital Public Library of America 9 Lucene/ElasticSearch Powered Thursday, May 2, 13
  • 10. © 2013 LucidWorks Wikimedia/Wikipedia/MediaWiki •Solr powered: translation memory service, GeoData extension, etc •"heavily modified Lucene" powers main site search currently 10 Thursday, May 2, 13
  • 11. © 2013 LucidWorks HathiTrust • "partnership of major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future." • 10.5M books, 12TB OCR+metadata, hundreds of languages - "Books are different" - http://code4lib.org/conference/2013/burton-west • http://www.hathitrust.org/blogs/large-scale-search - http://www.hathitrust.org/blogs/large-scale-search/too-many-words - "org.apache.solr.common.SolrException: Impossible Exception" - CommonGrams - word segmentation: autoGeneratePhraseQueries="false" • HathiTrust Research Center - The infrastructure includes an entrance portal, search and collection-building tools (using Blacklight), ... analysis algorithms that can be run against the HathiTrust public domain corpus (more than 3 million volumes). In addition to the production services, the HTRC offers a development “sandbox”. The sandbox runs against non-Google scanned content (about 260,000 volumes) and provides a test-bed for interested researchers to experiment with writing their own algorithms for use in the HTRC infrastructure. 11 Thursday, May 2, 13
  • 12. © 2013 LucidWorks Smithsonian Institution •http://collections.si.edu •Many disparate data sources: - 19 museums, 20 libraries, 14 archives,1 National Zoo,1 Astrophysical Observatory, research centers in Panama,Boston, New York, Maryland,and Virginia •"Documents" of all varieties: - Photographs, paintings, manuscripts, letters, postage stamps,scientific specimens, rockets, airplanes, postcards, sound recordings, posters, decorative arts, ceramics, maps, sculptures, publication papers, books, trade catalogs, etc •User tagging, negative/exclude filtering, DIH SolrEntityProcessor •http://bit.ly/13P41YJ - http://www.basistech.com/pdf/events/open-source-search-conference/ oss-2011-wang-steps-toward-open-government.pdf 12 Thursday, May 2, 13
  • 15. © 2013 LucidWorks •SerialsSolutions Summon •http://www.serialssolutions.com/en/services/summon •SaaS, single unified index, match & merge 15 Thursday, May 2, 13
  • 16. © 2013 LucidWorks Astrophysics Data System Labs •Smithsonian, NASA, Harvard •http://adslabs.org 16 http://code4lib.org/conference/2013/luker Thursday, May 2, 13
  • 17. © 2013 LucidWorks •vufind.org •Powers main HathiTrust UI (currently) and many more - see http://vufind.org/wiki/installation_status 17 Thursday, May 2, 13
  • 19. © 2013 LucidWorks • "Blacklight is an open source Ruby on Rails gem that provides a discovery interface for any Solr index. Blacklight provides a default user interface which is customizable via the standard Rails (templating) mechanisms. Blacklight accommodates heterogeneous data, allowing different information displays for different types of objects." - http://projectblacklight.org • Founded at the University of Virginia (2007): search.lib.virginia.edu - UV-A solar radiation == blacklight • Initial contributors: UVa, Stanford, JHU, WGBH • University of Hull, United States Holocaust Memorial Museum, University of Wisconsin- Madison, Tufts, Australian gov't (Natural Resource Management), Penn State's ScholarSphere, Northwestern, New York Public Library, NCSU, Columbia University, Agriculture Network Information Center (USDA), alicelaw.org (American Legislative and Issue Campaign Exchange, is a one-stop web-based public library of progressive state and local laws), and many more • http://projecthydra.org/ uses Blacklight as UI component 19 Thursday, May 2, 13
  • 20. © 2013 LucidWorks searchworks at Stanford 20 Thursday, May 2, 13
  • 21. © 2013 LucidWorks Advanced search at Stanford's searchworks 21 Thursday, May 2, 13
  • 22. © 2013 LucidWorks searchworks: Mapping Text Boxes to Solr query pieces •http://code4lib.org/conference/2010/dushay_keck 22 Thursday, May 2, 13
  • 24. © 2013 LucidWorks Rock and Roll! •m/ 24 Thursday, May 2, 13
  • 25. © 2013 LucidWorks Community and Resources •code4lib: - http://www.code4lib.org/ •HathiTrust folks - http://www.hathitrust.org/blogs/large-scale-search - http://robotlibrarian.billdueber.com/ •http://bighumanities.net/ - The Workshop on Big Humanities will be held in conjunction with the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), which will take place between 6-9 October 2013 in Silicon Valley, California, USA, and which provides a leading international forum for disseminating the latest research in the growing field of “big data 25 Thursday, May 2, 13