SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
THE NEED FOR AND FUNDAMENTALS OF
AN OPEN WEB INDEX
Prof. Dr. Dirk Lewandowski
Hamburg University of Applied Sciences, Hamburg, Germany
dirk.lewandowski@haw-hamburg.de
First International Symposium on Open Search Technology
Garching, 23 October, 2019
Proposal for an Open Web Index (OWI)
Prof. Dr. Dirk Lewandowski
ABOUT ME
• Professor of Information Research and
Information Retrieval at Hamburg
University of Applied Sciences
• Author of 100+ scholarly articles on
search engines
• German-language book “Suchmaschinen
verstehen” (Springer, 2nd edition, 2018)
• Editor, Aslib Journal of Information
Management (Emerald Publishing)
• Served as expert for the High Court of
Justice (UK) and Deutscher Bundestag
(German parliament)
1
https://searchstudies.org/dirk
WHY WE NEED AN OPEN WEB INDEX
GOOGLE SERVES MORE THAN
2.000.000.000.000 QUERIES PER YEAR.
Proposal for an Open Web Index (OWI)
Prof. Dr. Dirk Lewandowski
PROBLEM STATEMENT
• As there is no central directory of the Web, private search engine companies
have built large indexes of its contents
• Companies operating Web-scale indexes do not allow sufficient access to
their data to other parties interested
• The difficulties in building a Web index lie in technical issues, operating costs,
Web size, and freshness
• Due to these difficulties, there is no Web index built by a European company
(or other entity)
4
Proposal for an Open Web Index (OWI)
Prof. Dr. Dirk Lewandowski
IDEA
5
VISION
To build a public library of the Web
TECHNICAL IDEA
Separate the index from the services that are built on the index
PUBLIC VS. PRIVATE
While the index should be public, the services can be proprietary
Separate the index from the services that are built on the index
TECHNICAL IDEA
Separate the index from the services that are built on the index
TECHNICAL IDEA
Separate the index from the services that are built on the index
PUBLIC VS. PRIVATE
While the index should be public, the services can be proprietary
TECHNICAL IDEA
Separate the index from the services that are built on the index
Proposal for an Open Web Index (OWI)
Prof. Dr. Dirk Lewandowski
STRUCTURE
6
OWI
Crawler
OWI
Basic	Indexer
OWI
Advanced	Indexer
OWI
Web	Index
OWI
Usage	Data	Index
Service	1 Service	2 Service	3
User User User
OWI	Interface	/	API
User User UserUser User UserUser User User User
Service	4
Proposal for an Open Web Index (OWI)
Prof. Dr. Dirk Lewandowski
POSSIBLE APPLICATIONS
N.B.: This list of ideas is far from being complete and only serves illustrative purposes.
7
SEARCH
SCIENCE / RESEARCH
• Web Search
• Vertical Search, e.g.,video or
scholarly content
• Trend analysis, e.g., political trends
• Language use on the Web
• Research evaluation, e.g., Altmetrics
DATA ANALYSIS
• Data aggregation, e.g., company or person dossiers
• Opinion mining (“Who says what about whom?”)
• Market researc
SCIENCE / RESEARCH
• Web Search
• Vertical Search, e.g.,video or
scholarly content
• Trend analysis, e.g., political trends
• Language use on the Web
• Research evaluation, e.g., Altmetrics
DATA ANALYSIS
• Data aggregation, e.g., company or person
• Opinion mining (“Who says what about who
• Market researc
DATA ANALYSIS
• Data aggregation, e.g., company or
person dossiers
• Opinion mining (“Who says what
about whom?”)
• Market research
ARTIFICAL INTELLIGENCE
OWI could build the foundation for
large-scale AI applications, e.g.,
• Machine translation
• Question answering
DATA ANALYSIS
• Data aggregation, e.g., company or
person dossiers
• Opinion mining (“Who says what
about whom?”)
• Market research
COMBINING OWI DATA WITH PROPRIETARY DATA
• Company profiles + OWI data = enriched company dossiers
• Product data + OWI data = enriched product descriptions
• Geospatial data + OWI data = enriched map applicatio
DATA ANALYSIS
• Data aggregation, e.g., company or
person dossiers
• Opinion mining (“Who says what
about whom?”)
• Market research
COMBINING OWI DATA WITH PROPRIETARY DATA
• Company profiles + OWI data = enriched company dossiers
• Product data + OWI data = enriched product descriptions
• Geospatial data + OWI data = enriched map applications
WHY DON’T WE JUST START BUILDING IT?
Proposal for an Open Web Index (OWI)
Prof. Dr. Dirk Lewandowski
WHAT SIZE SHOULD A WEB INDEX HAVE?
• 1.71 billion websites
• How many pages/URLs
does this mean?
à There is no such thing as
a complete index.
à However, without
representing a major part
of the Web, an index is
more or less useless.
9
Proposal for an Open Web Index (OWI)
Prof. Dr. Dirk Lewandowski
WHY ARE INITIATIVES LIKE COMMON CRAWL NOT
ENOUGH?
They are not comprehensive
- CommonCrawl: 2.6 billion pages (not websites!)
They are static
- Crawling once a month is very different from keeping an index current at any time
They do not provide search functionality
- No (basic) indexing as needed to build applications on top of the index
- No SPAM control as needed to build applications
- No human raters to control for the quality of the index
à The use of initiatives like Common Crawl is more or less restricted to analysing Web
content. Due to the sampling procedure applied, it may not even be too useful for that.
10
Proposal for an Open Web Index (OWI)
Prof. Dr. Dirk Lewandowski
CRAWLING IS NOT THE PROBLEM, ANYWAY
Crawling is just the beginning of a long process. Indexing is required for making the index
searchable.
The real problems are
1) Avoiding SPAM (= excluding it from the index) – SPAM makes up A LOT of the Web’s
content
2) Keeping the index fresh
3) Providing indexing (basic and advanced)
4) Making the index searchable
11
Proposal for an Open Web Index (OWI)
Prof. Dr. Dirk Lewandowski
BIAS ON THE WEB
12Baeza-Yates, R. (2018). Bias on the web. Communications of the ACM, 61(6), 54–61. https://doi.org/10.1145/3209581
Proposal for an Open Web Index (OWI)
Prof. Dr. Dirk Lewandowski
WHO CONTROLS THE RESULT RANKINGS?
13
Search Engine
Providers
Search Engine
Result Page
Content
ProvidersUsers
Search Engine
Optimizers
HOW TO PROCEED
Proposal for an Open Web Index (OWI)
Prof. Dr. Dirk Lewandowski
HOW TO PROCEED
- A comprehensible and fresh Web index is a societal/political project, not a mere
technical problem.
- Therefore, we need to approach politics. They should decide for building the index (and
financing it)
- To make the index independent from governments, a European foundation should be
built to govern it.
- The technical implementation of the Index should lie in the hands of those
(companies/institution) best capable of building it.
15
THANK YOU
Dirk Lewandowski
Hamburg University of Applied Sciences, Hamburg, Germany
dirk.lewandowski@haw-hamburg.de
www.searchstudies.org/dirk

Más contenido relacionado

La actualidad más candente

Introducing the Linked Data Research Centre
Introducing the Linked Data Research CentreIntroducing the Linked Data Research Centre
Introducing the Linked Data Research Centre
Michael Hausenblas
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch
 

La actualidad más candente (20)

Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Data Citation: A Critical Role for Publishers
Data Citation: A Critical Role for PublishersData Citation: A Critical Role for Publishers
Data Citation: A Critical Role for Publishers
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
 
THOR Workshop - Services EBI
THOR Workshop - Services EBITHOR Workshop - Services EBI
THOR Workshop - Services EBI
 
ICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CASICIC 2017: New product presentations CAS
ICIC 2017: New product presentations CAS
 
Code4 lib2012
Code4 lib2012Code4 lib2012
Code4 lib2012
 
From Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataFrom Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental Data
 
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...
 
Making Open the Default - Bjorn Brembs
Making Open the Default - Bjorn BrembsMaking Open the Default - Bjorn Brembs
Making Open the Default - Bjorn Brembs
 
Imperial College ORCID project
Imperial College ORCID projectImperial College ORCID project
Imperial College ORCID project
 
Open data and linked data
Open data and linked dataOpen data and linked data
Open data and linked data
 
Introducing the Linked Data Research Centre
Introducing the Linked Data Research CentreIntroducing the Linked Data Research Centre
Introducing the Linked Data Research Centre
 
ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy  ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy
 
Introducing ORCID at Imperial College London
Introducing ORCID at Imperial College LondonIntroducing ORCID at Imperial College London
Introducing ORCID at Imperial College London
 
II-SDV 2016 RightsDirect
II-SDV 2016 RightsDirectII-SDV 2016 RightsDirect
II-SDV 2016 RightsDirect
 
Publishing Open Research Data
Publishing Open Research DataPublishing Open Research Data
Publishing Open Research Data
 
ORCID - A University Perspective
ORCID - A University PerspectiveORCID - A University Perspective
ORCID - A University Perspective
 
Research Data Publishing
Research Data PublishingResearch Data Publishing
Research Data Publishing
 
The Shift to Open Access Publishing
The Shift to Open Access PublishingThe Shift to Open Access Publishing
The Shift to Open Access Publishing
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 

Similar a The Need for and fundamentals of an Open Web Index

Keeping up to date & comparing journal apps. the stockholm workshop 2016
Keeping up to date &  comparing journal apps. the stockholm workshop 2016Keeping up to date &  comparing journal apps. the stockholm workshop 2016
Keeping up to date & comparing journal apps. the stockholm workshop 2016
Guus van den Brekel
 
#ALAAC15 Linked Data Love
#ALAAC15 Linked Data Love #ALAAC15 Linked Data Love
#ALAAC15 Linked Data Love
Kristi Holmes
 

Similar a The Need for and fundamentals of an Open Web Index (20)

From Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsFrom Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and Collaborations
 
Wiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School PkuWiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School Pku
 
Wiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School PkuWiserpku Lecture@Life Science School Pku
Wiserpku Lecture@Life Science School Pku
 
OpenAIRE: Science. Set Free, Iryna Kuchma, EIFL
OpenAIRE: Science. Set Free, Iryna Kuchma, EIFLOpenAIRE: Science. Set Free, Iryna Kuchma, EIFL
OpenAIRE: Science. Set Free, Iryna Kuchma, EIFL
 
W3 c semantic web activity
W3 c semantic web activityW3 c semantic web activity
W3 c semantic web activity
 
Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)
 
Semantic web & structured data - #SMT Search Marketing Thursday - Jan-Willem ...
Semantic web & structured data - #SMT Search Marketing Thursday - Jan-Willem ...Semantic web & structured data - #SMT Search Marketing Thursday - Jan-Willem ...
Semantic web & structured data - #SMT Search Marketing Thursday - Jan-Willem ...
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 
Open ILRI
Open ILRIOpen ILRI
Open ILRI
 
20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong
 
Open Access Tracking Project How it work ?
Open Access Tracking Project How it work ?Open Access Tracking Project How it work ?
Open Access Tracking Project How it work ?
 
How OATP Work?
How OATP Work?How OATP Work?
How OATP Work?
 
Keeping up to date & comparing journal apps. the stockholm workshop 2016
Keeping up to date &  comparing journal apps. the stockholm workshop 2016Keeping up to date &  comparing journal apps. the stockholm workshop 2016
Keeping up to date & comparing journal apps. the stockholm workshop 2016
 
#ALAAC15 Linked Data Love
#ALAAC15 Linked Data Love #ALAAC15 Linked Data Love
#ALAAC15 Linked Data Love
 
SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers
 
Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualiza...
Session 1 and 2  "Challenges and Opportunities with Big Linked Data Visualiza...Session 1 and 2  "Challenges and Opportunities with Big Linked Data Visualiza...
Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualiza...
 
ODIN: Connecting research and researchers
ODIN: Connecting research and researchersODIN: Connecting research and researchers
ODIN: Connecting research and researchers
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
 

Más de Dirk Lewandowski

In a World of Biased Search Engines
In a World of Biased Search EnginesIn a World of Biased Search Engines
In a World of Biased Search Engines
Dirk Lewandowski
 
Künstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei SuchmaschinenKünstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei Suchmaschinen
Dirk Lewandowski
 
Analysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topicsAnalysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topics
Dirk Lewandowski
 
Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?
Dirk Lewandowski
 
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und EntwicklungsperspektivenInternet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Dirk Lewandowski
 
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Dirk Lewandowski
 
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von SuchmaschinenVerwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Dirk Lewandowski
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Dirk Lewandowski
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Dirk Lewandowski
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Dirk Lewandowski
 

Más de Dirk Lewandowski (20)

In a World of Biased Search Engines
In a World of Biased Search EnginesIn a World of Biased Search Engines
In a World of Biased Search Engines
 
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
 
Künstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei SuchmaschinenKünstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei Suchmaschinen
 
Analysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topicsAnalysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topics
 
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändertGoogle Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
 
Suchverhalten und die Grenzen von Suchdiensten
Suchverhalten und die Grenzen von SuchdienstenSuchverhalten und die Grenzen von Suchdiensten
Suchverhalten und die Grenzen von Suchdiensten
 
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
 
Are Ads on Google search engine results pages labeled clearly enough?
Are Ads on Google search engine results pages labeled clearly enough?Are Ads on Google search engine results pages labeled clearly enough?
Are Ads on Google search engine results pages labeled clearly enough?
 
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
 
Wie Suchmaschinen die Inhalte des Web interpretieren
Wie Suchmaschinen die Inhalte des Web interpretierenWie Suchmaschinen die Inhalte des Web interpretieren
Wie Suchmaschinen die Inhalte des Web interpretieren
 
Perspektiven eines Open Web Index
Perspektiven eines Open Web IndexPerspektiven eines Open Web Index
Perspektiven eines Open Web Index
 
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
 
Suchmaschinen verstehen
Suchmaschinen verstehenSuchmaschinen verstehen
Suchmaschinen verstehen
 
Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?
 
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und EntwicklungsperspektivenInternet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
 
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
 
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von SuchmaschinenVerwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
 

Último

Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
soniya singh
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
imonikaupta
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 

Último (20)

(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 

The Need for and fundamentals of an Open Web Index

  • 1. THE NEED FOR AND FUNDAMENTALS OF AN OPEN WEB INDEX Prof. Dr. Dirk Lewandowski Hamburg University of Applied Sciences, Hamburg, Germany dirk.lewandowski@haw-hamburg.de First International Symposium on Open Search Technology Garching, 23 October, 2019
  • 2. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski ABOUT ME • Professor of Information Research and Information Retrieval at Hamburg University of Applied Sciences • Author of 100+ scholarly articles on search engines • German-language book “Suchmaschinen verstehen” (Springer, 2nd edition, 2018) • Editor, Aslib Journal of Information Management (Emerald Publishing) • Served as expert for the High Court of Justice (UK) and Deutscher Bundestag (German parliament) 1 https://searchstudies.org/dirk
  • 3. WHY WE NEED AN OPEN WEB INDEX
  • 4. GOOGLE SERVES MORE THAN 2.000.000.000.000 QUERIES PER YEAR.
  • 5. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski PROBLEM STATEMENT • As there is no central directory of the Web, private search engine companies have built large indexes of its contents • Companies operating Web-scale indexes do not allow sufficient access to their data to other parties interested • The difficulties in building a Web index lie in technical issues, operating costs, Web size, and freshness • Due to these difficulties, there is no Web index built by a European company (or other entity) 4
  • 6. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski IDEA 5 VISION To build a public library of the Web TECHNICAL IDEA Separate the index from the services that are built on the index PUBLIC VS. PRIVATE While the index should be public, the services can be proprietary Separate the index from the services that are built on the index TECHNICAL IDEA Separate the index from the services that are built on the index TECHNICAL IDEA Separate the index from the services that are built on the index PUBLIC VS. PRIVATE While the index should be public, the services can be proprietary TECHNICAL IDEA Separate the index from the services that are built on the index
  • 7. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski STRUCTURE 6 OWI Crawler OWI Basic Indexer OWI Advanced Indexer OWI Web Index OWI Usage Data Index Service 1 Service 2 Service 3 User User User OWI Interface / API User User UserUser User UserUser User User User Service 4
  • 8. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski POSSIBLE APPLICATIONS N.B.: This list of ideas is far from being complete and only serves illustrative purposes. 7 SEARCH SCIENCE / RESEARCH • Web Search • Vertical Search, e.g.,video or scholarly content • Trend analysis, e.g., political trends • Language use on the Web • Research evaluation, e.g., Altmetrics DATA ANALYSIS • Data aggregation, e.g., company or person dossiers • Opinion mining (“Who says what about whom?”) • Market researc SCIENCE / RESEARCH • Web Search • Vertical Search, e.g.,video or scholarly content • Trend analysis, e.g., political trends • Language use on the Web • Research evaluation, e.g., Altmetrics DATA ANALYSIS • Data aggregation, e.g., company or person • Opinion mining (“Who says what about who • Market researc DATA ANALYSIS • Data aggregation, e.g., company or person dossiers • Opinion mining (“Who says what about whom?”) • Market research ARTIFICAL INTELLIGENCE OWI could build the foundation for large-scale AI applications, e.g., • Machine translation • Question answering DATA ANALYSIS • Data aggregation, e.g., company or person dossiers • Opinion mining (“Who says what about whom?”) • Market research COMBINING OWI DATA WITH PROPRIETARY DATA • Company profiles + OWI data = enriched company dossiers • Product data + OWI data = enriched product descriptions • Geospatial data + OWI data = enriched map applicatio DATA ANALYSIS • Data aggregation, e.g., company or person dossiers • Opinion mining (“Who says what about whom?”) • Market research COMBINING OWI DATA WITH PROPRIETARY DATA • Company profiles + OWI data = enriched company dossiers • Product data + OWI data = enriched product descriptions • Geospatial data + OWI data = enriched map applications
  • 9. WHY DON’T WE JUST START BUILDING IT?
  • 10. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski WHAT SIZE SHOULD A WEB INDEX HAVE? • 1.71 billion websites • How many pages/URLs does this mean? à There is no such thing as a complete index. à However, without representing a major part of the Web, an index is more or less useless. 9
  • 11. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski WHY ARE INITIATIVES LIKE COMMON CRAWL NOT ENOUGH? They are not comprehensive - CommonCrawl: 2.6 billion pages (not websites!) They are static - Crawling once a month is very different from keeping an index current at any time They do not provide search functionality - No (basic) indexing as needed to build applications on top of the index - No SPAM control as needed to build applications - No human raters to control for the quality of the index à The use of initiatives like Common Crawl is more or less restricted to analysing Web content. Due to the sampling procedure applied, it may not even be too useful for that. 10
  • 12. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski CRAWLING IS NOT THE PROBLEM, ANYWAY Crawling is just the beginning of a long process. Indexing is required for making the index searchable. The real problems are 1) Avoiding SPAM (= excluding it from the index) – SPAM makes up A LOT of the Web’s content 2) Keeping the index fresh 3) Providing indexing (basic and advanced) 4) Making the index searchable 11
  • 13. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski BIAS ON THE WEB 12Baeza-Yates, R. (2018). Bias on the web. Communications of the ACM, 61(6), 54–61. https://doi.org/10.1145/3209581
  • 14. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski WHO CONTROLS THE RESULT RANKINGS? 13 Search Engine Providers Search Engine Result Page Content ProvidersUsers Search Engine Optimizers
  • 16. Proposal for an Open Web Index (OWI) Prof. Dr. Dirk Lewandowski HOW TO PROCEED - A comprehensible and fresh Web index is a societal/political project, not a mere technical problem. - Therefore, we need to approach politics. They should decide for building the index (and financing it) - To make the index independent from governments, a European foundation should be built to govern it. - The technical implementation of the Index should lie in the hands of those (companies/institution) best capable of building it. 15
  • 17. THANK YOU Dirk Lewandowski Hamburg University of Applied Sciences, Hamburg, Germany dirk.lewandowski@haw-hamburg.de www.searchstudies.org/dirk