SlideShare una empresa de Scribd logo
1 de 21
구글을 지탱하는 기술
구글을 지탱하는 기술 – chapter1.ppt
1. First Appearance of Google
2. Main Concepts
3. Search Engine Structure
    - ‘s Roll
    - Back-end Structure
    - Index Structure
4. Total Structure
First Appearance of Google


• Why?
           Get useful results


• Who?
           Sergey Brin & Larry Page
Main Concepts



Hardware expands


Ranking Function
         – Page Rank
         – Anchor Text
         – Word
Search Engine Structure




                      Internet
    Search Engine
Search Engine
Structure



Search Server’s Roll



• 통신 관리                                 Back-
                       Search
                                Index
                       Server            end
• 요청 해석하여 처리할 내용 판단

• 인덱스에서 필요한 정보 찾아냄

• 결과를 편집해 이용자에게 보냄
Search Engine
Structure



Back-end’s Roll

• Crawling

     •Web page 수집해 오는 기술
                                                  Back-
                                 Search
                                          Index
                                 Server            end
     •많은 시간 -> 복수의 crawler 사용

     •수집한 것을 Repository에 보관


• Creating Index

     •Repository에 저장된 web page
     로 Index를 만들어 냄

     •구조분석, 단어처리, 링크 처리
      랭킹 등
Search Engine
Structure



Index’s Roll



• 주어진 Data를 안전하게 저장                             Back-
                               Search
                                        Index
                               Server            end
• 요청 받은 Data를 찾아냄

• Search Engine의 Data Base 역
할
Search Engine
Structure
Back-end Structure



Crawling

Web page 수집해 오는 기술



초기 Google 2400만개 Web Page 등록

초당 avg40page를 유지하기 위해선
동시에 수백 개의 download유지

-> 현재는??

구글 검색했을 때 3,070,000,000개 결과
Search Engine
Structure
Back-end Structure
                               URL
                              server
                                                     crawler
Crawler

                                          crawler
URL server 가 전체 crawler 지휘

각 crawler는 지시에 따라             crawler
                                                           Internet
Web Page download

Repository에 임시 저장

• docID – 고유 숫자 값
                                        Repository
• url  – URL
• text – 압축물
• etc. – date, page length…
Search Engine
Structure
Back-end Structure
                       URL
                      server
                                             crawler
Crawler

                                  crawler
주소해석이 시간 많이 소요
-> 내부에 DNS cache 관리
                      crawler
                                                   Internet
Repository에 저장후
URL server가 다음주소 할당



                                Repository
Search Engine
Structure
Back-end Structure
                                                         docID   Sejong.ac.k
                                                          url         r
                                        <html>
                                                           1
                                        <head>
Creating Index                  <title>세종대학교</title>
                                        </body>
                                   <h1>학사정보<h1>
                                                                 세종대학교
                                                         Title
                                           ….
                                                         기타        …
Analyzing Web Page structures


DocIndex
– Web Page의 기본정보 저장
– docID를 key로 사용

                                       DocIndex              URLlist
URLlist
– url을 key로 사용                    docID url title etc.     url docID
– docID를 가져오기 위함
Search Engine                           Lexicon
Structure
                                     word    wordID
Back-end Structure
                                     세종       101
                                                                      Barrels
                                     대학교      102
                                     학사       201
Creating Index                       정보       202


                                                         Barrels
                                     docID    wordID#1   Position#1   Size#1    Etc.#1
Word Index
                                                         Position#2   Size#2    Etc.#2

Lexicon                                       wordID#2   Position#1   Size#1    Etc.#1
 – word -> wordID
                                                         Position#2   Size#2    Etc.#2

                                                            …
Barrels
 – docID wordID position size etc.

Inverted Index
 – wordID를 Key로 사용
Search Engine
Structure
Back-end Structure


                                 docID    Sejong.ac.k
                                                               docID       3
Creating Index                    url          r
                                                                url    Cyworld.com
                                   1

                                                        Link

Link Index


URLlist
                                          URLlist
Links                                                                Links
                                 Sejong.ac.kr       1              1     3
                                 Cyworld.com        3
Anchortext
- A information of linked page
Search Engine
Structure
Back-end Structure



Creating Index



Ranking Index


Page Rank - Link
                       Web Page 사이의 link를 일종의 투표처럼 분석
                       -> 더 많은 link를 받은 문서 = 더 좋은 문서
Anchortext
Word       - Barrels
Search Engine
Structure
                      DocIndex
Index Structure


                       Lexicon

DocIndex
– Web Page의 기본정보 저장
– docID를 key로 사용


Lexicon
– word -> wordID


                        Barrels
Barrels
– storages
Total Structure

User

         Index                   Back-end           Internet


                                  crawler
         DocIndex
Search
Server                            crawler

          Lexicon
                                  crawler

                     Structure
                                                         URL
                                                        server
                       word
         Barrels
          Barrels
           Barrels               Repository

                       Link
                                              URLlist

                     Ranking
                                    Links
Thanks for your attention
구글을지탱하는기술

Más contenido relacionado

Destacado

Hoja de reclamaciones IBON -
Hoja  de  reclamaciones  IBON -Hoja  de  reclamaciones  IBON -
Hoja de reclamaciones IBON -ibonlaka
 
La vida en el bosque ecuatorial
La vida en el bosque ecuatorialLa vida en el bosque ecuatorial
La vida en el bosque ecuatorialPatricio Munoz
 
Subidas 18%2 f01%2f14 sucio hoja 1
Subidas 18%2 f01%2f14 sucio   hoja 1Subidas 18%2 f01%2f14 sucio   hoja 1
Subidas 18%2 f01%2f14 sucio hoja 1andreaguti95
 
Tutorial De Registro En Vipsportsnet.Com
Tutorial De Registro En Vipsportsnet.ComTutorial De Registro En Vipsportsnet.Com
Tutorial De Registro En Vipsportsnet.ComJeronimo Valenzuela
 
Viernes intermedio
Viernes intermedioViernes intermedio
Viernes intermedioadjnt1979
 
La biodiversidad
La biodiversidadLa biodiversidad
La biodiversidadztefi
 
RESPUESTA SOLICITUDES COMITÉ DE CUPOS
RESPUESTA SOLICITUDES COMITÉ DE CUPOSRESPUESTA SOLICITUDES COMITÉ DE CUPOS
RESPUESTA SOLICITUDES COMITÉ DE CUPOSalcaldia municipal
 
Grandes matemáticos
Grandes matemáticosGrandes matemáticos
Grandes matemáticospedrorecio27
 
HUVH ciutadans i professionals plegats fent cami per la salut
HUVH ciutadans i professionals plegats fent cami per la salutHUVH ciutadans i professionals plegats fent cami per la salut
HUVH ciutadans i professionals plegats fent cami per la salutInstitut Català de la Salut
 
Los Mejores Juegos En linea Gratis Del dos mil catorce Y 2015
Los Mejores Juegos En linea Gratis Del dos mil catorce Y 2015
Los Mejores Juegos En linea Gratis Del dos mil catorce Y 2015
Los Mejores Juegos En linea Gratis Del dos mil catorce Y 2015 dwayne0cameron66
 
City Self Storage España se consolida en el sector del self-storage
City Self Storage España se consolida en el sector del self-storageCity Self Storage España se consolida en el sector del self-storage
City Self Storage España se consolida en el sector del self-storageAsociación Aess
 
Experienciaspedagogicasenerodos2014
Experienciaspedagogicasenerodos2014Experienciaspedagogicasenerodos2014
Experienciaspedagogicasenerodos2014Edgar Jayo Medina
 
Programa del 27 de abril de EXPO 92
Programa del 27 de abril de EXPO 92Programa del 27 de abril de EXPO 92
Programa del 27 de abril de EXPO 92Albero Belmonte
 
Presentación.. u nidad 1. precesamiento de datos
Presentación.. u nidad 1. precesamiento de datosPresentación.. u nidad 1. precesamiento de datos
Presentación.. u nidad 1. precesamiento de datosARANNYGUERRA
 
Gráfico diario del ibex 35 para el 17 01 2014
Gráfico diario del ibex 35 para el 17 01 2014Gráfico diario del ibex 35 para el 17 01 2014
Gráfico diario del ibex 35 para el 17 01 2014Experiencia Trading
 

Destacado (20)

Hoja de reclamaciones IBON -
Hoja  de  reclamaciones  IBON -Hoja  de  reclamaciones  IBON -
Hoja de reclamaciones IBON -
 
Cuadro presentación
Cuadro presentaciónCuadro presentación
Cuadro presentación
 
La vida en el bosque ecuatorial
La vida en el bosque ecuatorialLa vida en el bosque ecuatorial
La vida en el bosque ecuatorial
 
Subidas 18%2 f01%2f14 sucio hoja 1
Subidas 18%2 f01%2f14 sucio   hoja 1Subidas 18%2 f01%2f14 sucio   hoja 1
Subidas 18%2 f01%2f14 sucio hoja 1
 
Tutorial De Registro En Vipsportsnet.Com
Tutorial De Registro En Vipsportsnet.ComTutorial De Registro En Vipsportsnet.Com
Tutorial De Registro En Vipsportsnet.Com
 
Viernes intermedio
Viernes intermedioViernes intermedio
Viernes intermedio
 
Anuncio grad matemático fy13
Anuncio grad matemático fy13Anuncio grad matemático fy13
Anuncio grad matemático fy13
 
La biodiversidad
La biodiversidadLa biodiversidad
La biodiversidad
 
RESPUESTA SOLICITUDES COMITÉ DE CUPOS
RESPUESTA SOLICITUDES COMITÉ DE CUPOSRESPUESTA SOLICITUDES COMITÉ DE CUPOS
RESPUESTA SOLICITUDES COMITÉ DE CUPOS
 
Grandes matemáticos
Grandes matemáticosGrandes matemáticos
Grandes matemáticos
 
HUVH ciutadans i professionals plegats fent cami per la salut
HUVH ciutadans i professionals plegats fent cami per la salutHUVH ciutadans i professionals plegats fent cami per la salut
HUVH ciutadans i professionals plegats fent cami per la salut
 
Los Mejores Juegos En linea Gratis Del dos mil catorce Y 2015
Los Mejores Juegos En linea Gratis Del dos mil catorce Y 2015
Los Mejores Juegos En linea Gratis Del dos mil catorce Y 2015
Los Mejores Juegos En linea Gratis Del dos mil catorce Y 2015
 
City Self Storage España se consolida en el sector del self-storage
City Self Storage España se consolida en el sector del self-storageCity Self Storage España se consolida en el sector del self-storage
City Self Storage España se consolida en el sector del self-storage
 
Experienciaspedagogicasenerodos2014
Experienciaspedagogicasenerodos2014Experienciaspedagogicasenerodos2014
Experienciaspedagogicasenerodos2014
 
Programa del 27 de abril de EXPO 92
Programa del 27 de abril de EXPO 92Programa del 27 de abril de EXPO 92
Programa del 27 de abril de EXPO 92
 
Preguntas ensayo
Preguntas ensayoPreguntas ensayo
Preguntas ensayo
 
Presentación.. u nidad 1. precesamiento de datos
Presentación.. u nidad 1. precesamiento de datosPresentación.. u nidad 1. precesamiento de datos
Presentación.. u nidad 1. precesamiento de datos
 
JUH
JUHJUH
JUH
 
Gráfico diario del ibex 35 para el 17 01 2014
Gráfico diario del ibex 35 para el 17 01 2014Gráfico diario del ibex 35 para el 17 01 2014
Gráfico diario del ibex 35 para el 17 01 2014
 
+Q9meses Digital nº23 Nohemí Hervada
+Q9meses Digital nº23 Nohemí Hervada+Q9meses Digital nº23 Nohemí Hervada
+Q9meses Digital nº23 Nohemí Hervada
 

Similar a 구글을지탱하는기술

Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007ITDogadjaji.com
 
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleStephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleIE Group
 
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersTips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersDan Usher
 
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...Dan Usher
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overviewguru122
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overviewguestd9aa5
 
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersSharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersDan Usher
 
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Cengage Learning
 
Working With Rails
Working With RailsWorking With Rails
Working With RailsDali Wang
 
Website architecture 2013
Website architecture 2013Website architecture 2013
Website architecture 2013Stoney deGeyter
 
The things we found in your website
The things we found in your websiteThe things we found in your website
The things we found in your websitehernanibf
 
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices FinalMarianne Sweeny
 
REST Introduction (PHP London)
REST Introduction (PHP London)REST Introduction (PHP London)
REST Introduction (PHP London)Paul James
 
Project Tools in Web Development
Project Tools in Web DevelopmentProject Tools in Web Development
Project Tools in Web Developmentkmloomis
 
BADCamp 2008 DB Sync
BADCamp 2008 DB SyncBADCamp 2008 DB Sync
BADCamp 2008 DB SyncShaun Haber
 

Similar a 구글을지탱하는기술 (20)

Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007Microsoft SharePoint Server 2007
Microsoft SharePoint Server 2007
 
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleStephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
 
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro BlundersTips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
Tips and Tricks for SharePoint 2010 - Avoiding IT Pro Blunders
 
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
SharePoint 2010 - Tips and Tricks of the Trade - Avoiding Administrative Blun...
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overview
 
E Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical OverviewE Pi Server Easy Search Technical Overview
E Pi Server Easy Search Technical Overview
 
Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...
Best Practices For Centrally Governing Your Portal And Taxonomy   Echo Techno...Best Practices For Centrally Governing Your Portal And Taxonomy   Echo Techno...
Best Practices For Centrally Governing Your Portal And Taxonomy Echo Techno...
 
Websites On Speed
Websites On SpeedWebsites On Speed
Websites On Speed
 
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative BlundersSharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
SharePoint Saturday Philly - SharePoint 2010 Administrative Blunders
 
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
Course Tech 2013, Sasha Vodnik, A Crash Course in HTML5
 
Working With Rails
Working With RailsWorking With Rails
Working With Rails
 
Google
GoogleGoogle
Google
 
Website architecture 2013
Website architecture 2013Website architecture 2013
Website architecture 2013
 
The things we found in your website
The things we found in your websiteThe things we found in your website
The things we found in your website
 
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's GuidePardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
Pardot Webinar - Unlocking the Mysteries of SEO - A B2B Marketer's Guide
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices Final
 
REST Introduction (PHP London)
REST Introduction (PHP London)REST Introduction (PHP London)
REST Introduction (PHP London)
 
Session6
Session6Session6
Session6
 
Project Tools in Web Development
Project Tools in Web DevelopmentProject Tools in Web Development
Project Tools in Web Development
 
BADCamp 2008 DB Sync
BADCamp 2008 DB SyncBADCamp 2008 DB Sync
BADCamp 2008 DB Sync
 

Más de sid choi

벤치마킹
벤치마킹벤치마킹
벤치마킹sid choi
 
웹 기획, 사용자를 배려하는 합리적인 생각
웹 기획, 사용자를 배려하는 합리적인 생각웹 기획, 사용자를 배려하는 합리적인 생각
웹 기획, 사용자를 배려하는 합리적인 생각sid choi
 
Google을 지탱하는 기술4
Google을 지탱하는 기술4Google을 지탱하는 기술4
Google을 지탱하는 기술4sid choi
 
Google을 지탱하는 기술5
Google을 지탱하는 기술5Google을 지탱하는 기술5
Google을 지탱하는 기술5sid choi
 
Google을 지탱하는 기술3
Google을 지탱하는 기술3Google을 지탱하는 기술3
Google을 지탱하는 기술3sid choi
 
벤치 마킹
벤치 마킹벤치 마킹
벤치 마킹sid choi
 
미코노미
미코노미미코노미
미코노미sid choi
 
웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는sid choi
 
Google을 지탱하는 기술2
Google을 지탱하는 기술2Google을 지탱하는 기술2
Google을 지탱하는 기술2sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술sid choi
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술sid choi
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술sid choi
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술sid choi
 

Más de sid choi (16)

벤치마킹
벤치마킹벤치마킹
벤치마킹
 
Meconomy
MeconomyMeconomy
Meconomy
 
웹 기획, 사용자를 배려하는 합리적인 생각
웹 기획, 사용자를 배려하는 합리적인 생각웹 기획, 사용자를 배려하는 합리적인 생각
웹 기획, 사용자를 배려하는 합리적인 생각
 
Google을 지탱하는 기술4
Google을 지탱하는 기술4Google을 지탱하는 기술4
Google을 지탱하는 기술4
 
Google을 지탱하는 기술5
Google을 지탱하는 기술5Google을 지탱하는 기술5
Google을 지탱하는 기술5
 
Google을 지탱하는 기술3
Google을 지탱하는 기술3Google을 지탱하는 기술3
Google을 지탱하는 기술3
 
벤치 마킹
벤치 마킹벤치 마킹
벤치 마킹
 
미코노미
미코노미미코노미
미코노미
 
웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는웹기획, 사용자를 배려하는
웹기획, 사용자를 배려하는
 
Google을 지탱하는 기술2
Google을 지탱하는 기술2Google을 지탱하는 기술2
Google을 지탱하는 기술2
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 
구글을 지탱하는 기술
구글을 지탱하는 기술구글을 지탱하는 기술
구글을 지탱하는 기술
 
구글을지탱하는기술
구글을지탱하는기술구글을지탱하는기술
구글을지탱하는기술
 

구글을지탱하는기술

  • 1.
  • 3. 구글을 지탱하는 기술 – chapter1.ppt
  • 4. 1. First Appearance of Google 2. Main Concepts 3. Search Engine Structure - ‘s Roll - Back-end Structure - Index Structure 4. Total Structure
  • 5. First Appearance of Google • Why? Get useful results • Who? Sergey Brin & Larry Page
  • 6. Main Concepts Hardware expands Ranking Function – Page Rank – Anchor Text – Word
  • 7. Search Engine Structure Internet Search Engine
  • 8. Search Engine Structure Search Server’s Roll • 통신 관리 Back- Search Index Server end • 요청 해석하여 처리할 내용 판단 • 인덱스에서 필요한 정보 찾아냄 • 결과를 편집해 이용자에게 보냄
  • 9. Search Engine Structure Back-end’s Roll • Crawling •Web page 수집해 오는 기술 Back- Search Index Server end •많은 시간 -> 복수의 crawler 사용 •수집한 것을 Repository에 보관 • Creating Index •Repository에 저장된 web page 로 Index를 만들어 냄 •구조분석, 단어처리, 링크 처리 랭킹 등
  • 10. Search Engine Structure Index’s Roll • 주어진 Data를 안전하게 저장 Back- Search Index Server end • 요청 받은 Data를 찾아냄 • Search Engine의 Data Base 역 할
  • 11. Search Engine Structure Back-end Structure Crawling Web page 수집해 오는 기술 초기 Google 2400만개 Web Page 등록 초당 avg40page를 유지하기 위해선 동시에 수백 개의 download유지 -> 현재는?? 구글 검색했을 때 3,070,000,000개 결과
  • 12. Search Engine Structure Back-end Structure URL server crawler Crawler crawler URL server 가 전체 crawler 지휘 각 crawler는 지시에 따라 crawler Internet Web Page download Repository에 임시 저장 • docID – 고유 숫자 값 Repository • url – URL • text – 압축물 • etc. – date, page length…
  • 13. Search Engine Structure Back-end Structure URL server crawler Crawler crawler 주소해석이 시간 많이 소요 -> 내부에 DNS cache 관리 crawler Internet Repository에 저장후 URL server가 다음주소 할당 Repository
  • 14. Search Engine Structure Back-end Structure docID Sejong.ac.k url r <html> 1 <head> Creating Index <title>세종대학교</title> </body> <h1>학사정보<h1> 세종대학교 Title …. 기타 … Analyzing Web Page structures DocIndex – Web Page의 기본정보 저장 – docID를 key로 사용 DocIndex URLlist URLlist – url을 key로 사용 docID url title etc. url docID – docID를 가져오기 위함
  • 15. Search Engine Lexicon Structure word wordID Back-end Structure 세종 101 Barrels 대학교 102 학사 201 Creating Index 정보 202 Barrels docID wordID#1 Position#1 Size#1 Etc.#1 Word Index Position#2 Size#2 Etc.#2 Lexicon wordID#2 Position#1 Size#1 Etc.#1 – word -> wordID Position#2 Size#2 Etc.#2 … Barrels – docID wordID position size etc. Inverted Index – wordID를 Key로 사용
  • 16. Search Engine Structure Back-end Structure docID Sejong.ac.k docID 3 Creating Index url r url Cyworld.com 1 Link Link Index URLlist URLlist Links Links Sejong.ac.kr 1 1 3 Cyworld.com 3 Anchortext - A information of linked page
  • 17. Search Engine Structure Back-end Structure Creating Index Ranking Index Page Rank - Link Web Page 사이의 link를 일종의 투표처럼 분석 -> 더 많은 link를 받은 문서 = 더 좋은 문서 Anchortext Word - Barrels
  • 18. Search Engine Structure DocIndex Index Structure Lexicon DocIndex – Web Page의 기본정보 저장 – docID를 key로 사용 Lexicon – word -> wordID Barrels Barrels – storages
  • 19. Total Structure User Index Back-end Internet crawler DocIndex Search Server crawler Lexicon crawler Structure URL server word Barrels Barrels Barrels Repository Link URLlist Ranking Links
  • 20. Thanks for your attention