SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Tha	
  Anatomy	
  of	
  a	
  Large-­‐Scale	
  
Social	
  Search	
  Engine,	
  www2010	
                               	
  
•  Damon	
  Horowitz,	
  Sepandar	
  D.	
  Kamvar	
  
•  The	
  Anatomy	
  of	
  a	
  Large-­‐Scale	
  Social	
  Search	
  
   Engine	
  
•  WWW	
  2010	
  

•  Aardvark	
             QA                              	
  
•  web                                            	
  
•  QA              	
  
•                         	
  
• 
                                 	
  
• 
            	
  

•  Google
•        	
  Aardvark	
                                 •           :	
  Google	
  
•                                  	
                   •                                  	
  
•                           	
                          • 
•                                         	
                 	
  
•                                                	
     •                                         	
  
                                                        •                                                	

                                                          	
  
“Do	
  you	
  have	
  any	
  good	
  babysiLer	
  recommendaMons	
  in	
  Palo	
  
Alto	
  for	
  my	
  6-­‐year-­‐old	
  twins?	
  I’m	
  looking	
  for	
  somebody	
  that	
  
won’t	
  let	
  them	
  watch	
  TV.”
•  Crawler	
  and	
  Indexer	
  
     –                                                	
  
•  Query	
  Analyzer	
  
     –               	
  
•  Ranking	
  FuncMon	
  
     –                             	
  
•  UI	
  
     –                                         UI
s(ui ,u j ,q) = p(ui | u j ) • p(ui | q)
                = p(ui | u j )∑ p(ui | t) p(t | q)
                                  t∈T


• p(ui|uj):	
  quality	
  score	
  
• p(ui|q):	
  relevance	
  score	
  
•                                    	
  

u:             q:            t:             	
  
P(ui|t)                                        	
•                  	
                                          p(t | ui ) p(ui )
                                                   p(ui | t) =
•                                           	
                       p(t)
•                                    	
            s(t | ui ) = p(t | ui ) + γ ∑u∈U p(t | u)
     • facebook    	
  
• blog      	
                                     ∑ p(t | u ) = 1
                                                              i
•                  /twiLer	
                       t∈T


                                     €


                                 €
•                         	
  
     •                                                                        	
  
     • 
P(ui|uj)                    	
• 
                    	
  
     –           	
  
     –                                          	
  
     –                                   	
  
     –    	
  
     –                         	
  
     –                            	
  
     –    	
  
     – 
P(t|q)                       :	
     	
•  Non	
  QuesMon	
  Classifier	
  
   –                       	
  
•  Inappropriate	
  QuesMon	
  Classifier	
  
   –                	
  
•  Trivial	
  QuesMon	
  Classifier	
  
   –                                                  	
  
•  LocaMon	
  SensiMve	
  Classifier	
  
   – 
P(t|q)                        :	
                    	
•                          	
  
     –  Keyword	
  Match	
  Topic	
  Mapper	
  
         •                                       	
  
     –  Taxonomy	
  Topic	
  Mapper	
  
         •  SVM 3000                             	
  
     –  Salient	
  Term	
  Topic	
  Mapper	
  
         •  d-­‐idf                                     	
  
     –  User	
  Tag	
  Topic	
  Mapper	
  
         • 
•                                                  	
  
     –  Topic	
  ExperMse:	
  p(ui|q)	
  
     –  Connectedness:	
  p(ui|uj)	
  
     –  Availability:	
                                   	
  
•                  	
  
     – 
                                            	
  
•                        	
  
     –  Google PC               	
  
•  Mobile	
  Google   Aardvark
      	
  
     –  Google                         Aardvark
• 
             	
  
•                        	
  




                                  	
                                        	
Aardvark	
                             18.6	
  words	
                 98.1%	
                    	
          2.2	
   	
  2.9	
  words	
        57	
   	
  63%
•                   	
  
     –  fact
•  57.2% 10                 	
  
     –  facebook 15.7% 15          	
  
•             6 37
•  87.7%                	
  
•      2.08
•  97.7%       3               	
  
•  174,605         	
  
•      1,199,323
•  Google            	
  
     –  200     Aardvark                 	
  
     –  Aardvark                         google
                                     5                                	
  
     –  10                                                     	
  

                             	
                 	
                                  	

Aardvark	
                        5 	
               71.5%	
                 3.93	
  ±	
  1.23	

Google	
                          2 	
               70.5%	
                 3.07	
  ±	
  1.46
•                                          	
  
     –                              	
  
• 
                             	
  
• 
                      	
  
•              	
  
• 
•  “       ”       Aardvark   	
  
•  Aardvark          	
  
•  Aardvark          	
  

•  “           ”
                       	
  
• 

Más contenido relacionado

Más de Jun Harada

決算が読めるようになるゼミ第5回_Slack_原田惇
決算が読めるようになるゼミ第5回_Slack_原田惇決算が読めるようになるゼミ第5回_Slack_原田惇
決算が読めるようになるゼミ第5回_Slack_原田惇Jun Harada
 
mybo concept v1.00
mybo concept v1.00mybo concept v1.00
mybo concept v1.00Jun Harada
 
IoT x オープンイノベーション MERC丸の内院生ラウンジ
IoT x オープンイノベーション MERC丸の内院生ラウンジIoT x オープンイノベーション MERC丸の内院生ラウンジ
IoT x オープンイノベーション MERC丸の内院生ラウンジJun Harada
 
ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例
ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例
ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例Jun Harada
 
(途中案)本当に役立つプログラミング力を鍛える講座
(途中案)本当に役立つプログラミング力を鍛える講座(途中案)本当に役立つプログラミング力を鍛える講座
(途中案)本当に役立つプログラミング力を鍛える講座Jun Harada
 
コミュニケーションロボット開発から拡販までの色々
コミュニケーションロボット開発から拡販までの色々コミュニケーションロボット開発から拡販までの色々
コミュニケーションロボット開発から拡販までの色々Jun Harada
 
ユカイ工学 Qooboのご紹介
ユカイ工学 Qooboのご紹介ユカイ工学 Qooboのご紹介
ユカイ工学 Qooboのご紹介Jun Harada
 
2017-12-06 tsumugu4 人工知能特集
2017-12-06 tsumugu4 人工知能特集2017-12-06 tsumugu4 人工知能特集
2017-12-06 tsumugu4 人工知能特集Jun Harada
 
IoT Business in Japan
IoT Business in JapanIoT Business in Japan
IoT Business in JapanJun Harada
 
東京研修プログラム
東京研修プログラム東京研修プログラム
東京研修プログラムJun Harada
 
20170606 東京システムハウス様 ロボティクス思考塾_1.00
20170606 東京システムハウス様 ロボティクス思考塾_1.0020170606 東京システムハウス様 ロボティクス思考塾_1.00
20170606 東京システムハウス様 ロボティクス思考塾_1.00Jun Harada
 
西大和中学校様むけ、ミエタ社ワークショップ
西大和中学校様むけ、ミエタ社ワークショップ西大和中学校様むけ、ミエタ社ワークショップ
西大和中学校様むけ、ミエタ社ワークショップJun Harada
 
IoT・ロボット製品の実現に向けたアプローチの実例
IoT・ロボット製品の実現に向けたアプローチの実例IoT・ロボット製品の実現に向けたアプローチの実例
IoT・ロボット製品の実現に向けたアプローチの実例Jun Harada
 

Más de Jun Harada (13)

決算が読めるようになるゼミ第5回_Slack_原田惇
決算が読めるようになるゼミ第5回_Slack_原田惇決算が読めるようになるゼミ第5回_Slack_原田惇
決算が読めるようになるゼミ第5回_Slack_原田惇
 
mybo concept v1.00
mybo concept v1.00mybo concept v1.00
mybo concept v1.00
 
IoT x オープンイノベーション MERC丸の内院生ラウンジ
IoT x オープンイノベーション MERC丸の内院生ラウンジIoT x オープンイノベーション MERC丸の内院生ラウンジ
IoT x オープンイノベーション MERC丸の内院生ラウンジ
 
ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例
ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例
ロボット技術が、意外な製品・サービスに変わる - ロボット技術の応用事例
 
(途中案)本当に役立つプログラミング力を鍛える講座
(途中案)本当に役立つプログラミング力を鍛える講座(途中案)本当に役立つプログラミング力を鍛える講座
(途中案)本当に役立つプログラミング力を鍛える講座
 
コミュニケーションロボット開発から拡販までの色々
コミュニケーションロボット開発から拡販までの色々コミュニケーションロボット開発から拡販までの色々
コミュニケーションロボット開発から拡販までの色々
 
ユカイ工学 Qooboのご紹介
ユカイ工学 Qooboのご紹介ユカイ工学 Qooboのご紹介
ユカイ工学 Qooboのご紹介
 
2017-12-06 tsumugu4 人工知能特集
2017-12-06 tsumugu4 人工知能特集2017-12-06 tsumugu4 人工知能特集
2017-12-06 tsumugu4 人工知能特集
 
IoT Business in Japan
IoT Business in JapanIoT Business in Japan
IoT Business in Japan
 
東京研修プログラム
東京研修プログラム東京研修プログラム
東京研修プログラム
 
20170606 東京システムハウス様 ロボティクス思考塾_1.00
20170606 東京システムハウス様 ロボティクス思考塾_1.0020170606 東京システムハウス様 ロボティクス思考塾_1.00
20170606 東京システムハウス様 ロボティクス思考塾_1.00
 
西大和中学校様むけ、ミエタ社ワークショップ
西大和中学校様むけ、ミエタ社ワークショップ西大和中学校様むけ、ミエタ社ワークショップ
西大和中学校様むけ、ミエタ社ワークショップ
 
IoT・ロボット製品の実現に向けたアプローチの実例
IoT・ロボット製品の実現に向けたアプローチの実例IoT・ロボット製品の実現に向けたアプローチの実例
IoT・ロボット製品の実現に向けたアプローチの実例
 

Último

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Lab seminar20100604

  • 1. Tha  Anatomy  of  a  Large-­‐Scale   Social  Search  Engine,  www2010  
  • 2. •  Damon  Horowitz,  Sepandar  D.  Kamvar   •  The  Anatomy  of  a  Large-­‐Scale  Social  Search   Engine   •  WWW  2010   •  Aardvark   QA   •  web  
  • 3. •  QA   •    •    •    •  Google
  • 4. •   Aardvark   •  :  Google   •    •    •    •  •      •    •    •    “Do  you  have  any  good  babysiLer  recommendaMons  in  Palo   Alto  for  my  6-­‐year-­‐old  twins?  I’m  looking  for  somebody  that   won’t  let  them  watch  TV.”
  • 5. •  Crawler  and  Indexer   –    •  Query  Analyzer   –    •  Ranking  FuncMon   –    •  UI   –  UI
  • 6.
  • 7. s(ui ,u j ,q) = p(ui | u j ) • p(ui | q) = p(ui | u j )∑ p(ui | t) p(t | q) t∈T • p(ui|uj):  quality  score   • p(ui|q):  relevance  score   •    u: q: t:  
  • 8. P(ui|t) •    p(t | ui ) p(ui ) p(ui | t) = •    p(t) •    s(t | ui ) = p(t | ui ) + γ ∑u∈U p(t | u) • facebook   • blog   ∑ p(t | u ) = 1 i •  /twiLer   t∈T € € •    •    • 
  • 9. P(ui|uj) •    –    –    –    –    –    –    –    – 
  • 10. P(t|q) :   •  Non  QuesMon  Classifier   –    •  Inappropriate  QuesMon  Classifier   –    •  Trivial  QuesMon  Classifier   –    •  LocaMon  SensiMve  Classifier   – 
  • 11. P(t|q) :   •    –  Keyword  Match  Topic  Mapper   •    –  Taxonomy  Topic  Mapper   •  SVM 3000   –  Salient  Term  Topic  Mapper   •  d-­‐idf   –  User  Tag  Topic  Mapper   • 
  • 12. •    –  Topic  ExperMse:  p(ui|q)   –  Connectedness:  p(ui|uj)   –  Availability:     •    –   
  • 13.
  • 14.
  • 15.
  • 16. •    –  Google PC   •  Mobile  Google Aardvark   –  Google Aardvark
  • 17. •    •    Aardvark 18.6  words 98.1% 2.2    2.9  words 57    63%
  • 18. •    –  fact
  • 19. •  57.2% 10   –  facebook 15.7% 15   •  6 37
  • 20. •  87.7%   •  2.08
  • 21. •  97.7% 3   •  174,605   •  1,199,323
  • 22. •  Google   –  200 Aardvark   –  Aardvark google 5   –  10   Aardvark 5 71.5% 3.93  ±  1.23 Google 2 70.5% 3.07  ±  1.46
  • 23. •    –    •    •    •    • 
  • 24. •  “ ” Aardvark   •  Aardvark   •  Aardvark   •  “ ”   •