Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

The Death of the Keyword: In Search of NLP -- Presented at Ungagged London Apr 2019.

209 visualizaciones

Publicado el

This was my talk at Ungagged London on understanding the history of Google's "The Knowledge Graph" and how it is being used as the first step towards scaling natural language understanding and machine learning as we move past the simple keyword and into the time of queries, questions, and voice search.

Publicado en: Tecnología
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

The Death of the Keyword: In Search of NLP -- Presented at Ungagged London Apr 2019.

  1. 1. #GetUngagged @schachin Kristine Schachinger The Death of the Keyword: In Search of NLP Kristine@SitesWithoutWalls.com
  2. 2. #GetUngagged @schachin Kristine Schachinger In the beginning, there was a … Large-Scale Hypertextual Web Search Engine
  3. 3. #GetUngagged @schachin Kristine Schachinger What?
  4. 4. #GetUngagged @schachin Kristine Schachinger Link Profiles http://infolab.stanford.edu/pub/papers/google.pdf
  5. 5. #GetUngagged @schachin Kristine Schachinger The Web 1998
  6. 6. #GetUngagged @schachin Kristine Schachinger The Web 1998
  7. 7. #GetUngagged @schachin Kristine Schachinger Google Goes To Work http://infolab.stanford.edu/pub/papers/google.pdf
  8. 8. #GetUngagged @schachin Kristine Schachinger Today in 2019 … Roughly half of the world's population or 3.8 billion people use the internet every day.
  9. 9. #GetUngagged @schachin Kristine Schachinger Google processes TRILLIONS of queries a year & has indexed BILLIONS of Websites.
  10. 10. #GetUngagged @schachin Kristine Schachinger Yesterday, THERE WERE OVER 438,662,584 Google searches.http://www.internetlivestats.com/google-search-statistics/
  11. 11. #GetUngagged @schachin Kristine Schachinger
  12. 12. #GetUngagged @schachin Kristine Schachinger Dealing With The Data.
  13. 13. #GetUngagged @schachin Kristine Schachinger Google Search was founded on unstructured data.
  14. 14. #GetUngagged @schachin Kristine Schachinger Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. https://www.google.co.uk/search?q=definition+unstructured+data&oq=definition+unstructured+data&aqs=chrome..69i57j0l5.5175j0j7&sourceid=chrome&ie=UTF-8
  15. 15. #GetUngagged @schachin Kristine Schachinger Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. https://www.google.co.uk/search?q=definition+unstructured+data&oq=definition+unstructured+data&aqs=chrome..69i57j0l5.5175j0j7&sourceid=chrome&ie=UTF-8 This is known as the “Bag of Words” approach.
  16. 16. #GetUngagged @schachin Kristine Schachinger Otherwise Known as Keywords.
  17. 17. #GetUngagged @schachin Kristine Schachinger Unstructured Data uses keywords. https://moz.com/blog/7-advanced-seo-concepts
  18. 18. #GetUngagged @schachin Kristine Schachinger TF-IDF Term Frequency Inverse Document Frequency ie the frequency of keywords compared to other documents in the set https://moz.com/blog/7-advanced-seo-concepts
  19. 19. #GetUngagged @schachin Kristine Schachinger But as queries number in the trillions unstructured data becomes inefficient. Data needed structure.
  20. 20. #GetUngagged @schachin Kristine Schachinger https://twitter.com/manishrjain https://blog.dgraph.io/post/why-google-needed-graph-serving-system/
  21. 21. #GetUngagged @schachin Kristine Schachinger https://blog.dgraph.io/post/why-google-needed-graph-serving-system/
  22. 22. #GetUngagged @schachin Kristine Schachinger Semantics: Keywords to Queries.
  23. 23. #GetUngagged @schachin Kristine Schachinger So Google moved from Relational Databases to Knowledge Graphs. Knowledge Graphs
  24. 24. #GetUngagged @schachin Kristine Schachinger NOTE Knowledge Graphs DO NOT EQUAL THE KNOWLEDGE GRAPH Knowledge Graphs
  25. 25. #GetUngagged @schachin Kristine Schachinger “Graph-based knowledge representation has been researched for decades and the term knowledge graph does not constitute a new technology. Rather, it is a buzzword reinvented by Google and adopted by other companies and academia to describe different knowledge representation applications.” Knowledge Graphs http://ceur-ws.org/Vol-1695/paper4.pdf
  26. 26. #GetUngagged @schachin Kristine Schachinger Enter Semantic Search https://web.archive.org/web/20090516213508/http://blog.searchenginewatch.com/090512-201139
  27. 27. #GetUngagged @schachin Kristine Schachinger https://web.archive.org/web/20090516213508/http://blog.searchenginewatch.com/090512-201139 What is Semantic Search?
  28. 28. #GetUngagged @schachin Kristine Schachinger Semantic Search = Understanding Intent
  29. 29. #GetUngagged @schachin Kristine Schachinger Why?
  30. 30. #GetUngagged @schachin Kristine Schachinger The Holy Grail of Search? NLP (Natural Language Processing)
  31. 31. #GetUngagged @schachin Kristine Schachinger Welcome THE Knowledge Graph 2012.
  32. 32. #GetUngagged @schachin Kristine Schachinger Knowledge Graphs are based on known relationships. THE Knowledge Graph is Google’s graph database. THE Knowledge Graph
  33. 33. #GetUngagged @schachin Kristine Schachinger https://searchengineland.com/up-close-google-squared-19313 Before the Knowledge Graph – Wonder Wheel
  34. 34. #GetUngagged @schachin Kristine Schachinger (Knowledge Graphs) ”…quite possibly ... one of Google's significant achievements” Nathania Johnson of Search Engine Watch https://web.archive.org/web/20090516213508/http://blog.searchenginewatch.com/090512-201139 Knowledge Graphs
  35. 35. #GetUngagged @schachin Kristine Schachinger The Knowledge Graph (Google) is seeded by things known. Instead of just text without meaning, The KG is a relational graph with known objects and mapped relationships. THE Knowledge Graph
  36. 36. #GetUngagged @schachin Kristine Schachinger Why the Knowledge Graph? To help better match user intent. To understand what users want. THE Knowledge Graph
  37. 37. #GetUngagged @schachin Kristine Schachinger Why? Google doesn’t truly process Natural Language (NLP), but it does use Natural Language Understanding (NLU) The Knowledge Graph was the first step towards language understanding.
  38. 38. #GetUngagged @schachin Kristine Schachinger NLU is a subset of NLP.
  39. 39. #GetUngagged @schachin Kristine Schachinger Enter Machine Learning
  40. 40. #GetUngagged @schachin Kristine Schachinger https://www.tensorflow.org/guide/ MACHINE LEARNING
  41. 41. #GetUngagged @schachin Kristine Schachinger The Knowledge Graph enables you to search for things, people or places that Google knows about—landmarks, celebrities, cities, sports teams, buildings, geographical features, movies, celestial objects, works of art and more—and instantly get information that’s relevant to your query THE Knowledge Graph
  42. 42. #GetUngagged @schachin Kristine Schachinger In other words NOUNS THE Knowledge Graph
  43. 43. #GetUngagged @schachin Kristine Schachinger NOUNS=ENTITIES THE Knowledge Graph
  44. 44. #GetUngagged @schachin Kristine Schachinger Google moves to ENTITY SEARCH THE Knowledge Graph
  45. 45. #GetUngagged @schachin Kristine Schachinger Knowledge Graph entities The Knowledge Graph has millions of entries that describe real-world entities like people, places, and things. These entities form the nodes of the graph. The following are some of the types of entities found in the Knowledge Graph: Book BookSeries EducationalOrganization Event GovernmentOrganization LocalBusiness Movie MovieSeries MusicAlbum MusicGroup MusicRecording Organization Periodical Person Place SportsTeam TVEpisode TVSeries VideoGame VideoGameSeries WebSite THE Knowledge Graph ENTITIES
  46. 46. #GetUngagged @schachin Kristine Schachinger Entities (nouns) + Relationships (nodes) = THE Knowledge Graph THE Knowledge Graph
  47. 47. #GetUngagged @schachin Kristine Schachinger Knowledge Graph = the Answer Engine THE Knowledge Graph
  48. 48. #GetUngagged @schachin Kristine Schachinger Knowledge Graph = the Answer Engine THE Knowledge Graph
  49. 49. #GetUngagged @schachin Kristine Schachinger Google as an Answer Engine https://www.google.com/search/howsearchworks/responses/#?modal_active=none
  50. 50. #GetUngagged @schachin Kristine Schachinger THE Knowledge Graph Seeds.
  51. 51. #GetUngagged @schachin Kristine Schachinger "Four years ago this July, Google acquired Metaweb, bringing Freebase and linked open data to Google," he wrote. Google software engineer Barak Michener http://www.eweek.com/database/google-releases-cayley-open-source-graph-database THE Knowledge Graph Seeds
  52. 52. #GetUngagged @schachin Kristine Schachinger Also includes trusted sources such as the CIA Fact Book, Wikipedia, Wikidata etc. http://www.eweek.com/database/google-releases-cayley-open-source-graph-database THE Knowledge Graph Seeds
  53. 53. #GetUngagged @schachin Kristine Schachinger Hummingbird  “Strings to Things”. The name was derived from the speed and accuracy of the hummingbird.
  54. 54. #GetUngagged @schachin Kristine Schachinger Hummingbird Arrives 2013 Google moves from matching keyword terms to Google trying to process Natural Language Queries using the Knowledge Graph. “Strings to Things"
  55. 55. #GetUngagged @schachin Kristine Schachinger But Google doesn’t process Natural Language very well. “Strings to Things"
  56. 56. #GetUngagged @schachin Kristine Schachinger Hummingbird was the first step in Google moving search from strings (unstructured data) or the “bag of words” approach  to “things” (structured data) “Strings to Things"
  57. 57. #GetUngagged @schachin Kristine Schachinger “Things” are known objects with known (or learned) relationships. “Strings to Things"
  58. 58. #GetUngagged @schachin Kristine Schachinger Things Need Context. Moving to Vector Relationships.
  59. 59. #GetUngagged @schachin Kristine Schachinger KEY FACTOR word2vec: Vector space models (VSMs) represent (embed) words in a continuous vector space where semantically similar words are mapped to nearby points ('are embedded nearby each other'). Hummingbird https://www.tensorflow.org/tutorials/representation/word2vec
  60. 60. #GetUngagged @schachin Kristine Schachinger Embedded Word Model Hummingbird https://www.tensorflow.org/tutorials/representation/word2vec
  61. 61. #GetUngagged @schachin Kristine Schachinger “…words that appear in the same contexts share semantic meaning. The different approaches that leverage this principle can be divided into two categories: count-based methods (e.g. Latent Semantic Analysis), and predictive methods (e.g. neural probabilistic language models).” Hummingbird https://www.tensorflow.org/tutorials/representation/word2vec
  62. 62. #GetUngagged @schachin Kristine Schachinger Hummingbird
  63. 63. #GetUngagged @schachin Kristine Schachinger Hummingbird added a semantic layer to the search algorithms. “Strings to Things"
  64. 64. #GetUngagged @schachin Kristine Schachinger Not Just Keywords Anymore … Semantic Interpretations.
  65. 65. #GetUngagged @schachin Kristine Schachinger Hummingbird adds a semantic layer to the search algorithms like synonyms and close variants. https://moz.com/blog/7-advanced-seo-concepts
  66. 66. #GetUngagged @schachin Kristine Schachinger Hummingbird adds a semantic layer to the search algorithms that uses “semantic distance and term relationships”. https://moz.com/blog/7-advanced-seo-concepts
  67. 67. #GetUngagged @schachin Kristine Schachinger Hummingbird adds a semantic layer to the search algorithms that uses “phrase based Indexing and co- occurrence.” https://moz.com/blog/7-advanced-seo-concepts
  68. 68. #GetUngagged @schachin Kristine Schachinger Page Segmentation. This part of the algorithm determines meaning through placement. https://moz.com/blog/7-advanced-seo-concepts
  69. 69. #GetUngagged @schachin Kristine Schachinger Entity Salience. This part of the algorithm determines meaning through known relationships. https://moz.com/blog/7-advanced-seo-concepts
  70. 70. #GetUngagged @schachin Kristine Schachinger Entity Salience. This part of the algorithm determines meaning through known relationships. + 2018-19 Google adds the “topic layer” to the knowledge graph (categorical classification) https://moz.com/blog/7-advanced-seo-concepts
  71. 71. #GetUngagged @schachin Kristine Schachinger So Hummingbird moves from strict word count based modeling (ie keyword counts) to probabilistic modeling (ie predictive interpretation) via known word vectors+nodes (relationships). Hummingbird
  72. 72. #GetUngagged @schachin Kristine Schachinger What does this look like mathematically?
  73. 73. #GetUngagged @schachin Kristine Schachinger BUT ….. Google Search still doesn’t process Natural Language in the true sense.
  74. 74. #GetUngagged @schachin Kristine Schachinger This means we must add an “interpreter”.
  75. 75. #GetUngagged @schachin Kristine Schachinger Enter Structured Data and Schema.
  76. 76. #GetUngagged @schachin Kristine Schachinger What is Structured Data? https://developers.google.com/search/docs/guides/intro-structured-data
  77. 77. #GetUngagged @schachin Kristine Schachinger What is Structured Data? Structured data for SEO purposes is on-page markup that enables search engines to better understand the information currently on your site’s web pages, and then use this information to improve search results listing by better matching user intent.
  78. 78. #GetUngagged @schachin Kristine Schachinger What is Structured Data? This structured data is defined by using schema to act as the interpreter. This is the definition we add to the page using schema code. Google allows 3 types. • RDFa • Microdata • JSON-LD PREFERS
  79. 79. #GetUngagged @schachin Kristine Schachinger Schema JSON-LD is the recommended schema code. JSON-LD stands for JavaScript Object Notation for Linked Data This is just a way to implement schema outside the HTML mark-up structure. RDFa and Microformats required the code to be implemented via HTML.
  80. 80. #GetUngagged @schachin Kristine Schachinger Schema Benefit is it can be removed from the HTML structure, which makes it easier to write, implement, and maintain. Resources. For a good breakdown on what JSON is at the code level. Portent’s JSON Implementation Guide is very helpful. https://www.portent.com/blog/seo/json-ld-implementation-guide.htm And Google has a section in the Developer Guides https://developers.google.com/search/docs/guides/intro-structured-data
  81. 81. #GetUngagged @schachin Kristine Schachinger JSON-LD Schema
  82. 82. #GetUngagged @schachin Kristine Schachinger Schema IMPORTANT! Test your JSON-LD. Use the Google Structured Mark-Up Helper. https://search.google.com/structured-data/testing-tool
  83. 83. #GetUngagged @schachin Kristine Schachinger Schema NOTE this tool only tells you if it is semantically correct, NOT if you are using the proper schema. Make sure to check with Google’s Guides on schema implementation. Improper use or implementation can result in a manual action. • https://developers.google.com/search/docs/guides/intro-structured-data • https://developers.google.com/search/docs/guides/prototype
  84. 84. #GetUngagged @schachin Kristine Schachinger Schema IMPORTANT! Your JSON content MUST match what is in the page exactly. If they differ, you will likely get a manual action as Google sees this as cloaking.
  85. 85. #GetUngagged @schachin Kristine Schachinger Why Does Schema Matter?
  86. 86. #GetUngagged @schachin Kristine Schachinger We can act as the interpreter and help “teach” Google what our site is about.
  87. 87. #GetUngagged @schachin Kristine Schachinger Adding semantic mark-up (structured data via schema) allows us to tell Google what WE SAY our site is about and WHAT RELATIONSHIPS we define within it.
  88. 88. #GetUngagged @schachin Kristine Schachinger We can act as the interpreter and help “teach” Google the context of our content.
  89. 89. #GetUngagged @schachin Kristine Schachinger
  90. 90. #GetUngagged @schachin Kristine Schachinger We can help give Google a clearer understanding. That helps us help Google better answer the questions users ask and to better surface our content for those users We give our data meaning Google Understands
  91. 91. #GetUngagged @schachin Kristine Schachinger Ranking Without Links
  92. 92. #GetUngagged @schachin Kristine Schachinger Rank Brain
  93. 93. #GetUngagged @schachin Kristine Schachinger
  94. 94. #GetUngagged @schachin Kristine Schachinger Rank Brain is used for Unknown Queries where entity meanings/relationships are unclear or unknown.
  95. 95. #GetUngagged @schachin Kristine Schachinger Rank Brain
  96. 96. #GetUngagged @schachin Kristine Schachinger Rank Brain. One of two algorithms that uses AI on the live results Rank Brain.
  97. 97. #GetUngagged @schachin Kristine Schachinger Presence of Rank Brain means Google is confused …
  98. 98. #GetUngagged @schachin Kristine Schachinger Rank Brain
  99. 99. #GetUngagged @schachin Kristine Schachinger Why? Google does not use true NLP in Search (Natural Language Processing) Rank Brain.
  100. 100. #GetUngagged @schachin Kristine Schachinger • Words go in. • Words get assigned a mathematical address in a vector. • Similar and related words sit close to each other in the vector space. • Words are retrieved based on your query and the words it locates in the “best fit” vector. • These word “interpretations” are used to return results. • If the relationships are weak or unknown, enter Rank Brain. • Behind the scenes, data is continually fed into the machine learning process, so as to make those results more relevant the next time. Rank Brain – Known Relationships.
  101. 101. #GetUngagged @schachin Kristine Schachinger Rank Brain Also Uses Users Queries & Clicks to Help It Understand Query Intent.
  102. 102. #GetUngagged @schachin Kristine Schachinger Remember this result?
  103. 103. #GetUngagged @schachin Kristine Schachinger A year later … (yellow = poor intent match)
  104. 104. #GetUngagged @schachin Kristine Schachinger Google + Rank Brain Also Uses Users Queries & Clicks to Help It Understand Query Intent. + GEO LOCATION
  105. 105. #GetUngagged @schachin Kristine Schachinger A year later in London …
  106. 106. #GetUngagged @schachin Kristine Schachinger Why location? Semantic Relevancy
  107. 107. #GetUngagged @schachin Kristine Schachinger Remember this result?
  108. 108. #GetUngagged @schachin Kristine Schachinger A year later … (yellow = poor intent match)
  109. 109. #GetUngagged @schachin Kristine Schachinger Why location? Semantic Relevancy Sweets has no definitive entity in the US.
  110. 110. #GetUngagged @schachin Kristine Schachinger Neural Matching Rank Brain has a friend now.
  111. 111. #GetUngagged @schachin Kristine Schachinger
  112. 112. #GetUngagged @schachin Kristine Schachinger Rank Brain. What is the difference between Rank Brain and Neural Matching? Rank Brain.
  113. 113. #GetUngagged @schachin Kristine Schachinger Rank Brain vs Neural Matching. Rank Brain = concepts Neural Matching = linking words to concepts “…neural matching, – AI method to better connect words to concepts.” - Google
  114. 114. #GetUngagged @schachin Kristine Schachinger Rank Brain vs Neural Matching. A Google patent related to Rank Brain and Neural Matching describes a system that uses traditional ranking factors to decide what is relevant, but NOT what is in the top 10. Which may be re-ordered post retrieval according to “ad hoc retrieval” methods and ”dynamic relevancy” https://www.searchenginejournal.com/google-neural-matching/271125/
  115. 115. #GetUngagged @schachin Kristine Schachinger AND IT DOES NOT USE LINKS https://www.oncrawl.com/technical-seo/neural-matching-seo-content-creation-rules/
  116. 116. #GetUngagged @schachin Kristine Schachinger So should you optimize for Rank Brain and Neural Matching? Rank Brain and Neural Matching.
  117. 117. #GetUngagged @schachin Kristine Schachinger Why would you optimize to rank with AI? AI Is ever-changing and unfixed. Don’t waste the time and resources. Rank Brain.
  118. 118. #GetUngagged @schachin Kristine Schachinger Google Does Not Even Understand What Rank Brain (and neural matching) is Actually Doing. Rank Brain and Neural Matching.
  119. 119. #GetUngagged @schachin Kristine Schachinger (Gary Illyes) Rank Brain.
  120. 120. #GetUngagged @schachin Kristine Schachinger
  121. 121. #GetUngagged @schachin Kristine Schachinger Simple answer to a very complex issue? Do your normal query research then just write in natural and conversational language. Create holistic content.
  122. 122. #GetUngagged @schachin Kristine Schachinger Holistic Content?
  123. 123. #GetUngagged @schachin Kristine Schachinger Write holistic content. Use terms that are semantically related. For a detailed explanation Google explains here > https://www.youtube.com/watch?v=vzoe2G5g-w4&feature=youtu.be&t=32m19s
  124. 124. #GetUngagged @schachin Kristine Schachinger Write holistic content. DOES YOUR CONTENT HAVE DEPTH AND WIDTH? For a detailed explanation Google explains here > https://www.youtube.com/watch?v=vzoe2G5g-w4&feature=youtu.be&t=32m19s
  125. 125. #GetUngagged @schachin Kristine Schachinger And Use Well Formed Text.
  126. 126. #GetUngagged @schachin Kristine Schachinger Well Formed Text. http://www.kurzweilai.net/google-open-sources-natural-language-understanding-tools Ray Kurzweil on Google NLU
  127. 127. #GetUngagged @schachin Kristine Schachinger Questions = Well Formed Text https://ai.google/research/pubs/pub47323
  128. 128. #GetUngagged @schachin Kristine Schachinger THINK in Intent, Query Terms, & Context.
  129. 129. #GetUngagged @schachin Kristine Schachinger And Questions!
  130. 130. #GetUngagged @schachin Kristine Schachinger Takeaways. • Think Search Queries NOT Simple Keywords • Write in natural, conversational language • Write using holistic content • Focus on depth and breadth with related terms • Add Structured Data • Use well formed text (ie questions) when you can. Takeaways.
  131. 131. #GetUngagged @schachin Kristine Schachinger Google is an Answer Engine.
  132. 132. #GetUngagged @schachin Kristine Schachinger https://cloud.google.com/natural-language/
  133. 133. #GetUngagged @schachin Kristine Schachinger
  134. 134. #GetUngagged @schachin Kristine Schachinger Google Natural Cloud Tool
  135. 135. #GetUngagged @schachin Kristine Schachinger Google Natural Cloud Tool
  136. 136. #GetUngagged @schachin Kristine Schachinger BONUS.
  137. 137. #GetUngagged @schachin Kristine Schachinger https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/#0
  138. 138. #GetUngagged @schachin Kristine Schachinger
  139. 139. #GetUngagged @schachin Kristine Schachinger The Death of the Keyword: In Search of NLP Kristine@SitesWithoutWalls.com

×