Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

BigML Education - Topic Models

228 visualizaciones

Publicado el

Learn how to process natural language using Topic Models to automatically discover relevant relationships.

Publicado en: Datos y análisis
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

BigML Education - Topic Models

  1. 1. BigML Education Topic Models July 2017
  2. 2. BigML Education Program 2Ensembles In This Video • Introduction to Topic Models • Exploration of Topic Models in the BigML Interface • Inference of topic distributions using a trained topic model • Parameterization of topic models
  3. 3. BigML Education Program 3Ensembles Data For Topic Models • Unstructured text data • Short stories, novels, newspaper articles • Web pages • Customer reviews or surveys • E-mail Messages • Data is not like most machine learning data • Often no fields in each row (i.e., no “columns”) • Each instance is just the text of the document
  4. 4. BigML Education Program 4Ensembles Categorizing Instances • Often, many instances will have words indicating they are about the same thing (the same topic) • It may be useful to identify instances corresponding to a certain topic • Topic modeling automatically discovers common topics in the data • Can assign a score to each instance indicating how much that instance is “about” a given topic
  5. 5. BigML Education Program 5Ensembles Generative Modeling • Decision trees / Logistic regression are discriminative models • Aggressively model the classification boundary • Parsimonious: Don’t consider anything you don’t have to • Topic models are generative models • Posit a theory of how the data was generated • Tweak the theory to fit the data
  6. 6. BigML Education Program 6Ensembles Title Text Be not afraid of greatness: some are born great, some achieve greatness, and some have greatness thrust upon 'em. DocumentTerm
  7. 7. BigML Education Program 7Ensembles Topics cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… shoe asteroid flashlight pizza… plate giraffe purple jump… Be not afraid of greatness: some are born great, some achieve greatness… term probability shoe ϵ asteroid ϵ flashlight ϵ pizza ϵ … ϵ • A topic is a term generator • Invoke it a bunch of times to get a document • Most will be nonsense, but eventually you’ll generate your dataset
  8. 8. BigML Education Program 8Ensembles Topic Models word probability travel 23,55 % airplane 2,33 % mars 0,003 % mantle ϵ … ϵTopic: travel cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… Topic: space cat shoe zebra ball tree jump pen asteroid cable box step cabinet yellow plate flashlight… airplane passport pizza … mars quasar lightyear soda word probability space 38,94 % airplane ϵ mars 13,43 % mantle 0,05 % … ϵ Generate Document
  9. 9. BigML Education Program 9Ensembles Review • Topic models are generative models for unstructured text data • The BigML interface provides an intuitive way to explore your topic model • You can get the topic distribution for an instance by using the “topic distribution” or “batch topic distribution” options in the model resource view • Changing the “number of topics” and specifying “excluded terms” may give you a much different and possibly better topic model

×