SlideShare a Scribd company logo
1 of 58
Download to read offline
Supervised and Relational Topic Models

                  David M. Blei

            Department of Computer Science
                 Princeton University

                October 5, 2009

Joint work with Jonathan Chang and Jon McAuliffe
Topic modeling

  • Large electronic archives of document collections require new
    statistical tools for analyzing text.

  • Topic models have emerged as a powerful technique for
    unsupervised analysis of large document collections.

  • Topic models posit latent topics in text using hidden random
    variables, and uncover that structure with posterior inference.

  • Useful for tasks like browsing, search, information retrieval, etc.
Examples of topic modeling

  contractual   employment         female        markets     criminal
 expectation      industrial        men         earnings    discretion
      gain           local         women        investors     justice
   promises           jobs           see           sec          civil
 expectations    employees         sexual       research     process
    breach         relations        note        structure     federal
   enforcing         unfair       employer      managers        see
     supra       agreement     discrimination      firm         officer
      note       economic       harassment         risk       parole
    perform          case          gender         large      inmates
Examples of topic modeling
                                                                                            quantum                          task
                         Quantum lower bounds by polynomials                                                              competitive
                         On the power of bounded concurrency I: finite automata             automata                                         approximation
                                                                                               nc                           tasks
                         Dense quantum coding and quantum finite automata                                                                          s
                         Classical physics and the Church--Turing Thesis                   automaton                                             points
                                                                                           languages                                           distance
                                                                                                                       n                                                          routing                  Nearly optimal algorithms and bounds for multilayer channel routing
                                                                                       machine                     functions                                                     adaptive                  How bad is selfish routing?
                                                                                        domain                    polynomial                                 networks             network                  Authoritative sources in a hyperlinked environment
                                                                                                                                                                                 networks                  Balanced sequences and optimal routing
                                                                                        degree                        log                                    protocol            protocols
                                                                                      polynomials                  algorithm                                 network
                                                                                                    statistical                                                                                     constraint
                                                                                                    examples                                                                                      dependencies                  Module algebra
                                                                                                      classes                                                                                           local                   On XML integrity constraints in the presence of DTDs
 An optimal algorithm for intersecting line segments in the plane           graph                                                                                                                                               Closure properties of constraints
 Recontamination does not help to search a graph                            graphs                                                                                                                 consistency                  Dynamic functional dependencies and database aging
 A new approach to the maximum-flow problem                                  edge                                                                                                                    tractable
 The time complexity of maximum matching by simulated annealing            minimum                                                      the,of                             database
                                                                           vertices                                                                                       constraints
                                                                                                                                         a, is                              algebra
                                                                                                                                         and                                boolean                       logic
                                                                     m                                                                                                     relational                    logics
                                                                merging                            n                                                                                                     query
                                                                networks                      algorithm                                                                                                 theories
                                                                 sorting                                                                                                                              languages
                                                               multiplication                    time
                                                                                                                                                                  systems                    knowledge
                                                                                                                                    logic                       performance                  reasoning
                                                                        protocol                                                 programs                         analysis                   verification
                                                                                                                                  systems                        distributed
                                                                                           regular                                   sets                                             networks
                                                                                                                                                                                                                Single-class bounds of multi-class queuing networks
                                                                                             tree                                                                                     queuing
                                                                                                                                                                                                                The maximum concurrent flow problem
                                                                                           search                                                                                    asymptotic                 Contention in shared memory algorithms
                                                                                         compression                                                               database         productform                 Linear probing with a nonuniform address distribution
                                                                                                                                                                 transactions          server
                                     Magic Functions: In Memoriam: Bernard M. Dwork 1923--1998                 proof                                              restrictions
                                                                                                              property                    formulas
                                     A mechanical proof of the Church-Rosser theorem
                                                                                                              program                     firstorder
                                     Timed regular expressions
                                     On the power and limitations of strictness analysis                     resolution                   decision
                                                                                                              abstract                    temporal
Examples of topic modeling

   1880             1890             1900                1910              1920               1930              1940
  electric         electric       apparatus               air           apparatus             tube               air
 machine           power            steam               water              tube            apparatus            tube
  power           company           power            engineering            air               glass          apparatus
  engine           steam           engine             apparatus          pressure              air              glass
  steam           electrical     engineering             room              water            mercury          laboratory
    two           machine           water             laboratory           glass           laboratory          rubber
 machines            two         construction          engineer             gas             pressure          pressure
    iron           system         engineer              made              made               made               small
  battery           motor           room                  gas           laboratory             gas            mercury
   wire            engine            feet                tube            mercury              small              gas

        1950               1960            1970                1980             1990                2000
        tube               tube             air                high          materials            devices
     apparatus            system           heat              power              high              device
        glass          temperature        power              design            power             materials
          air                air          system               heat           current             current
      chamber              heat        temperature           system         applications            gate
     instrument          chamber         chamber            systems         technology              high
        small             power            high             devices           devices               light
     laboratory            high            flow            instruments         design               silicon
      pressure          instrument         tube              control          device              material
       rubber             control         design              large             heat            technology
Examples of topic modeling
                                                                                                                               brain                          stimulus
                                  activated                                                                                   subjects                                           synapses
                          tyrosine phosphorylation                                                                                                             cortical
                                                                                                                                 left                                                ltp
                              phosphorylation                   p53                                                             task            surface                          glutamate
                                   kinase                    cell cycle                 proteins                                                   tip                            synaptic
                                                              activity                   protein
                                                               cyclin                   binding              rna                                 image                            neurons
                                                            regulation                  domain               dna                                sample            materials
                                                                                        domains        rna polymerase                                              organic
                                                                                                                                 problem        device
                                         receptor                                                         cleavage
                    science                                               amino acids
                                        receptors                             cdna
    funding                                                                                                                                                       molecules      physicists
    support          says                  ligand                          sequence                                             problems
                                                                                                                                              laser                               particles
      nih          research               ligands                           isolated                                                         optical                              physics
   program          people                                                   protein                     sequence                              light
                                        apoptosis                                                       sequences         surface
                                                                                                                                            electrons                           experiment
                                                                                                          genome           liquid           quantum
                                                           wild type                                        dna          surfaces                                                                 stars
                                                            mutant                       enzyme         sequencing          fluid
                                                           mutations                    enzymes                            model                       reaction                               astronomers
    united states                                          mutants
                                                                                        active site
                                                                                                                                                      reactions                                 universe
        women                               cells
                                                           mutation                     reduction                                                     molecule                                  galaxies
                                             cell                                                                                                     molecules
                                         expression                                                               magnetic
                                          cell lines                                        plants
                                                                                                                 magnetic field                     transition state
       students                         bone marrow                                          plant
      education                                                                             genes
                                                                                                                                                 pressure                    mantle
                                                      bacteria                                                                                high pressure                   crust                   sun
                                                      bacterial                                                                                 pressures                 upper mantle             solar wind
                                                        host                                                  fossil record                        core                    meteorites                earth
                                                     resistance                         development               birds                         inner core                   ratios                 planets
              mice                                    parasite                            embryos                fossils                                                                             planet
                                                                         gene                                  dinosaurs
            antigen                      virus                                           drosophila                                species
                                                                       disease                                    fossil
             t cells                       hiv                                             genes                                    forest
           antigens                       aids                                           expression                                forests
                                                                       families                                                                                    earthquake                  co2
        immune response                infection                                                                                 populations
                                                                      mutation                                                                                     earthquakes               carbon
                                        viruses                                                                                  ecosystems
                                                                                                                                                                       fault             carbon dioxide
                                                                                                                     ancient                                         images                 methane
                           patients                                                             genetic               found
                           disease                          cells                             population             impact
                                                                                                                                                                       data                   water
                          treatment                       proteins                            populations      million years ago        volcanic                                                           atmospheric
                            drugs                                                             differences             africa
                            clinical                    researchers                                                                     deposits                        climate
                                                                                               variation                                                                                                   stratosphere
                                                          protein                                                                       magma                           ocean
                                                                                                                                        eruption                           ice                            concentrations
                                                           found                                                                       volcanism                       changes
                                                                                                                                                                   climate change
Supervised topic models

  • These applications of topic modeling work in the same way.

  • Fit a model using a likelihood criterion. Then, hope that the
    resulting model is useful for the task at hand.

  • Supervised topic models and relational topic models fit
    topics explicitly to perform prediction.

  • Useful for building topic models that can
      • Predict the rating of a review
      • Predict the category of an image
      • Predict the links emitted from a document

  1   Unsupervised topic models

  2   Supervised topic models

  3   Relational topic models
Probabilistic modeling

  1   Treat data as observations that arise from a generative
      probabilistic process that includes hidden variables
        • For documents, the hidden variables reflect the thematic
          structure of the collection.
  2   Infer the hidden structure using posterior inference
         • What are the topics that describe this collection?
  3   Situate new data into the estimated model.
         • How does this query or new document fit into the estimated
           topic structure?
Intuition behind LDA

       Simple intuition: Documents exhibit multiple topics.
Generative model

                                              Topic proportions and
       Topics             Documents
    gene      0.04
    dna       0.02
    genetic   0.01

    life     0.02
    evolve   0.01
    organism 0.01

    brain     0.04
    neuron    0.02
    nerve     0.01

    data     0.02
    number   0.02
    computer 0.01

  • Each document is a random mixture of corpus-wide topics
  • Each word is drawn from one of those topics
The posterior distribution

                                                  Topic proportions and
     Topics                 Documents

  • In reality, we only observe the documents
  • Our goal is to infer the underlying topic structure
Latent Dirichlet allocation

                  topic assignment

            Per-document       Observed                         Topic
         topic proportions       word             Topics   hyperparameter

        α         θd    Zd,n    Wd,n                βk          η
                                              D            K

            Each piece of the structure is a random variable.
Latent Dirichlet allocation

         α          θd        Zd,n    Wd,n                 βk           η
                                                  D                K

                         βk   ∼ Dir(η) k = 1 . . . K
                         θd   ∼ Dir(α) d = 1 . . . D
                Zd,n | θd     ∼ Mult(1, θd ) d = 1 . . . D,        n = 1...N
   Wd,n | θd , zd,n , β1:K    ∼ Mult(1, βzd,n )   d = 1 . . . D,    n = 1...N
Latent Dirichlet allocation

         α        θd       Zd,n    Wd,n                βk         η
                                                D             K

  1   Draw each topic βk ∼ Dir(η), for k ∈ {1, . . . , K }.
  2   For each document:
        1 Draw topic proportions θd ∼ Dir(α).
        2 For each word:
            1 Draw Zd,n ∼ Mult(θd ).
            2 Draw Wd,n ∼ Mult(βzd,n ).
Latent Dirichlet allocation

       α        θd      Zd,n    Wd,n               βk       η
                                            D           K

  • From a collection of documents, infer
      • Per-word topic assignment zd,n
      • Per-document topic proportions θd
      • Per-corpus topic distributions βk
  • Use posterior expectations to perform the task at hand, e.g.,
    information retrieval, document similarity, etc.
Latent Dirichlet allocation

       α        θd       Zd,n       Wd,n                 βk             η
                                                 D             K

  • Computing the posterior is intractable:

                     p(θ | α)     n=1 p(zn | θ)p(wn | zn , β1:K )
                                N     K
               θ   p(θ | α)     n=1   z=1 p(zn | θ)p(wn | zn , β1:K )

  • Several approximation techniques have been developed.
Latent Dirichlet allocation

       α        θd     Zd,n    Wd,n               βk          η
                                           D             K

  • Mean field variational methods (Blei et al., 2001, 2003)
  • Expectation propagation (Minka and Lafferty, 2002)
  • Collapsed Gibbs sampling (Griffiths and Steyvers, 2002)
  • Collapsed variational inference (Teh et al., 2006)
Example inference


                                        1 8 16 26 36 46 56 66 76 86 96

Example topics

       “Genetics”    “Evolution” “Disease”        “Computers”
           human        evolution     disease       computer
          genome      evolutionary     host           models
             dna         species     bacteria      information
          genetic      organisms     diseases          data
            genes           life    resistance      computers
         sequence         origin     bacterial        system
            gene         biology        new          network
         molecular       groups       strains        systems
        sequencing   phylogenetic     control          model
            map           living    infectious        parallel
       information      diversity     malaria        methods
          genetics        group      parasite        networks
          mapping          new       parasites       software
          project           two       united            new
         sequences      common     tuberculosis    simulations
Used in exploratory tools of document collections
LDA summary

  • LDA is a powerful model for
      • Visualizing the hidden thematic structure in large corpora
      • Generalizing new data to fit into that structure

  • LDA is a mixed membership model (Erosheva, 2004) that builds
    on the work of Deerwester et al. (1990) and Hofmann (1999).

  • For document collections and other grouped data, this might be
    more appropriate than a simple finite mixture.

  • The same model was independently invented for population
    genetics analysis (Pritchard et al., 2000).
LDA summary

  • Modular : It can be embedded in more complicated models.
  • General: The data generating distribution can be changed.
  • Variational inference is fast; lets us to analyze large data sets.

  • See Blei et al., 2003 for details and a quantitative comparison.
    See my web-site for code and other papers.
  • Jonathan Chang’s excellent R package “lda” contains Gibbs
    sampling code for this model and many others.
Supervised topic models

  • But LDA is an unsupervised model. How can we build a topic
    model that is good at the task we care about?

  • Many data are paired with response variables.
      • User reviews paired with a number of stars
      • Web pages paired with a number of “diggs”
      • Documents paired with links to other documents
      • Images paired with a category

  • Supervised topic models are topic models of documents and
    responses, fit to find topics predictive of the response.
Supervised LDA

                 α      θd     Zd,n   Wd,n               βk K

                                      Yd         D      η, σ 2

  1   Draw topic proportions θ | α ∼ Dir(α).
  2   For each word
        • Draw topic assignment zn | θ ∼ Mult(θ).
        • Draw word wn | zn , β1:K ∼ Mult(βzn ).
  3   Draw response variable y | z1:N , η, σ 2 ∼ N η z , σ 2 , where
                              z = (1/N)      n=1 zn .
Supervised LDA

              α      θd    Zd,n   Wd,n            βk K

                                  Yd         D   η, σ 2

  • The response variable y is drawn after the document because it
    depends on z1:N , an assumption of partial exchangeability.

  • Consequently, y is necessarily conditioned on the words.

  • In a sense, this blends generative and discriminative modeling.
Supervised LDA

              α      θd     Zd,n   Wd,n            βk K

                                   Yd         D   η, σ 2

  • Given a set of document-response pairs, fit the model
    parameters by maximum likelihood.

  • Given a new document, compute a prediction of its response.

  • Both of these activities hinge on variational inference.
Variational inference (in general)

  • Variational methods are a deterministic alternative to MCMC.

  • Let x1:N be observations and z1:M be latent variables

  • Our goal is to compute the posterior distribution

                                         p(z1:M , x1:N )
                   p(z1:M | x1:N ) =
                                       p(z1:M , x1:N )dz1:M

  • For many interesting distributions, the marginal likelihood of the
    observations is difficult to efficiently compute
Variational inference

  • Use Jensen’s inequality to bound the log prob of the

        log p(x1:N ) = log       p(z1:M , x1:N )dz1:M
                                              qν (z1:M )
                      = log      p(z1:M , x1:N )         dz1:M
                                              qν (z1:M )
                      ≥ Eqν [log p(z1:M , x1:N )] − Eqν [log qν (z1:M )]

  • We have introduced a distribution of the latent variables with free
    variational parameters ν.

  • We optimize those parameters to tighten this bound.

  • This is the same as finding the member of the family qν that is
    closest in KL divergence to p(z1:M | x1:N ).
Mean-field variational inference

  • Factorization of qν determines complexity of optimization

  • In mean field variational inference qν is fully factored

                          qν (z1:M ) =         qνm (zm ).

  • The latent variables are independent.
      •   Each is governed by its own variational parameter νm .

  • In the true posterior they can exhibit dependence
    (often, this is what makes exact inference difficult).
MFVI and conditional exponential families

  • Suppose the distribution of each latent variable conditional on all
    other variables is in the exponential family:

     p(zm | z−m , x) = hm (zm ) exp{gm (z−m , x)T zm − am (gi (z−m , x))}

  • Assume qν is fully factorized, and each factor is in the same
    exponential family as the corresponding conditional:

                 qνm (zm ) = hm (zm ) exp{νm zm − am (νm )}
MFVI and conditional exponential families

  • Variational inference is the following coordinate ascent algorithm

                          νm = Eqν [gm (Z−m , x)]

  • Notice the relationship to Gibbs sampling.
Variational inference

  • Alternative to MCMC; replace sampling with optimization.

  • Deterministic approximation to posterior distribution.

  • Uses established optimization methods
    (block coordinate ascent; Newton-Raphson; interior-point).

  • Faster, more scalable than MCMC for large problems.

  • Biased, whereas MCMC is not.

  • Emerging as a useful framework for fully Bayesian and empirical
    Bayesian inference problems. Many open issues!

  • Good papers: Beal’s Ph.D. thesis, Wainwright and Jordan (2009)
Variational inference in sLDA

                        α     θd     Zd,n   Wd,n             βk K

                                            Yd          D   η, σ 2

  • In sLDA the variational bound is

    E[log p(θ | α)] +       n=1 E[log p(Zn         | θ)]
         +    N
              n=1 E[log p(wn       | Zn , β1:K )] + E[log p(y | Z1:N , η, σ 2 )] + H(q)

  • As in Blei, Ng, and Jordan (2003), we use the fully-factorized
    variational distribution
                q(θ, z1:N | γ, φ1:N ) = q(θ | γ)              n=1 q(zn   | φn ),
Variational inference in sLDA

  • The distinguishing term is

         E[log p(y | Z1:N , η, σ 2 )]
                 1            y 2 − 2yη E Z + η E Z Z
                                          ¯       ¯¯                     η
              = − log 2πσ 2 −
                 2                        2σ 2
  • The first expectation is

                                ¯   ¯         1   N
                              E Z = φ :=      N   n=1 φn .

  • The second expectation is

           ¯¯            1       N                       N
         E ZZ        =   N2      n=1    m=n   φn φm +    n=1 diag{φn }   .

  • Linear in φn , which leads to an easy coordinate ascent algorithm.
Maximum likelihood estimation

  • The M-step is an MLE under expected sufficient statistics.

  • Define
      • y = y1:D is the response vector
      • A is the D × K matrix whose rows are Zd .

  • MLE of the coefficients solve the expected normal equations

      E A A η = E[A] y        ⇒      ηnew ← E A A
                                     ˆ                          E[A] y

  • The MLE of the variance is
            σnew ← (1/D){y y − y E[A] E A A              E[A] y }

  • We have fit SLDA parameters to a corpus, using variational EM.

  • We have a new document w1:N with unknown response value.

  • First, run variational inference in the unsupervised LDA model, to
    obtain γ and φ1:N for the new document.
    (LDA ⇔ integrating unobserved Y out of SLDA.)

  • Predict y using SLDA expected value:

              E Y | w1:N , α, β1:K , η, σ 2 ≈ η Eq Z = η φ.
                                                   ¯     ¯
Example: Movie reviews

                least                 bad               more         awful          his                  both
                problem               guys              has          featuring      their                motion
                unfortunately         watchable         than         routine        character            simple
                supposed              its               films         dry            many                 perfect
                worse                 not               director     offered        while                fascinating
                flat                   one               will         charlie        performance          power
                dull                  movie             characters   paris          between              complex
                 ●                     ●                  ●            ●●     ●     ●      ●                   ●

          −30                   −20               −10   have    not         0 one       however    10                  20
                                                        like    about       from        cinematography
                                                        you     movie       there       screenplay
                                                        was     all         which       performances
                                                        just    would       who         pictures
                                                        some    they        much        effective
                                                        out     its         what        picture

  • 10-topic sLDA model on movie reviews (Pang and Lee, 2005).
      •         Response: Number of stars associated with each review

  • Each component of coefficient vector η is associated with a topic.
Predictive R2
(SLDA is red.)

                                                     q    q         q    q    q
                                                q              q                   q


                 Predictive R2


                                                               q    q         q
                                       q        q         q              q



                                       5   10   15   20   25   30   35   40   45   50

                                                     Number of topics
Held out likelihood
(SLDA is red.)


                                                                     q    q
                 Per−word held out log likelihood

                                                    −6.38       q    q         q






                                                            5   10   15   20   25   30   35   40   45   50

                                                                          Number of topics
Diverse response types with GLMs

  • Want to work with response variables that don’t live in the reals.
      • binary / multiclass classification
      • count data
      • waiting time

  • Model the response response with a generalized linear model

                                               ζy − A(ζ)
                  p(y | ζ, δ) = h(y , δ) exp               ,

    where ζ = η z .

  • Complicates inference, but allows for flexible modeling.
CVPR 2009 Submission #318. CONFIDE
                                       759                                                            highway                                                                                coast (highw
      Example: Multi-class classification
                             car, sign, road
                                                                                                                                                Correct classification
                                                                                                                                                with predicted annotations
                                                                                                                                                Correct classification
                                                                                                                                                                                                  car, sand bea
                                       762                                                                   758
                                                                                                             757                                            with predicted annotations    CVPR
                                        763                                                   CVPR 2009 Submission #318. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBU                        #318
                                                                                                             758                                                           highway
              CVPR 2009 Submission #318. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.
                                        764                                                                  760
                                                                                                      inside city                                                          highway                street (insid
                                        765                                                                  761                                                    car, sign, road
                                       756                                             Correct classification760                                                             Incorrect classification (correct
                                        766                                                                  762
                                       757                                     buildings, car, sidewalk      761
                                                                                       with predicted annotations                                                            with predicted annotations tre
                                                                                                                                                                    car, sign, road           594 window,
                                        767                                                                  763                                                                                  occluded
                    image classification on the LabelMe dataset                                              762
                                                                                                        image classification on the UIUC!Sport dataset                                        595
         0.78                           768                                                                  764
                                       759                                                                   763                                                                              596
                                        769                                               0.66         highway                                                           inside city             coast (highw
         0.76                                                                                                765
                                       760                                                         tall building                                                                              597 inside city (t
                                        770                                                                  766                                                        inside city
         0.74                          761                                                                   765
                                                                                          0.64 car, sign, road                                         buildings, car, sidewalk               598 sand bea
                                        771                                                                  767
                                       762                                                  trees, buildings 766                                                                              599 tree, car, sid
                                        772                                                                                                            buildings, car, sidewalk
         average accuracy

                                                                                   average accuracy
                                                                                      occluded, window       768
                                       763                                                0.62                                                                                                600
                                        773                                                                  769
                                       764                                                                   768                                                                              601
                                        774                                                          inside city
                                                                                                             770                                                      tall building              street (inside
         0.68                          765                                                 0.6               769                                                                              602
                                        775                                                                  street
                                                                                                             771                                                      tall building               highway (str
                                       766                                                                   770                                                                              603
         0.66                           776                                    buildings, car, sidewalk      772                                               trees, buildings                  window, tree
                                                                                                             771                                                                              604
                                        777                                              tree, car, sidewalk                                                occluded, window
                                                                                                                                                               trees, buildings                  occluded
                                                                                                                                                                                                  car, window
                                       768                                                                                                                                                    605
                                        778                                               0.56
                                                                                                                                                           occluded, window
                                   60 769 80                                                                                                                                                  606
               20     40
                                        779                100       120 topics                 20         40
                                                                                                                        60           80           100    120 topics

                                       770                                                       Fei!Fei and 774
                                                                                                 tall building2005                                                             street            inside city (t
                                   multi!class sLDA with annotations
                               # of components
                                                                         multi!class sLDA                    Perona,               Bosch et al., 2006
                                                                                                             forest                                                                               mountain (fo
                                       771                                                                                                                                    street          608
mparisons of average accuracy over all classes based on 5 random train/test subsets. multi-class sLDA with annotations and
                                                                                           trees, buildings  777                                             tree, car, sidewalk                 tree, car, side
                                       772                                                                   776                                                                              609
LDA (red curves in color) are both our models. left. Accuracy as a function of the number of topics on the LabelMe dataset.
                                                                                      occluded, window       778
                                       773                                                   tree trunk,777   trees,                                         tree, car, sidewalk              610 snowy moun
acy as a function of the number of topics on the UIUC-Sport dataset.                             ground778    grass
                                                                                                             779                                                                                  trunk
                                    (SLDA for image classification, with Chong Wang)
 lass sLDA: This is the multi-class sLDA model,
                                                                                      purely generative approach. On one hand, a large number
                                        785                                                                  street                                                           forest             highway (str
 ed in this paper.                                                                                           781
                                                                                      of topics increases the possibility of overfitting; on the other
                                       776                                                                    coast                                                                           613 open countr
                                        786                                                                  782                                                              forest
                                       777                                            hand, car, sidewalk    781
                                                                                        tree, it provides more latent features for building the clas-                                         614 window,
 lass sLDA with annotations: This is multi-class
                                                                                                                                                                tree trunk, trees,               car,
                                                                                      sifier. beach, 782      783                                                     ground trees,
with annotations, described788         778 this paper.
                                          in                                             sand                cloud                                             tree trunk,                    615 water, bu
                                       779                                                                                                                           ground grass             616
                                        789                                                                  785
 ing is performed on unlabeled and unannotated
                                       780                                                                   784                                                                              617
                                        790                                           Image Annotation. In the case of multi-class sLDA with
                                                                                                             785                                                               coast             mountain (fo
                                       781                                                                                                                                                    618
                                        791                                           annotations, we can use the same trained model for coast
                                                                                                                                                                               image              highway (mo
Supervised topic models

  • SLDA enables model-based regression where the predictor
    “variable” is a text document.

  • It can easily be used wherever LDA is used in an unsupervised
    fashion (e.g., images, genes, music).

  • SLDA is a supervised dimension-reduction technique, whereas
    LDA performs unsupervised dimension reduction.

  • LDA + regression compared to sLDA is like principal components
    regression compared to partial least squares.

  • Paper: Blei and McAuliffe, NIPS 2007.
Relational topic models
                                                                                                                                                902                                    1673
                                                                                                                                      1253                               1140
                                                                                                                               1590                                              1481
                                                                                                                                               981                                                                                                                                    ...

                                                                                                                      120      1060                                                                                                                                                                                                                                                                                ...

                                                                                                          2259       837
                                                                                                                                                                                                                   474                                                                ...
                                                                                                                                                                                                           264            722
                                                                                                              1743                                           965                                                                         442                                                                                                                                                                       ...
                                                                                                                                                                                  1335                                                                    640

                                                                                                                                                                                                                                                                                                                                                        Utilizing prior concepts for

                                                                                                                                                       885                                                                                                                                                                                            The inductive learning problem
                                                                                                                                                                                                             635                                                                                              Irrelevant features and the             consists of learning a concept
                                            801                                                                                         2192
                                                                                                                                                                                                                                  256                                                                          subset selection problem               given examples and
                                                                                                                                                                                                                   651                       89
                                                                                                                                                                                                                                                                                                           We address the problem of                  nonexamples of the concept. To
                                                                                                                                                                                                                          632                                                                              finding a subset of features that          perform this learning task,
                         1592                                                                                                                                                              524
                                                                                                                                                                                                                                                    686                                                    allows a supervised induction              inductive learning algorithms bias
                                                                                                                                                                                                                                                                   119                                     algorithm to induce small high-            their learning method...
              1642                                                 1176
                                                                                   1317                                                                                                                                                                           1698
                                                                                                                                                                                                                                                                                                           accuracy concepts...
                           539                                                                                                                                                      1568                      430                        236
                                                                                             994                                                                                                                                                                                                                                                                                                                               ...
                                                                                                                                                                                              1284                                                       2593     223
                                                             313                 1426
                                                                                           1304                                                                                                                                 1165
                                                                                                        992                                                                                      1792                                                           2557
                                                                          1188                                                                                                                                                                          2343
            1377                                                                                                                                                         2487                              2197
                                                                                                  1001                                                                                                              2137        1637
                                                                                                                                                                                                                                                                                            Learning with many irrelevant                                                        An evolutionary approach to                   ...
                                                                            1674                                                        911
                                                        1123                                                                                                                      52
                                                                                                                                                                                                                                             1569                                                        features                                                                        learning in robots
                                                                      1695                 1354
                                                                                                          1039                                                                                                                                                                              In many domains, an appropriate         Evaluation and selection of                 Evolutionary learning methods
                                                                                                                                                                                                                                                                                            inductive bias is the MIN-              biases in machine learning                  have been found to be useful in
                                                                          1680                                                                                                                                     1207                                                                     FEATURES bias, which prefers          In this introduction, we define the           several areas in the development
           1355                                       1047
                                                                                                                                                                                                                                                                                            consistent hypotheses definable       term bias as it is used in machine            of intelligent robots. In the
                                                             75                            1089                                                        478                                                                                               1010                               over as few features as               learning systems. We motivate                 approach described here,
                                                                      1420                                                                                                                                                                                                                  possible...                           the importance of automated                   evolutionary...
                                                                                                                 479                                               585                                                                                                                                                            methods for evaluating...
                         2122                                                                                                                                                       227                                                                                  1651
                   1345                                                                                                                                                                                                                692
                                                                                                                                                                          396           218
                                                                                                                                                                                                                                                                                                                                                                          Using a genetic algorithm to                   ...
                        2299                                                                                                                                                                                                                      960
                                                                                                                                                                                                           378                                                         1578                                                                                               learn strategies for collision
                                                                                                                                                                             2291                                                                                               ...
                                                                                              418                                                              1539
                                                                                                                                                                                                                                   286                     1963                                                                                                               avoidance and local
                                                      1138                                                                    449                303
                                                                                                                                                                                                                                                                                                                                                                                     navigation                          ...
                                                                                                                                                                                                              1290                 1678                                                                     Improving tactical plans with                                Navigation through obstacles
                                                                                    2300          147
                                                                                                                       1627                                                     1275
                                                                                                                                                                                                                                                   2195                         ...                               genetic algorithms                                     such as mine fields is an
                                                                                                  2636                                         2091                                                                       1027          1238                                                               The problem of learning decision                              important capability for
                                                                                                                                                                                                                                  1644                                                                     rules for sequential tasks is                                 autonomous underwater vehicles.
                                                                                                                                                       344                                                   2583        2012                                                                              addressed, focusing on the                                    One way to produce robust
                                                                                                                                                                                                                                                                                                           problem of learning tactical plans                            behavior...
                                                                                                                                                                                                                                                                                                           from a simple flight simulator
                                                                                                                                                                                                                                                                                                           where a plane must avoid a

  • Many data sets contain connected observations.

  • For example:
      • Citation networks of documents
      • Hyperlinked networks of web-pages.
      • Friend-connected social network profiles
supervised and relational topic models
supervised and relational topic models
supervised and relational topic models
supervised and relational topic models
supervised and relational topic models
supervised and relational topic models
supervised and relational topic models
supervised and relational topic models
supervised and relational topic models
supervised and relational topic models
supervised and relational topic models
supervised and relational topic models
supervised and relational topic models
supervised and relational topic models

More Related Content

Viewers also liked

Tokyo azure meetup #8 - Introduction to the microsoft bot framework
Tokyo azure meetup #8  - Introduction to the microsoft bot frameworkTokyo azure meetup #8  - Introduction to the microsoft bot framework
Tokyo azure meetup #8 - Introduction to the microsoft bot frameworkKanio Dimitrov
17.01.18_論文紹介_Discrimination- and privacy-aware patterns
17.01.18_論文紹介_Discrimination- and privacy-aware patterns17.01.18_論文紹介_Discrimination- and privacy-aware patterns
17.01.18_論文紹介_Discrimination- and privacy-aware patternsLINE Corp.
KDD2013読み会: Direct Optimization of Ranking Measures
KDD2013読み会: Direct Optimization of Ranking MeasuresKDD2013読み会: Direct Optimization of Ranking Measures
KDD2013読み会: Direct Optimization of Ranking Measuressleepy_yoshi
PRML復々習レーン#15 前回までのあらすじ
PRML復々習レーン#15 前回までのあらすじPRML復々習レーン#15 前回までのあらすじ
PRML復々習レーン#15 前回までのあらすじsleepy_yoshi
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing SystemsShuyo Nakatani
Extreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a WeekExtreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a WeekShuyo Nakatani
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialTopic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialVitomir Kovanovic
Short Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-GramShort Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-GramShuyo Nakatani
DSIRNLP#1 ランキング学習ことはじめ
DSIRNLP#1 ランキング学習ことはじめDSIRNLP#1 ランキング学習ことはじめ
DSIRNLP#1 ランキング学習ことはじめsleepy_yoshi
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...Shuyo Nakatani
KDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
KDD2014勉強会: Large-Scale High-Precision Topic Modeling on TwitterKDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
KDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twittersleepy_yoshi

Viewers also liked (13)

Tokyo azure meetup #8 - Introduction to the microsoft bot framework
Tokyo azure meetup #8  - Introduction to the microsoft bot frameworkTokyo azure meetup #8  - Introduction to the microsoft bot framework
Tokyo azure meetup #8 - Introduction to the microsoft bot framework
17.01.18_論文紹介_Discrimination- and privacy-aware patterns
17.01.18_論文紹介_Discrimination- and privacy-aware patterns17.01.18_論文紹介_Discrimination- and privacy-aware patterns
17.01.18_論文紹介_Discrimination- and privacy-aware patterns
KDD2013読み会: Direct Optimization of Ranking Measures
KDD2013読み会: Direct Optimization of Ranking MeasuresKDD2013読み会: Direct Optimization of Ranking Measures
KDD2013読み会: Direct Optimization of Ranking Measures
PRML復々習レーン#15 前回までのあらすじ
PRML復々習レーン#15 前回までのあらすじPRML復々習レーン#15 前回までのあらすじ
PRML復々習レーン#15 前回までのあらすじ
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
Extreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a WeekExtreme Extraction - Machine Reading in a Week
Extreme Extraction - Machine Reading in a Week
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialTopic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
20151221 public
20151221 public20151221 public
20151221 public
Short Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-GramShort Text Language Detection with Infinity-Gram
Short Text Language Detection with Infinity-Gram
DSIRNLP#1 ランキング学習ことはじめ
DSIRNLP#1 ランキング学習ことはじめDSIRNLP#1 ランキング学習ことはじめ
DSIRNLP#1 ランキング学習ことはじめ
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
KDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
KDD2014勉強会: Large-Scale High-Precision Topic Modeling on TwitterKDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter
KDD2014勉強会: Large-Scale High-Precision Topic Modeling on Twitter

Similar to supervised and relational topic models

A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...
A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...
A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...ricky_pi_tercios
Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...butest
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitShubham Verma
LPCNN: convolutional neural network for link prediction based on network stru...
LPCNN: convolutional neural network for link prediction based on network stru...LPCNN: convolutional neural network for link prediction based on network stru...
LPCNN: convolutional neural network for link prediction based on network stru...TELKOMNIKA JOURNAL
Learning weighted lower linear envelope potentials in binary markov random fi...
Learning weighted lower linear envelope potentials in binary markov random fi...Learning weighted lower linear envelope potentials in binary markov random fi...
Learning weighted lower linear envelope potentials in binary markov random fi...jpstudcorner
GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced DatasetsGECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced DatasetsAlbert Orriols-Puig
Financial Networks VI - Correlation Networks
Financial Networks VI - Correlation NetworksFinancial Networks VI - Correlation Networks
Financial Networks VI - Correlation NetworksKimmo Soramaki
Ieee acm transactions 2018 on networking topics with abstract for final year ...
Ieee acm transactions 2018 on networking topics with abstract for final year ...Ieee acm transactions 2018 on networking topics with abstract for final year ...
Ieee acm transactions 2018 on networking topics with abstract for final year ...tsysglobalsolutions
ATI's Radar Signal Analysis and Processing using MATLAB Technical Training Sh...
ATI's Radar Signal Analysis and Processing using MATLAB Technical Training Sh...ATI's Radar Signal Analysis and Processing using MATLAB Technical Training Sh...
ATI's Radar Signal Analysis and Processing using MATLAB Technical Training Sh...Jim Jenkins

Similar to supervised and relational topic models (10)

A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...
A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...
A Methodology for the Emulation of Boolean Logic that Paved the Way for the S...
Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
LPCNN: convolutional neural network for link prediction based on network stru...
LPCNN: convolutional neural network for link prediction based on network stru...LPCNN: convolutional neural network for link prediction based on network stru...
LPCNN: convolutional neural network for link prediction based on network stru...
Learning weighted lower linear envelope potentials in binary markov random fi...
Learning weighted lower linear envelope potentials in binary markov random fi...Learning weighted lower linear envelope potentials in binary markov random fi...
Learning weighted lower linear envelope potentials in binary markov random fi...
GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced DatasetsGECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
GECCO'2006: Bounding XCS’s Parameters for Unbalanced Datasets
1b N. Alonistioti
1b N. Alonistioti 1b N. Alonistioti
1b N. Alonistioti
Financial Networks VI - Correlation Networks
Financial Networks VI - Correlation NetworksFinancial Networks VI - Correlation Networks
Financial Networks VI - Correlation Networks
Ieee acm transactions 2018 on networking topics with abstract for final year ...
Ieee acm transactions 2018 on networking topics with abstract for final year ...Ieee acm transactions 2018 on networking topics with abstract for final year ...
Ieee acm transactions 2018 on networking topics with abstract for final year ...
ATI's Radar Signal Analysis and Processing using MATLAB Technical Training Sh...
ATI's Radar Signal Analysis and Processing using MATLAB Technical Training Sh...ATI's Radar Signal Analysis and Processing using MATLAB Technical Training Sh...
ATI's Radar Signal Analysis and Processing using MATLAB Technical Training Sh...

Recently uploaded

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George

Recently uploaded (20)

4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP

supervised and relational topic models

  • 1. Supervised and Relational Topic Models David M. Blei Department of Computer Science Princeton University October 5, 2009 Joint work with Jonathan Chang and Jon McAuliffe
  • 2. Topic modeling • Large electronic archives of document collections require new statistical tools for analyzing text. • Topic models have emerged as a powerful technique for unsupervised analysis of large document collections. • Topic models posit latent topics in text using hidden random variables, and uncover that structure with posterior inference. • Useful for tasks like browsing, search, information retrieval, etc.
  • 3. Examples of topic modeling contractual employment female markets criminal expectation industrial men earnings discretion gain local women investors justice promises jobs see sec civil expectations employees sexual research process breach relations note structure federal enforcing unfair employer managers see supra agreement discrimination firm officer note economic harassment risk parole perform case gender large inmates
  • 4. Examples of topic modeling online scheduling quantum task Quantum lower bounds by polynomials competitive On the power of bounded concurrency I: finite automata automata approximation nc tasks Dense quantum coding and quantum finite automata s Classical physics and the Church--Turing Thesis automaton points languages distance convex n routing Nearly optimal algorithms and bounds for multilayer channel routing machine functions adaptive How bad is selfish routing? domain polynomial networks network Authoritative sources in a hyperlinked environment networks Balanced sequences and optimal routing degree log protocol protocols degrees polynomials algorithm network packets link learning learnable statistical constraint examples dependencies Module algebra classes local On XML integrity constraints in the presence of DTDs An optimal algorithm for intersecting line segments in the plane graph Closure properties of constraints Recontamination does not help to search a graph graphs consistency Dynamic functional dependencies and database aging A new approach to the maximum-flow problem edge tractable The time complexity of maximum matching by simulated annealing minimum the,of database vertices constraints a, is algebra and boolean logic m relational logics merging n query networks algorithm theories sorting languages multiplication time log bound system learning consensus systems knowledge objects logic performance reasoning messages protocol programs analysis verification circuit asynchronous systems distributed language trees regular sets networks Single-class bounds of multi-class queuing networks tree queuing The maximum concurrent flow problem search asymptotic Contention in shared memory algorithms compression database productform Linear probing with a nonuniform address distribution transactions server retrieval concurrency Magic Functions: In Memoriam: Bernard M. Dwork 1923--1998 proof restrictions property formulas A mechanical proof of the Church-Rosser theorem program firstorder Timed regular expressions On the power and limitations of strictness analysis resolution decision abstract temporal queries
  • 5. Examples of topic modeling 1880 1890 1900 1910 1920 1930 1940 electric electric apparatus air apparatus tube air machine power steam water tube apparatus tube power company power engineering air glass apparatus engine steam engine apparatus pressure air glass steam electrical engineering room water mercury laboratory two machine water laboratory glass laboratory rubber machines two construction engineer gas pressure pressure iron system engineer made made made small battery motor room gas laboratory gas mercury wire engine feet tube mercury small gas 1950 1960 1970 1980 1990 2000 tube tube air high materials devices apparatus system heat power high device glass temperature power design power materials air air system heat current current chamber heat temperature system applications gate instrument chamber chamber systems technology high small power high devices devices light laboratory high flow instruments design silicon pressure instrument tube control device material rubber control design large heat technology
  • 6. Examples of topic modeling neurons brain stimulus motor memory visual activated subjects synapses tyrosine phosphorylation cortical left ltp activation phosphorylation p53 task surface glutamate kinase cell cycle proteins tip synaptic activity protein cyclin binding rna image neurons regulation domain dna sample materials computer domains rna polymerase organic problem device receptor cleavage information polymer science amino acids research scientists receptors cdna site computers polymers funding molecules physicists support says ligand sequence problems laser particles nih research ligands isolated optical physics program people protein sequence light apoptosis sequences surface particle electrons experiment genome liquid quantum wild type dna surfaces stars mutant enzyme sequencing fluid mutations enzymes model reaction astronomers united states mutants iron active site reactions universe women cells mutation reduction molecule galaxies universities cell molecules expression magnetic galaxy cell lines plants magnetic field transition state students bone marrow plant spin superconductivity gene education genes superconducting pressure mantle arabidopsis bacteria high pressure crust sun bacterial pressures upper mantle solar wind host fossil record core meteorites earth resistance development birds inner core ratios planets mice parasite embryos fossils planet gene dinosaurs antigen virus drosophila species disease fossil t cells hiv genes forest mutations antigens aids expression forests families earthquake co2 immune response infection populations mutation earthquakes carbon viruses ecosystems fault carbon dioxide ancient images methane patients genetic found disease cells population impact data water ozone treatment proteins populations million years ago volcanic atmospheric drugs differences africa clinical researchers deposits climate measurements variation stratosphere protein magma ocean eruption ice concentrations found volcanism changes climate change
  • 7. Supervised topic models • These applications of topic modeling work in the same way. • Fit a model using a likelihood criterion. Then, hope that the resulting model is useful for the task at hand. • Supervised topic models and relational topic models fit topics explicitly to perform prediction. • Useful for building topic models that can • Predict the rating of a review • Predict the category of an image • Predict the links emitted from a document
  • 8. Outline 1 Unsupervised topic models 2 Supervised topic models 3 Relational topic models
  • 9. Probabilistic modeling 1 Treat data as observations that arise from a generative probabilistic process that includes hidden variables • For documents, the hidden variables reflect the thematic structure of the collection. 2 Infer the hidden structure using posterior inference • What are the topics that describe this collection? 3 Situate new data into the estimated model. • How does this query or new document fit into the estimated topic structure?
  • 10. Intuition behind LDA Simple intuition: Documents exhibit multiple topics.
  • 11. Generative model Topic proportions and Topics Documents assignments gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, • Each document is a random mixture of corpus-wide topics • Each word is drawn from one of those topics
  • 12. The posterior distribution Topic proportions and Topics Documents assignments • In reality, we only observe the documents • Our goal is to infer the underlying topic structure
  • 13. Latent Dirichlet allocation Per-word Dirichlet topic assignment parameter Per-document Observed Topic topic proportions word Topics hyperparameter α θd Zd,n Wd,n βk η N D K Each piece of the structure is a random variable.
  • 14. Latent Dirichlet allocation α θd Zd,n Wd,n βk η N D K βk ∼ Dir(η) k = 1 . . . K θd ∼ Dir(α) d = 1 . . . D Zd,n | θd ∼ Mult(1, θd ) d = 1 . . . D, n = 1...N Wd,n | θd , zd,n , β1:K ∼ Mult(1, βzd,n ) d = 1 . . . D, n = 1...N
  • 15. Latent Dirichlet allocation α θd Zd,n Wd,n βk η N D K 1 Draw each topic βk ∼ Dir(η), for k ∈ {1, . . . , K }. 2 For each document: 1 Draw topic proportions θd ∼ Dir(α). 2 For each word: 1 Draw Zd,n ∼ Mult(θd ). 2 Draw Wd,n ∼ Mult(βzd,n ).
  • 16. Latent Dirichlet allocation α θd Zd,n Wd,n βk η N D K • From a collection of documents, infer • Per-word topic assignment zd,n • Per-document topic proportions θd • Per-corpus topic distributions βk • Use posterior expectations to perform the task at hand, e.g., information retrieval, document similarity, etc.
  • 17. Latent Dirichlet allocation α θd Zd,n Wd,n βk η N D K • Computing the posterior is intractable: N p(θ | α) n=1 p(zn | θ)p(wn | zn , β1:K ) N K θ p(θ | α) n=1 z=1 p(zn | θ)p(wn | zn , β1:K ) • Several approximation techniques have been developed.
  • 18. Latent Dirichlet allocation α θd Zd,n Wd,n βk η N D K • Mean field variational methods (Blei et al., 2001, 2003) • Expectation propagation (Minka and Lafferty, 2002) • Collapsed Gibbs sampling (Griffiths and Steyvers, 2002) • Collapsed variational inference (Teh et al., 2006)
  • 19. Example inference 0.4 0.3 Probability 0.2 0.1 0.0 1 8 16 26 36 46 56 66 76 86 96 Topics
  • 20. Example topics “Genetics” “Evolution” “Disease” “Computers” human evolution disease computer genome evolutionary host models dna species bacteria information genetic organisms diseases data genes life resistance computers sequence origin bacterial system gene biology new network molecular groups strains systems sequencing phylogenetic control model map living infectious parallel information diversity malaria methods genetics group parasite networks mapping new parasites software project two united new sequences common tuberculosis simulations
  • 21. Used in exploratory tools of document collections
  • 22. LDA summary • LDA is a powerful model for • Visualizing the hidden thematic structure in large corpora • Generalizing new data to fit into that structure • LDA is a mixed membership model (Erosheva, 2004) that builds on the work of Deerwester et al. (1990) and Hofmann (1999). • For document collections and other grouped data, this might be more appropriate than a simple finite mixture. • The same model was independently invented for population genetics analysis (Pritchard et al., 2000).
  • 23. LDA summary • Modular : It can be embedded in more complicated models. • General: The data generating distribution can be changed. • Variational inference is fast; lets us to analyze large data sets. • See Blei et al., 2003 for details and a quantitative comparison. See my web-site for code and other papers. • Jonathan Chang’s excellent R package “lda” contains Gibbs sampling code for this model and many others.
  • 24. Supervised topic models • But LDA is an unsupervised model. How can we build a topic model that is good at the task we care about? • Many data are paired with response variables. • User reviews paired with a number of stars • Web pages paired with a number of “diggs” • Documents paired with links to other documents • Images paired with a category • Supervised topic models are topic models of documents and responses, fit to find topics predictive of the response.
  • 25. Supervised LDA α θd Zd,n Wd,n βk K N Yd D η, σ 2 1 Draw topic proportions θ | α ∼ Dir(α). 2 For each word • Draw topic assignment zn | θ ∼ Mult(θ). • Draw word wn | zn , β1:K ∼ Mult(βzn ). 3 Draw response variable y | z1:N , η, σ 2 ∼ N η z , σ 2 , where ¯ N ¯ z = (1/N) n=1 zn .
  • 26. Supervised LDA α θd Zd,n Wd,n βk K N Yd D η, σ 2 • The response variable y is drawn after the document because it depends on z1:N , an assumption of partial exchangeability. • Consequently, y is necessarily conditioned on the words. • In a sense, this blends generative and discriminative modeling.
  • 27. Supervised LDA α θd Zd,n Wd,n βk K N Yd D η, σ 2 • Given a set of document-response pairs, fit the model parameters by maximum likelihood. • Given a new document, compute a prediction of its response. • Both of these activities hinge on variational inference.
  • 28. Variational inference (in general) • Variational methods are a deterministic alternative to MCMC. • Let x1:N be observations and z1:M be latent variables • Our goal is to compute the posterior distribution p(z1:M , x1:N ) p(z1:M | x1:N ) = p(z1:M , x1:N )dz1:M • For many interesting distributions, the marginal likelihood of the observations is difficult to efficiently compute
  • 29. Variational inference • Use Jensen’s inequality to bound the log prob of the observations: log p(x1:N ) = log p(z1:M , x1:N )dz1:M qν (z1:M ) = log p(z1:M , x1:N ) dz1:M qν (z1:M ) ≥ Eqν [log p(z1:M , x1:N )] − Eqν [log qν (z1:M )] • We have introduced a distribution of the latent variables with free variational parameters ν. • We optimize those parameters to tighten this bound. • This is the same as finding the member of the family qν that is closest in KL divergence to p(z1:M | x1:N ).
  • 30. Mean-field variational inference • Factorization of qν determines complexity of optimization • In mean field variational inference qν is fully factored M qν (z1:M ) = qνm (zm ). m=1 • The latent variables are independent. • Each is governed by its own variational parameter νm . • In the true posterior they can exhibit dependence (often, this is what makes exact inference difficult).
  • 31. MFVI and conditional exponential families • Suppose the distribution of each latent variable conditional on all other variables is in the exponential family: p(zm | z−m , x) = hm (zm ) exp{gm (z−m , x)T zm − am (gi (z−m , x))} • Assume qν is fully factorized, and each factor is in the same exponential family as the corresponding conditional: qνm (zm ) = hm (zm ) exp{νm zm − am (νm )} T
  • 32. MFVI and conditional exponential families • Variational inference is the following coordinate ascent algorithm νm = Eqν [gm (Z−m , x)] • Notice the relationship to Gibbs sampling.
  • 33. Variational inference • Alternative to MCMC; replace sampling with optimization. • Deterministic approximation to posterior distribution. • Uses established optimization methods (block coordinate ascent; Newton-Raphson; interior-point). • Faster, more scalable than MCMC for large problems. • Biased, whereas MCMC is not. • Emerging as a useful framework for fully Bayesian and empirical Bayesian inference problems. Many open issues! • Good papers: Beal’s Ph.D. thesis, Wainwright and Jordan (2009)
  • 34. Variational inference in sLDA α θd Zd,n Wd,n βk K N Yd D η, σ 2 • In sLDA the variational bound is N E[log p(θ | α)] + n=1 E[log p(Zn | θ)] + N n=1 E[log p(wn | Zn , β1:K )] + E[log p(y | Z1:N , η, σ 2 )] + H(q) • As in Blei, Ng, and Jordan (2003), we use the fully-factorized variational distribution N q(θ, z1:N | γ, φ1:N ) = q(θ | γ) n=1 q(zn | φn ),
  • 35. Variational inference in sLDA • The distinguishing term is E[log p(y | Z1:N , η, σ 2 )] 1 y 2 − 2yη E Z + η E Z Z ¯ ¯¯ η = − log 2πσ 2 − 2 2σ 2 • The first expectation is ¯ ¯ 1 N E Z = φ := N n=1 φn . • The second expectation is ¯¯ 1 N N E ZZ = N2 n=1 m=n φn φm + n=1 diag{φn } . • Linear in φn , which leads to an easy coordinate ascent algorithm.
  • 36. Maximum likelihood estimation • The M-step is an MLE under expected sufficient statistics. • Define • y = y1:D is the response vector ¯ • A is the D × K matrix whose rows are Zd . • MLE of the coefficients solve the expected normal equations −1 E A A η = E[A] y ⇒ ηnew ← E A A ˆ E[A] y • The MLE of the variance is −1 ˆ2 σnew ← (1/D){y y − y E[A] E A A E[A] y }
  • 37. Prediction • We have fit SLDA parameters to a corpus, using variational EM. • We have a new document w1:N with unknown response value. • First, run variational inference in the unsupervised LDA model, to obtain γ and φ1:N for the new document. (LDA ⇔ integrating unobserved Y out of SLDA.) • Predict y using SLDA expected value: E Y | w1:N , α, β1:K , η, σ 2 ≈ η Eq Z = η φ. ¯ ¯
  • 38. Example: Movie reviews least bad more awful his both problem guys has featuring their motion unfortunately watchable than routine character simple supposed its films dry many perfect worse not director offered while fascinating flat one will charlie performance power dull movie characters paris between complex ● ● ● ●● ● ● ● ● −30 −20 −10 have not 0 one however 10 20 like about from cinematography you movie there screenplay was all which performances just would who pictures some they much effective out its what picture • 10-topic sLDA model on movie reviews (Pang and Lee, 2005). • Response: Number of stars associated with each review • Each component of coefficient vector η is associated with a topic.
  • 39. Predictive R2 (SLDA is red.) 0.5 q q q q q q q q q 0.4 Predictive R2 0.3 q q q q q q q q q 0.2 q q 0.1 0.0 5 10 15 20 25 30 35 40 45 50 Number of topics
  • 40. Held out likelihood (SLDA is red.) −6.37 q q q q Per−word held out log likelihood −6.38 q q q q q q q −6.39 q q q q q −6.40 q q −6.41 q −6.42 q 5 10 15 20 25 30 35 40 45 50 Number of topics
  • 41. Diverse response types with GLMs • Want to work with response variables that don’t live in the reals. • binary / multiclass classification • count data • waiting time • Model the response response with a generalized linear model ζy − A(ζ) p(y | ζ, δ) = h(y , δ) exp , δ where ζ = η z . ¯ • Complicates inference, but allows for flexible modeling.
  • 42. CVPR 2009 Submission #318. CONFIDE 759 highway coast (highw Example: Multi-class classification 760 CVPR 761 car, sign, road 756 757 756 Correct classification with predicted annotations Correct classification car, sand bea #318 762 758 757 with predicted annotations CVPR 763 CVPR 2009 Submission #318. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBU #318 759 758 highway CVPR 2009 Submission #318. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE. 764 760 759 inside city highway street (insid 765 761 car, sign, road 756 Correct classification760 Incorrect classification (correct 766 762 757 buildings, car, sidewalk 761 with predicted annotations with predicted annotations tre car, sign, road 594 window, 767 763 occluded 758 image classification on the LabelMe dataset 762 image classification on the UIUC!Sport dataset 595 0.78 768 764 759 763 596 769 0.66 highway inside city coast (highw 0.76 765 764 760 tall building 597 inside city (t 770 766 inside city 0.74 761 765 0.64 car, sign, road buildings, car, sidewalk 598 sand bea car, 771 767 762 trees, buildings 766 599 tree, car, sid 772 buildings, car, sidewalk average accuracy average accuracy 0.72 occluded, window 768 767 763 0.62 600 773 769 0.7 764 768 601 774 inside city 770 tall building street (inside 0.68 765 0.6 769 602 775 street 771 tall building highway (str 766 770 603 0.66 776 buildings, car, sidewalk 772 trees, buildings window, tree 767 0.58 771 604 777 tree, car, sidewalk occluded, window trees, buildings occluded car, window 773 772 0.64 768 605 778 0.56 774 occluded, window 773 60 769 80 606 20 40 779 100 120 topics 20 40 775 60 80 100 120 topics 770 Fei!Fei and 774 tall building2005 street inside city (t multi!class sLDA with annotations 780 # of components multi!class sLDA Perona, Bosch et al., 2006 607 776 775 forest mountain (fo 771 street 608 781 mparisons of average accuracy over all classes based on 5 random train/test subsets. multi-class sLDA with annotations and trees, buildings 777 tree, car, sidewalk tree, car, side 772 776 609 782 LDA (red curves in color) are both our models. left. Accuracy as a function of the number of topics on the LabelMe dataset. occluded, window 778 773 tree trunk,777 trees, tree, car, sidewalk 610 snowy moun 783 acy as a function of the number of topics on the UIUC-Sport dataset. ground778 grass 779 trunk (SLDA for image classification, with Chong Wang) 774 784 lass sLDA: This is the multi-class sLDA model, 775 780 779 purely generative approach. On one hand, a large number 611 612 785 street forest highway (str ed in this paper. 781 780 of topics increases the possibility of overfitting; on the other 776 coast 613 open countr 786 782 forest 777 hand, car, sidewalk 781 tree, it provides more latent features for building the clas- 614 window, lass sLDA with annotations: This is multi-class 787 tree trunk, trees, car, sifier. beach, 782 783 ground trees, grass with annotations, described788 778 this paper. in sand cloud tree trunk, 615 water, bu sea 784 783 779 ground grass 616 789 785 ing is performed on unlabeled and unannotated 780 784 617 790 Image Annotation. In the case of multi-class sLDA with forest 786 785 coast mountain (fo 781 618 791 annotations, we can use the same trained model for coast mountain 787 786 image highway (mo
  • 43. Supervised topic models • SLDA enables model-based regression where the predictor “variable” is a text document. • It can easily be used wherever LDA is used in an unsupervised fashion (e.g., images, genes, music). • SLDA is a supervised dimension-reduction technique, whereas LDA performs unsupervised dimension reduction. • LDA + regression compared to sLDA is like principal components regression compared to partial least squares. • Paper: Blei and McAuliffe, NIPS 2007.
  • 44. Relational topic models 966 902 1673 1253 1140 1432 1590 1481 964 981 ... 120 1060 ... 831 2259 837 474 ... 436 264 722 1743 965 442 ... 375 660 1335 640 Utilizing prior concepts for 109 1959 learning 254 885 The inductive learning problem 2272 1489 635 Irrelevant features and the consists of learning a concept 801 2192 172 256 subset selection problem given examples and 381 1285 547 651 89 683 We address the problem of nonexamples of the concept. To 2033 534 177 632 finding a subset of features that perform this learning task, 1270 1592 524 634 686 allows a supervised induction inductive learning algorithms bias 1020 208 119 algorithm to induce small high- their learning method... 1642 1176 1317 1698 accuracy concepts... 539 1568 430 236 994 ... 1284 2593 223 313 1426 1304 1165 992 1792 2557 541 1188 2343 1377 2487 2197 1001 2137 1637 1617 Learning with many irrelevant An evolutionary approach to ... 1674 911 1483 1123 52 1569 features learning in robots 1695 1354 1039 In many domains, an appropriate Evaluation and selection of Evolutionary learning methods 603 inductive bias is the MIN- biases in machine learning have been found to be useful in 1680 1207 FEATURES bias, which prefers In this introduction, we define the several areas in the development 288 1355 1047 1465 1040 136 consistent hypotheses definable term bias as it is used in machine of intelligent robots. In the 75 1089 478 1010 over as few features as learning systems. We motivate approach described here, 1348 1420 possible... the importance of automated evolutionary... 479 585 methods for evaluating... 806 2122 227 1651 1345 692 92 396 218 1061 178 Using a genetic algorithm to ... 2299 960 1854 378 1578 learn strategies for collision 2291 ... 1344 418 1539 286 1963 avoidance and local 649 1855 1138 449 303 335 navigation ... 2042 2290 1290 1678 Improving tactical plans with Navigation through obstacles 2300 147 1627 1275 2195 ... genetic algorithms such as mine fields is an 1121 2636 2091 1027 1238 The problem of learning decision important capability for 2447 1644 rules for sequential tasks is autonomous underwater vehicles. 344 2583 2012 addressed, focusing on the One way to produce robust 426 2438 problem of learning tactical plans behavior... from a simple flight simulator 1244 where a plane must avoid a 2617 missile... 2213 1234 1944 • Many data sets contain connected observations. • For example: • Citation networks of documents • Hyperlinked networks of web-pages. • Friend-connected social network profiles