SlideShare a Scribd company logo
1 of 45
Download to read offline
!
10 !XML
          !
10.0 !         !
10.1 !             XML          !
10.2 !XML                   !
10.3 !XML                                             !
10.4 !XML                   !
!
!
      h,p://nlp.stanford.edu/IR=book/ppt/10xml.pptx
10.0 !      !
10.1 !          XML       !
10.2 !XML             !
10.3 !XML                     !
10.4 !XML
RDB
•            RDB   !
•  IR
                       !
•  RDB                         !
    –                      !




• 

         !
(Structured!Retrieval)
•                          or
                                !
                          DB
                                (named!enJty!tagging)!
     !
!                    :!
              !
!        :!       4,405,829             RSA
                                    !
!                          :!
                                !
RDB
•  3                 !
     –                                 (DB)
                                                                !
     –                                                              !−!
                 !
          •  tours!AND!(COUNTRY:!VaJcan!OR!
          •  LANDMARK:!Coliseum)?!
          •  tour!AND!(STATE:!VaJcan!OR!BUILDING:!Coliseum)?!
     – 
                                                           !
•            !
     –                                                                !
• 
            XML!
     –  !          !→!XML      !
     –                      (HTML,!SGML,!…)
10.0 !      !
10.1 !          XML       !
10.2 !XML             !
10.3 !XML                     !
10.4 !XML
XML
!                                              !   !<play>!
!                       XML                    /   !<author>Shakespeare</author>!
                                (e.g.!<Jtle…       !<Jtle>Macbeth</Jtle>!
     >,!</Jtle…>)!!
                                                   !<act!number=“I”>!
!                       XML
               (e.g.!number)!                      !<scene!!number=”vii”>!

!                     (e.g.!vii)!                  !<Jtle>Macbeth’s!castle</Jtle>!

!                                      !(e.g.!     !<verse>Will!I!with!wine!
     Jtle,!verse)!                                 !…</verse>!
                                                   !</scene>!
                                                   !</act>!
                                                   !</play>!
XML
                root!element!
                    play!

  !element!       !element!        !element!
   author!           act!             )tle!

   !text!                           !text!
Shakespeare!                       Macbeth!

 !a,ribute!       element!
number=“I”!        scene!

  !a,ribute!      !element!        !element!
number=“vii”!       verse!            )tle!

                   !text!           !text!
                Shakespeare!    Macbeth’s!castle!
                                      10!           10!
XML
                root!element!
                     play!

  !element!       !element!        !element!
   author!           act!             )tle!

   !text!                           !text!
Shakespeare!                       Macbeth!

 !a,ribute!        element!
number=“I”!         scene!

  !a,ribute!      !element!        !element!
number=“vii”!       verse!            )tle!

                    !text!          !text!
                 Shakespeare!   Macbeth’s!castle!
XML
                root!element!
                     play!

  !element!       !element!          !element!
   author!           act!               )tle!

   !text!                             !text!
Shakespeare!                         Macbeth!

 !a,ribute!       element!
number=“I”!        scene!

  !a,ribute!      !element!        !element!
number=“vii”!       verse!            )tle!

                   !text!            !text!
                Shakespeare!    Macbeth’s!castle!
                                 12!                12!
XML
!  XML$Documents$Object$Model$(XML$DOM):!
               !
       !  DOM
                 !
       !  DOM!API                          XML
                                     !
!  XPath:!XML                                            e.g.NEXI !
       !  XML                                      !
!  Schema:!                             XML
         E.g.!                                 :!    (scene)
   (act)                     !
       !  XML                               :!XML!DTD!(document!
          type!definiJon)! !XML!Schema!
10.0 !      !
10.1 !          XML       !
10.2 !XML             !
10.3 !XML                     !
10.4 !XML
XML
1.              !
2.                  !
3.          !
! !
•                /XML         :!
          (i.e.,!XML     )
                    !
     !
                              Macbeth’s!castle
                        !scene act!                  !

         –                           scene       !
         –              Macbeth
                           play                  !
• 
!
                                                               !

                                             !

• 
                                         !
•                                                        E.g.!
     query:Jtle:Macbeth!                             Macbeth
                 tragedy,!Macbeth   Jtle,!Act!I,!Scene!vii,!Macbetch’s!
     castle Jtle                                 10.2 !
     –                 tragedy Jtle
             !
     – 
!

• 
                                !
     – 

                                    !
     – 
                        !
          1.                !
          2.        !
          3.        !
          4. 
•        !




•            !
• 
     !
•                  !
     1. 
           E.g.!       book!   !
     2. 
                               !
•                      book
•         !
     – 



              !
     – 

                  !
•                                  :!
                              :!
         –    XML                          E.g.!ISBN!!
         – 
                         !

     !
Macbeth’s!castle!                       Macbeth’s!castle
     play,!act,!scene,!)tle
                                                         !

• 

                                            !
! !
• 
                          !
     –                !
     – 
                XML                   !
     – 
                                  !
     – 
                              !
• 
! !
                            !
     • 
                        !
     • 



                !
!         1:!

                                      !
!         2:!

                    !
•                                                        (idf)

                                                                      !

  !
        author        Gates     gate

Gates                                                             !
 !
            XML=context/term(     /    )          !
 •                              XML=context
                          df                  !
 •                                                    x
                                       x
10.0 !      !
10.1 !          XML       !
10.2 !XML             !
10.3 !XML                     !
10.4 !XML
•               :!                                            XML


                                                  Microso.$             Bill$        Gates$
              Book!

                                           Title!             Author!            Author!
     Title!            Author!
                                         Microso.$              Bill$            Gates$

Microso.$             Bill$   Gates$     Book!                  Book!
                                                                                      .!.!.!!
                                         Title!                Author!

                                       Microso.$              Bill$   27!   Gates$
1. 
                   Bill!Gates                          Bill Gates                                !
2. 
                                                                                       !

                                                    Microso.$             Bill$        Gates$
           Book!

                                             Title!             Author!            Author!
  Title!              Author!
                                           Microso.$              Bill$            Gates$

Microso.$            Bill$      Gates$     Book!                  Book!
                                                                                           .!.!.!!
                                           Title!                Author!

                                         Microso.$              Bill$   28!   Gates$
• 

     E.g.!

      $vs.$             $
                  XML

              !
• 
                       !
     – 
                               e.g.!author
          Jtle Gates       !
     – 
                                                            !
• 
                                         XML=context/term
                            XML=context/term
          (structural!term)      <c,!t>!
                         XML     context c
     (term)t         !
(context!resemblance)
•                   cq !                cd !
                             !CR!      :!




•  |cq|!   |cd|!                                     !
•                                    cq !      cd!
             cq !   cd !            !
CR(cq4,!cd2)!=!3/4!=!0.75.!!
q! d!           idenJcal       !CR(cq,!cd)!   !1.0! !
CR(cq4,!cd3)!=!3/5!=!0.6.!
•                                SIMNOMERGE!
           !
SIMNOMERGE(q,!d)!=!
!
!

•  V                        !
•  B          XML                 !
•  weight!(q,!t,!c),!weight(d,!t,!c)!                  !q!        !d! XML
      !c!       !t!                  (               E.g.!idft!*!wft,d!:!idft!
   dft!                                        )!!
•  SIMNOMERGE(q,!d)!           1.0
                       !
SimNoMerge
SCOREDOCUMENTSWITHSIMNOMERGE(q,!B,!V,!N,!normalizer)!
10.0 !      !
10.1 !          XML       !
10.2 !XML             !
10.3 !XML                     !
10.4 !XML
XML                                                        (INEX)

•  INEX:!XML

  INEX2002              IEEE                       12,000
       2006                            Wikipedia


 INEX$2002$collec@on$sta@s@cs$
 12,107!       number!of!documents!
 494!MB!       size!
 1995—2002! Jme!of!publicaJon!of!arJcles!
 1,532!        average!number!of!XML!nodes!per!document!
 6.9!          average!depth!of!a!node!
 30!           number!of!CAS!topics!
 30!           number!of!CO!topics!
INEX
•  2        /         !
   1.               (CO          )!
   2.                     (CAS        )!
CAS

                !
INEX
•          :!
     1.  Content=only!or!CO       :!
                                        !
     2.  Content=and=structure!or!CAS       :!
                                   !
     CAS

                     !
INEX
•  INEX!2002!
                                  !
                        $

                                      !
•                             !
     1.                E :!
                                          !
     2.         S :!
                                              !
     3.         L
                            !
     4.                N
INEX
• 
                      (3)                     (2)
     (1)                     (0)
                  $
                                 (ex.!3E!→!              (3)
       (E) )                                                   2S
                                                    3E



                            3N                  !
INEX
•                     !




• 
         XML              Q
     /

                A
INEX
•                              2

                                   !

     $

•            INEX      !   !
•            INEX

         !
XML
                                     XML
•                                         !
     –                    XML!
          •                         XML
                                                             !
          •          !
               1.                         !
               2.               !
               3.                                                    !
     –                   XML!
          •                                                          !
          •                                         →            !
               –                                             !
•  XML                                        XQuery!(W3C)
•  (        )XML           IR

                   !
•                               ex.!
                       !
•  10
        !

More Related Content

Viewers also liked

мастер класс по эл.коммерции
мастер класс по эл.коммерциимастер класс по эл.коммерции
мастер класс по эл.коммерцииgerlakh
 
20130611 java concurrencyinpracticech7
20130611 java concurrencyinpracticech720130611 java concurrencyinpracticech7
20130611 java concurrencyinpracticech7Toshiaki Toyama
 
แนะนำตัว
แนะนำตัวแนะนำตัว
แนะนำตัวnamthip
 

Viewers also liked (8)

Powerpoint
PowerpointPowerpoint
Powerpoint
 
Bertha Adler
Bertha AdlerBertha Adler
Bertha Adler
 
Powerpoint
PowerpointPowerpoint
Powerpoint
 
мастер класс по эл.коммерции
мастер класс по эл.коммерциимастер класс по эл.коммерции
мастер класс по эл.коммерции
 
20120423 hbase勉強会
20120423 hbase勉強会20120423 hbase勉強会
20120423 hbase勉強会
 
20130611 java concurrencyinpracticech7
20130611 java concurrencyinpracticech720130611 java concurrencyinpracticech7
20130611 java concurrencyinpracticech7
 
Desk Research 2.0
Desk Research 2.0Desk Research 2.0
Desk Research 2.0
 
แนะนำตัว
แนะนำตัวแนะนำตัว
แนะนำตัว
 

Introduction to Information Retrieval Chapter 10

  • 2. 10.0 ! ! 10.1 ! XML ! 10.2 !XML ! 10.3 !XML ! 10.4 !XML ! ! ! h,p://nlp.stanford.edu/IR=book/ppt/10xml.pptx
  • 3. 10.0 ! ! 10.1 ! XML ! 10.2 !XML ! 10.3 !XML ! 10.4 !XML
  • 4. RDB •  RDB ! •  IR ! •  RDB ! –  ! •  !
  • 5. (Structured!Retrieval) •  or ! DB (named!enJty!tagging)! ! !  :! ! !  :! 4,405,829 RSA ! !  :! !
  • 6. RDB •  3 ! –  (DB) ! –  !−! ! •  tours!AND!(COUNTRY:!VaJcan!OR! •  LANDMARK:!Coliseum)?! •  tour!AND!(STATE:!VaJcan!OR!BUILDING:!Coliseum)?! –  ! •  ! –  !
  • 7. •  XML! –  ! !→!XML ! –  (HTML,!SGML,!…)
  • 8. 10.0 ! ! 10.1 ! XML ! 10.2 !XML ! 10.3 !XML ! 10.4 !XML
  • 9. XML !  ! !<play>! !  XML / !<author>Shakespeare</author>! (e.g.!<Jtle… !<Jtle>Macbeth</Jtle>! >,!</Jtle…>)!! !<act!number=“I”>! !  XML (e.g.!number)! !<scene!!number=”vii”>! !  (e.g.!vii)! !<Jtle>Macbeth’s!castle</Jtle>! !  !(e.g.! !<verse>Will!I!with!wine! Jtle,!verse)! !…</verse>! !</scene>! !</act>! !</play>!
  • 10. XML root!element! play! !element! !element! !element! author! act! )tle! !text! !text! Shakespeare! Macbeth! !a,ribute! element! number=“I”! scene! !a,ribute! !element! !element! number=“vii”! verse! )tle! !text! !text! Shakespeare! Macbeth’s!castle! 10! 10!
  • 11. XML root!element! play! !element! !element! !element! author! act! )tle! !text! !text! Shakespeare! Macbeth! !a,ribute! element! number=“I”! scene! !a,ribute! !element! !element! number=“vii”! verse! )tle! !text! !text! Shakespeare! Macbeth’s!castle!
  • 12. XML root!element! play! !element! !element! !element! author! act! )tle! !text! !text! Shakespeare! Macbeth! !a,ribute! element! number=“I”! scene! !a,ribute! !element! !element! number=“vii”! verse! )tle! !text! !text! Shakespeare! Macbeth’s!castle! 12! 12!
  • 13. XML !  XML$Documents$Object$Model$(XML$DOM):! ! !  DOM ! !  DOM!API XML ! !  XPath:!XML e.g.NEXI ! !  XML ! !  Schema:! XML E.g.! :! (scene) (act) ! !  XML :!XML!DTD!(document! type!definiJon)! !XML!Schema!
  • 14. 10.0 ! ! 10.1 ! XML ! 10.2 !XML ! 10.3 !XML ! 10.4 !XML
  • 15. XML 1.  ! 2.  ! 3.  !
  • 16. ! ! •  /XML :! (i.e.,!XML ) ! ! Macbeth’s!castle !scene act! ! –  scene ! –  Macbeth play ! • 
  • 17. ! ! ! •  ! •  E.g.! query:Jtle:Macbeth! Macbeth tragedy,!Macbeth Jtle,!Act!I,!Scene!vii,!Macbetch’s! castle Jtle 10.2 ! –  tragedy Jtle ! – 
  • 18. ! •  ! –  ! –  ! 1.  ! 2.  ! 3.  ! 4. 
  • 19. •  ! •  ! •  !
  • 20. •  ! 1.  E.g.! book! ! 2.  ! •  book
  • 21. •  ! –  ! –  !
  • 22. •  :! :! –  XML E.g.!ISBN!! –  ! ! Macbeth’s!castle! Macbeth’s!castle play,!act,!scene,!)tle ! •  !
  • 23. ! ! •  ! –  ! –  XML ! –  ! –  ! • 
  • 24. ! ! ! •  ! •  ! !  1:! ! !  2:! !
  • 25. •  (idf) ! ! author Gates gate Gates ! ! XML=context/term( / ) ! •  XML=context df ! •  x x
  • 26. 10.0 ! ! 10.1 ! XML ! 10.2 !XML ! 10.3 !XML ! 10.4 !XML
  • 27. •  :! XML Microso.$ Bill$ Gates$ Book! Title! Author! Author! Title! Author! Microso.$ Bill$ Gates$ Microso.$ Bill$ Gates$ Book! Book! .!.!.!! Title! Author! Microso.$ Bill$ 27! Gates$
  • 28. 1.  Bill!Gates Bill Gates ! 2.  ! Microso.$ Bill$ Gates$ Book! Title! Author! Author! Title! Author! Microso.$ Bill$ Gates$ Microso.$ Bill$ Gates$ Book! Book! .!.!.!! Title! Author! Microso.$ Bill$ 28! Gates$
  • 29. •  E.g.! $vs.$ $ XML !
  • 30. •  ! –  e.g.!author Jtle Gates ! –  ! •  XML=context/term XML=context/term (structural!term) <c,!t>! XML context c (term)t !
  • 31. (context!resemblance) •  cq ! cd ! !CR! :! •  |cq|! |cd|! ! •  cq ! cd! cq ! cd ! !
  • 32. CR(cq4,!cd2)!=!3/4!=!0.75.!! q! d! idenJcal !CR(cq,!cd)! !1.0! !
  • 34. •  SIMNOMERGE! ! SIMNOMERGE(q,!d)!=! ! ! •  V ! •  B XML ! •  weight!(q,!t,!c),!weight(d,!t,!c)! !q! !d! XML !c! !t! ( E.g.!idft!*!wft,d!:!idft! dft! )!! •  SIMNOMERGE(q,!d)! 1.0 !
  • 36. 10.0 ! ! 10.1 ! XML ! 10.2 !XML ! 10.3 !XML ! 10.4 !XML
  • 37. XML (INEX) •  INEX:!XML INEX2002 IEEE 12,000 2006 Wikipedia INEX$2002$collec@on$sta@s@cs$ 12,107! number!of!documents! 494!MB! size! 1995—2002! Jme!of!publicaJon!of!arJcles! 1,532! average!number!of!XML!nodes!per!document! 6.9! average!depth!of!a!node! 30! number!of!CAS!topics! 30! number!of!CO!topics!
  • 38. INEX •  2 / ! 1.  (CO )! 2.  (CAS )! CAS !
  • 39. INEX •  :! 1.  Content=only!or!CO :! ! 2.  Content=and=structure!or!CAS :! ! CAS !
  • 40. INEX •  INEX!2002! ! $ ! •  ! 1.  E :! ! 2.  S :! ! 3.  L ! 4.  N
  • 41. INEX •  (3) (2) (1) (0) $ (ex.!3E!→! (3) (E) ) 2S 3E 3N !
  • 42. INEX •  ! •  XML Q / A
  • 43. INEX •  2 ! $ •  INEX ! ! •  INEX !
  • 44. XML XML •  ! –  XML! •  XML ! •  ! 1.  ! 2.  ! 3.  ! –  XML! •  ! •  → ! –  ! •  XML XQuery!(W3C)
  • 45. •  ( )XML IR ! •  ex.! ! •  10 !