Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Lecture: Automata

16.635 visualizaciones

Publicado el

Finite-State Automata (FSA or FA)
Deterministic vs Non-Deterministic Finite-State Automata

Publicado en: Educación
  • Inicia sesión para ver los comentarios

Lecture: Automata

  1. 1. Automata   slideshare:  h0p://­‐45326059       Mathema6cs  for  Language  Technology   h0p://     Last  Updated:  6  March  2015   Marina  San6ni   san$     Department  of  Linguis6cs  and  Philology   Uppsala  University,  Uppsala,  Sweden     Spring  2015   1
  2. 2. Acknowledgements   •  Several  slides  borrowed  from  Jurafsky  and  Mar6n  (2009).   •  Prac6cal  ac6vi6es  by  Mats  Dahllöf  and  Jurafsky  and  Mar6n   (2009).   2
  3. 3. Reading   •  Required  Reading:   –  E&G  (2013):  Ch.  9  (pp.  243-­‐252)   –  Compendium  (3):  6.1,  7.1   –  M.  Dahllöf:  Ändliga  automater     •  h0p://   •  Further  Reading:   –  Chapter  2  in  Jurafsky  D.  &  Mar6n  J.  (2009)  Speech  and  Language   Processing:  An  introduc5on  to  natural  language  processing,   computa5onal  linguis5cs,  and  speech  recogni5on.  Online  dra^  version:   h0p://^ %202007.pdf     Advanced  Level        *Coursera  –  Automata,  by  Jeff  Ullman  (h0ps://   3
  4. 4. Outline   •  Automata  Theory   •  Finite-­‐State  Automata  (FSA  or  FA)   •  Determinis6c  vs  Non-­‐Determinis6c   •  Prac6cal  Ac6vi6es   4
  5. 5. What  is  an   Automaton?   •  Generally  speaking,   an  automaton   (plural:  automata)  is   a  self-­‐opera6ng   machine.   5 Below: A mechanical man designed to write with a pen (source: Hugo by Martin Scorsese, 2011)
  6. 6. In  Mathema6cs…   •  An  automaton  is  an  abstract  machine,  ie  a   mathema6cal  model.   •  In  par6cular,  a  finite-­‐state  machine  or  finite-­‐ state  automaton,  is  a  mathema$cal  model  of   computer  hardware  and  so^ware,  mostly  used   for  compilers  (cf.  programming  languages)  and   natural  language  processing  (NLP)  and   computa6onal  linguis6cs.   6
  7. 7. What  is  Automata  Theory?   •  Study  of  abstract  compu5ng  devices,  or  “machines”   •  Automaton  =  an  abstract  compu6ng  device   –  Note:  A  “device”  need  not  even  be  a  physical  hardware!   7
  8. 8. Alan  Turing  (1912-­‐1954)   •  Father  of  Modern  Computer  Science   •  English  mathema6cian   •  Studied  abstract  machines  called  Turing   machines  even  before  computers  existed   » Heard  of  the  Turing  test?   8 (A pioneer of automata theory)
  9. 9. Turing  Machine   •  Turing  machines,  first  described  by  Alan  Turing  in  (Turing  1937),  are   simple  abstract  computa6onal  devices  intended  to  help  inves6gate  the   extent  and  limita6ons  of  what  can  be  computed.   •  Turing  was  interested  in  the  ques6on  of  what  it  means  for  a  task  to  be   computable,  which  is  one  of  the  founda6onal  ques6ons  in  the  philosophy   of  computer  science.     •  Intui6vely  a  task  is  computable  if  it  is  possible  to  specify  a  sequence  of   instruc6ons  which  will  result  in  the  comple6on  of  the  task  when  they  are   carried  out  by  some  machine.   •  h<p://­‐machine/     9
  10. 10. Turing  Test   •  The  Turing  test  is  a  test  of  a  machine's  ability  to  exhibit  intelligent  behavior  equivalent  to,   or  indis6nguishable  from,  that  of  a  human.   •  The  Turing  Test,  defined  by  Alan  Turing  in  1950  as  the  founda6on  of  the  philosophy  of   ar6ficial  intelligence.   •  Turing  put  forward  the  idea  of  an  'imita5on  game',  in  which  a  human  being  and  a   computer  would  be  interrogated  under  condi6ons  where  the  interrogator  would  not  know   which  was  which,  the  communica6on  being  en6rely  by  textual  messages.     •  Turing  argued  that  if  the  interrogator  could  not  dis6nguish  them  by  ques6oning,  then  it   would  be  unreasonable  not  to  call  the  computer  intelligent,  because  we  judge  other   people's  intelligence  from  external  observa6on  in  just  this  way.   •  Turing's  'imita6on  game'  is  now  usually  called  'the  Turing  test'  for  intelligence.   10
  11. 11. Module  3…   3.1  Determinis6c  finite-­‐state  automata   3.2  Regular  expressions   3.3  Context-­‐free  grammars   11
  12. 12. Let’s  build  a  sheeptalk   recognizer     12 The sheep language contains any string from the following infinite set: … baaa! baaaa! baaaaa! …. Fåret Shaun
  13. 13. FSAs  (graph  nota6on)   •  Direct  graph:   –  Finite set of vertices (aka nodes) –  A set of directed links between pairs of vertices (aka arcs) 13 … baaa! baaaa! baaaaa! …. The machine starts in the start state (q0), and iterates the following process: •  Check the next letter of the input. •  If it matches the symbol on an arc leaving the current state, then cross that arc, move to the next state, and also advance one symbol in the input. If we are in the accepting state (q4) when we run out of input, the machine has successfully recognized an instance of sheeptalk. •  If the machine never gets to the final state, either because it runs out of input, or it gets some input that doesn’t match an arc, or if it just happens to get stuck in some non-final state, we say the machine REJECTS or fails to accept an input.
  14. 14. Tape  nota6on  for  baaa!   •  Tradi6onally,  (Turing’s  no6on)  this  process  is   depicted  with  a  tape.     14
  15. 15. State-­‐Transi6on  Table   15 … baaa! baaaa! baaaaa! ….
  16. 16. Sheep  FSA   •  We  can  say  the  following   things  about  this  machine:   –  It  has  5  states  (the  start  state  is   included  in  the  count)   –  b,  a,  and  !  are  in  its  alphabet   –  q0  is  the  start  state   –  q4  is  an  accept  state   –  It  has  transi6ons   16
  17. 17. About  Alphabets   •  Do  not  take  term  alphabet  word  too  narrowly;   it  just  means  we  need  a  finite  set  of  symbols   in  the  input.   •  These  symbols  can  and  will  stand  for  bigger   objects  that  can  have  internal  structure.     17
  18. 18. More  Formally   •  You  can  specify  an  FSA  by  enumera6ng  the   following  things.   – The  set  of  states:  Q   – A  finite  alphabet:  Σ   – A  start  state   – A  set  of  accept/final  states   – A  transi6on  func6on  that  maps  QxΣ  to  Q   18
  19. 19. Even  more  formally   A  sequence  of  states  q0,q1,q2,....,  qn,  where  qi  ∈  Q  such  that  q0  is  the  start  state  and  qi  =   δ(qi-­‐1,ai)  for     0  <  i  ≤  n,  is  a  run  of  the  automaton  on  an  input  word  w  =  a1,a2,....,  an  ∈  Σ.       In  other  words,  at  first  the  automaton  is  at  the  start  state  q0,  and  then  the  automaton  reads   symbols  of  the  input  word  in  sequence.  When  the  automaton  reads  symbol  ai  it  jumps  to  state   qi  =  δ(qi-­‐1,ai).  qn  is  said  to  be  the  final  state  of  the  run.   19 qi=state ai=input symbol transition=jump
  20. 20. Recogni6on…   •  Recogni6on  is  the  process  of  determining  if  a   string  should  be  accepted  by  a  machine   •  Or…  it  is  the  process  of  determining  if  a  string  is   in  the  language  we  want  to  define  with  the   machine   20
  21. 21. Determinis6c   •  Determinis6c  means  that  at  each  point  in   processing  there  is  always  one  unique  thing  to   do  (no  choices).   21
  22. 22. A  Determinis6c  FSA…   •  …  has  no  choice  points.  It  always  knows  that  to  do   for  any  input.     •  Reasoniing:     -­‐-­‐  Before  examining  the  beginning  of  the  tape,  the   machine  is  in  state  q0.     -­‐-­‐  Finding  a  b  on  input  tape,  it  changes  to  state  q1  as   indicated  by  the  contents  of  transi6on-­‐table[q0,b].     -­‐-­‐  It  then  finds  an  a  and  switches  to  state  q2,  another  a   puts  it  in  state  q3,  a  third  a  leaves  it  in  state  q3,  where  it   reads  the  “!”,  and  switches  to  state  q4.     -­‐-­‐  Since  there  is  no  more  input,  the  end  of  input   condi6on  at  the  beginning  of  the  loop  is  sa6sfied  for  the   first  6me  and  the  machine  halts  in  q4.  State  q4  is  an   accep6ng  state  so  the  machine  has  accepted  the  sheep   language.     The  automaton  will  fail  whenever  there  is  no  legal   transi6on  for  a  given  combina6on  of  state  and  input!   22 baaa!
  23. 23. empty  state  =  fail  state  or  sink  state   23 We have always somewhere to go from any state on any possible input.
  24. 24. 24 Example  (Q,Σ,δ  ?  )   •  Modeling  recogni6on  of  the  word  “then”   Start state Final stateTransition Intermediate state
  25. 25. A  more  complete  version   25
  26. 26. Another  Example     Recognizing  Strings  ending  in  “ing”   26 nothing Saw i i Not i Saw ing g i Not i or g Saw in n i Not i or n Start i Not i
  27. 27. When  modelling  an  automaton…   •  It  is  YOU  who  decide  the  op6mal  number  of   states  that  account  for  the  input  strings  you   want  to  process  for  the  purpose  you  have  in   mind…  always  mo6vate  your  choices  and  be   ready  to  any  plausible  amend  or  upgrade…   27
  28. 28. Formal  Languages   28 The usefulness of an automaton for defining a language is that it can express an infinite set in a closed form.
  29. 29. Natural  Languages   •  Formal  languages  are  not  the  same  as  natural   languages,  which  are  the  kind  of  languages  that   real  people  speak.   •  A  formal  language  may  bear  no  resemblance  at   all  to  a  real  language  (e.g.,  a  formal  language  can   be  used  to  model  the  different  states  of  a  soda   machine).     •  We  o^en  use  a  formal  language  to  model  part  of   a  natural  language,  such  as  parts  of  the   phonology,  morphology,  or  syntax.   29
  30. 30. Genera6ve  Formalisms   •  Formal  Languages  are  sets  of  strings  composed   of  symbols  from  a  finite  set  of  symbols.   •  Finite-­‐state  automata  define  formal  languages   (without  having  to  enumerate  all  the  strings  in   the  language)   •  The  term  Genera5ve  is  based  on  the  view  that   you  can  run  the  machine  as  a  generator  to  get   strings  from  the  language.   30
  31. 31. Genera6ve  Formalisms:     Acceptors  and  Recognizers   •  FSAs  can  be  viewed  from  two  perspec6ves:   – Acceptors  that  can  tell  you  if  a  string  is  in  the   language   – Generators  to  produce  all  and  only  the  strings  in   the  language   31 We can use our automaton also for generating sheeptalk!
  32. 32. Determinis6c  vs  Non-­‐Determinis6c   32
  33. 33. Determinis6c  FSA  vs  Non-­‐Determinis6c  FSA   33 When we get to state 2, if we see an a we do not know whether to remain in state 2 or to go on to state 3. … baaa! baaaa! baaaaa! ….
  34. 34. Non-­‐Determinism  cont.   •  Yet  another  technique   – epsilon  transi6ons   – Key  point:  these  transi6ons  do  not  examine  or   advance  the  tape  during  recogni6on   34 baaa! baaaa! baaaaa! …. If we are in state 3, we are allowed to move to state 2 without looking at the input or advancing our input pointer
  35. 35. Solu6ons   •  Backup   •  Look-­‐ahead   •  Parallelism   35
  36. 36. Backup:  Example   36
  37. 37. Example   37
  38. 38. Example   38
  39. 39. Example   39
  40. 40. Example   40
  41. 41. Example   41
  42. 42. Example   42
  43. 43. Example   43
  44. 44. Key  Points   •  States  in  the  search  space  are  pairings  of  tape   posi6ons  and  states  in  the  machine.   •  By  keeping  track  of  as  yet  unexplored  states,  a   recognizer  can  systema6cally  explore  all  the   paths  through  the  machine  given  an  input.   44
  45. 45. Non-­‐Determinis6c  Recogni6on:  Search   •  In  a  ND  FSA  there  exists  at  least  one  path  through   the  machine  for  a  string  that  is  in  the  language   defined  by  the  machine.   •  But  not  all  paths  directed  through  the  machine  for   an  accept  string  lead  to  an  accept  state.   •  No  paths  through  the  machine  lead  to  an  accept   state  for  a  string  not  in  the  language.   45
  46. 46. Non-­‐Determinis6c  Recogni6on   •  So  success  in  non-­‐determinis6c  recogni6on   occurs  when  a  path  is  found  through  the   machine  that  ends  in  an  accept.   •  Failure  occurs  when  all  of  the  possible  paths   for  a  given  string  lead  to  failure.     46
  47. 47. Equivalence   •  Non-­‐determinis6c  machines  can  be   converted  to  determinis6c  ones  with  a  fairly   simple  construc6on;   •  That  means  that  determinis$c  and  non-­‐ determinis$c  automata  have  the  same   power  (for  mathema6cal  proofs  see   Hopcro^  et  al.  2007);   •  Non-­‐determinis6c  machines  are  not  more   powerful  than  determinis6c  ones  in  terms  of   the  languages  they  can  accept.   47
  48. 48. 48
  49. 49. Ques6ons   49
  50. 50. Q1:  Automaton=pre-­‐coded  path   •  Look  at  the  automaton  below:   1.  What  is  the  minimum  length  of  a  string  that  is   accepted  by  this  automaton?   2.  What  is  the  max  length  of  a  string  that  is   accepted  by  this  automaton?   50
  51. 51. Flow-­‐Chart   51
  52. 52. The  loop   52 At Q3 we have 2 possibilities. Depending on the input symbol, we can take a different path. At Q3: --if the input symbol is an ”a”, stay on Q3 --if the input symbol is a ”!”, move to Q4
  53. 53. Q2:  how  can  we  modify  the   automaton  to  make  it  accept   ”ba!”?   53
  54. 54. Repe66on  (1)   54
  55. 55. Repe66on  (2)   A  sequence  of  states  q0,q1,q2,....,  qn,  where  qi  ∈  Q  such  that  q0  is  the  start  state  and  qi  =   δ(qi-­‐1,ai)  for     0  <  i  ≤  n,  is  a  run  of  the  automaton  on  an  input  word  w  =  a1,a2,....,  an  ∈  Σ.       In  other  words,  at  first  the  automaton  is  at  the  start  state  q0,  and  then  the  automaton  reads   symbols  of  the  input  word  in  sequence.  When  the  automaton  reads  symbol  ai  it  jumps  to  state   qi  =  δ(qi-­‐1,ai).  qn  is  said  to  be  the  final  state  of  the  run.   55 qi=state ai=input symbol transition=jump
  56. 56. Repe66on:  dead  state  (sink  state)   •  Is  the  dead  state  (q5)  included  in  Q?  Controversial!   •  Some  say  that  it  should  be  included   •  Some  say  it  should  not  be  included   •  Some  suggest  that  dead  states  should  be  removed  from  DFAs   •  Some  suggest  that  a  DFAs  should  absolutely  have  a  dead  state…   •  At  this  stage,  we  are  flexible  about  it…    J     56 This example comes from Chalmers Uni
  57. 57. Prac6cal  Ac6vity  1   •  The  language  L  contains  all  strings  over  the   alphabet  {a,b}  that  begin  with  a  and  end  with  b,   ie:   •  Draw  a  determinis$c  finite-­‐state  automaton   that  accepts  the  language  L.       57
  58. 58. Prac6cal  Ac6vity  1:     Possible  Solu6on   58
  59. 59. Any  random  sequence  of  0,1:   what  is  the  minimum  length  with  is   automaton?   59
  60. 60. Important:   •  Every  transi6on  must  carry  at  least  an  input   symbol   •  When  we  have  a  epsilon(or  lambda)   transi6on,  we  must  mark  the  transi6on  with  ε   (or  λ)   60
  61. 61. Determinis6c  vs  Non-­‐Determinis6c   •  In  a  Determinis6c  Finite  Automa6on,  only  one  transi6on  out  of  state  is   possible  on  the  same  input  symbol.     •  On  the  other  hand,  in  NonDeterminis6c  Finite  Automata  more  than  one   transi6ons  may  possible  for  same  input  symbol.   •  In  DFA  no  state  has  epsilon  transi6on.   •   Although,  NDFA  seems  to  be  more  flexible  technique,  DFA  is  chosen  for   many  purposes  as  it  is  very  much  easy  to  implement.     •  They  have  exactly  the  same  expression  power.   61
  62. 62. Good  example  in  the  compendium:  apa   Determinis6c  vs  non-­‐determinis6c  (strangely  …)               62
  63. 63. Prac6cal  Ac6vity  2   •  Build  a  determinis6c/non-­‐determinis6c  finite-­‐ state  automaton  that  accounts  for  numbers   from  1  to  99.   63
  64. 64. Prac6cal  Ac6vity  2   Possible  Solu6on   64
  65. 65. Prac6cal  Ac6vity  3   •  The  language  L  contains  all  strings  over  the   alphabet  {ab…ab  |  n  ≥  1}  where  the  sequence   ab  should  be  repeated  at  least  twice.     65
  66. 66. Prac6cal  Ac6vity  3:   Possible  Solu6on   shortest  string:  ”abab”   66
  67. 67. Exercises:  E&G  (2013)   •  Övning  9.38   •  Op6onal:  as  many  as  you  can   •  A^er  having  completed  the  exercises,  check   out  the  solu6ons  at  the  end  of  the  book.       67
  68. 68. The  End   68