SlideShare una empresa de Scribd logo
1 de 43
Gradient Algorithms,
  Robustness, and Partial
      Observability
- In the context of Cortical Neural Control
             using Rat Model




                Jennie Si
    Department of Electrical Engineering
         Arizona State University


                 si@asu.edu                NSF ADP 2006
Motivation/Challenge/Societal Impact

• Introduce an interesting platform to study the higher function
  of the brain (the frontal cortical area and the motor area) in
  decision and control using designed control tasks
• Use systems tools (ADP, MDP, CI…) to understand some
  fundamental science questions
• Need to develop new tools: technology centered designs and
  theory centered analysis
• Inspire new ways of thinking about complex systems




                              si@asu.edu                 NSF ADP 2006
Background on cortical motor control

• Center-out task and preferred direction
• Population coding of movement direction and speed
• Motor cortical neural activity as a predictive signal, preceding
  movement onset
• Brain-machine interface: open loop vs. close loop solution




                             si@asu.edu                 NSF ADP 2006
Cortical neural signal extraction: non-invasive vs. invasive recording


• EEG
   – Rhythms β and μ, P300, Slow cortical potential (SCP)
   – Sampling rate 200-1000Hz,
   – # of channels, from 1 or 2 to 128 or 256
• Electrodes
   – Bioactive, allowing growth of nerve, or bio-inactive multiple
     mircowires or multichannel electrode arrays
   – Superficial motor areas or deep brain structures
   – Primary motor, parietal, premotor, frontoparietal, basal ganglia




                                 si@asu.edu                      NSF ADP 2006
Cortical neural signal extraction: ECoG

electrodes for online control are circled
spectral correlations of ECoG with target
location (color encodes patients)




                                                                resting




                                                           imagining saying
                                                           the word ‘move’
                                                                                          (d) Imagery is associated
                                                                                          with decrease in µ (8–12 Hz)
                                                                                          and β (18–26 Hz) bands.
                                            A brain–computer interface using
                                            electrocorticographic signals in humans*
                                            Leuthardt et al 2004 J. Neural Eng. 1 63-71

                                                   si@asu.edu                                    NSF ADP 2006
•Motor and Thalamic Regions
                                                        •Used large number (40-60) of neurons
                                                        •Regress the position of a water dripper arm
                                                        •Used recurrent Neural Network




Chapin, J.K.; Moxon, K.A.; Markowitz, R.S.; and Nicolelis, M.A.L. (1999) Real-time control of a
robot arm using simultaneously recorded neurons in the motor cortex. Nature Neurosci.,
2:664-670.


                                             si@asu.edu                                NSF ADP 2006
a, b, Trial examples showing the movement by hand (green) and by neural reconstruction (blue) of a cursor
to a target (red). Dotted outlines represent the actual circumference of the target and cursor on the screen.
In a, hand motion resembles the neurally controlled cursor path; in b, no manipulandum motion occurred,
but the neurally controlled cursor reached the target. Each dot represents an estimate of position, updated
at 50-ms intervals. Axes are in x, y screen coordinates (1,000 units corresponds to a visual angle of 3.5°);
note that the two trials take place in different parts of the workspace.

 •    SERRUYA, HATSOPOULOS, PANINSKI, FELLOWS & DONOGHUE. Instant
      neural control of a movement signal, NATURE 416 (6877): 141-142 MAR
      14 2002
       –   Monkey, Utah array, motor cortex,
       –   2D cursor position and velocity, Linear and Kalman Filters,
       –   a few (7–30) MI neurons
       –   careful calibration can lead to reasonable control without excessive training

                                                si@asu.edu                                NSF ADP 2006
•   Taylor, Dawn M., Tillery, Stephen I. Helms, Schwartz, Andrew B.,Direct Cortical
    Control of 3D Neuroprosthetic Devices, Science 2002 296: 1829-1832
     – Monkey, microwire, motor and pre-motor cortex
     – 3D cursor velocity, adaptive version of Population Vectors
     – Showed small numbers of neurons can be used to control a three dimensional cursor and
       that neurons trained to control a cursor can control a real robot for feeding


                                         si@asu.edu                             NSF ADP 2006
• Carmena JM, Lebedev MA, Crist RE, et al., Learning to control a brain-
  machine interface for reaching and grasping by primates, PLOS
  BIOLOGY 1 (2): 193-208 NOV 2003
    – Monkey,
    – high density array of 128 microwires, Motor, Premotor, Supplimentary Motor,
      Posterior Parietal, and Sensory Cortex
    – 2D cursor position and velocity and gripping force, Linear Filters


                                    si@asu.edu                        NSF ADP 2006
-   Parietal reach region
    (PRR)
-   Cognition-based
    prosthetic goal rather
    than trajectory
-   Performance improved
    over a period of weeks.
-   Expected value signals
    related to fluid
    preference, the
    expected magnitude, or
    probability of reward
    were decoded
    simultaneously with the
    intended goal.
Musallam, S., Corneil, B. D.,
   Greger, B., Scherberger,
   H., and Andersen, R. A.
   (2004). "Cognitive Control
   Signals for Neural
   Prosthetics", Science, Vol
   305, Issue 5681, 258-262



                                si@asu.edu   NSF ADP 2006
Driving tasks




• The arena for training rats to drive the
robot towards one of the light




                                              si@asu.edu     NSF ADP 2006
Question asked



• How does the rat develop a control strategy to complete
  the driving tasks (under different time scale and spatial
  complexity)?




                          si@asu.edu               NSF ADP 2006
Neuroscientific evidence

• Multimodal association area - anterior association area
  (prefrontal cortex) integrating different sensory modalities and
  linking them to action
• Macaque and rat prefrontal cortex receives multimodal
  cortico-cortical projections from motor, somatosensory, visual,
  auditory, gustatory, and limbic cortices
• Prefrontal areas provide cognitive, sensory or motivational
  inputs for motor behavior (rastral region in rat)
• Motor areas are concerned with more concrete aspects of
  movement (caudal region in rat)


                             si@asu.edu                NSF ADP 2006
One step at a time…




First, a directional control task with only high level control commands




                          si@asu.edu                        NSF ADP 2006
The Brain-Controlled Vehicle

  Neural Interface                                         Signal Processing
                      Neural Signals
                                                   Algorithms/Command Extraction



Directiona
 l control                                                         Control
                                                                   Command

                              Vehicle State Signal
             Environmental Feedback

                                                                          Vehicle

                                                     Sensors

                                      si@asu.edu                    NSF ADP 2006
Goals

• To decode the directional control decision as a predictive
  signal from motor cortical neural activities
• To associate motor neural activities with motor behavior and
  thus to develop models to possibly interpret neural mechanism
  of cortical motor directional control




                           si@asu.edu                NSF ADP 2006
•   male Sprague-Dawley rats
•   2×4 arrays of 50µm tungsten wires coated with
    polyimide
•   spaced 500µm apart for a size of approximately
    1.5mm×0.5mm.
•   The implant site targets the rostral region




                                       From Kolbe The Cerebral Cortex of the Rat, 1990
                                   si@asu.edu                       NSF ADP 2006
Brain Control Diagram

                                                                 Feedback - Visual,
        Neural Signals                                           Auditory & Reward




                                                                                      Task
Recording                                                                           Execution
 System



                              NAV -     K × L dimensional                                 − 1, Left
                                              vector                                      
                                                                                          + 1, Right
                               Neuron 1      ···   Neuron L
                               Bin 1 ... K         Bin 1 ... K   Computation
                     Binned
                                                                 of Directional
      Spike times     Data    Neural Activity Vector                               Decision
                                                                    Control
                                      (NAV)
                                                                    Decision



                                      si@asu.edu                                  NSF ADP 2006
Perievent Histograms
                                                                     Rdar36
                                       Left Hits                                                          Right Hits
                            sig001a                           sig005a                           sig001a                          sig005a
             200                               40                                200                               40
             100                               20                                100                               20
              0                                0                                  0                                 0
                  -2   -1     0       1    2        -2   -1     0       1    2        -2   -1     0       1    2       -2   -1     0       1     2
                            sig002a                           sig005b                           sig002a                          sig005b
             60                                                                  120                               30
             40                                20
                                                                                  80                               20
             20                                10                                 40                               10
              0                                 0                                  0                                0
                  -2   -1     0       1    2        -2   -1     0       1    2        -2   -1     0       1    2       -2   -1     0       1     2
                            sig003a                           sig006a                           sig003a                          sig006a
                                               80                                                                  80
             40                                                                  40
                                               40                                                                  40
counts/bin




              0                                 0                                 0                                 0
                  -2   -1     0       1    2        -2   -1     0       1    2        -2   -1     0       1    2       -2   -1     0       1     2
                            sig003b                           sig007a                           sig003b                          sig007a
             80                                                                60
                                               80                                                                80
             40                                                                40
                                               40                              20                                40
              0                                 0                               0                                 0
                  -2   -1     0       1    2        -2   -1     0       1    2    -2       -1     0       1    2    -2      -1     0       1     2
                            sig004a                           sig007b                           sig004a                          sig007b
             120                                                                 150                               8
              80                               4                                 100                               4
              40                                                                  50
               0                               0                                   0                               0
                  -2   -1     0       1    2        -2   -1     0       1    2        -2   -1     0       1    2       -2   -1     0       1     2
                            sig004b                           sig008a                           sig004b                          sig008a
             40
                                               80                                40                                80
             20
              0                                0                                  0                                0
                  -2   -1      0       1   2        -2   -1      0       1   2        -2   -1      0       1   2       -2   -1      0       1    2
                            Time (sec)                        Time (sec)                        Time (sec)                       Time (sec)




                                                                        si@asu.edu                                                              NSF ADP 2006
Cross validation accuracy boxplots for manual and brain
        control respectively, 5 rats, 8 data sets
                                                              C a c r c , C liba na dBa c n o a n uo s
                                                               V c ua y a r tio n r in o tr l, ll e r n
                                        1


•   Each box shows the
    25-75 quartile,                    0.9

    median values of
    accuracy.                          0.8

•   R3, R5/1, R5/2,
                            V c ua y
                           C Ac r c
    there are fewer than               0.7

    30 trials in each
    brain control data                 0.6
    set.                                                                            C lib2 /7
                                                                                     a 5 5
                                                                                    C libm d n
                                                                                     a    e ia
                                       0.5                                          Ba 2 /7
                                                                                     r in 5 5
                                                                                    Ba m d n
                                                                                     r in e ia



                                       0.4
                                                 R1      R2        R3      R /1
                                                                            4         R /2
                                                                                       4      R /3
                                                                                               4          R /1
                                                                                                           5     R /2
                                                                                                                  5
                                                                               R t/D y
                                                                                a a



                                             Typically 20 runs of randomized 5 fold cross-
                                             validation were performed for each data set.

                                                      si@asu.edu                                          NSF ADP 2006
Modeling rat’s directional control using MDP?

MDPs:

 Finite state space S = {1,2,  , n}
                               {
 Finite action space A i = a1i , a2i ,  , ami   }
 Infinite decision horizon T = { 0,1,2,3, }
 Cost function c(i, a ) discount factor γ (0 < γ < 1)
 Action mapping                         a : S → Ai   a(i ) ∈ A i
 Stationary controller policy π = (a, a, ) π ∈ Π s


                           si@asu.edu                  NSF ADP 2006
Manual lever press following cue
Brain control - “imaginary lever press” following cue




                  si@asu.edu                NSF ADP 2006
Possible implementation
Define 6 possible states:
• Idle – between two trials
• Ready – right before trial start
• Reward – success of a trial
• No-Reward – failure of a trial
• Left experiment state – left cue experiment
• Right experiment state – right cue experiment

The action (control) is the rat’s volition represented by corresponding neural activities

Going from one state to another depends on the current state as well as the action
   taken.

•    The reward can be stated as
    r (LL) = 1; r(LR)=-1 …
    r (RR) = 1; r(RL)=-1 …


                                       si@asu.edu                           NSF ADP 2006
Does this tell us more?

• “Open loop” discrimination and CV analysis provide a baseline
  of relating neural activity (spike trains) to behavioral
  parameters (left/right decision)
• As a decoding tool, can an MDP model tell us more than “open
  loop” analysis?
• MDP model to explain the experiment as a decision process




                           si@asu.edu                NSF ADP 2006
Technicalities

•  How to represent control (start/stop and bin size)
Trial and error, hard to formulate theoretically

• How to compute the transition matrix given uncertainty, partially
  observed sequences of spike trains
We can try to formulate this theoretically…




                              si@asu.edu                     NSF ADP 2006
• Uncertain transition matrices
   – Robust value iteration (Nilim & El Ghaoui, 2005)
   – Robust policy iteration (Satia & Lave, 1973)




                            si@asu.edu                  NSF ADP 2006
Problem formulation

• Classification of uncertain transition matrices
   – Expression of uncertain transition matrices


             P a11     f1a11 (U)      P a (1)   f1a (1) ( U ) 
             1                    π  1                         
             M                M   P = M =               M      
             a         a ji           P a( n )   f a( n ) ( U) 
        P =  Pi ji    =  fi (U)       n   n                    
             M                M  
                                 
             P amn     f (U)  P = { P : U ∈ U }
                              amn
             n         1         




                                     si@asu.edu                       NSF ADP 2006
Problem formulation
• Classification of uncertain transition matrices
         – Definition of uncertain transition matrices

The transition matrix P is correlated if                                    y
                             a                a        a
                 P ⊂ P1 11 × × Pi ji × × P1 mn




                                                                        [
The transition matrix P is independent if
                             a                a        a                 I1             S1
                                                                                        S2
                 P = P1 11 × × Pi ji × × P1 mn
  a
Pi ji is the projection of P on the direction




                                                                        ]
        a ji
of Pi            (i ∈ S a ji ∈ A i )
P π is the projection of P on the direction                                             I2
                                                                                [                   ]
of { P  1
         a (1)
                 ,P2
                    a (2)
                            , , P
                                 n
                                  a( n)
                                          }                                                                 x
                                                                        S1 = I1 × I 2 S 2 ⊂ I1 × I 2


                                                           si@asu.edu                        NSF ADP 2006
Problem formulation
• Classification of MDPs
   – MDPs with independent transition matrices
   – MDPs with correlated transition matrices

• Optimality criterion
   – Minimizing maximum value function for any initial state

                                   π
                          min max vP (i ) = v* (i ) ∀i ∈ S
                          π ∈Π s P∈P

• Stationary optimal policy pair
   (π   *
            , P * ) is optimal if
                π*              π*            π
              v (i ) = max v (i ) = min max v P (i ) for any initial state i ∈ S
                P*              P
                         P∈P           π ∈Π s P∈P



                                        si@asu.edu                 NSF ADP 2006
Problem formulation

• MDPs with independent transition matrices
   – An optimal policy pair exists
   – Robust value iteration and robust policy iteration are applicable

• MDPs with correlated transition matrices
   – An optimal policy pair exists and both iterations are applicable
   – An optimal policy pair exists but both iterations are no longer
     applicable
   – An optimal policy pair does not exist




                                si@asu.edu                       NSF ADP 2006
Questions to be answered

• Sufficient conditions to guarantee that robust value iteration and
  robust policy iteration are applicable;

        • Optimality criterion to make a stationary optimal policy
          pair exist in a weak condition;

                     • Efficient algorithm.




                               si@asu.edu                 NSF ADP 2006
Sufficient conditions
Lemma
 For any given π = (a, a,) ∈ Π s and any given q ∈ ℜ1×n ,
                                                     +


          n×1
        v∈ℜ
                                    (             )
        max qv : v (i ) ≤ g π (v) := c ( i, a(i ) ) + γ amax( i ) Pi a (i ) v
                                                      i  (i ) a
                                                                       Pi   ∈Pi
                                                                                          i∈S           (1)

 For any given q ∈ℜ1×n ,
                   +


       max qv : v (i ) ≤ ( g (v) ) i := min  c ( i, a ) + γ max Pi a v 
                                                                        i∈S (2)
       v∈ℜn×1                           a∈ A i              Pi a ∈Pi a 
 The functions g π and g are monotone non - decreasing and contractive.
 The problems (1) and (2) have the unique optimal solutions denoted as
  π
 v∞ and v∞ , which are the unique solutions to the fixed - point equations
 v = g π (v ) and v = g (v), respectively.
 The optimal transition probility rows are given by

        (            )                        {           }
                         *
                                                   π
            Pi a ( i ) ∈ arg amax( i ) Pi a ( i ) v∞
                              (i) a
                                                              i ∈ S , which constitute ( Pπ )*          (3)
                               Pi       ∈Pi


        ( )                             {             }
                 *
            Pi a ∈ arg max Pi a v∞ i ∈ S , a ∈ A i , which constitute ( P)*
                        a a
                                                                                                        (4)
                             Pi ∈Pi

                                                          si@asu.edu                             NSF ADP 2006
Sufficient conditions


                          π
Iterations for obtaining v∞
            π
(1) select v0 ∈ℜn×1 and set k = 0;
(2) compute vk +1 by vk +1 = g π (vk )
             π        π            π

                  π       π             π    π
(3) terminate if vk +1 = vk and output v∞ = vk ;
   otherwise, set k = k + 1 and go to (2)

Iterations for obtaining v∞
(1) select v0 ∈ ℜn×1 and set k = 0;
(2) compute vk +1 by vk +1 = g (vk )
(3) terminate if vk +1 = vk and output v∞ = vk ;
   otherwise, set k = k + 1 and go to (2)


                     si@asu.edu                    NSF ADP 2006
Sufficient conditions

Theorem
     When there exist, for any π ∈ Π s , ( Pπ )*
     defined by (3) is in the set P π , and P*
     defined by (4) is in the set P
     i) A stationary optimal policy pair exists
          under the optimality criterion of
          minimizing maximum value function
          for any initial state
     ii) Robust value iteration is applicable;
     iii) Robust policy iteration is applicable.



                    si@asu.edu                     NSF ADP 2006
Robust value iteration

1. Select v0 ∈ℜn and set k = 0;
2. Compute vk +1 by

                      vk +1 (i ) = min  c(i, a ) + γ max Pi a vk 
                                                                 
                                   a∈ A i            Pi a ∈Pi a  
3. If vk +1 = vk , then go to 4; otherwise increment k by 1 and go to 2
4. Compute π * = (a* , a* ,) and P* defined by

                  a* (i ) ∈ arg min  c(i, a ) + γ max Pi a vk 
                                                              
                                a∈A i             Pi a ∈Pi a  
                  ( )
                        a
                   P*       ∈ arg max{Pi a vk }
                        i          a a
                                  Pi ∈Pi

5. If P* ∈ P, output a stationary optimal policy pair (π * , P* );
   otherwise, the algorithm can not be applied.



                                si@asu.edu                         NSF ADP 2006
Robust policy iteration

1. Initialization : select π 0 = ( a0 , a0 ,) ∈ Π s and set k = 0;
                                         π
2. Policy evaluation : do iteration for v∞k ;
3. Policy improvement : find πk +1 = (ak +1 , ak +1 ,)

                  ak +1 (i ) ∈ arg min  c(i, a ) + γ max Pi a v∞k 
                                          
                                                                 π
                                                                   
                                   a∈ A i            Pi a ∈Pi a   
4. If ππP = k , compute * by
       k +1


                    (P )
                            a                   π
                        *
                                ∈ arg max{Pi a v∞k }   ∀i ∈ S a ∈ A i
                            i          a a
                                      Pi ∈Pi

   and go to 5; otherwise increment k by 1 and go to 2;
5. If P* ∈ P, output a stationary optimal policy pair (π * , P* );
   otherwise, the algorithm can not be applied.



                                    si@asu.edu                         NSF ADP 2006
Sufficient conditions

Example
  S = { 1, 2}     A1 = A 2 = { a1 , a2 }
       P a1   u1
         1
                       1 − u1                  c(1, a1 ) = 1
       a2                  
        P   u3       1 − u3                  c(1, a2 ) = 2
  P =  1a =
       P2 1  1 − u2
                     2
                         u2 
                          2
                                                c(2, a1 ) = 3
       a2  
       P  1− u              
       2          4   u4                    c(2, a2 ) = 4
  U = { u1 , u2 , u3 , u4 }         W = { 0, 0.2, 0.4, 0.6, 0.8,1}
  U = { U : u1 = u3 , u2 = u4 ; u1 , u4 ∈ W} ⇒ Correlated transition matrix P
                                           Independent transition matrix for π , Pπ
  Optimal controller policy π * = a* , a* ,(            )      a* (1) = a1 a* (2) = a1
                            0                  1
                                                
                              0                 1
  Optimal nature policy P = 
                         *
                                                   ∈P
                            0                  1
                                                
                            0                  1

                                    si@asu.edu                                            NSF ADP 2006
New optimality criterion

• Minimizing maximum squared total value function
                                     π 2
                    min max V        P                                                       (5)
                    π ∈Π s P∈P

                                                                                  ′
    Where total value function V                         π
                                                         P   =    (V ) V      π
                                                                              P
                                                                                      π
                                                                                      P

                                                                          ′
                      π
                           (     π          π
                    V = v (1)  v (i )  v (n)
                      P          P          P
                                                             π
                                                             P        )
• Stationary optimal policy pair

 (π             )
                                            2                     2
                                       π*                    π*                              π 2
      *     *
          , P is optimal if V          P*
                                                = max V      P            = min max V        P
                                                   P∈P                        π ∈Π s P∈P




                                      si@asu.edu                                           NSF ADP 2006
New optimality criterion


• Existence of stationary optimal policy pair
  Theorem :
                                       2
    Assuming for any π , max VPπ           exists, a stationary optimal
                              P∈P

    policy pair (π * , P* ) exists in terms of (5)

• Relationship between two optimality criterions
    Optimality criterion of minimizing maximum squared total value
    function generalizes optimality criterion of minimizing maximum
    value function for any initial state




                               si@asu.edu                          NSF ADP 2006
Robust policy iteration under total value function

• Policy evaluation
   – Direct method
                                                 −1
                           ′               ′
                = max ( C )  I − γ ( P π ) 
              π 2
                                                      ( I − γ ( P ))
                             π                                  π      −1
      max V   P                                                           Cπ
       P∈P        P∈P                       
   – Iterative method
                      π
       Iteration for v∞
                                                                                  π * Π 3 Π 2 Π1 Π 0
• Policy improvement
   – Policy improvement in robust policy iteration
       a k +1 (i ) ∈ arg min c(i, a ) + γ max Pi a vk 
                                                      
                         a∈A i            Pi a ∈Pi a  
   – Controller policy elimination
                                                                                 π 2         πk 2
    Necessary condition for optimal policy at k-th iteration V                   Pπ k
                                                                                        ≤V   Pπ k



                                            si@asu.edu                                  NSF ADP 2006
1. Initialization : set k = 0, Π 0 = Π s , M = +∞ and select π 0 = { a0 , a0 ,}
2. Policy evaluation :
     If the condition of iteration for π k is satisfied
                                                                                               2                      2                 2
          (a) use "iterative method" to compute Pπ k ∈ P and VPππkk                                such that VPππkk       = max VPπ k
                                                                                                                             P∈P

     Else
         (b) use "direct method"
3. Policy improvement :
     (a) eliminate controller policies
                                                                                 Algorithm of robust policy
                                                                                 iteration under total value function
                 {
          Π′ = π ∈ Π k : VPπ k
           k
                          π           2      π
                                          ≤ VPπkk
                                                    2
                                                        }
      If Π ′ > 1
           k

         If the condition in Theorem is satisfied
                                                                  2
             (b) Set Π k +1 = Π′ and M = VPπkk
                               k
                                          π
                                                                      and select π k +1 = { a k +1 , ak +1 ,} ∈ Π k +1 by

                                     a∈A i   {
                     a k +1 (i ) ∈ arg min c(i, a) + γ max Pi a vk
                                                        a a
                                                            Pi ∈Pi
                                                                            }
                     If π k +1 = π k , go to 4; otherwise, set k = k + 1 and go to 2
          Else
                             2                                2
             (c) If VPππkk       < M , set M = VPππkk             and Π k +1 = Π ′, and then select π k +1 ≠ π k ∈ Π k +1
                     and set k = k + 1 and go to 2; otherwise, select π k ∈ Π′ − { π k } and set
                                                                        ′    k

                     Π k = Π′ − { π k } and π k = π k and go to 2
                            k
                                                    ′
      Else
             (d) go to 4                                          si@asu.edu                                                  NSF ADP 2006
4. Termination : output (π k , Pπ k ) as a stationary optimal policy pair
Remaining issues toward MDP model of the rat’s neural control strategy

How to estimate uncertain stationary transition matrices in Markov decision
processes using the experimental data collected from the rat’s cortical motor
areas while he performed his control tasks?

Proposed Solution:
D-S theory of evidence is proposed as new models for obtaining set estimation of
stationary transition matrix

Mathematics worked out, need to implement with algorithms and compare with
existing models

Is a POMDP model more feasible? How?

More work needed to give the rat’s cortical neural control mechanism a
reasonable mathematical model




                                   si@asu.edu                       NSF ADP 2006
Acknowledgement

• Support by NSF under ECS-0002098 and ECS-0233529, and partially by
  General Dynamics
• Support by ASU infrastructural funds
• Byron Olson and Jing Hu for work on rat experiment and analysis
• Baohua Li for robust dynamic programming results
• Jiping He for help with experiments
• Useful discussions with many (Dankert, L. Yang, C. Yang, Raghunathan …)
• Lab support by many (Silver, Scanlan, Tian…)




                               si@asu.edu                   NSF ADP 2006

Más contenido relacionado

La actualidad más candente

CV Vaadia - Feb 2017
CV Vaadia - Feb 2017CV Vaadia - Feb 2017
CV Vaadia - Feb 2017
Eilon Vaadia
 
認知與科技 以睡眠為例
認知與科技 以睡眠為例認知與科技 以睡眠為例
認知與科技 以睡眠為例
吉閔 鄭
 

La actualidad más candente (10)

CV Vaadia - Feb 2017
CV Vaadia - Feb 2017CV Vaadia - Feb 2017
CV Vaadia - Feb 2017
 
Let’s master the digital toolkit to harness lifelong neuroplasticity
Let’s master the digital toolkit to harness lifelong neuroplasticityLet’s master the digital toolkit to harness lifelong neuroplasticity
Let’s master the digital toolkit to harness lifelong neuroplasticity
 
Neurally Driven Prosthetics
Neurally Driven ProstheticsNeurally Driven Prosthetics
Neurally Driven Prosthetics
 
認知與科技 以睡眠為例
認知與科技 以睡眠為例認知與科技 以睡眠為例
認知與科技 以睡眠為例
 
Analytical Review on the Correlation between Ai and Neuroscience
Analytical Review on the Correlation between Ai and NeuroscienceAnalytical Review on the Correlation between Ai and Neuroscience
Analytical Review on the Correlation between Ai and Neuroscience
 
STUDY OF BRAIN MACHINE INTERFACE SYSTEM
STUDY OF BRAIN MACHINE INTERFACE SYSTEMSTUDY OF BRAIN MACHINE INTERFACE SYSTEM
STUDY OF BRAIN MACHINE INTERFACE SYSTEM
 
⭐⭐⭐⭐⭐ #IEEE #PRC #YP Puerto Rico and Caribbean (Virtual Summit 2020): Clasifi...
⭐⭐⭐⭐⭐ #IEEE #PRC #YP Puerto Rico and Caribbean (Virtual Summit 2020): Clasifi...⭐⭐⭐⭐⭐ #IEEE #PRC #YP Puerto Rico and Caribbean (Virtual Summit 2020): Clasifi...
⭐⭐⭐⭐⭐ #IEEE #PRC #YP Puerto Rico and Caribbean (Virtual Summit 2020): Clasifi...
 
Teaching Techniques: Neurotechnologies the way of the future (Stotler, 2019)
Teaching Techniques: Neurotechnologies the way of the future (Stotler, 2019)Teaching Techniques: Neurotechnologies the way of the future (Stotler, 2019)
Teaching Techniques: Neurotechnologies the way of the future (Stotler, 2019)
 
Myung - Cognitive Modeling and Robust Decision Making - Spring Review 2012
Myung - Cognitive Modeling and Robust Decision Making - Spring Review 2012Myung - Cognitive Modeling and Robust Decision Making - Spring Review 2012
Myung - Cognitive Modeling and Robust Decision Making - Spring Review 2012
 
METHODS OF COMMAND RECOGNITION USING SINGLE-CHANNEL EEGS
METHODS OF COMMAND RECOGNITION USING SINGLE-CHANNEL EEGSMETHODS OF COMMAND RECOGNITION USING SINGLE-CHANNEL EEGS
METHODS OF COMMAND RECOGNITION USING SINGLE-CHANNEL EEGS
 

Destacado (7)

Damon2011 preview
Damon2011 previewDamon2011 preview
Damon2011 preview
 
Digital Matters from Industry to faculty
Digital Matters from Industry to facultyDigital Matters from Industry to faculty
Digital Matters from Industry to faculty
 
Smartphone project
Smartphone projectSmartphone project
Smartphone project
 
JWT SCRUM - Find Data through Doodles Story
JWT SCRUM - Find Data through Doodles StoryJWT SCRUM - Find Data through Doodles Story
JWT SCRUM - Find Data through Doodles Story
 
Kutsuyanko
KutsuyankoKutsuyanko
Kutsuyanko
 
Satoki (Science Fair)
Satoki (Science Fair)Satoki (Science Fair)
Satoki (Science Fair)
 
Jiwon- Animal Flying Machine
Jiwon- Animal Flying MachineJiwon- Animal Flying Machine
Jiwon- Animal Flying Machine
 

Similar a Jennie sinsfadp06

ML UNIT2.pptx uyftdhfjkghnkgutdmsedjytkf
ML UNIT2.pptx uyftdhfjkghnkgutdmsedjytkfML UNIT2.pptx uyftdhfjkghnkgutdmsedjytkf
ML UNIT2.pptx uyftdhfjkghnkgutdmsedjytkf
mamathamyakaojaiah62
 
Neural networks...
Neural networks...Neural networks...
Neural networks...
Molly Chugh
 
20141003.journal club
20141003.journal club20141003.journal club
20141003.journal club
Hayaru SHOUNO
 
Hailey_Evans NAc VTA Poster 2014
Hailey_Evans NAc VTA Poster 2014Hailey_Evans NAc VTA Poster 2014
Hailey_Evans NAc VTA Poster 2014
Hailey Zie Evans
 
Short presentation about my thesis
Short presentation about my thesis Short presentation about my thesis
Short presentation about my thesis
fourthplanet
 
Modelling and Analysis of EEG Signals Based on Real Time Control for Wheel Chair
Modelling and Analysis of EEG Signals Based on Real Time Control for Wheel ChairModelling and Analysis of EEG Signals Based on Real Time Control for Wheel Chair
Modelling and Analysis of EEG Signals Based on Real Time Control for Wheel Chair
IJTET Journal
 
Brain Machine Interfacenew
Brain Machine InterfacenewBrain Machine Interfacenew
Brain Machine Interfacenew
Tuhin_Das
 

Similar a Jennie sinsfadp06 (20)

A cortical neural network model of visual motion perception for reactive navi...
A cortical neural network model of visual motion perception for reactive navi...A cortical neural network model of visual motion perception for reactive navi...
A cortical neural network model of visual motion perception for reactive navi...
 
Sporns kavli2008
Sporns kavli2008Sporns kavli2008
Sporns kavli2008
 
Dissertation character recognition - Report
Dissertation character recognition - ReportDissertation character recognition - Report
Dissertation character recognition - Report
 
IRJET- Precision of Lead-Point with Support Vector Machine based Microelectro...
IRJET- Precision of Lead-Point with Support Vector Machine based Microelectro...IRJET- Precision of Lead-Point with Support Vector Machine based Microelectro...
IRJET- Precision of Lead-Point with Support Vector Machine based Microelectro...
 
ML UNIT2.pptx uyftdhfjkghnkgutdmsedjytkf
ML UNIT2.pptx uyftdhfjkghnkgutdmsedjytkfML UNIT2.pptx uyftdhfjkghnkgutdmsedjytkf
ML UNIT2.pptx uyftdhfjkghnkgutdmsedjytkf
 
Our Best Ideas in Our Hands with Adaptive Virtual Reality - MCAA - ESOF 2016
Our Best Ideas in Our Hands with Adaptive Virtual Reality - MCAA - ESOF 2016Our Best Ideas in Our Hands with Adaptive Virtual Reality - MCAA - ESOF 2016
Our Best Ideas in Our Hands with Adaptive Virtual Reality - MCAA - ESOF 2016
 
Neural networks...
Neural networks...Neural networks...
Neural networks...
 
20141003.journal club
20141003.journal club20141003.journal club
20141003.journal club
 
OBC | Observing the brain to know ourselves
OBC | Observing the brain to know ourselvesOBC | Observing the brain to know ourselves
OBC | Observing the brain to know ourselves
 
Hailey_Evans NAc VTA Poster 2014
Hailey_Evans NAc VTA Poster 2014Hailey_Evans NAc VTA Poster 2014
Hailey_Evans NAc VTA Poster 2014
 
Bci
BciBci
Bci
 
Bci
BciBci
Bci
 
Short presentation about my thesis
Short presentation about my thesis Short presentation about my thesis
Short presentation about my thesis
 
Modelling and Analysis of EEG Signals Based on Real Time Control for Wheel Chair
Modelling and Analysis of EEG Signals Based on Real Time Control for Wheel ChairModelling and Analysis of EEG Signals Based on Real Time Control for Wheel Chair
Modelling and Analysis of EEG Signals Based on Real Time Control for Wheel Chair
 
FrB18_2_Krishnamoorthy_Venkatasubramanian.pptx
FrB18_2_Krishnamoorthy_Venkatasubramanian.pptxFrB18_2_Krishnamoorthy_Venkatasubramanian.pptx
FrB18_2_Krishnamoorthy_Venkatasubramanian.pptx
 
Neuromorphic computing for neural networks
Neuromorphic computing for neural networksNeuromorphic computing for neural networks
Neuromorphic computing for neural networks
 
SURA presentation_.pptx
SURA presentation_.pptxSURA presentation_.pptx
SURA presentation_.pptx
 
Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networks
 
neuro prosthesis
neuro prosthesisneuro prosthesis
neuro prosthesis
 
Brain Machine Interfacenew
Brain Machine InterfacenewBrain Machine Interfacenew
Brain Machine Interfacenew
 

Último

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 

Jennie sinsfadp06

  • 1. Gradient Algorithms, Robustness, and Partial Observability - In the context of Cortical Neural Control using Rat Model Jennie Si Department of Electrical Engineering Arizona State University si@asu.edu NSF ADP 2006
  • 2. Motivation/Challenge/Societal Impact • Introduce an interesting platform to study the higher function of the brain (the frontal cortical area and the motor area) in decision and control using designed control tasks • Use systems tools (ADP, MDP, CI…) to understand some fundamental science questions • Need to develop new tools: technology centered designs and theory centered analysis • Inspire new ways of thinking about complex systems si@asu.edu NSF ADP 2006
  • 3. Background on cortical motor control • Center-out task and preferred direction • Population coding of movement direction and speed • Motor cortical neural activity as a predictive signal, preceding movement onset • Brain-machine interface: open loop vs. close loop solution si@asu.edu NSF ADP 2006
  • 4. Cortical neural signal extraction: non-invasive vs. invasive recording • EEG – Rhythms β and μ, P300, Slow cortical potential (SCP) – Sampling rate 200-1000Hz, – # of channels, from 1 or 2 to 128 or 256 • Electrodes – Bioactive, allowing growth of nerve, or bio-inactive multiple mircowires or multichannel electrode arrays – Superficial motor areas or deep brain structures – Primary motor, parietal, premotor, frontoparietal, basal ganglia si@asu.edu NSF ADP 2006
  • 5. Cortical neural signal extraction: ECoG electrodes for online control are circled spectral correlations of ECoG with target location (color encodes patients) resting imagining saying the word ‘move’ (d) Imagery is associated with decrease in µ (8–12 Hz) and β (18–26 Hz) bands. A brain–computer interface using electrocorticographic signals in humans* Leuthardt et al 2004 J. Neural Eng. 1 63-71 si@asu.edu NSF ADP 2006
  • 6. •Motor and Thalamic Regions •Used large number (40-60) of neurons •Regress the position of a water dripper arm •Used recurrent Neural Network Chapin, J.K.; Moxon, K.A.; Markowitz, R.S.; and Nicolelis, M.A.L. (1999) Real-time control of a robot arm using simultaneously recorded neurons in the motor cortex. Nature Neurosci., 2:664-670. si@asu.edu NSF ADP 2006
  • 7. a, b, Trial examples showing the movement by hand (green) and by neural reconstruction (blue) of a cursor to a target (red). Dotted outlines represent the actual circumference of the target and cursor on the screen. In a, hand motion resembles the neurally controlled cursor path; in b, no manipulandum motion occurred, but the neurally controlled cursor reached the target. Each dot represents an estimate of position, updated at 50-ms intervals. Axes are in x, y screen coordinates (1,000 units corresponds to a visual angle of 3.5°); note that the two trials take place in different parts of the workspace. • SERRUYA, HATSOPOULOS, PANINSKI, FELLOWS & DONOGHUE. Instant neural control of a movement signal, NATURE 416 (6877): 141-142 MAR 14 2002 – Monkey, Utah array, motor cortex, – 2D cursor position and velocity, Linear and Kalman Filters, – a few (7–30) MI neurons – careful calibration can lead to reasonable control without excessive training si@asu.edu NSF ADP 2006
  • 8. Taylor, Dawn M., Tillery, Stephen I. Helms, Schwartz, Andrew B.,Direct Cortical Control of 3D Neuroprosthetic Devices, Science 2002 296: 1829-1832 – Monkey, microwire, motor and pre-motor cortex – 3D cursor velocity, adaptive version of Population Vectors – Showed small numbers of neurons can be used to control a three dimensional cursor and that neurons trained to control a cursor can control a real robot for feeding si@asu.edu NSF ADP 2006
  • 9. • Carmena JM, Lebedev MA, Crist RE, et al., Learning to control a brain- machine interface for reaching and grasping by primates, PLOS BIOLOGY 1 (2): 193-208 NOV 2003 – Monkey, – high density array of 128 microwires, Motor, Premotor, Supplimentary Motor, Posterior Parietal, and Sensory Cortex – 2D cursor position and velocity and gripping force, Linear Filters si@asu.edu NSF ADP 2006
  • 10. - Parietal reach region (PRR) - Cognition-based prosthetic goal rather than trajectory - Performance improved over a period of weeks. - Expected value signals related to fluid preference, the expected magnitude, or probability of reward were decoded simultaneously with the intended goal. Musallam, S., Corneil, B. D., Greger, B., Scherberger, H., and Andersen, R. A. (2004). "Cognitive Control Signals for Neural Prosthetics", Science, Vol 305, Issue 5681, 258-262 si@asu.edu NSF ADP 2006
  • 11. Driving tasks • The arena for training rats to drive the robot towards one of the light si@asu.edu NSF ADP 2006
  • 12. Question asked • How does the rat develop a control strategy to complete the driving tasks (under different time scale and spatial complexity)? si@asu.edu NSF ADP 2006
  • 13. Neuroscientific evidence • Multimodal association area - anterior association area (prefrontal cortex) integrating different sensory modalities and linking them to action • Macaque and rat prefrontal cortex receives multimodal cortico-cortical projections from motor, somatosensory, visual, auditory, gustatory, and limbic cortices • Prefrontal areas provide cognitive, sensory or motivational inputs for motor behavior (rastral region in rat) • Motor areas are concerned with more concrete aspects of movement (caudal region in rat) si@asu.edu NSF ADP 2006
  • 14. One step at a time… First, a directional control task with only high level control commands si@asu.edu NSF ADP 2006
  • 15. The Brain-Controlled Vehicle Neural Interface Signal Processing Neural Signals Algorithms/Command Extraction Directiona l control Control Command Vehicle State Signal Environmental Feedback Vehicle Sensors si@asu.edu NSF ADP 2006
  • 16. Goals • To decode the directional control decision as a predictive signal from motor cortical neural activities • To associate motor neural activities with motor behavior and thus to develop models to possibly interpret neural mechanism of cortical motor directional control si@asu.edu NSF ADP 2006
  • 17. male Sprague-Dawley rats • 2×4 arrays of 50µm tungsten wires coated with polyimide • spaced 500µm apart for a size of approximately 1.5mm×0.5mm. • The implant site targets the rostral region From Kolbe The Cerebral Cortex of the Rat, 1990 si@asu.edu NSF ADP 2006
  • 18. Brain Control Diagram Feedback - Visual, Neural Signals Auditory & Reward Task Recording Execution System NAV - K × L dimensional − 1, Left vector  + 1, Right Neuron 1 ··· Neuron L Bin 1 ... K Bin 1 ... K Computation Binned of Directional Spike times Data Neural Activity Vector Decision Control (NAV) Decision si@asu.edu NSF ADP 2006
  • 19. Perievent Histograms Rdar36 Left Hits Right Hits sig001a sig005a sig001a sig005a 200 40 200 40 100 20 100 20 0 0 0 0 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 sig002a sig005b sig002a sig005b 60 120 30 40 20 80 20 20 10 40 10 0 0 0 0 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 sig003a sig006a sig003a sig006a 80 80 40 40 40 40 counts/bin 0 0 0 0 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 sig003b sig007a sig003b sig007a 80 60 80 80 40 40 40 20 40 0 0 0 0 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 sig004a sig007b sig004a sig007b 120 150 8 80 4 100 4 40 50 0 0 0 0 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 sig004b sig008a sig004b sig008a 40 80 40 80 20 0 0 0 0 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 -2 -1 0 1 2 Time (sec) Time (sec) Time (sec) Time (sec) si@asu.edu NSF ADP 2006
  • 20. Cross validation accuracy boxplots for manual and brain control respectively, 5 rats, 8 data sets C a c r c , C liba na dBa c n o a n uo s V c ua y a r tio n r in o tr l, ll e r n 1 • Each box shows the 25-75 quartile, 0.9 median values of accuracy. 0.8 • R3, R5/1, R5/2, V c ua y C Ac r c there are fewer than 0.7 30 trials in each brain control data 0.6 set. C lib2 /7 a 5 5 C libm d n a e ia 0.5 Ba 2 /7 r in 5 5 Ba m d n r in e ia 0.4 R1 R2 R3 R /1 4 R /2 4 R /3 4 R /1 5 R /2 5 R t/D y a a Typically 20 runs of randomized 5 fold cross- validation were performed for each data set. si@asu.edu NSF ADP 2006
  • 21. Modeling rat’s directional control using MDP? MDPs: Finite state space S = {1,2,  , n} { Finite action space A i = a1i , a2i ,  , ami } Infinite decision horizon T = { 0,1,2,3, } Cost function c(i, a ) discount factor γ (0 < γ < 1) Action mapping a : S → Ai a(i ) ∈ A i Stationary controller policy π = (a, a, ) π ∈ Π s si@asu.edu NSF ADP 2006
  • 22. Manual lever press following cue Brain control - “imaginary lever press” following cue si@asu.edu NSF ADP 2006
  • 23. Possible implementation Define 6 possible states: • Idle – between two trials • Ready – right before trial start • Reward – success of a trial • No-Reward – failure of a trial • Left experiment state – left cue experiment • Right experiment state – right cue experiment The action (control) is the rat’s volition represented by corresponding neural activities Going from one state to another depends on the current state as well as the action taken. • The reward can be stated as r (LL) = 1; r(LR)=-1 … r (RR) = 1; r(RL)=-1 … si@asu.edu NSF ADP 2006
  • 24. Does this tell us more? • “Open loop” discrimination and CV analysis provide a baseline of relating neural activity (spike trains) to behavioral parameters (left/right decision) • As a decoding tool, can an MDP model tell us more than “open loop” analysis? • MDP model to explain the experiment as a decision process si@asu.edu NSF ADP 2006
  • 25. Technicalities • How to represent control (start/stop and bin size) Trial and error, hard to formulate theoretically • How to compute the transition matrix given uncertainty, partially observed sequences of spike trains We can try to formulate this theoretically… si@asu.edu NSF ADP 2006
  • 26. • Uncertain transition matrices – Robust value iteration (Nilim & El Ghaoui, 2005) – Robust policy iteration (Satia & Lave, 1973) si@asu.edu NSF ADP 2006
  • 27. Problem formulation • Classification of uncertain transition matrices – Expression of uncertain transition matrices  P a11   f1a11 (U)   P a (1)   f1a (1) ( U )   1    π  1     M   M  P = M = M   a   a ji   P a( n )   f a( n ) ( U)  P =  Pi ji  =  fi (U)   n   n   M   M       P amn   f (U)  P = { P : U ∈ U } amn  n   1  si@asu.edu NSF ADP 2006
  • 28. Problem formulation • Classification of uncertain transition matrices – Definition of uncertain transition matrices The transition matrix P is correlated if y a a a P ⊂ P1 11 × × Pi ji × × P1 mn [ The transition matrix P is independent if a a a I1 S1 S2 P = P1 11 × × Pi ji × × P1 mn a Pi ji is the projection of P on the direction ] a ji of Pi (i ∈ S a ji ∈ A i ) P π is the projection of P on the direction I2 [ ] of { P 1 a (1) ,P2 a (2) , , P n a( n) } x S1 = I1 × I 2 S 2 ⊂ I1 × I 2 si@asu.edu NSF ADP 2006
  • 29. Problem formulation • Classification of MDPs – MDPs with independent transition matrices – MDPs with correlated transition matrices • Optimality criterion – Minimizing maximum value function for any initial state π min max vP (i ) = v* (i ) ∀i ∈ S π ∈Π s P∈P • Stationary optimal policy pair (π * , P * ) is optimal if π* π* π v (i ) = max v (i ) = min max v P (i ) for any initial state i ∈ S P* P P∈P π ∈Π s P∈P si@asu.edu NSF ADP 2006
  • 30. Problem formulation • MDPs with independent transition matrices – An optimal policy pair exists – Robust value iteration and robust policy iteration are applicable • MDPs with correlated transition matrices – An optimal policy pair exists and both iterations are applicable – An optimal policy pair exists but both iterations are no longer applicable – An optimal policy pair does not exist si@asu.edu NSF ADP 2006
  • 31. Questions to be answered • Sufficient conditions to guarantee that robust value iteration and robust policy iteration are applicable; • Optimality criterion to make a stationary optimal policy pair exist in a weak condition; • Efficient algorithm. si@asu.edu NSF ADP 2006
  • 32. Sufficient conditions Lemma For any given π = (a, a,) ∈ Π s and any given q ∈ ℜ1×n , + n×1 v∈ℜ ( ) max qv : v (i ) ≤ g π (v) := c ( i, a(i ) ) + γ amax( i ) Pi a (i ) v i (i ) a Pi ∈Pi i∈S (1) For any given q ∈ℜ1×n , + max qv : v (i ) ≤ ( g (v) ) i := min  c ( i, a ) + γ max Pi a v    i∈S (2) v∈ℜn×1 a∈ A i  Pi a ∈Pi a  The functions g π and g are monotone non - decreasing and contractive. The problems (1) and (2) have the unique optimal solutions denoted as π v∞ and v∞ , which are the unique solutions to the fixed - point equations v = g π (v ) and v = g (v), respectively. The optimal transition probility rows are given by ( ) { } * π Pi a ( i ) ∈ arg amax( i ) Pi a ( i ) v∞ (i) a i ∈ S , which constitute ( Pπ )* (3) Pi ∈Pi ( ) { } * Pi a ∈ arg max Pi a v∞ i ∈ S , a ∈ A i , which constitute ( P)* a a (4) Pi ∈Pi si@asu.edu NSF ADP 2006
  • 33. Sufficient conditions π Iterations for obtaining v∞ π (1) select v0 ∈ℜn×1 and set k = 0; (2) compute vk +1 by vk +1 = g π (vk ) π π π π π π π (3) terminate if vk +1 = vk and output v∞ = vk ; otherwise, set k = k + 1 and go to (2) Iterations for obtaining v∞ (1) select v0 ∈ ℜn×1 and set k = 0; (2) compute vk +1 by vk +1 = g (vk ) (3) terminate if vk +1 = vk and output v∞ = vk ; otherwise, set k = k + 1 and go to (2) si@asu.edu NSF ADP 2006
  • 34. Sufficient conditions Theorem When there exist, for any π ∈ Π s , ( Pπ )* defined by (3) is in the set P π , and P* defined by (4) is in the set P i) A stationary optimal policy pair exists under the optimality criterion of minimizing maximum value function for any initial state ii) Robust value iteration is applicable; iii) Robust policy iteration is applicable. si@asu.edu NSF ADP 2006
  • 35. Robust value iteration 1. Select v0 ∈ℜn and set k = 0; 2. Compute vk +1 by vk +1 (i ) = min  c(i, a ) + γ max Pi a vk    a∈ A i  Pi a ∈Pi a  3. If vk +1 = vk , then go to 4; otherwise increment k by 1 and go to 2 4. Compute π * = (a* , a* ,) and P* defined by a* (i ) ∈ arg min  c(i, a ) + γ max Pi a vk    a∈A i  Pi a ∈Pi a  ( ) a P* ∈ arg max{Pi a vk } i a a Pi ∈Pi 5. If P* ∈ P, output a stationary optimal policy pair (π * , P* ); otherwise, the algorithm can not be applied. si@asu.edu NSF ADP 2006
  • 36. Robust policy iteration 1. Initialization : select π 0 = ( a0 , a0 ,) ∈ Π s and set k = 0; π 2. Policy evaluation : do iteration for v∞k ; 3. Policy improvement : find πk +1 = (ak +1 , ak +1 ,) ak +1 (i ) ∈ arg min  c(i, a ) + γ max Pi a v∞k   π  a∈ A i  Pi a ∈Pi a  4. If ππP = k , compute * by k +1 (P ) a π * ∈ arg max{Pi a v∞k } ∀i ∈ S a ∈ A i i a a Pi ∈Pi and go to 5; otherwise increment k by 1 and go to 2; 5. If P* ∈ P, output a stationary optimal policy pair (π * , P* ); otherwise, the algorithm can not be applied. si@asu.edu NSF ADP 2006
  • 37. Sufficient conditions Example S = { 1, 2} A1 = A 2 = { a1 , a2 }  P a1   u1 1 1 − u1  c(1, a1 ) = 1  a2    P   u3 1 − u3  c(1, a2 ) = 2 P =  1a =  P2 1  1 − u2 2 u2  2 c(2, a1 ) = 3  a2    P  1− u   2   4 u4  c(2, a2 ) = 4 U = { u1 , u2 , u3 , u4 } W = { 0, 0.2, 0.4, 0.6, 0.8,1} U = { U : u1 = u3 , u2 = u4 ; u1 , u4 ∈ W} ⇒ Correlated transition matrix P Independent transition matrix for π , Pπ Optimal controller policy π * = a* , a* ,( ) a* (1) = a1 a* (2) = a1 0 1   0 1 Optimal nature policy P =  * ∈P 0 1   0 1 si@asu.edu NSF ADP 2006
  • 38. New optimality criterion • Minimizing maximum squared total value function π 2 min max V P (5) π ∈Π s P∈P ′ Where total value function V π P = (V ) V π P π P ′ π ( π π V = v (1)  v (i )  v (n) P P P π P ) • Stationary optimal policy pair (π ) 2 2 π* π* π 2 * * , P is optimal if V P* = max V P = min max V P P∈P π ∈Π s P∈P si@asu.edu NSF ADP 2006
  • 39. New optimality criterion • Existence of stationary optimal policy pair Theorem : 2 Assuming for any π , max VPπ exists, a stationary optimal P∈P policy pair (π * , P* ) exists in terms of (5) • Relationship between two optimality criterions Optimality criterion of minimizing maximum squared total value function generalizes optimality criterion of minimizing maximum value function for any initial state si@asu.edu NSF ADP 2006
  • 40. Robust policy iteration under total value function • Policy evaluation – Direct method −1 ′ ′ = max ( C )  I − γ ( P π )  π 2 ( I − γ ( P )) π π −1 max V P   Cπ P∈P P∈P   – Iterative method π Iteration for v∞ π * Π 3 Π 2 Π1 Π 0 • Policy improvement – Policy improvement in robust policy iteration a k +1 (i ) ∈ arg min c(i, a ) + γ max Pi a vk    a∈A i  Pi a ∈Pi a  – Controller policy elimination π 2 πk 2 Necessary condition for optimal policy at k-th iteration V Pπ k ≤V Pπ k si@asu.edu NSF ADP 2006
  • 41. 1. Initialization : set k = 0, Π 0 = Π s , M = +∞ and select π 0 = { a0 , a0 ,} 2. Policy evaluation : If the condition of iteration for π k is satisfied 2 2 2 (a) use "iterative method" to compute Pπ k ∈ P and VPππkk such that VPππkk = max VPπ k P∈P Else (b) use "direct method" 3. Policy improvement : (a) eliminate controller policies Algorithm of robust policy iteration under total value function { Π′ = π ∈ Π k : VPπ k k π 2 π ≤ VPπkk 2 } If Π ′ > 1 k If the condition in Theorem is satisfied 2 (b) Set Π k +1 = Π′ and M = VPπkk k π and select π k +1 = { a k +1 , ak +1 ,} ∈ Π k +1 by a∈A i { a k +1 (i ) ∈ arg min c(i, a) + γ max Pi a vk a a Pi ∈Pi } If π k +1 = π k , go to 4; otherwise, set k = k + 1 and go to 2 Else 2 2 (c) If VPππkk < M , set M = VPππkk and Π k +1 = Π ′, and then select π k +1 ≠ π k ∈ Π k +1 and set k = k + 1 and go to 2; otherwise, select π k ∈ Π′ − { π k } and set ′ k Π k = Π′ − { π k } and π k = π k and go to 2 k ′ Else (d) go to 4 si@asu.edu NSF ADP 2006 4. Termination : output (π k , Pπ k ) as a stationary optimal policy pair
  • 42. Remaining issues toward MDP model of the rat’s neural control strategy How to estimate uncertain stationary transition matrices in Markov decision processes using the experimental data collected from the rat’s cortical motor areas while he performed his control tasks? Proposed Solution: D-S theory of evidence is proposed as new models for obtaining set estimation of stationary transition matrix Mathematics worked out, need to implement with algorithms and compare with existing models Is a POMDP model more feasible? How? More work needed to give the rat’s cortical neural control mechanism a reasonable mathematical model si@asu.edu NSF ADP 2006
  • 43. Acknowledgement • Support by NSF under ECS-0002098 and ECS-0233529, and partially by General Dynamics • Support by ASU infrastructural funds • Byron Olson and Jing Hu for work on rat experiment and analysis • Baohua Li for robust dynamic programming results • Jiping He for help with experiments • Useful discussions with many (Dankert, L. Yang, C. Yang, Raghunathan …) • Lab support by many (Silver, Scanlan, Tian…) si@asu.edu NSF ADP 2006

Notas del editor

  1. Cross-validation accuracy boxplots for both calibration and brain control data sets. Typically 20 runs of randomized 5 fold cross-validation were performed for each data set. The filled boxes are for Brain control. The non-filled ones are for Calibration. Each box shows the lower quartile, median, and upper quartile values of accuracy. Note that for R3, R5/1, R5/2, there are fewer than 30 trials in each brain control data set, thus the range of accuaracy is large.