SlideShare una empresa de Scribd logo
1 de 44
Patrick Bellasi
                   Dipartimenti di Elettronica ed Informazione
                   Politecnico di Milano                         bellasi@elet.polimi.it




                                     Linux Power Management 
                                                  Architecture
                 An updated review on Linux PM 
                   frameworks




Last update: Nov 11, 2008
Patrick Bellasi bellasi@elet.polimi.it




            Agenda
                                                             2


    Outline the architecture for advanced power management 



    support in recent Linux kernels
            key points for effective PM and overall picture view
        ️



    Review of main subsystems


                  ­ Advanced timekeeping framework
              ️


                  ­ Clock framework
              ️


                  ­ Voltage/Power control Framework
              ️


                  ­ QoS Framework
              ️


                  ­ CPUIdle
              ️


                  ­ CPUFreq
              ️


                  ­ Linux PM
              ️


            Why we need them?
        ️


            What they do?
        ️


            How we can exploit them?
        ️
Patrick Bellasi bellasi@elet.polimi.it



              Modern PM architectures
              Key points for effective power management          3


    Exploit partial activity



         disable parts of the system when not needed
     ️


    SW does part of the work, HW dependencies do the rest



         exploit existing system framework
     ️


               track dependencies (producer­consumer)
          ️
                                                               Fine grained tracking
               track usage (reference counting)
          ️
                                                                  and constraints
               system constraints assertion
          ️


               notification­chains
          ️



         drivers support required
     ️


               aggressively get and release resources
          ️


               support efficiently OFF mode (0 volts)
          ️


                     full context loss, low­latency wakeup,
                 ️


                     provide context saving­restore routines
                 ️


               use constraints to require operational restrictions
          ️
Patrick Bellasi bellasi@elet.polimi.it



            Modern PM architectures
            Key points for effective power management       4


    Control power sources



              clocks, power domains, voltage domains, memories and power IC 
        ️


             resources
             device drivers should exploit interfaces to control power resources
        ️



    Constraints aware drivers and subsystems



    Inactive state power saving in OS idle



             automatic choice between multiple C­States
        ️


                   OFF and retention modes
               ️


             manual suspend/resume
        ️


             system­wide sleep states
        ️



    Active state dynamic power management



             automatic choice between multiple P­States
        ️
Patrick Bellasi bellasi@elet.polimi.it



                    Modern PM architectures
                    What we should (already) have                                                                                               5

                                         MM Resource Manager
      Legend
                  User
    Arch Indep                                                                                                                                                                                Policy management
                  Kernel                                         CPUIdle Goevernor                                                        CPUFreq Goevernor
                                                                                                                                                                                                     Layer
 Arch Dependant                                                   (menu governor)                                                        (on­demand governor)
                                   LDM
   OMAP34xx
                                                                 CPUIdle Framework                                                       CPUFreq Framework
Control
Notifications
Constraints
                                                                                   Constraints Framework

                                                                                                                                                                                                Device Driver
   DSPBridge          DSP Bridge                                      CPUIdle Driver                                                       CPUFreq Driver
                                          Drivers                                                                                                                                                  Layer

    Workload                                                                                                                                                                                  Resource Tracking
                                                                     Resource Tracking
    Predictor                                                                                                                                                                                      Layer

                                                                                                      Shared Resource Framework

                                        Clock 




                                                                                   Power domain




                                                                                                                          Memory Logic
                                                                                                                                                                                               Power Resource
                                     Framework




                                                                                                                                              Latency Con
                                                                                                            DPLL State
    Workload




                                                                                                                                                            Frequ. Con
                                                                                                                                                                                                Management




                                                                                                                                                                                   DPLL Con
                                                                                                                                                                         OPP Con
    Predictor
                                                                                                                                                                                                   Layer


                                                                                                  Logical
                                                   Voltage 
                                                 Framework
     PWRM

   DSPBridge

                                                                PRCM API                                                 SmartRelfex
                                                                 Layer 2                                                   Driver                                                                 PRCM API
                                                                                                                                                                                                  Layer 1&2
                                                                                                                                                                                                 (OMAP34xx)
                                                                                                                                                             PowerIC
                                                  PRCM API Layer 1                                                                                            Driver
Patrick Bellasi bellasi@elet.polimi.it



             Linux frameworks to support PM
             Introduction                                      6


    Typical mainline Linux Power Management features:



        Platform­specific code
    ️


              idle­loop, timekeeping (dynamic tick) & clock tree, latency
         ️



        Dynamic Power Management
    ️


              Low­power states activation for unused devices
         ️


              C­States, LDM, Suspension (RAM), Hibernation (Disk)
         ️



        Dynamic Voltage and Frequency Scaling
    ️


              Adapting device performances to application needs
         ️


              P­States, OPP
         ️



    Main focus on x86 arch



              custom and different PM development for SoC embedded devices
         ️


              2nd Linux PM Summit was only on 2007
         ️


              increasing emphasis on embedded systems
         ️
Ve
  nk
    at
      es
        h 
          Pa
            l   lip
                   ad
                     i
                                    C




                         2.6.9
                                  G PU
                                   ov Fr
                                     er eq
                                       no  O
                                         r  nD
       Jo                                     em
         hn                                     an
            S
                                                                                            Timeline



                                                  d 
             tu
               l   tz
                                   G
                                     en
                                          er
     In                                      ic
                                                 T
       go
                                   H              im
           M                         rti
                                         m            e­
             ol                                          of
                na
 Th                                         er             ­D
                  r                C           s 
   om                                                          ay
                                                 ba
                                     lo
      as                                             se
                                        ck
         G                                               ­
                                             S
           le                                  ou cod
                                   G
             ix                                                e
                                                   rc
               ne
                         2.6.16      en
                  r                                   e 
                                          er            an
                                   C ic D                   d 
                                                              C
                                      lo           yn
                                         ck           am lock
                                             fr
Ve                                                                    E
                                                am ic
                                   D
  nk                                                          T        ve
                                      yn                        ic
                                                    ew
    at
                         2.6.21



                                                                         nt
                                                                   k        
      es                                  tic           or
                                                           k
                                              ks
        h 
          Pa
            l   lip
                   ad
                     i
                                    C
        M                            PU
                         2.6.24
                                                                                                       Linux frameworks to support PM




                                       Id
         ar
                                         l
            k                                e 
                 G                             fra
                  ro                              m
                    ss                             ew
                                    Q               or
                                                       k
                                     oS
                         2.6.25




                                        F
                                                                                            7




                                         ra
                                                                               April 2008




                                           m
     Li
       am                                   ew
          G                                    or
                                                  k
           rid
              w
                   oo
                     d
                                   R
                         2.6.2x




                                       eg
                                         ul
                                            at
                                              or
                                                s 
                                                  Fr
                                                    am
                                                      ew
                                                            or
                                                               k
                                                                                                                                        Patrick Bellasi bellasi@elet.polimi.it
Patrick Bellasi bellasi@elet.polimi.it



                 The Linux Time System
                 The old tick­based implementation                                8

                                                                                              Strongly associated with 
        Prior to 2.6.16
   
                                                                                               individual HW devices
                                                               Arch 1

                                                                                              HW
                                                                        Clock source
                                                              TOD

                                                                                              HW
                              Lake of a generic                      Clock event device
                                                              ISR
                              abstraction layer

                                                               Arch 2

                                  Timekeeping                                                 HW
                                                              TOD       Clock source

                                                                                              HW
                                      Tick                    ISR    Clock event device


                   Process acc.                                Arch 3
Clock related 
                                                                                              HW
                                                                        Clock source
                                                              TOD
  services                Profiling

                                                                                              HW
                                      Jiffies                        Clock event device
                                                              ISR


                                                Timer whell
                                                                     Time tracked using
                                                                    periodic timer ticks
Patrick Bellasi bellasi@elet.polimi.it



         The Linux Time System
         Towards a new implementation                            9


    Required abstractions



        Clock source (CS) management
    ️

                provide read access to monotonically increasing time values
            ️


                use human­oriented values (nanoseconds)
            ️



        Clock synchronization
    ️

                system time value and clock source drifts correction
            ️


                use reference time sources (e.g. NTP or GPS/GSM)
            ️



        Time­of­day representation
    ️

                applying drifts corrections to clock source
            ️



        Clock event device (CED) management
    ️

                Schedule next event interrupt using clock event device
            ️


                minimize arch dependent code, allow easy run­time addition of new CED
            ️



        Removing tick dependencies
    ️

                better support high resolution timers and precise timer scheduling
            ️


                by replace the CTW (Cascading Timer Wheel) mechanism
            ️
Patrick Bellasi bellasi@elet.polimi.it



             The Linux Time System
             hrtimers – a complementary time subsystem            10


    Why a new Timer Subsystem?



        CTW (1997) has unacceptable worst case performances
    ️


              Insertion is O(1), but cascading require interrupt disabling...
         ️


              CTW is the better solution for long­term protocol­timeout related 
         ️


              timers:
                    ­ expiration time within the first category
                ️


                    ­ removed before they expire or have to be cascaded
                ️



        is based on periodic interrupt (jiffies)
    ️


              support only low­resolution time values
         ️



    Propose solution: use two category for timers



              Timeouts – low­resolution, almost always removed
         ️


                    using the existing CTW
                ️


              Timers – high­resolution, usually expire
         ️


                    using the new hrtimers framework, based on human time units [ns], kept 
                ️


                    in per­CPU time sorted list implemented using red­black tree (O(log(N))
                ️
Patrick Bellasi bellasi@elet.polimi.it



              The Linux Time System
              hrtimers – a complementary time subsystem                      11


     Since 2.6.16
 


                                                                              Arch 1

Using human­based                   Better data structures to support                                                HW
                                                                                         Clock source
                                                                             TOD
   time units [ns]                       other timer code rework
                                     (e.g. high­resolution timers and 
                                                                                                                     HW
                                                                                     Clock event device
                                                                             ISR
                                               dynamic­tick)

                                                                              Arch 2
          hrtimers
                                                 Timekeeping                                                         HW
                                                                             TOD         Clock source

                                                                                                                     HW
                                                     Tick                    ISR     Clock event device


                                  Process acc.                                Arch 3
     Timer queues run from the 
                                                                                                                     HW
                                                                                         Clock source
                                                                             TOD
                                         Profiling
       normal timer softirq

                                                                                                                     HW
                                                     Jiffies                         Clock event device
                                                                             ISR
       Low­resolution jiffies 
          based timers
                                                               Timer whell
Patrick Bellasi bellasi@elet.polimi.it



             The Linux Time System [5]
              The Generic Time­of­Day (GTOD)                12


    Why a Generic Time of Day Implementation?



        allow sharing of clock source code across archs
    ️


              move large portion of code out of arch­specific areas
         ️


              limit arch­dependent code to HW interface and setup
         ️



        avoid interpolation based computation of human­time values
    ️


              compensate clock drifts as clock source offsets to get TOD
         ️


              previous implementation track compensated TOD directly and derive 
         ️


              increasing human­time values from it (error addition problem)
    Proposed solution: a new generic TOD framework



        presented by John Stultz at OLS 2005, merged in 2.6.16
    ️


              cleanup and simplify by arch­independent common code
         ️


              use nanoseconds as fundamental time unit
         ️


              modular design supporting run­time add/remove of time source
         ️


              break tick­based dependency to avoid interpolation issues
         ️
Patrick Bellasi bellasi@elet.polimi.it



           The Linux Time System
            The Generic Time­of­Day (GTOD)                                     13

                    Arch independent and dynamic
                      clock source management                     Human­based time units
                                                                time­of­day representation


               Shared HW              Clock Source
                                                                       Arch­dependent code limited 
                                                                        to direct HW interface and 
                                                                              setup procedure
                           Clock synch.
                                                                          Arch 2
hrtimers
                                      Timekeeping                                                             HW
                                                                        TOD         Clock source

                                                                                                              HW
                                            Tick                         ISR    Clock event device


                      Process acc.                                        Arch 3

                                                                                                              HW
                                                                                    Clock source
                                                                        TOD
                                Profiling

                                                                                                              HW
                                            Jiffies                             Clock event device
                                                                         ISR


                                                      Timer whell
Patrick Bellasi bellasi@elet.polimi.it



           The Linux Time System
            The Generic Time­of­Day (GTOD)                                      14

                    Arch independent and dynamic
                      clock source management                      Human­based time units
                                                                 time­of­day representation


               Shared HW              Clock Source
                                                                        Arch­dependent code limited 
                                                                         to direct HW interface and 
                                                                               setup procedure
                           Clock synch.               TOD
                                                                           Arch 2
hrtimers
                                      Timekeeping                                                              HW

                                                                                                               HW
                                            Tick                          ISR    Clock event device


                      Process acc.                                         Arch 3

                                                                                                               HW
                                Profiling

                                                                                                               HW
                                            Jiffies                              Clock event device
                                                                          ISR


                                                       Timer whell
Patrick Bellasi bellasi@elet.polimi.it



             The Linux Time System [6]
             The Clock Event Device Abstraction                       15


    Why a Clock Event Device Abstraction?



        provide an abstraction level for timekeeping and time­related 
    ️


        activities
              substantial reduction in arch­dependent code
         ️


              support either periodic or individual programmable events
         ️


              build the base for a generic dynamic tick implementation
         ️



    Proposed solution: by Thomas Gleixner, merged in 2.6.21



              ­ Registration interface: run­time configuration of clock event device
         ️


                    property based definition (e.g. capabilities, max/min delta, mult/shift)
                ️


                    support generic interrupt setup code (i.e. call­back based handlers)
                ️


                    support for power­management
                ️


              ­ Event distribution interface: bind clock events related services to 
         ️


              clock event sources
                    classical time services (e.g. process accounting, profiling, periodic tick)
                ️


                    next event interrupt (e.g. high­resolution timers, dynamic tick)
                ️
Patrick Bellasi bellasi@elet.polimi.it



             The Linux Time System [6]
             High Resolution Timers Support                        16


    Why high­resolution timers?



        exploit brand­new time subsystems reworks (GTOD and 
    ️

        hrtimers)
            require just arch­independent code by modifying hrtimers framework
          ️



        accurate delivery of high­resolution timers (e.g. itimers and 
    ️

        POSIX)
    Proposed solution:



        presented by Thomas Gleixner at OLS 2006
    ️


              switch to high­resolution mode late in the boot process
         ️


                    exploiting clock source and clock event device dynamic registration
                ️


                    ensure kernel compatibility on non high­resolution platforms
                ️


              support next event scheduling and next event interrupt handler
         ️


                    softirq based execution of hrtimer queues, independent from tick bound
                ️


                    callbacks from next­event handler (with some limitations, e.g. for 
                ️


                    nanosleep)
                ️
Patrick Bellasi bellasi@elet.polimi.it



             The Linux Time System [7]
              The Dynamic Tick Support                                17


    Why a Tick­less Kernel?



        processors are quite good about saving power when idle
    ️


              HW has support to reduce leakage on idle states
         ️



        avoid processor wakeups when nothing is happening
    ️


              by turning off the period timer tick when in idle state
         ️


              looking at the timer queue to see when the next timer expires
         ️


                    CPUIdle can exploit this information
                ️


                    in the absence of other events (e.g. hardware interrupts), the system will 
                ️


                    sleep until the nearest timer is due
              enable the periodic tick once the processor goes out of the idle state
         ️



    Proposed solution: merged in 2.6.21



              room for further development (i.e. full tickless systems)
         ️


                    time slice is controlled by the scheduler, variable frequency profiling, and 
                ️


                    a complete removal of jiffies in the future
Patrick Bellasi bellasi@elet.polimi.it



             The Linux Time System
             Deferrable Timers                                      18


    Why deferrable timers?



        better exploit dynamic­tick, by allowing to sleep longer
    ️


              not all timers has to run as soon as the requested period has expired
         ️


              non­critical timeouts can run some fraction of a second later
         ️


                    i.e. when the processor wakes up for other reasons
                ️


                               Intr




                                            250        500
                                      0                      [ms]


    Proposed solution: new function added to the internal 



    kernel API 
              init_timer_deferrable(struct timer_list *timer)
         ️


              timers defined as deferrable are recognized by the kernel
         ️


              will not be considered when the kernel makes its quot;when should the 
         ️


              next timer interrupt be?quot; decision
              timer_list, workqueue, 
         ️
Patrick Bellasi bellasi@elet.polimi.it



                       The Linux Time System
                       Deferrable Timers                                    19


    Why deferrable timers?



           better exploit dynamic­tick, by allowing to sleep longer
    ️


                        not all timers has to run as soon as the requested period has expired
                   ️


                        non­critical timeouts can run some fraction of a second later
                   ️


                              i.e. when the processor wakes up for other reasons
                          ️


        Intr                                                  Intr




                                 250        500                            250
               0                                                     0                        500
                                                  [ms]                                                   [ms]


    Proposed solution: new function added to the internal 



    kernel API 
                        init_timer_deferrable(struct timer_list *timer)
                   ️


                        timers defined as deferrable are recognized by the kernel
                   ️


                        will not be considered when the kernel makes its quot;when should the 
                   ️


                        next timer interrupt be?quot; decision
                        timer_list, workqueue, 
                   ️
Patrick Bellasi bellasi@elet.polimi.it



              Linux Timekeeping Subsystem
              Complete support for RT and PM                                20

Shared HW              Clock Source


             Clock synch.        TOD
                                                                                 Arch 1

                                                                                                   HW
                       Timekeeping
  hrtimers

                        Shared HW               Clock events                                       HW

 Next event                                                   ISR
                                                                                 Arch 2
                       Dynamic tick
                                                                                                   HW
                                           Event distribution
                                                                                                   HW
                                 Process acc.
                                                                                 Arch 3
                                        Profiling
                                                                                                   HW
                                                    Jiffies
                                                                                      ISR          HW
                                                              Timer whell
Patrick Bellasi bellasi@elet.polimi.it


              The Clock Framework
              centralized control for all clock related 
                                                                  21
              functionalities

    Define an (abstract) API for all clock related functionalities



               dependencies (clock tree), rates (get/set), status (enable/disable)
          ️



    Since 2.6.16



         standard open source solution
     ️


               Initially added for ARM, now multiple architectures use it
          ️


                     In 2.6.25: ARM (OMAP, PXA, SA1100, AT91, ...), Avr32, PowerPC, ...
                 ️


               recent proposals on LKML for a generic clocklib implementation
          ️



    Used by device drivers



         reference counting
     ️


         re­entrant code
     ️


    May support layered structure



         platform generic layer and board specific low­level
     ️
Patrick Bellasi bellasi@elet.polimi.it



              Power/Voltage Framework
              track device's power dependencies           22


    provide a framework analogues to the clock framework but 



    for power/voltage regulators control
         voltage regulators dependency tracking
     ️


         satisfy client devices needs depending on their state
     ️


               optimize generators usage and efficiency
          ️



    no support in mainline, but different patch­set available



         ARM patch by Nokia (no too much general)
     ️


         Platform independent framework by Wolfson Microelectronics
     ️


     ️
Patrick Bellasi bellasi@elet.polimi.it


            Power/Voltage Framework
            Wolfson Microelectronics ­ Linux Voltage and 
                                                                           23
            Current Regulator Framework [2]

    Announced on LKML in Feb 2008, presented on ELC 2008



             Standard and generic kernel interfaces to dynamically control voltage 
        ️


             regulators and current sinks
    Better exploit regulators dynamics



             e.g. consumer with 10mA load:
        ️

                                        100
             70% @ Normal =~ 13mA
        ️
                                         90
             90% @ Sleep =~ 11mA
        ️




                                        Efficiency (%)
                                         80                   idle mode
                                                         70
        ️
                                                                                         normal mode
                                                         60
                      Saving ~2mA
        ️
                                                         50
    ️
                                                         40
                                                         30
    ️
                                                         20
                                                         10
                                                          0
                                                                1         10          100                1000
                                                                               Load (mA)
Patrick Bellasi bellasi@elet.polimi.it


             Power/Voltage Framework
             Wolfson Microelectronics – Real  World 
                                                               24
             Examples

    Some real world examples:



        CPUFreq: voltage control matching operating frequency
    ️


        CPUIdle: voltage control matching idle state
    ️


        LCD back lighting: control via white LED current reduction
    ️


        Audio: analog supply control, components control
    ️

                    FM­Tuner, Speaker Amplifier when using Headphone
                ️



        NAND & NOR: idle power control
    ️


              e.g. R/W=35mA, Erase=40mA, Erase+R/W=55mA, Idle=1mA
         ️


    ️


    ️
Patrick Bellasi bellasi@elet.polimi.it


        Power/Voltage Framework
        Wolfson Microelectronics ­ Linux Voltage and 
                                                                  25
        Current Regulator Framework [2]

    Four separate interfaces



        Regulator driver API
           Allows regulator drivers to register and provide operations to the core
           Notifier call chain for propagating regulator events to clients
        Consumer (Client) driver API
           complete control over their supply voltage and current
               ­ static clients: only enable/disable support
               ­ dynamic clients: voltage/current limit control
           notify client V/I requirements to regulator driver
        Platform API
           creation of voltage/current domains (with constraints) for each regulator
           creation of a regulator tree
        Userspace API
           exports a lot of useful voltage/current/opmode data to userspace via sysfs
           help monitor device power consumption and status
Patrick Bellasi bellasi@elet.polimi.it


                  The Latency Framework
                  Explicit system­wide latency­expectation 
                                                               26
                  infrastructure

    include/linux/latency.h





              The system can tune its operations to the minimum latency 
         ️


                        requirements in effect at the moment

             multiple drivers can announce their maximum accepted 
     ️


             latency
                   drivers know device operating modes at any given moment and the 
              ️


                   corresponding expected system response time
             collect and summarize these expectations globally
     ️


                   cumulated result can be used by power management and similar 
              ️


                   users
                   to make decisions that have trade­offs
              ️
Patrick Bellasi bellasi@elet.polimi.it


              The Latency Framework
              Explicit system­wide latency­expectation 
                                                             27
              infrastructure

    Since 2.6.19, an interface where drivers can:



               ­ announce the maximum latency [us] that they can deal with
          ️


               ­ modify this latency
          ️


               ­ give up their constraint
          ️


               ­ a function where the code that decides on power saving strategy 
          ️


               can query the current global desired maximum
               ­ a notifier chain allow interested subsystems know when the 
          ️


               maximum acceptable latency has changed
         Currently used by the generic cpuidle driver on x86 arch
     ️
Patrick Bellasi bellasi@elet.polimi.it


             The Latency Framework
             Explicit system­wide latency­expectation 
                                                                                28
             infrastructure

    Example:



        ­ user: idle loop code
    ️


              higher C­state saves more power, but has a higher exit latency
         ️


              idle loop code use these informations to make a good decision which 
         ️


              C­state to use
        ­ announcer: audio driver
    ️


              knowns it will get an interrupt when the hardware has 200 usec of 
         ️


              samples left in the DMA buffer
              set a latency constraint of, say, 150 usec          Register notifier
         ️



                                                              user interface
                         Latency Framework Core                                Idle Loop
                           Announcer interface

                                     ...
                     Audio Drv                     WiFi Drv

                                                                                  Update latency 
                                 Assert latency                                     constraint
                                   constraint
Patrick Bellasi bellasi@elet.polimi.it



          LatencyTOP
          measuring and fixing Linux latency        29


    tool for software developers (both kernel and userspace), 



        identifying where in the system latency is happening
    ️


        what kind of operation/action is causing the latency to 
    ️


        happen
        both at system level or on a per process level
    ️


    focuses on the cases



        the applications want to run and execute useful code, but 
    ️


        there's some resource that's not currently available (and the 
        kernel then blocks the process)
    Kernel patch needed



        keep track of what high level operation it is doing
    ️


        limited number of annotations
    ️


        output on procfs or using a ncurses based GUI
    ️


    ️
Patrick Bellasi bellasi@elet.polimi.it



             LatencyTOP
             measuring and fixing Linux latency                30

              Here's some output of LatencyTOP, collected for a make ­j4 of a 
         ️


              kernel on a quad core system
Cause                            Maximum [msec]   Average [msec]
process fork                         1097.7            2.5
Reading from file                    1097.0            0.1
updating atime                        850.4           60.1
Locking buffer head                   433.1           94.3
Writing to file                       381.8            0.6
Synchronous bufferhead read           318.5           16.3
Waiting for buffer IO                 298.8            7.8                      With a small change to 
                                                                                 the EXT3 journaling 
                                                                                 layer to fix a priority 
                                                                                   inversion problem
Cause                            Maximum [msec]   Average [msec]
Writing to file                       814.9            0.8
Reading from file                     441.1            0.1
Waiting for buffer IO                 419.0            3.4
Locking buffer head                   360.5           75.7
Unlinking file                        292.7            5.9
EXT3 (jbd) do_get_write_access        274.0           36.0
Filename lookup                       260.0            0.5
Patrick Bellasi bellasi@elet.polimi.it



                 Quality of Service Framework [4]
                 the latency framework reloaded                   31


    linux/pm_qos_params.h





            Kernel infrastructure to implement a coordination mechanism 
        ️


             to facilitate communication among devices (drivers), users 
                       (applications) and system (power manager)

            devices have specific power management capabilities
    ️


                  devices talk in terms of latencies, time outs, throughput
             ️


                  drivers could better address power management
             ️


                  expose power management mechanisms
             ️



            applications have specific performances needs
    ️


    Since 2.6.25



            work­on­progress, used for iwl4965 WiFi driver on x86
    ️
Patrick Bellasi bellasi@elet.polimi.it



              Quality of Service Framework
              the latency framework reloaded                              32


    Provide infrastructure for the registration of:



         Dependents: register their QoS needs
     ️


               e.g. 
          ️


         Watchers: keep track of the current QoS needs of the system
     ️


               e.g. cpuidle, wifi drivers, ...
          ️



    3 basic classes of QoS parameter



               latency [us], timeout [us], throughput [kbs]
          ️


                     platform customizable constraints set
                 ️


                           Interrupt latency, power domain latency, frequency/OPP
                       ️



               maintain lists of pm_qos requests and aggregated requirements
          ️


               kernel notification tree for each parameter
          ️



    User mode interface for request QoS



               using simple char device node
          ️


     ️


     ️
Patrick Bellasi bellasi@elet.polimi.it



            The CPUIdle Framework [1]
            Do nothing, efficiently...                             33


    Generic processor idle power management framework



             support for multiple idle states with different characteristics
        ️


                   power consumptions, state preservation, state constraints, entry/exit 
               ️


                   latency
    Support trade­off efficiently between application 



    expectations and idle state power saving
    Clean interface



             abstract between idle­driver and idle­governor
        ️


                   separate arch­specific idle driver (mechanisms) from arch­independent 
               ️


                   power management governors  (policies)
             provide a convenient user­space interface
        ️



    Since 2.6.24



             X86 ­ ACPI, for OMAP mapping to ACPI states
        ️
Patrick Bellasi bellasi@elet.polimi.it



             The CPUIdle Framework [1]
             Do nothing, efficiently...                                             34

                /sys/devices/system/cpu/cpuX/cpuidle
                                                                     struct cpuidle_governor {
User­level
                  /sys/devices/system/cpu/cpuidle                           init();
interfaces                                                                  exit();                           Decide the target
                                                                                                                  C­State
                                                                            scan();
                                                                            select();
                Step­wise         Latency­based
                                                                            reflect();
governors         ladder                  menu
                                                                     }
                     governor interface
                                                                                 data structures
                  Generic CPUIdle Infrastructure
                                                                                 initialization and registration
                                                                                 idle handling
                       driver interface                     cpuidle core
                                                                                 system state change handling
               acpi­cpuidle          halt_idle
drivers
                                                                                              struct cpuidle_state {
                   ACPI driver                                                                       exit_latency;             [us]
                                                                   Populate supported
                                                                       C­States                      power_usage;              [mW]
                                                                                                     target_residency;         [us]
                                                       struct cpuidle_driver {
                                                                                                     usage;
                                                              init();
                                                                                                     time;                     [us]
                                                              exit();
                                                                                                     enter();
                                                              redetect();
                                                                                              }
                                                              bm_check();
                                                       }
                                                                                           Implement functions
                                                                                            to enter C­States
Patrick Bellasi bellasi@elet.polimi.it



        The CPUIdle Framework [1]
        The OMAP43xx implementation                              35


    The following C states are supported in Cpuidle driver:


                   C0  – System executing code
              ️


                   C1  – MPU WFI + Core active + No tick suppression
              ️


                   C2  – MPU WFI + Core active + Tick suppression
              ️


                   C3  – MPU CSWR + Core active + Tick suppression
              ️


                   C4  – MPU OFF + Core active + Tick suppression
              ️


                   C5  – MPU RET + Core RET + Tick suppression
              ️


                   C6  – MPU OFF + Core RET + Tick suppression
              ️


                   C7  – MPU OFF + Core OFF + Tick suppression
              ️



    Menu governor takes the following into account to decide 



    the target sleep state:
                  Next timer expiry in the system
              ️


                  Comparing target residency with available sleep time
              ️


                  Comparing exit latency with system wide latency constraints
              ️


                  Checking for activity in the core domain
              ️



    Dynamic tick based on support in kernel

Patrick Bellasi bellasi@elet.polimi.it



             PowerTOP
             measuring and fixing Linux tick­less systems           36


    identifying what is causing system wake­ups



              what kind of operation/action prevent long­time permanence in low­
         ️


              power consumption states?
              support software developers, both kernel­space and user­space
         ️



        main features
    ️


              show how well your system is using various HW power­saving 
         ️


              features
                    both C­ and P­States statistics are collected
                ️


              show you the culprit software components that are preventing optimal 
         ️


              usage of your HW power savings
                    userspace should prefere deferrable timers
                ️


              provide you with tuning suggestions to achieve low power 
         ️


              consumption
    support for CPUIdle since v1.10



              previsou version used the ACPI interface
         ️


         ️
Patrick Bellasi bellasi@elet.polimi.it



PowerTOP
example session running on OMAP3430   37
Patrick Bellasi bellasi@elet.polimi.it



            The CPUFreq Framework [9]
            CPU active power consumption optimization                           38


    Generic processor active power management framework



             support for multiple performance states with different characteristics
        ️


                   power consumptions
               ️



    Clean interface



             abstract between driver and governor
        ️


                   separate arch­specific cpu driver (mechanisms) from arch­independent 
               ️


                   power management governors  (policies)
             provide a convenient user­space interface
        ️



    Since 2.6.9                                                       [W]                     0.25s*34W+ 0.75s*1W = 9.25J

                                                          Full­speed 34

             on­demand governor
        ️                                                 Half­speed 24
                                                                                                    0.5s*24W+ 0.5s*1W = 12.5J


                   scaling based on load
               ️

                                                               Idle    1
                   exploit race­to­idle
               ️
                                                                            0                0.5                   1            [s]

             more recently added support for conservative governor
        ️


                   implementing a better battery­fair policy
               ️
Patrick Bellasi bellasi@elet.polimi.it



               CPUFreq
               optimized CPU power consumptions                                                39

                                                                                                                          Decide the target
                                                                                                                              P­State
                                                                                struct cpufreq_governor {
 User­level          powersaved           cpuspeed                                     governor();
 governors                                                                      }

               aggressive                          battery­fair
 In­kernel      on­demand        userspace       conservative
 governors                  governor interface
                                                                                            data structures
                      Generic CPUFreq Framework
                                                                                            initialization and registration
                                                                                            transition handling
                             driver interface                          cpufreq core
                                                                                            policy and transition notifiers
                  acpi­cpufreq             speedstep
CPU­specific
  drivers                                                                                                 struct cpufreq_policy {
                             ACPI processor                                                                      min_freq;           [kHz]
                                                                              Define supported
                                 driver                                        policy values                     max_freq;           [kHz]
                                                                                                                 transition_latency; [us]
                                                                  struct cpufreq_driver {
                                                                                                                 ...
                                                                         init();
                                                                                                          }
                                                                         exit();
                                                                         verify();                                 Compile frequency
                                                                         set_policy();                                  tables
                                                                         resume();
                                                                  }
Patrick Bellasi bellasi@elet.polimi.it



             Linux PM
             device runtime suspend/resume support   40


    Every driver should implement suspend/resume calls



        register to the LDM (Linux Device Model)
    ️


        release clocks and save context in suspend call and restore 
    ️


        these when resume is called
        drivers which have already released their clocks and have 
    ️


        saved their context need not do anything in their suspend call
    Drivers should support OFF modes



        all registers in the power domain are reset when the power 
    ️


        domain goes to OFF
        OFF mode could introduce considerable latency for wakeup
    ️


        the system can enter chip off through two paths:
    ️


              ­ Idle loop
         ️


              ­ Suspend/Resume
         ️
Patrick Bellasi bellasi@elet.polimi.it



              Linux PM
              device runtime suspend/resume support             41


    Device drivers responsibilities



         aggressively manage request/release of clocks
     ️


               through clock framework => control their clocks on a request basis
          ️


               not transaction based => use  inactivity timer to cut off clocks after a 
          ️


               period of inactivity
         need to register with the LDM
     ️


               implement suspend() and resume() calls
          ️



         specify constraints when required for its functionalities
     ️


               e.g. min. frequency, max latency, ...
          ️



         need to implement context save/restore
     ️


               be aware of power OFF mode
          ️



         configure optimal power settings
     ️


               according to standing system constraints
          ️


     ️
Patrick Bellasi bellasi@elet.polimi.it



          Linux PM
          Examples of changes being made                     42

    Transaction based drivers will control clocks based on activity



          e.g. i2c driver enables clocks when there are pending requests and 
      ️


          disables them when there are no pending requests
    Camera – Clocks will be enabled as long as the driver is required



    Display – The fbdev inactivity timer (which is tied to user activity) can be used 



    to turn off display clock
    MMC – Clocks are controlled on a per command basis



    GPTimer – Clocks are controlled as per requirement



          i.e. when a timer is in use, they will be enabled and will be disabled when 
        ️


          they are not in use
    UART – Console clocks are cut in the idle loop (before putting core domain to 



    retention) and other UART clocks could be controlled on a need basis
    USB – Clocks can be controlled as per requirement (only when transfers are 



    going on)
Patrick Bellasi bellasi@elet.polimi.it



            Linux PM
            Context save and restore                                43


    When all drivers in a power domain release their clocks, 



    the power domain can go to RET or OFF state
             the shared resource framework programs the power domain to target 
        ️


             state depending on the latency constraints in the system
    Drivers can follow any of the following methods to save and 



    restore their context
             ­ Always save/ always restore
        ️


                   drivers which do not have lots of registers can always save context and 
               ️


                   restore context because it will not cause a lot of overhead
             ­ Early save/ restore on demand
        ️


                   drivers save context every time they release their clocks but restore it 
               ️


                   only if the power domain has actually gone to off after saving
                   this makes sense for drivers which have a large restore time with save 
               ️


                   time being minimal
Patrick Bellasi bellasi@elet.polimi.it




          References
                                                                    44

1. V. Pallipadi, S. Li, A. Belay, cpuidle ­ Do nothing, efficiently... Proceedings of the Linux 
    Symposium, Vol. 2, June 2007. Ottawa, Ontario Canada.
2. Walfson Microelectronics, Linux Voltage and Current Regulator Framework
3. An API for specifying latency constraints, LWN Article, August 2006.
4. M. Gross, Power Management QoS: How You Could Use it  in Your Embedded Application, Intel, 
    Open Source Technology Center.
5. J. Stults, N. Aravamuda, D. Hart, 
    We Are Not Getting Any Younger: A New Approach to Time and Timers, Proceedings of the 
    Linux Symposium, Vol. 1, July 2005. Ottawa, Ontario Canada.
6. T. Gleixner, D. Niehaus, Hrtimers and Beyond: Transforming the Linux Time Subsystems, 
    Proceedings of the Linux Symposium, Vol. 1, July 2006. Ottawa, Ontario Canada.
7. Clockevents and dyntick, LWN Article, February 2007.
8. Deferrable timers, LWN Article, March 2007.
9. V. Pallipandi, A. Starikovskiy, The Ondemand Governor, Proceedings of the Linux Symposium, 
    Vol. 2, July 2006. Ottawa, Ontario Canada.
10. R. Woodruff, Linux system power management on OMAP3430, Embedded Linux Conference, 
    April 2008. Silicon Valley, US.
11. PowerTOP, 

Más contenido relacionado

La actualidad más candente

Dell UPS Management Software Networking Examples
Dell UPS Management Software Networking ExamplesDell UPS Management Software Networking Examples
Dell UPS Management Software Networking ExamplesDell TechCenter
 
Dell UPS | Power Management in the Virtualized World
Dell UPS | Power Management in the Virtualized WorldDell UPS | Power Management in the Virtualized World
Dell UPS | Power Management in the Virtualized WorldDell TechCenter
 
Ticket to Ride - Bus Fleet Operated and Managed with OSGi - C Larsson
Ticket to Ride - Bus Fleet Operated and Managed with OSGi - C LarssonTicket to Ride - Bus Fleet Operated and Managed with OSGi - C Larsson
Ticket to Ride - Bus Fleet Operated and Managed with OSGi - C Larssonmfrancis
 
E zcall web access client sales doc
E zcall web access client sales docE zcall web access client sales doc
E zcall web access client sales docQBsoft Solutions
 
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...mentoresd
 
Eco4Cloud - Company Presentation
Eco4Cloud - Company PresentationEco4Cloud - Company Presentation
Eco4Cloud - Company PresentationEco4Cloud
 
AMD PowerTune & ZeroCore Power Technologies
AMD PowerTune & ZeroCore Power TechnologiesAMD PowerTune & ZeroCore Power Technologies
AMD PowerTune & ZeroCore Power TechnologiesAMD
 
OpenEye Client Software Training
OpenEye Client Software TrainingOpenEye Client Software Training
OpenEye Client Software Trainingopeneyevideo
 
E Plex Presentation Piece 6 2012
E Plex Presentation Piece 6 2012E Plex Presentation Piece 6 2012
E Plex Presentation Piece 6 2012Bass Products
 
MENTOR MP/QUANTUM MP
MENTOR MP/QUANTUM MPMENTOR MP/QUANTUM MP
MENTOR MP/QUANTUM MPArve
 
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012Amazon Web Services
 
CarDAQ Plus Manual from Clark Heintz Tools & Equipment LLC
CarDAQ Plus Manual from Clark Heintz Tools & Equipment LLCCarDAQ Plus Manual from Clark Heintz Tools & Equipment LLC
CarDAQ Plus Manual from Clark Heintz Tools & Equipment LLCClark Heintz
 
Vampire Mice: How USB PM Impacts You
Vampire Mice: How USB PM Impacts YouVampire Mice: How USB PM Impacts You
Vampire Mice: How USB PM Impacts YouSage Sharp
 
Benefits of Using FPGAs for Embedded Processing: Embedded World 2010
Benefits of Using FPGAs for Embedded Processing: Embedded World 2010Benefits of Using FPGAs for Embedded Processing: Embedded World 2010
Benefits of Using FPGAs for Embedded Processing: Embedded World 2010Altera Corporation
 
Cots moves to multicore: Wind River
Cots moves to multicore: Wind RiverCots moves to multicore: Wind River
Cots moves to multicore: Wind RiverKonrad Witte
 
Single Console for viewing OpManager & DeviceExpert Alarms
Single Console for viewing OpManager & DeviceExpert AlarmsSingle Console for viewing OpManager & DeviceExpert Alarms
Single Console for viewing OpManager & DeviceExpert AlarmsManageEngine, Zoho Corporation
 

La actualidad más candente (20)

Dell UPS Management Software Networking Examples
Dell UPS Management Software Networking ExamplesDell UPS Management Software Networking Examples
Dell UPS Management Software Networking Examples
 
Dell UPS | Power Management in the Virtualized World
Dell UPS | Power Management in the Virtualized WorldDell UPS | Power Management in the Virtualized World
Dell UPS | Power Management in the Virtualized World
 
Ticket to Ride - Bus Fleet Operated and Managed with OSGi - C Larsson
Ticket to Ride - Bus Fleet Operated and Managed with OSGi - C LarssonTicket to Ride - Bus Fleet Operated and Managed with OSGi - C Larsson
Ticket to Ride - Bus Fleet Operated and Managed with OSGi - C Larsson
 
E zcall web access client sales doc
E zcall web access client sales docE zcall web access client sales doc
E zcall web access client sales doc
 
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
 
Eco4Cloud - Company Presentation
Eco4Cloud - Company PresentationEco4Cloud - Company Presentation
Eco4Cloud - Company Presentation
 
AMD PowerTune & ZeroCore Power Technologies
AMD PowerTune & ZeroCore Power TechnologiesAMD PowerTune & ZeroCore Power Technologies
AMD PowerTune & ZeroCore Power Technologies
 
OpenEye Client Software Training
OpenEye Client Software TrainingOpenEye Client Software Training
OpenEye Client Software Training
 
Blackfin Device Drivers
Blackfin Device DriversBlackfin Device Drivers
Blackfin Device Drivers
 
ARM and SoC Traning Part I -- Overview
ARM and SoC Traning Part I -- OverviewARM and SoC Traning Part I -- Overview
ARM and SoC Traning Part I -- Overview
 
Minimizing I/O Latency in Xen-ARM
Minimizing I/O Latency in Xen-ARMMinimizing I/O Latency in Xen-ARM
Minimizing I/O Latency in Xen-ARM
 
E Plex Presentation Piece 6 2012
E Plex Presentation Piece 6 2012E Plex Presentation Piece 6 2012
E Plex Presentation Piece 6 2012
 
MENTOR MP/QUANTUM MP
MENTOR MP/QUANTUM MPMENTOR MP/QUANTUM MP
MENTOR MP/QUANTUM MP
 
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012
CPN208 Failures at Scale & How to Ride Through Them - AWS re: Invent 2012
 
CarDAQ Plus Manual from Clark Heintz Tools & Equipment LLC
CarDAQ Plus Manual from Clark Heintz Tools & Equipment LLCCarDAQ Plus Manual from Clark Heintz Tools & Equipment LLC
CarDAQ Plus Manual from Clark Heintz Tools & Equipment LLC
 
Vampire Mice: How USB PM Impacts You
Vampire Mice: How USB PM Impacts YouVampire Mice: How USB PM Impacts You
Vampire Mice: How USB PM Impacts You
 
Sharam salamian
Sharam salamianSharam salamian
Sharam salamian
 
Benefits of Using FPGAs for Embedded Processing: Embedded World 2010
Benefits of Using FPGAs for Embedded Processing: Embedded World 2010Benefits of Using FPGAs for Embedded Processing: Embedded World 2010
Benefits of Using FPGAs for Embedded Processing: Embedded World 2010
 
Cots moves to multicore: Wind River
Cots moves to multicore: Wind RiverCots moves to multicore: Wind River
Cots moves to multicore: Wind River
 
Single Console for viewing OpManager & DeviceExpert Alarms
Single Console for viewing OpManager & DeviceExpert AlarmsSingle Console for viewing OpManager & DeviceExpert Alarms
Single Console for viewing OpManager & DeviceExpert Alarms
 

Destacado

Linaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISALinaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISAPatrick Bellasi
 
Embedded Systems Power Management
Embedded Systems Power ManagementEmbedded Systems Power Management
Embedded Systems Power ManagementPatrick Bellasi
 
VTTEK _ Bảng mô tả vị trí tuyển dụng
VTTEK _ Bảng mô tả vị trí tuyển dụngVTTEK _ Bảng mô tả vị trí tuyển dụng
VTTEK _ Bảng mô tả vị trí tuyển dụngDang Xuan Trung
 
Constrained Power Management
Constrained Power ManagementConstrained Power Management
Constrained Power ManagementPatrick Bellasi
 
Evoluzione dei Sistemi Embedded: Verso architetture multi-core
Evoluzione dei Sistemi Embedded: Verso architetture multi-coreEvoluzione dei Sistemi Embedded: Verso architetture multi-core
Evoluzione dei Sistemi Embedded: Verso architetture multi-corePatrick Bellasi
 
Cross-Layer Frameworks for Constrained Power and Resources Management of Embe...
Cross-Layer Frameworks for Constrained Power and Resources Management of Embe...Cross-Layer Frameworks for Constrained Power and Resources Management of Embe...
Cross-Layer Frameworks for Constrained Power and Resources Management of Embe...Patrick Bellasi
 
LCA13: Power State Coordination Interface
LCA13: Power State Coordination InterfaceLCA13: Power State Coordination Interface
LCA13: Power State Coordination InterfaceLinaro
 
U boot source clean up project how-to
U boot source clean up project how-toU boot source clean up project how-to
U boot source clean up project how-toMacpaul Lin
 
U boot 程式碼打掃計畫
U boot 程式碼打掃計畫U boot 程式碼打掃計畫
U boot 程式碼打掃計畫Macpaul Lin
 
How to build a community in a company blue&macpaul coscup2015
How to build a community in a company blue&macpaul coscup2015How to build a community in a company blue&macpaul coscup2015
How to build a community in a company blue&macpaul coscup2015Macpaul Lin
 
Bootstrap process of u boot (NDS32 RISC CPU)
Bootstrap process of u boot (NDS32 RISC CPU)Bootstrap process of u boot (NDS32 RISC CPU)
Bootstrap process of u boot (NDS32 RISC CPU)Macpaul Lin
 
Presentation linux on power
Presentation   linux on powerPresentation   linux on power
Presentation linux on powersolarisyougood
 
USB Specification 2.0 - Chapter 9 - Device Framework
USB Specification 2.0 - Chapter 9 - Device FrameworkUSB Specification 2.0 - Chapter 9 - Device Framework
USB Specification 2.0 - Chapter 9 - Device FrameworkMacpaul Lin
 
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoCMacpaul Lin
 
Linux Porting to a Custom Board
Linux Porting to a Custom BoardLinux Porting to a Custom Board
Linux Porting to a Custom BoardPatrick Bellasi
 
Exploiting Linux Control Groups for Effective Run-time Resource Management
Exploiting Linux Control Groups for Effective Run-time Resource ManagementExploiting Linux Control Groups for Effective Run-time Resource Management
Exploiting Linux Control Groups for Effective Run-time Resource ManagementPatrick Bellasi
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecturehugo lu
 
Linux architecture
Linux architectureLinux architecture
Linux architecturemcganesh
 
Distributed computing ).ppt him
Distributed computing ).ppt himDistributed computing ).ppt him
Distributed computing ).ppt himHimanshu Saini
 

Destacado (20)

Linaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISALinaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISA
 
Embedded Systems Power Management
Embedded Systems Power ManagementEmbedded Systems Power Management
Embedded Systems Power Management
 
VTTEK _ Bảng mô tả vị trí tuyển dụng
VTTEK _ Bảng mô tả vị trí tuyển dụngVTTEK _ Bảng mô tả vị trí tuyển dụng
VTTEK _ Bảng mô tả vị trí tuyển dụng
 
Constrained Power Management
Constrained Power ManagementConstrained Power Management
Constrained Power Management
 
Evoluzione dei Sistemi Embedded: Verso architetture multi-core
Evoluzione dei Sistemi Embedded: Verso architetture multi-coreEvoluzione dei Sistemi Embedded: Verso architetture multi-core
Evoluzione dei Sistemi Embedded: Verso architetture multi-core
 
Cross-Layer Frameworks for Constrained Power and Resources Management of Embe...
Cross-Layer Frameworks for Constrained Power and Resources Management of Embe...Cross-Layer Frameworks for Constrained Power and Resources Management of Embe...
Cross-Layer Frameworks for Constrained Power and Resources Management of Embe...
 
LCA13: Power State Coordination Interface
LCA13: Power State Coordination InterfaceLCA13: Power State Coordination Interface
LCA13: Power State Coordination Interface
 
U boot source clean up project how-to
U boot source clean up project how-toU boot source clean up project how-to
U boot source clean up project how-to
 
U boot 程式碼打掃計畫
U boot 程式碼打掃計畫U boot 程式碼打掃計畫
U boot 程式碼打掃計畫
 
How to build a community in a company blue&macpaul coscup2015
How to build a community in a company blue&macpaul coscup2015How to build a community in a company blue&macpaul coscup2015
How to build a community in a company blue&macpaul coscup2015
 
Bootstrap process of u boot (NDS32 RISC CPU)
Bootstrap process of u boot (NDS32 RISC CPU)Bootstrap process of u boot (NDS32 RISC CPU)
Bootstrap process of u boot (NDS32 RISC CPU)
 
Presentation linux on power
Presentation   linux on powerPresentation   linux on power
Presentation linux on power
 
USB Specification 2.0 - Chapter 9 - Device Framework
USB Specification 2.0 - Chapter 9 - Device FrameworkUSB Specification 2.0 - Chapter 9 - Device Framework
USB Specification 2.0 - Chapter 9 - Device Framework
 
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoC
 
Linux Porting to a Custom Board
Linux Porting to a Custom BoardLinux Porting to a Custom Board
Linux Porting to a Custom Board
 
Exploiting Linux Control Groups for Effective Run-time Resource Management
Exploiting Linux Control Groups for Effective Run-time Resource ManagementExploiting Linux Control Groups for Effective Run-time Resource Management
Exploiting Linux Control Groups for Effective Run-time Resource Management
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architecture
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
Linux architecture
Linux architectureLinux architecture
Linux architecture
 
Distributed computing ).ppt him
Distributed computing ).ppt himDistributed computing ).ppt him
Distributed computing ).ppt him
 

Similar a Linux Power Management Slideshare

Plugin-able POS Solutions by Javascript @HDM9 Taiwan
Plugin-able POS Solutions by Javascript @HDM9 TaiwanPlugin-able POS Solutions by Javascript @HDM9 Taiwan
Plugin-able POS Solutions by Javascript @HDM9 TaiwanRack Lin
 
Advanced Oracle Troubleshooting
Advanced Oracle TroubleshootingAdvanced Oracle Troubleshooting
Advanced Oracle TroubleshootingHector Martinez
 
Use Cases and Integration Scenarios with SAP Adaptive Computing Virtualization
Use Cases and Integration Scenarios with SAP Adaptive Computing VirtualizationUse Cases and Integration Scenarios with SAP Adaptive Computing Virtualization
Use Cases and Integration Scenarios with SAP Adaptive Computing VirtualizationGunther_01
 
Preventing the Next Deployment Issue with Continuous Performance Testing and ...
Preventing the Next Deployment Issue with Continuous Performance Testing and ...Preventing the Next Deployment Issue with Continuous Performance Testing and ...
Preventing the Next Deployment Issue with Continuous Performance Testing and ...Correlsense
 
Galera Multi Master Synchronous My S Q L Replication Clusters
Galera  Multi Master  Synchronous  My S Q L  Replication  ClustersGalera  Multi Master  Synchronous  My S Q L  Replication  Clusters
Galera Multi Master Synchronous My S Q L Replication ClustersPerconaPerformance
 
Using LCDS to Power Live REAs
Using LCDS to Power Live REAsUsing LCDS to Power Live REAs
Using LCDS to Power Live REAsShailesh Mangal
 
MPLS/SDN Intersections Next Generation Access Networks at MPLS & Ethernet Wor...
MPLS/SDN Intersections Next Generation Access Networks at MPLS & Ethernet Wor...MPLS/SDN Intersections Next Generation Access Networks at MPLS & Ethernet Wor...
MPLS/SDN Intersections Next Generation Access Networks at MPLS & Ethernet Wor...ADVA
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...netvis
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Finaljucaab
 
Joe Honan Virtualization Trends
Joe Honan   Virtualization TrendsJoe Honan   Virtualization Trends
Joe Honan Virtualization Trends1velocity
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesJason TC HOU (侯宗成)
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Oracle Exadata Management with Oracle Enterprise Manager
Oracle Exadata Management with Oracle Enterprise ManagerOracle Exadata Management with Oracle Enterprise Manager
Oracle Exadata Management with Oracle Enterprise ManagerEnkitec
 
Fy09 Sask Tel Learn It System Centre Garth Jones
Fy09 Sask Tel Learn It   System Centre   Garth JonesFy09 Sask Tel Learn It   System Centre   Garth Jones
Fy09 Sask Tel Learn It System Centre Garth Jonessim100
 

Similar a Linux Power Management Slideshare (20)

Plugin-able POS Solutions by Javascript @HDM9 Taiwan
Plugin-able POS Solutions by Javascript @HDM9 TaiwanPlugin-able POS Solutions by Javascript @HDM9 Taiwan
Plugin-able POS Solutions by Javascript @HDM9 Taiwan
 
XS Boston 2008 Network Topology
XS Boston 2008 Network TopologyXS Boston 2008 Network Topology
XS Boston 2008 Network Topology
 
Advanced Oracle Troubleshooting
Advanced Oracle TroubleshootingAdvanced Oracle Troubleshooting
Advanced Oracle Troubleshooting
 
Use Cases and Integration Scenarios with SAP Adaptive Computing Virtualization
Use Cases and Integration Scenarios with SAP Adaptive Computing VirtualizationUse Cases and Integration Scenarios with SAP Adaptive Computing Virtualization
Use Cases and Integration Scenarios with SAP Adaptive Computing Virtualization
 
Preventing the Next Deployment Issue with Continuous Performance Testing and ...
Preventing the Next Deployment Issue with Continuous Performance Testing and ...Preventing the Next Deployment Issue with Continuous Performance Testing and ...
Preventing the Next Deployment Issue with Continuous Performance Testing and ...
 
Galera Multi Master Synchronous My S Q L Replication Clusters
Galera  Multi Master  Synchronous  My S Q L  Replication  ClustersGalera  Multi Master  Synchronous  My S Q L  Replication  Clusters
Galera Multi Master Synchronous My S Q L Replication Clusters
 
saurabh soni rac
saurabh soni racsaurabh soni rac
saurabh soni rac
 
Using LCDS to Power Live REAs
Using LCDS to Power Live REAsUsing LCDS to Power Live REAs
Using LCDS to Power Live REAs
 
MPLS/SDN Intersections Next Generation Access Networks at MPLS & Ethernet Wor...
MPLS/SDN Intersections Next Generation Access Networks at MPLS & Ethernet Wor...MPLS/SDN Intersections Next Generation Access Networks at MPLS & Ethernet Wor...
MPLS/SDN Intersections Next Generation Access Networks at MPLS & Ethernet Wor...
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Final
 
Joe Honan Virtualization Trends
Joe Honan   Virtualization TrendsJoe Honan   Virtualization Trends
Joe Honan Virtualization Trends
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network Issues
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Oracle Exadata Management with Oracle Enterprise Manager
Oracle Exadata Management with Oracle Enterprise ManagerOracle Exadata Management with Oracle Enterprise Manager
Oracle Exadata Management with Oracle Enterprise Manager
 
Fy09 Sask Tel Learn It System Centre Garth Jones
Fy09 Sask Tel Learn It   System Centre   Garth JonesFy09 Sask Tel Learn It   System Centre   Garth Jones
Fy09 Sask Tel Learn It System Centre Garth Jones
 

Linux Power Management Slideshare

  • 1. Patrick Bellasi Dipartimenti di Elettronica ed Informazione Politecnico di Milano bellasi@elet.polimi.it Linux Power Management  Architecture An updated review on Linux PM  frameworks Last update: Nov 11, 2008
  • 2. Patrick Bellasi bellasi@elet.polimi.it Agenda 2 Outline the architecture for advanced power management   support in recent Linux kernels key points for effective PM and overall picture view ️ Review of main subsystems  ­ Advanced timekeeping framework ️ ­ Clock framework ️ ­ Voltage/Power control Framework ️ ­ QoS Framework ️ ­ CPUIdle ️ ­ CPUFreq ️ ­ Linux PM ️ Why we need them? ️ What they do? ️ How we can exploit them? ️
  • 3. Patrick Bellasi bellasi@elet.polimi.it Modern PM architectures Key points for effective power management 3 Exploit partial activity  disable parts of the system when not needed ️ SW does part of the work, HW dependencies do the rest  exploit existing system framework ️ track dependencies (producer­consumer) ️ Fine grained tracking track usage (reference counting) ️ and constraints system constraints assertion ️ notification­chains ️ drivers support required ️ aggressively get and release resources ️ support efficiently OFF mode (0 volts) ️ full context loss, low­latency wakeup, ️ provide context saving­restore routines ️ use constraints to require operational restrictions ️
  • 4. Patrick Bellasi bellasi@elet.polimi.it Modern PM architectures Key points for effective power management 4 Control power sources   clocks, power domains, voltage domains, memories and power IC  ️ resources device drivers should exploit interfaces to control power resources ️ Constraints aware drivers and subsystems  Inactive state power saving in OS idle  automatic choice between multiple C­States ️ OFF and retention modes ️ manual suspend/resume ️ system­wide sleep states ️ Active state dynamic power management  automatic choice between multiple P­States ️
  • 5. Patrick Bellasi bellasi@elet.polimi.it Modern PM architectures What we should (already) have 5 MM Resource Manager Legend User Arch Indep Policy management Kernel CPUIdle Goevernor CPUFreq Goevernor Layer Arch Dependant (menu governor) (on­demand governor) LDM OMAP34xx CPUIdle Framework CPUFreq Framework Control Notifications Constraints Constraints Framework Device Driver DSPBridge DSP Bridge CPUIdle Driver CPUFreq Driver Drivers Layer Workload Resource Tracking Resource Tracking Predictor Layer Shared Resource Framework Clock  Power domain Memory Logic Power Resource Framework Latency Con DPLL State Workload Frequ. Con Management DPLL Con OPP Con Predictor Layer Logical Voltage  Framework PWRM DSPBridge PRCM API SmartRelfex Layer 2 Driver PRCM API Layer 1&2 (OMAP34xx) PowerIC PRCM API Layer 1 Driver
  • 6. Patrick Bellasi bellasi@elet.polimi.it Linux frameworks to support PM Introduction 6 Typical mainline Linux Power Management features:  Platform­specific code ️ idle­loop, timekeeping (dynamic tick) & clock tree, latency ️ Dynamic Power Management ️ Low­power states activation for unused devices ️ C­States, LDM, Suspension (RAM), Hibernation (Disk) ️ Dynamic Voltage and Frequency Scaling ️ Adapting device performances to application needs ️ P­States, OPP ️ Main focus on x86 arch  custom and different PM development for SoC embedded devices ️ 2nd Linux PM Summit was only on 2007 ️ increasing emphasis on embedded systems ️
  • 7. Ve nk at es h  Pa l lip ad i C 2.6.9 G PU ov Fr er eq no  O r nD Jo em hn an  S Timeline d  tu l tz G en er In ic  T go H im  M rti m e­ ol of na Th er ­D r C s  om ay ba lo as se ck  G ­  S le ou cod G ix e rc ne 2.6.16 en r e  er an C ic D d  C lo yn ck am lock  fr Ve  E am ic D nk  T ve yn ic ew at 2.6.21 nt k   es tic or k ks h  Pa l lip ad i C M PU 2.6.24 Linux frameworks to support PM Id ar l k e   G fra ro m ss ew Q or k oS 2.6.25  F 7 ra April 2008 m Li am ew  G or k rid w oo d R 2.6.2x eg ul at or s  Fr am ew or k Patrick Bellasi bellasi@elet.polimi.it
  • 8. Patrick Bellasi bellasi@elet.polimi.it The Linux Time System The old tick­based implementation 8 Strongly associated with  Prior to 2.6.16  individual HW devices Arch 1 HW Clock source TOD HW Lake of a generic Clock event device ISR abstraction layer Arch 2 Timekeeping HW TOD Clock source HW Tick ISR Clock event device Process acc. Arch 3 Clock related  HW Clock source TOD services Profiling HW Jiffies Clock event device ISR Timer whell Time tracked using periodic timer ticks
  • 9. Patrick Bellasi bellasi@elet.polimi.it The Linux Time System Towards a new implementation 9 Required abstractions  Clock source (CS) management ️ provide read access to monotonically increasing time values ️ use human­oriented values (nanoseconds) ️ Clock synchronization ️ system time value and clock source drifts correction ️ use reference time sources (e.g. NTP or GPS/GSM) ️ Time­of­day representation ️ applying drifts corrections to clock source ️ Clock event device (CED) management ️ Schedule next event interrupt using clock event device ️ minimize arch dependent code, allow easy run­time addition of new CED ️ Removing tick dependencies ️ better support high resolution timers and precise timer scheduling ️ by replace the CTW (Cascading Timer Wheel) mechanism ️
  • 10. Patrick Bellasi bellasi@elet.polimi.it The Linux Time System hrtimers – a complementary time subsystem 10 Why a new Timer Subsystem?  CTW (1997) has unacceptable worst case performances ️ Insertion is O(1), but cascading require interrupt disabling... ️ CTW is the better solution for long­term protocol­timeout related  ️ timers: ­ expiration time within the first category ️ ­ removed before they expire or have to be cascaded ️ is based on periodic interrupt (jiffies) ️ support only low­resolution time values ️ Propose solution: use two category for timers  Timeouts – low­resolution, almost always removed ️ using the existing CTW ️ Timers – high­resolution, usually expire ️ using the new hrtimers framework, based on human time units [ns], kept  ️ in per­CPU time sorted list implemented using red­black tree (O(log(N)) ️
  • 11. Patrick Bellasi bellasi@elet.polimi.it The Linux Time System hrtimers – a complementary time subsystem 11 Since 2.6.16  Arch 1 Using human­based  Better data structures to support  HW Clock source TOD time units [ns] other timer code rework (e.g. high­resolution timers and  HW Clock event device ISR dynamic­tick) Arch 2 hrtimers Timekeeping HW TOD Clock source HW Tick ISR Clock event device Process acc. Arch 3 Timer queues run from the  HW Clock source TOD Profiling normal timer softirq HW Jiffies Clock event device ISR Low­resolution jiffies  based timers Timer whell
  • 12. Patrick Bellasi bellasi@elet.polimi.it The Linux Time System [5]  The Generic Time­of­Day (GTOD) 12 Why a Generic Time of Day Implementation?  allow sharing of clock source code across archs ️ move large portion of code out of arch­specific areas ️ limit arch­dependent code to HW interface and setup ️ avoid interpolation based computation of human­time values ️ compensate clock drifts as clock source offsets to get TOD ️ previous implementation track compensated TOD directly and derive  ️ increasing human­time values from it (error addition problem) Proposed solution: a new generic TOD framework  presented by John Stultz at OLS 2005, merged in 2.6.16 ️ cleanup and simplify by arch­independent common code ️ use nanoseconds as fundamental time unit ️ modular design supporting run­time add/remove of time source ️ break tick­based dependency to avoid interpolation issues ️
  • 13. Patrick Bellasi bellasi@elet.polimi.it The Linux Time System  The Generic Time­of­Day (GTOD) 13 Arch independent and dynamic clock source management Human­based time units time­of­day representation Shared HW Clock Source Arch­dependent code limited  to direct HW interface and  setup procedure Clock synch. Arch 2 hrtimers Timekeeping HW TOD Clock source HW Tick ISR Clock event device Process acc. Arch 3 HW Clock source TOD Profiling HW Jiffies Clock event device ISR Timer whell
  • 14. Patrick Bellasi bellasi@elet.polimi.it The Linux Time System  The Generic Time­of­Day (GTOD) 14 Arch independent and dynamic clock source management Human­based time units time­of­day representation Shared HW Clock Source Arch­dependent code limited  to direct HW interface and  setup procedure Clock synch. TOD Arch 2 hrtimers Timekeeping HW HW Tick ISR Clock event device Process acc. Arch 3 HW Profiling HW Jiffies Clock event device ISR Timer whell
  • 15. Patrick Bellasi bellasi@elet.polimi.it The Linux Time System [6] The Clock Event Device Abstraction 15 Why a Clock Event Device Abstraction?  provide an abstraction level for timekeeping and time­related  ️ activities substantial reduction in arch­dependent code ️ support either periodic or individual programmable events ️ build the base for a generic dynamic tick implementation ️ Proposed solution: by Thomas Gleixner, merged in 2.6.21  ­ Registration interface: run­time configuration of clock event device ️ property based definition (e.g. capabilities, max/min delta, mult/shift) ️ support generic interrupt setup code (i.e. call­back based handlers) ️ support for power­management ️ ­ Event distribution interface: bind clock events related services to  ️ clock event sources classical time services (e.g. process accounting, profiling, periodic tick) ️ next event interrupt (e.g. high­resolution timers, dynamic tick) ️
  • 16. Patrick Bellasi bellasi@elet.polimi.it The Linux Time System [6] High Resolution Timers Support 16 Why high­resolution timers?  exploit brand­new time subsystems reworks (GTOD and  ️ hrtimers) require just arch­independent code by modifying hrtimers framework ️ accurate delivery of high­resolution timers (e.g. itimers and  ️ POSIX) Proposed solution:  presented by Thomas Gleixner at OLS 2006 ️ switch to high­resolution mode late in the boot process ️ exploiting clock source and clock event device dynamic registration ️ ensure kernel compatibility on non high­resolution platforms ️ support next event scheduling and next event interrupt handler ️ softirq based execution of hrtimer queues, independent from tick bound ️ callbacks from next­event handler (with some limitations, e.g. for  ️ nanosleep) ️
  • 17. Patrick Bellasi bellasi@elet.polimi.it The Linux Time System [7]  The Dynamic Tick Support 17 Why a Tick­less Kernel?  processors are quite good about saving power when idle ️ HW has support to reduce leakage on idle states ️ avoid processor wakeups when nothing is happening ️ by turning off the period timer tick when in idle state ️ looking at the timer queue to see when the next timer expires ️ CPUIdle can exploit this information ️ in the absence of other events (e.g. hardware interrupts), the system will  ️ sleep until the nearest timer is due enable the periodic tick once the processor goes out of the idle state ️ Proposed solution: merged in 2.6.21  room for further development (i.e. full tickless systems) ️ time slice is controlled by the scheduler, variable frequency profiling, and  ️ a complete removal of jiffies in the future
  • 18. Patrick Bellasi bellasi@elet.polimi.it The Linux Time System Deferrable Timers 18 Why deferrable timers?  better exploit dynamic­tick, by allowing to sleep longer ️ not all timers has to run as soon as the requested period has expired ️ non­critical timeouts can run some fraction of a second later ️ i.e. when the processor wakes up for other reasons ️ Intr 250 500 0 [ms] Proposed solution: new function added to the internal   kernel API  init_timer_deferrable(struct timer_list *timer) ️ timers defined as deferrable are recognized by the kernel ️ will not be considered when the kernel makes its quot;when should the  ️ next timer interrupt be?quot; decision timer_list, workqueue,  ️
  • 19. Patrick Bellasi bellasi@elet.polimi.it The Linux Time System Deferrable Timers 19 Why deferrable timers?  better exploit dynamic­tick, by allowing to sleep longer ️ not all timers has to run as soon as the requested period has expired ️ non­critical timeouts can run some fraction of a second later ️ i.e. when the processor wakes up for other reasons ️ Intr Intr 250 500 250 0 0 500 [ms] [ms] Proposed solution: new function added to the internal   kernel API  init_timer_deferrable(struct timer_list *timer) ️ timers defined as deferrable are recognized by the kernel ️ will not be considered when the kernel makes its quot;when should the  ️ next timer interrupt be?quot; decision timer_list, workqueue,  ️
  • 20. Patrick Bellasi bellasi@elet.polimi.it Linux Timekeeping Subsystem Complete support for RT and PM 20 Shared HW Clock Source Clock synch. TOD Arch 1 HW Timekeeping hrtimers Shared HW Clock events HW Next event ISR Arch 2 Dynamic tick HW Event distribution HW Process acc. Arch 3 Profiling HW Jiffies ISR HW Timer whell
  • 21. Patrick Bellasi bellasi@elet.polimi.it The Clock Framework centralized control for all clock related  21 functionalities Define an (abstract) API for all clock related functionalities  dependencies (clock tree), rates (get/set), status (enable/disable) ️ Since 2.6.16  standard open source solution ️ Initially added for ARM, now multiple architectures use it ️ In 2.6.25: ARM (OMAP, PXA, SA1100, AT91, ...), Avr32, PowerPC, ... ️ recent proposals on LKML for a generic clocklib implementation ️ Used by device drivers  reference counting ️ re­entrant code ️ May support layered structure  platform generic layer and board specific low­level ️
  • 22. Patrick Bellasi bellasi@elet.polimi.it Power/Voltage Framework track device's power dependencies 22 provide a framework analogues to the clock framework but   for power/voltage regulators control voltage regulators dependency tracking ️ satisfy client devices needs depending on their state ️ optimize generators usage and efficiency ️ no support in mainline, but different patch­set available  ARM patch by Nokia (no too much general) ️ Platform independent framework by Wolfson Microelectronics ️ ️
  • 23. Patrick Bellasi bellasi@elet.polimi.it Power/Voltage Framework Wolfson Microelectronics ­ Linux Voltage and  23 Current Regulator Framework [2] Announced on LKML in Feb 2008, presented on ELC 2008  Standard and generic kernel interfaces to dynamically control voltage  ️ regulators and current sinks Better exploit regulators dynamics  e.g. consumer with 10mA load: ️ 100 70% @ Normal =~ 13mA ️ 90 90% @ Sleep =~ 11mA ️ Efficiency (%) 80 idle mode 70 ️ normal mode 60          Saving ~2mA ️ 50 ️ 40 30 ️ 20 10 0 1 10 100 1000 Load (mA)
  • 24. Patrick Bellasi bellasi@elet.polimi.it Power/Voltage Framework Wolfson Microelectronics – Real  World  24 Examples Some real world examples:  CPUFreq: voltage control matching operating frequency ️ CPUIdle: voltage control matching idle state ️ LCD back lighting: control via white LED current reduction ️ Audio: analog supply control, components control ️ FM­Tuner, Speaker Amplifier when using Headphone ️ NAND & NOR: idle power control ️ e.g. R/W=35mA, Erase=40mA, Erase+R/W=55mA, Idle=1mA ️ ️ ️
  • 25. Patrick Bellasi bellasi@elet.polimi.it Power/Voltage Framework Wolfson Microelectronics ­ Linux Voltage and  25 Current Regulator Framework [2] Four separate interfaces  Regulator driver API Allows regulator drivers to register and provide operations to the core Notifier call chain for propagating regulator events to clients Consumer (Client) driver API complete control over their supply voltage and current ­ static clients: only enable/disable support ­ dynamic clients: voltage/current limit control notify client V/I requirements to regulator driver Platform API creation of voltage/current domains (with constraints) for each regulator creation of a regulator tree Userspace API exports a lot of useful voltage/current/opmode data to userspace via sysfs help monitor device power consumption and status
  • 26. Patrick Bellasi bellasi@elet.polimi.it The Latency Framework Explicit system­wide latency­expectation  26 infrastructure include/linux/latency.h  The system can tune its operations to the minimum latency  ️ requirements in effect at the moment multiple drivers can announce their maximum accepted  ️ latency drivers know device operating modes at any given moment and the  ️ corresponding expected system response time collect and summarize these expectations globally ️ cumulated result can be used by power management and similar  ️ users to make decisions that have trade­offs ️
  • 27. Patrick Bellasi bellasi@elet.polimi.it The Latency Framework Explicit system­wide latency­expectation  27 infrastructure Since 2.6.19, an interface where drivers can:  ­ announce the maximum latency [us] that they can deal with ️ ­ modify this latency ️ ­ give up their constraint ️ ­ a function where the code that decides on power saving strategy  ️ can query the current global desired maximum ­ a notifier chain allow interested subsystems know when the  ️ maximum acceptable latency has changed Currently used by the generic cpuidle driver on x86 arch ️
  • 28. Patrick Bellasi bellasi@elet.polimi.it The Latency Framework Explicit system­wide latency­expectation  28 infrastructure Example:  ­ user: idle loop code ️ higher C­state saves more power, but has a higher exit latency ️ idle loop code use these informations to make a good decision which  ️ C­state to use ­ announcer: audio driver ️ knowns it will get an interrupt when the hardware has 200 usec of  ️ samples left in the DMA buffer set a latency constraint of, say, 150 usec Register notifier ️ user interface Latency Framework Core Idle Loop Announcer interface ... Audio Drv WiFi Drv Update latency  Assert latency  constraint constraint
  • 29. Patrick Bellasi bellasi@elet.polimi.it LatencyTOP measuring and fixing Linux latency 29 tool for software developers (both kernel and userspace),   identifying where in the system latency is happening ️ what kind of operation/action is causing the latency to  ️ happen both at system level or on a per process level ️ focuses on the cases  the applications want to run and execute useful code, but  ️ there's some resource that's not currently available (and the  kernel then blocks the process) Kernel patch needed  keep track of what high level operation it is doing ️ limited number of annotations ️ output on procfs or using a ncurses based GUI ️ ️
  • 30. Patrick Bellasi bellasi@elet.polimi.it LatencyTOP measuring and fixing Linux latency 30 Here's some output of LatencyTOP, collected for a make ­j4 of a  ️ kernel on a quad core system Cause Maximum [msec] Average [msec] process fork 1097.7  2.5 Reading from file 1097.0  0.1 updating atime  850.4 60.1 Locking buffer head  433.1 94.3 Writing to file  381.8  0.6 Synchronous bufferhead read    318.5 16.3 Waiting for buffer IO  298.8  7.8 With a small change to  the EXT3 journaling  layer to fix a priority  inversion problem Cause Maximum [msec] Average [msec] Writing to file  814.9  0.8 Reading from file  441.1  0.1 Waiting for buffer IO  419.0  3.4 Locking buffer head  360.5 75.7 Unlinking file  292.7  5.9 EXT3 (jbd) do_get_write_access  274.0 36.0 Filename lookup  260.0  0.5
  • 31. Patrick Bellasi bellasi@elet.polimi.it Quality of Service Framework [4] the latency framework reloaded 31 linux/pm_qos_params.h  Kernel infrastructure to implement a coordination mechanism  ️ to facilitate communication among devices (drivers), users  (applications) and system (power manager) devices have specific power management capabilities ️ devices talk in terms of latencies, time outs, throughput ️ drivers could better address power management ️ expose power management mechanisms ️ applications have specific performances needs ️ Since 2.6.25  work­on­progress, used for iwl4965 WiFi driver on x86 ️
  • 32. Patrick Bellasi bellasi@elet.polimi.it Quality of Service Framework the latency framework reloaded 32 Provide infrastructure for the registration of:  Dependents: register their QoS needs ️ e.g.  ️ Watchers: keep track of the current QoS needs of the system ️ e.g. cpuidle, wifi drivers, ... ️ 3 basic classes of QoS parameter  latency [us], timeout [us], throughput [kbs] ️ platform customizable constraints set ️ Interrupt latency, power domain latency, frequency/OPP ️ maintain lists of pm_qos requests and aggregated requirements ️ kernel notification tree for each parameter ️ User mode interface for request QoS  using simple char device node ️ ️ ️
  • 33. Patrick Bellasi bellasi@elet.polimi.it The CPUIdle Framework [1] Do nothing, efficiently... 33 Generic processor idle power management framework  support for multiple idle states with different characteristics ️ power consumptions, state preservation, state constraints, entry/exit  ️ latency Support trade­off efficiently between application   expectations and idle state power saving Clean interface  abstract between idle­driver and idle­governor ️ separate arch­specific idle driver (mechanisms) from arch­independent  ️ power management governors  (policies) provide a convenient user­space interface ️ Since 2.6.24  X86 ­ ACPI, for OMAP mapping to ACPI states ️
  • 34. Patrick Bellasi bellasi@elet.polimi.it The CPUIdle Framework [1] Do nothing, efficiently... 34 /sys/devices/system/cpu/cpuX/cpuidle struct cpuidle_governor { User­level /sys/devices/system/cpu/cpuidle init(); interfaces exit(); Decide the target C­State scan(); select(); Step­wise Latency­based reflect(); governors ladder menu } governor interface data structures Generic CPUIdle Infrastructure initialization and registration idle handling driver interface cpuidle core system state change handling acpi­cpuidle halt_idle drivers struct cpuidle_state { ACPI driver exit_latency; [us] Populate supported C­States power_usage; [mW] target_residency; [us] struct cpuidle_driver { usage; init(); time; [us] exit(); enter(); redetect(); } bm_check(); } Implement functions to enter C­States
  • 35. Patrick Bellasi bellasi@elet.polimi.it The CPUIdle Framework [1] The OMAP43xx implementation 35 The following C states are supported in Cpuidle driver:   C0  – System executing code ️  C1  – MPU WFI + Core active + No tick suppression ️  C2  – MPU WFI + Core active + Tick suppression ️  C3  – MPU CSWR + Core active + Tick suppression ️  C4  – MPU OFF + Core active + Tick suppression ️  C5  – MPU RET + Core RET + Tick suppression ️  C6  – MPU OFF + Core RET + Tick suppression ️  C7  – MPU OFF + Core OFF + Tick suppression ️ Menu governor takes the following into account to decide   the target sleep state: Next timer expiry in the system ️ Comparing target residency with available sleep time ️ Comparing exit latency with system wide latency constraints ️ Checking for activity in the core domain ️ Dynamic tick based on support in kernel 
  • 36. Patrick Bellasi bellasi@elet.polimi.it PowerTOP measuring and fixing Linux tick­less systems 36 identifying what is causing system wake­ups  what kind of operation/action prevent long­time permanence in low­ ️ power consumption states? support software developers, both kernel­space and user­space ️ main features ️ show how well your system is using various HW power­saving  ️ features both C­ and P­States statistics are collected ️ show you the culprit software components that are preventing optimal  ️ usage of your HW power savings userspace should prefere deferrable timers ️ provide you with tuning suggestions to achieve low power  ️ consumption support for CPUIdle since v1.10  previsou version used the ACPI interface ️ ️
  • 38. Patrick Bellasi bellasi@elet.polimi.it The CPUFreq Framework [9] CPU active power consumption optimization 38 Generic processor active power management framework  support for multiple performance states with different characteristics ️ power consumptions ️ Clean interface  abstract between driver and governor ️ separate arch­specific cpu driver (mechanisms) from arch­independent  ️ power management governors  (policies) provide a convenient user­space interface ️ Since 2.6.9 [W] 0.25s*34W+ 0.75s*1W = 9.25J  Full­speed 34 on­demand governor ️ Half­speed 24 0.5s*24W+ 0.5s*1W = 12.5J scaling based on load ️ Idle 1 exploit race­to­idle ️ 0 0.5 1 [s] more recently added support for conservative governor ️ implementing a better battery­fair policy ️
  • 39. Patrick Bellasi bellasi@elet.polimi.it CPUFreq optimized CPU power consumptions 39 Decide the target P­State struct cpufreq_governor { User­level powersaved cpuspeed governor(); governors } aggressive battery­fair In­kernel on­demand userspace conservative governors governor interface data structures Generic CPUFreq Framework initialization and registration transition handling driver interface cpufreq core policy and transition notifiers acpi­cpufreq speedstep CPU­specific drivers struct cpufreq_policy { ACPI processor min_freq; [kHz] Define supported driver policy values max_freq; [kHz] transition_latency; [us] struct cpufreq_driver { ... init(); } exit(); verify(); Compile frequency set_policy(); tables resume(); }
  • 40. Patrick Bellasi bellasi@elet.polimi.it Linux PM device runtime suspend/resume support 40 Every driver should implement suspend/resume calls  register to the LDM (Linux Device Model) ️ release clocks and save context in suspend call and restore  ️ these when resume is called drivers which have already released their clocks and have  ️ saved their context need not do anything in their suspend call Drivers should support OFF modes  all registers in the power domain are reset when the power  ️ domain goes to OFF OFF mode could introduce considerable latency for wakeup ️ the system can enter chip off through two paths: ️ ­ Idle loop ️ ­ Suspend/Resume ️
  • 41. Patrick Bellasi bellasi@elet.polimi.it Linux PM device runtime suspend/resume support 41 Device drivers responsibilities  aggressively manage request/release of clocks ️ through clock framework => control their clocks on a request basis ️ not transaction based => use  inactivity timer to cut off clocks after a  ️ period of inactivity need to register with the LDM ️ implement suspend() and resume() calls ️ specify constraints when required for its functionalities ️ e.g. min. frequency, max latency, ... ️ need to implement context save/restore ️ be aware of power OFF mode ️ configure optimal power settings ️ according to standing system constraints ️ ️
  • 42. Patrick Bellasi bellasi@elet.polimi.it Linux PM Examples of changes being made 42 Transaction based drivers will control clocks based on activity  e.g. i2c driver enables clocks when there are pending requests and  ️ disables them when there are no pending requests Camera – Clocks will be enabled as long as the driver is required  Display – The fbdev inactivity timer (which is tied to user activity) can be used   to turn off display clock MMC – Clocks are controlled on a per command basis  GPTimer – Clocks are controlled as per requirement  i.e. when a timer is in use, they will be enabled and will be disabled when  ️ they are not in use UART – Console clocks are cut in the idle loop (before putting core domain to   retention) and other UART clocks could be controlled on a need basis USB – Clocks can be controlled as per requirement (only when transfers are   going on)
  • 43. Patrick Bellasi bellasi@elet.polimi.it Linux PM Context save and restore 43 When all drivers in a power domain release their clocks,   the power domain can go to RET or OFF state the shared resource framework programs the power domain to target  ️ state depending on the latency constraints in the system Drivers can follow any of the following methods to save and   restore their context ­ Always save/ always restore ️ drivers which do not have lots of registers can always save context and  ️ restore context because it will not cause a lot of overhead ­ Early save/ restore on demand ️ drivers save context every time they release their clocks but restore it  ️ only if the power domain has actually gone to off after saving this makes sense for drivers which have a large restore time with save  ️ time being minimal
  • 44. Patrick Bellasi bellasi@elet.polimi.it References 44 1. V. Pallipadi, S. Li, A. Belay, cpuidle ­ Do nothing, efficiently... Proceedings of the Linux  Symposium, Vol. 2, June 2007. Ottawa, Ontario Canada. 2. Walfson Microelectronics, Linux Voltage and Current Regulator Framework 3. An API for specifying latency constraints, LWN Article, August 2006. 4. M. Gross, Power Management QoS: How You Could Use it  in Your Embedded Application, Intel,  Open Source Technology Center. 5. J. Stults, N. Aravamuda, D. Hart,  We Are Not Getting Any Younger: A New Approach to Time and Timers, Proceedings of the  Linux Symposium, Vol. 1, July 2005. Ottawa, Ontario Canada. 6. T. Gleixner, D. Niehaus, Hrtimers and Beyond: Transforming the Linux Time Subsystems,  Proceedings of the Linux Symposium, Vol. 1, July 2006. Ottawa, Ontario Canada. 7. Clockevents and dyntick, LWN Article, February 2007. 8. Deferrable timers, LWN Article, March 2007. 9. V. Pallipandi, A. Starikovskiy, The Ondemand Governor, Proceedings of the Linux Symposium,  Vol. 2, July 2006. Ottawa, Ontario Canada. 10. R. Woodruff, Linux system power management on OMAP3430, Embedded Linux Conference,  April 2008. Silicon Valley, US. 11. PowerTOP,