SlideShare a Scribd company logo
1 of 49
Download to read offline
Caching techinques in
                                 python
                                Michael Domanski
                                europython 2010


czwartek, 22 lipca 2010
who I am

                     • python developer, professionally for a few
                          years now
                     • experienced also in c and objective-c
                     • currently working for 10clouds.com


czwartek, 22 lipca 2010
Interesting intro

                     • a bit of theory
                     • common patterns
                     • common problems
                     • common solutions

czwartek, 22 lipca 2010
How I think about
                               cache

                     • imagine a giant dict storing all your data
                     • you have to manage all data manually
                     • or provide some automated behaviour


czwartek, 22 lipca 2010
similar to....

                     • manual memory managment in c
                     • cache is memory
                     • and you have to controll it manually


czwartek, 22 lipca 2010
profits


                     • improved performance
                     • ...?


czwartek, 22 lipca 2010
problems


                     • managing any type of memory is hard
                     • automation often have to be done custom
                          each time




czwartek, 22 lipca 2010
common patterns



czwartek, 22 lipca 2010
memoization



czwartek, 22 lipca 2010
• very old pattern (circa 1968)
                     • we own the name to Donald Mitchie



czwartek, 22 lipca 2010
how it works


                     • we assosciate input with output, and store
                          in somewhere
                     • based on the assumption that for a given
                          input, output is always the same




czwartek, 22 lipca 2010
code example
                 CACHE_DICT = {}

                 def cached(key):
                     def func_wrapper(func):
                         def arg_wrapper(*args, **kwargs):
                             if not key in CACHE_DICT:
                                  value = func(*args, **kwargs)
                                  CACHE_DICT[key] = value
                             return CACHE_DICT[key]
                         return arg_wrapper
                     return func_wrapper




czwartek, 22 lipca 2010
what if output can
                               change?

                     • our pattern is still usefull
                     • we simply need to add something


czwartek, 22 lipca 2010
cache invalidation



czwartek, 22 lipca 2010
There are only two hard problems in Computer
                           Science: cache invalidation and naming things
                                                                  Phil Karlton


czwartek, 22 lipca 2010
• basically, we update data in cache
                     • we need to know when and what to
                          change

                     • the more granular you want to be, the
                          harder it gets




czwartek, 22 lipca 2010
code example
                   def invalidate(key):
                     try:
                          del CACHE_DICT[key]
                     except KeyError:
                          print "someone tried to invalidate not present
                 key: %s" %key




czwartek, 22 lipca 2010
common problems



czwartek, 22 lipca 2010
invalidating too much/
                                not enough

                     • flushing all data any time something changes
                     • not flushing cache at all
                     • tragic effects


czwartek, 22 lipca 2010
@cached('key1')
                 def simple_function1():
                     return db_get(id=1)

                 @cached('key2')
                 def simple_function2():
                     return db_get(id=2)

                 # SUPPOSE THIS IS IN ANOTHER MODULE

                 @cached('big_key1')
                 def some_bigger_function():
                     """
                     this function depends on big_key1, key1 and key2
                     """
                     def inner_workings():
                         db_set(1, 'something totally new')
                     #######
                     ##   imagine 100 lines of code here :)
                     ######
                     inner_workings()

                          return [simple_function1(),simple_function2()]

                 if __name__ == '__main__':
                     simple_function1()
                     simple_function2()
                     a,b = some_bigger_function()
                     assert a == db_get(id=1), "this fails because we didn't invalidated cache properly"




czwartek, 22 lipca 2010
invalidating too soon/
                                  too late

                     • your cache have to be synchronised to you
                          db
                     • sometimes very hard to spot
                     • leads to tragic mistakes


czwartek, 22 lipca 2010
@cached('key1')
                 def simple_function1():
                     return db_get(id=1)

                 @cached('key2')
                 def simple_function2():
                     return db_get(id=2)

                 # SUPPOSE THIS IS IN ANOTHER MODULE

                 def some_bigger_function():
                     db_set(1, 'something')
                     value = simple_function1()
                     db_set(2, 'something else')
                     #### now we know we used 2 cached functions so....
                     invalidate('key1')
                     invalidate('key2')
                     #### now we know we are safe, but for a price
                     return simple_function2()

                 if __name__ == '__main__':
                     some_bigger_function()




czwartek, 22 lipca 2010
superposition of
                               dependancy
                     • somehow less obvious problem
                     • eventually you will start caching effects of
                          computation
                     • you have to know very preciselly of what
                          your data is dependant



czwartek, 22 lipca 2010
@cached('key1')
                 def simple_function1():
                     return db_get(id=1)

                 @cached('key2')
                 def simple_function2():
                     return db_get(id=2)

                 # SUPPOSE THIS IS IN ANOTHER MODULE

                 @cached('key')
                 def some_bigger_function():

                          return {
                              '1': simple_function1(),
                              '2': simple_function2(),
                              '3': db_get(id=3)
                          }

                 if __name__ == '__main__':
                     simple_function1()
                     # somewhere else
                     db_set(1, 'foobar')
                     # and again
                     db_set(3, 'bazbar')
                     invalidate('key')
                     # ooops, we forgot something
                     data = some_bigger_function()
                     assert data['1'] == db_get(id=1), "this fails because we didn't manage to invalidate all the
                 keys"




czwartek, 22 lipca 2010
summing up
                     • know your data....
                     • be aware what and when you cache
                     • take care when using cached data in
                          computation




czwartek, 22 lipca 2010
common solutions



czwartek, 22 lipca 2010
process level cache



czwartek, 22 lipca 2010
why?

                     • very fast access
                     • simple to implement
                     • very effective as long as you’re using single
                          process




czwartek, 22 lipca 2010
clever tricks with dicts



czwartek, 22 lipca 2010
code example
                 CACHE_DICT = {}

                 def cached(key):
                     def func_wrapper(func):
                         def arg_wrapper(*args, **kwargs):
                             if not key in CACHE_DICT:
                                  value = func(*args, **kwargs)
                                  CACHE_DICT[key] = value
                             return CACHE_DICT[key]
                         return arg_wrapper
                     return func_wrapper




czwartek, 22 lipca 2010
invalidation



czwartek, 22 lipca 2010
code example
                   def invalidate(key):
                     try:
                          del CACHE_DICT[key]
                     except KeyError:
                          print "someone tried to invalidate not present
                 key: %s" %key




czwartek, 22 lipca 2010
application level cache



czwartek, 22 lipca 2010
memcache



czwartek, 22 lipca 2010
• battle tested
                     • scales
                     • fast
                     • supports a few cool features
                     • behaves a lot like dict
                     • supports time-based expiration
czwartek, 22 lipca 2010
libraries?

                     • python-memcache
                     • python-libmemcache
                     • python-cmemcache
                     • pylibmc

czwartek, 22 lipca 2010
why no benchmarks

                     • not the point of this talk :)
                     • benchmarks are generic, caching is specific
                     • pick your flavour, think for yourself


czwartek, 22 lipca 2010
code example
                          cache = memcache.Client(['localhost:11211'])

                 def memcached(key):
                     def func_wrapper(func):
                         def arg_wrapper(*args, **kwargs):
                             value = cache.get(str(key))
                             if not value:
                                 value = func(*args, **kwargs)
                                 cache.set(str(key), value)
                             return value
                         return arg_wrapper
                     return func_wrapper




czwartek, 22 lipca 2010
invalidation



czwartek, 22 lipca 2010
code example
                          def mem_invalidate(key):
                            cache.set(str(key), None)




czwartek, 22 lipca 2010
batch key managment



czwartek, 22 lipca 2010
• what if I don’t want to expire each key
                          manually

                     • that’s a lot to remember
                     • and we have to be carefull :(


czwartek, 22 lipca 2010
groups?

                     • group keys into sets
                     • which are tied to one key per set
                     • expire one key, instead of twenty


czwartek, 22 lipca 2010
how to get there?

                     • store some extra data
                     • you can store dicts in cache
                     • and cache behaves like dict
                     • so it’s a case of comparing keys and values

czwartek, 22 lipca 2010
#we start with specified key and group
                 key='some_key'
                 group='some_group'

                 # now retrieve some data from memcached
                 data=memcached_client.get_multi(key, group)
                 # now data is a dict that should look like
                 #{'some_key' :{'group_key' : '1234',
                 #                  'value' : 'some_value' },
                 # 'some_group' : '1234'}
                 #
                 if data and (key in data) and (group in data):
                     if data[key]['group_key']==data[group]:
                         return data[key]['value']




czwartek, 22 lipca 2010
def cached(key, group_key='', exp_time=0 ):

          # we don't want to mix time based and event based expiration models
          if group_key : assert exp_time==0, "can't set expiration time for grouped keys"
          def f_wrapper(func):
              def arg_wrapper(*args, **kwargs):
                  value = None
                  if group_key:
                      data = cache.get_multi([tools.make_key(group_key)]+[tools.make_key(key)])
                      data_dict = data.get(tools.make_key(key))
                      if data_dict:
                           value = data_dict['value']
                           group_value = data_dict['group_value']
                           if group_value != data[tools.make_key(group_key)]:
                               value = None
                  else:
                      value = cache.get(key)
                  if not value:
                      value = func(*args, **kwargs)
                      if exp_time:
                           cache.set(tools.make_key(key), value, exp_time)
                      elif not group_key:
                           cache.set(tools.make_key(key), value)
                      else: # exp_time not set and we have group_keys
                           group_value = make_group_value(group_key)
                           data_dict = { 'value':value, 'group_value': group_value}
                           cache.set_multi({ tools.make_key(key):data_dict, tools.make_key(group_key):group_value })
                  return value
              arg_wrapper.__name__ = func.__name__
              return arg_wrapper
          return f_wrapper



czwartek, 22 lipca 2010
questions?



czwartek, 22 lipca 2010
code samples @
                       http://github.com/
                    mdomans/europython2010

czwartek, 22 lipca 2010
follow me

                 twitter: mdomans
                 blog:    blog.mdomans.com


czwartek, 22 lipca 2010

More Related Content

What's hot

VJET bringing the best of Java and JavaScript together
VJET bringing the best of Java and JavaScript togetherVJET bringing the best of Java and JavaScript together
VJET bringing the best of Java and JavaScript together
Justin Early
 
Easy undo.key
Easy undo.keyEasy undo.key
Easy undo.key
zachwaugh
 
Database madness with_mongoengine_and_sql_alchemy
Database madness with_mongoengine_and_sql_alchemyDatabase madness with_mongoengine_and_sql_alchemy
Database madness with_mongoengine_and_sql_alchemy
Jaime Buelta
 
Rails' Next Top Model
Rails' Next Top ModelRails' Next Top Model
Rails' Next Top Model
Adam Keys
 
YUI3 Modules
YUI3 ModulesYUI3 Modules
YUI3 Modules
a_pipkin
 

What's hot (15)

Security Research2.0 - FIT 2008
Security Research2.0 - FIT 2008Security Research2.0 - FIT 2008
Security Research2.0 - FIT 2008
 
hibernate
hibernatehibernate
hibernate
 
Build your own entity with Drupal
Build your own entity with DrupalBuild your own entity with Drupal
Build your own entity with Drupal
 
IT Data Visualization - Sumit 2008
IT Data Visualization - Sumit 2008IT Data Visualization - Sumit 2008
IT Data Visualization - Sumit 2008
 
Django - sql alchemy - jquery
Django - sql alchemy - jqueryDjango - sql alchemy - jquery
Django - sql alchemy - jquery
 
VJET bringing the best of Java and JavaScript together
VJET bringing the best of Java and JavaScript togetherVJET bringing the best of Java and JavaScript together
VJET bringing the best of Java and JavaScript together
 
Easy undo.key
Easy undo.keyEasy undo.key
Easy undo.key
 
Spock and Geb in Action
Spock and Geb in ActionSpock and Geb in Action
Spock and Geb in Action
 
Database madness with_mongoengine_and_sql_alchemy
Database madness with_mongoengine_and_sql_alchemyDatabase madness with_mongoengine_and_sql_alchemy
Database madness with_mongoengine_and_sql_alchemy
 
Rails' Next Top Model
Rails' Next Top ModelRails' Next Top Model
Rails' Next Top Model
 
Drupal Entities - Emerging Patterns of Usage
Drupal Entities - Emerging Patterns of UsageDrupal Entities - Emerging Patterns of Usage
Drupal Entities - Emerging Patterns of Usage
 
Zend Framework 1 + Doctrine 2
Zend Framework 1 + Doctrine 2Zend Framework 1 + Doctrine 2
Zend Framework 1 + Doctrine 2
 
BDD - Behavior Driven Development Webapps mit Groovy Spock und Geb
BDD - Behavior Driven Development Webapps mit Groovy Spock und GebBDD - Behavior Driven Development Webapps mit Groovy Spock und Geb
BDD - Behavior Driven Development Webapps mit Groovy Spock und Geb
 
Core Data Performance Guide Line
Core Data Performance Guide LineCore Data Performance Guide Line
Core Data Performance Guide Line
 
YUI3 Modules
YUI3 ModulesYUI3 Modules
YUI3 Modules
 

Similar to Caching techniques in python, europython2010

Bookmarklet型(云端)应用的前端架构
Bookmarklet型(云端)应用的前端架构Bookmarklet型(云端)应用的前端架构
Bookmarklet型(云端)应用的前端架构
Chappell.Wat
 
CapitalCamp Features
CapitalCamp FeaturesCapitalCamp Features
CapitalCamp Features
Phase2
 
So you want to liberate your data?
So you want to liberate your data?So you want to liberate your data?
So you want to liberate your data?
Mogens Heller Grabe
 
Objective-C: a gentle introduction
Objective-C: a gentle introductionObjective-C: a gentle introduction
Objective-C: a gentle introduction
Gabriele Petronella
 

Similar to Caching techniques in python, europython2010 (12)

Dojo for programmers (TXJS 2010)
Dojo for programmers (TXJS 2010)Dojo for programmers (TXJS 2010)
Dojo for programmers (TXJS 2010)
 
Bookmarklet型(云端)应用的前端架构
Bookmarklet型(云端)应用的前端架构Bookmarklet型(云端)应用的前端架构
Bookmarklet型(云端)应用的前端架构
 
Bookmarklet型(云端)应用的前端架构
Bookmarklet型(云端)应用的前端架构Bookmarklet型(云端)应用的前端架构
Bookmarklet型(云端)应用的前端架构
 
Drupal 7: What's In It For You?
Drupal 7: What's In It For You?Drupal 7: What's In It For You?
Drupal 7: What's In It For You?
 
CapitalCamp Features
CapitalCamp FeaturesCapitalCamp Features
CapitalCamp Features
 
TC39: How we work, what we are working on, and how you can get involved (dotJ...
TC39: How we work, what we are working on, and how you can get involved (dotJ...TC39: How we work, what we are working on, and how you can get involved (dotJ...
TC39: How we work, what we are working on, and how you can get involved (dotJ...
 
Active domain
Active domainActive domain
Active domain
 
So you want to liberate your data?
So you want to liberate your data?So you want to liberate your data?
So you want to liberate your data?
 
Pitfalls of Continuous Deployment
Pitfalls of Continuous DeploymentPitfalls of Continuous Deployment
Pitfalls of Continuous Deployment
 
Unbundling the JavaScript module bundler - DublinJS July 2018
Unbundling the JavaScript module bundler - DublinJS July 2018Unbundling the JavaScript module bundler - DublinJS July 2018
Unbundling the JavaScript module bundler - DublinJS July 2018
 
D3 in Jupyter : PyData NYC 2015
D3 in Jupyter : PyData NYC 2015D3 in Jupyter : PyData NYC 2015
D3 in Jupyter : PyData NYC 2015
 
Objective-C: a gentle introduction
Objective-C: a gentle introductionObjective-C: a gentle introduction
Objective-C: a gentle introduction
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Caching techniques in python, europython2010

  • 1. Caching techinques in python Michael Domanski europython 2010 czwartek, 22 lipca 2010
  • 2. who I am • python developer, professionally for a few years now • experienced also in c and objective-c • currently working for 10clouds.com czwartek, 22 lipca 2010
  • 3. Interesting intro • a bit of theory • common patterns • common problems • common solutions czwartek, 22 lipca 2010
  • 4. How I think about cache • imagine a giant dict storing all your data • you have to manage all data manually • or provide some automated behaviour czwartek, 22 lipca 2010
  • 5. similar to.... • manual memory managment in c • cache is memory • and you have to controll it manually czwartek, 22 lipca 2010
  • 6. profits • improved performance • ...? czwartek, 22 lipca 2010
  • 7. problems • managing any type of memory is hard • automation often have to be done custom each time czwartek, 22 lipca 2010
  • 10. • very old pattern (circa 1968) • we own the name to Donald Mitchie czwartek, 22 lipca 2010
  • 11. how it works • we assosciate input with output, and store in somewhere • based on the assumption that for a given input, output is always the same czwartek, 22 lipca 2010
  • 12. code example CACHE_DICT = {} def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper czwartek, 22 lipca 2010
  • 13. what if output can change? • our pattern is still usefull • we simply need to add something czwartek, 22 lipca 2010
  • 15. There are only two hard problems in Computer Science: cache invalidation and naming things Phil Karlton czwartek, 22 lipca 2010
  • 16. • basically, we update data in cache • we need to know when and what to change • the more granular you want to be, the harder it gets czwartek, 22 lipca 2010
  • 17. code example def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key czwartek, 22 lipca 2010
  • 19. invalidating too much/ not enough • flushing all data any time something changes • not flushing cache at all • tragic effects czwartek, 22 lipca 2010
  • 20. @cached('key1') def simple_function1(): return db_get(id=1) @cached('key2') def simple_function2(): return db_get(id=2) # SUPPOSE THIS IS IN ANOTHER MODULE @cached('big_key1') def some_bigger_function(): """ this function depends on big_key1, key1 and key2 """ def inner_workings(): db_set(1, 'something totally new') ####### ## imagine 100 lines of code here :) ###### inner_workings() return [simple_function1(),simple_function2()] if __name__ == '__main__': simple_function1() simple_function2() a,b = some_bigger_function() assert a == db_get(id=1), "this fails because we didn't invalidated cache properly" czwartek, 22 lipca 2010
  • 21. invalidating too soon/ too late • your cache have to be synchronised to you db • sometimes very hard to spot • leads to tragic mistakes czwartek, 22 lipca 2010
  • 22. @cached('key1') def simple_function1(): return db_get(id=1) @cached('key2') def simple_function2(): return db_get(id=2) # SUPPOSE THIS IS IN ANOTHER MODULE def some_bigger_function(): db_set(1, 'something') value = simple_function1() db_set(2, 'something else') #### now we know we used 2 cached functions so.... invalidate('key1') invalidate('key2') #### now we know we are safe, but for a price return simple_function2() if __name__ == '__main__': some_bigger_function() czwartek, 22 lipca 2010
  • 23. superposition of dependancy • somehow less obvious problem • eventually you will start caching effects of computation • you have to know very preciselly of what your data is dependant czwartek, 22 lipca 2010
  • 24. @cached('key1') def simple_function1(): return db_get(id=1) @cached('key2') def simple_function2(): return db_get(id=2) # SUPPOSE THIS IS IN ANOTHER MODULE @cached('key') def some_bigger_function(): return { '1': simple_function1(), '2': simple_function2(), '3': db_get(id=3) } if __name__ == '__main__': simple_function1() # somewhere else db_set(1, 'foobar') # and again db_set(3, 'bazbar') invalidate('key') # ooops, we forgot something data = some_bigger_function() assert data['1'] == db_get(id=1), "this fails because we didn't manage to invalidate all the keys" czwartek, 22 lipca 2010
  • 25. summing up • know your data.... • be aware what and when you cache • take care when using cached data in computation czwartek, 22 lipca 2010
  • 28. why? • very fast access • simple to implement • very effective as long as you’re using single process czwartek, 22 lipca 2010
  • 29. clever tricks with dicts czwartek, 22 lipca 2010
  • 30. code example CACHE_DICT = {} def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper czwartek, 22 lipca 2010
  • 32. code example def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key czwartek, 22 lipca 2010
  • 35. • battle tested • scales • fast • supports a few cool features • behaves a lot like dict • supports time-based expiration czwartek, 22 lipca 2010
  • 36. libraries? • python-memcache • python-libmemcache • python-cmemcache • pylibmc czwartek, 22 lipca 2010
  • 37. why no benchmarks • not the point of this talk :) • benchmarks are generic, caching is specific • pick your flavour, think for yourself czwartek, 22 lipca 2010
  • 38. code example cache = memcache.Client(['localhost:11211']) def memcached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): value = cache.get(str(key)) if not value: value = func(*args, **kwargs) cache.set(str(key), value) return value return arg_wrapper return func_wrapper czwartek, 22 lipca 2010
  • 40. code example def mem_invalidate(key): cache.set(str(key), None) czwartek, 22 lipca 2010
  • 42. • what if I don’t want to expire each key manually • that’s a lot to remember • and we have to be carefull :( czwartek, 22 lipca 2010
  • 43. groups? • group keys into sets • which are tied to one key per set • expire one key, instead of twenty czwartek, 22 lipca 2010
  • 44. how to get there? • store some extra data • you can store dicts in cache • and cache behaves like dict • so it’s a case of comparing keys and values czwartek, 22 lipca 2010
  • 45. #we start with specified key and group key='some_key' group='some_group' # now retrieve some data from memcached data=memcached_client.get_multi(key, group) # now data is a dict that should look like #{'some_key' :{'group_key' : '1234', # 'value' : 'some_value' }, # 'some_group' : '1234'} # if data and (key in data) and (group in data): if data[key]['group_key']==data[group]: return data[key]['value'] czwartek, 22 lipca 2010
  • 46. def cached(key, group_key='', exp_time=0 ): # we don't want to mix time based and event based expiration models if group_key : assert exp_time==0, "can't set expiration time for grouped keys" def f_wrapper(func): def arg_wrapper(*args, **kwargs): value = None if group_key: data = cache.get_multi([tools.make_key(group_key)]+[tools.make_key(key)]) data_dict = data.get(tools.make_key(key)) if data_dict: value = data_dict['value'] group_value = data_dict['group_value'] if group_value != data[tools.make_key(group_key)]: value = None else: value = cache.get(key) if not value: value = func(*args, **kwargs) if exp_time: cache.set(tools.make_key(key), value, exp_time) elif not group_key: cache.set(tools.make_key(key), value) else: # exp_time not set and we have group_keys group_value = make_group_value(group_key) data_dict = { 'value':value, 'group_value': group_value} cache.set_multi({ tools.make_key(key):data_dict, tools.make_key(group_key):group_value }) return value arg_wrapper.__name__ = func.__name__ return arg_wrapper return f_wrapper czwartek, 22 lipca 2010
  • 48. code samples @ http://github.com/ mdomans/europython2010 czwartek, 22 lipca 2010
  • 49. follow me twitter: mdomans blog: blog.mdomans.com czwartek, 22 lipca 2010