This document provides an overview of different collection data types in Python including tuples, dictionaries, and sets. It discusses the key properties and uses of each type. Tuples are immutable sequences, dictionaries store key-value pairs and allow fast lookup by key, and sets only allow unique elements and support mathematical set operations. The document also covers performance considerations and recommends sets for fast membership checking of hashable elements.
1. Welcome to the Brixton Library
Technology Initiative
(Coding for Adults)
ZCDixon@lambeth.gov.uk
BasilBibi@hotmail.com
January 30th 2016
Week 4 – Collections 2
2. Collections – Part 2
• This week we cover three very important
abstract data types – tuples, associative arrays
and Sets.
3. Collections – Tuples
• A tuple is an immutable sequence of Python
objects.
• Tuples are just like lists but cannot be
changed.
• Tuples use parentheses, whereas lists use
square brackets.
5. Collections – Tuples
• Single value tuple - you have to include a
comma
tup1 = (50,)
• Otherwise Python thinks it is an expression.
6. Collections – Tuples
• Any set of comma-separated, multiple objects
without identifying symbols (brackets for lists,
parentheses for tuples, etc.) default to tuples
print 'abc', -4.24e93, 18+6.6j, 'xyz'
>>> abc -4.24e+93 (18+6.6j) xyz
7. Collections – Tuples
Nice programming idiom:
Declare a tuple and variables at the same time
x, y = 1, 2
print "Value of x , y : ", x,y
>>> Value of x , y : 1 2
x,y = 1,2,3
ValueError: too many values to unpack
8. Tuples - Immutability
• You cannot update or change the values of tuple
elements. (*see partial truth)
• You are able to take portions of existing tuples to
create new tuples.
tup1 = (12, 34.56)
tup2 = ('abc', 'xyz')
# tup1[0] = 100
tup3 = tup1 + tup2
9. Tuples - Immutability
Remember list example replacing elements :
myList = [10,20,30,40,50]
myList[2:4] = ['C', 'D', 'E‘]
[10, 20, 'C', 'D', 'E', 50]
tup = (10,20,30,40,50)
tup[2:4] = ('C','D','E')
TypeError: 'tuple' object does not support item assignment
Use slices to do the same thing :
tup[:2] + ('C','D','E') + tup[4:]
(10, 20, 'C', 'D', 'E', 50)
10. Tuples – Immutability - Partial Truth
• There are circumstances where a tuple’s contents can change.
• If the tuple contains a List – the List’s contents can change.
a = ["apple",]
b = ["banana",]
c = ["cucumber",]
t = (a,b,c)
t
(['apple'], ['banana'], ['cucumber'])
b.append("A new banana in my immutable tuple! WAT?!")
t
(['apple'], ['banana', 'A new banana in my immutable tuple! WAT?!'],
['cucumber'])
Beware of Tuples containing
collections – those collections could
change.
11. Tuples – Operations Like Lists
Python Expression Results
len( (1, 2, 3) ) 3
min( (1, 2, 3) ) 1
max( (1, 2, 3) ) 3
(1, 2, 3) + (4, 5, 6) [1, 2, 3, 4, 5, 6]
('Hi!‘) * 4 ['Hi!', 'Hi!', 'Hi!', 'Hi!']
3 in (1, 2, 3) True
tuple( myList ) Convert a list to a tuple
All your favourite list
operations are also
available for Tuples.
Remember : Create
List with []
Tuple with ()
Map with {k : v}
Set with {a,b,c}
12. Tuple – basic operation summary
Expression Result
tup[ 3 ] 40
tup[ 0 ] 10
index=4
tup[ index ]
50
tup[ -1 ] 50
tup[ -3 ] 30
tup[ 20 ] IndexError: list index out of range
tup[ 1 : 4 ] [20, 30, 40]
tup[ : 4] [10, 20, 30, 40]
tup] [20, 40]
tup[::2] [10, 30, 50]
myList[ :: -1 ] [50,40,30,20,10]
Given tup= [10,20,30,40,50]
13. Collections – Associative Arrays
• In computer science, an associative
array, map, symbol table, or dictionary is
an abstract data type composed of
a collection of key-value pairs, such that each
possible key appears just once in the
collection – Wikipedia
14. Python Associative Array = Dict
• That means we put a value into an associative array
using a key and then get the value back using the same
key.
• Python has a dict type that is defined like this :
{ key0 : value0 , key1 : value1 , ... keyn : valuen }
• Here is a dict containing key value pairs that map
names to email addresses :
emails = { “basil” : “basilbibi@xyz.com” , “joel” : “joel@xyz.com” }
15. Dict Operations
• We can get the value using the associated key :
>>> emails = { 'basil' : 'basilbibi@xyz.com' , 'joel' : 'joel@xyz.com' }
>>> print emails['basil']
basilbibi@xyz.com
• We can replace a value using the same key :
>>> emails['basil'] = 'BBB@abc.com'
>>> print emails['basil']
BBB@abc.com
16. Dict – Adding and Removing
• We can add a new value:
>>> emails['tony'] = 'tony@abc.com'
>>> print emails['tony']
'tony@abc.com'
• We can remove an entry using del
>>> del emails['basil']
>>> print emails['basil']
KeyError: 'basil'
Can also clear the entire dict.
>>> emails.clear()
Or del it.
>>> del emails
17. Dict – Summary
Python Expression Results
myDict.clear()
myDict.copy()
emails = dict.fromkeys( [1,2,3], 'BBB') {1: 'BBB', 2: 'BBB', 3: 'BBB'}
emails[2] 'BBB'
emails.get(2, "Not Found") 'BBB'
Provide a value if the key is not there.
emails.get(5, "Not Found")
'Not Found'
print emails.get(5) None
emails.has_key(5) False
email.keys() [1, 2, 3]
email.values() ['BBB', 'BBB', 'BBB']
18. Dict – Other Rules
• Only one value per key : last one wins
• >>> emails = { 'basil' : 'basilbibi@xyz.com' , 'basil' : 'BBB@abc.com' }
• >>> print emails['basil']
• BBB@abc.com
19. Dict – Other Rules
• Keys must be in the Dict
• You get an error otherwise.
>>> myDodgyKey = "dodgy"
>>> emails[ myDodgyKey ] = 'no@way.com'
>>> print emails[ myDodgyKey ]
no@way.com
>>> myDodgyKey = 'HAHA'
>>> print emails[ myDodgyKey ]
KeyError: 'HAHA'
20. Dict – Other Rules
• Keys must be immutable – String, Number, Tuple*.
• Because if something changed the key outside of the dict then the
hashcode will not point to the correct bucket.
• Also can’t use List as a key :
>>> myDodgyKey = ['dodge']
>>> print emails[ myDodgyKey ]
TypeError: unhashable type: 'list‘
- it's complaining because there's no built-in hash function for lists (by
design), and dictionaries are implemented as hash tables –
stackoverflow.com
* Even though tuples are
immutable, they can’t be used as
dictionary keys if they contain
mutable objects.
21. Dict – Hashing
• What is a hash function and what are hash tables?
• Hashing is at the core of associative arrays.
• Every object in python has a hash code.
• It is a way to turn a value into a unique numeric code.
>>> hash('basil')
-1740749512
>>> hash(2)
2
>>> hash(2.0) # Python treats float and int hash as the same.
2
>>> hash (3.14159)
-1684683840
22. Dict – Hashing
• When you add a key value pair to a dict, the key’s hashcode is used to
store the value in the dict.
23. Dict – Internal Storage
• The internal storage of the dict is actually a lot of of buckets :
• When you insert a value, the hashcode is used to find a bucket to put it
into.
24. Dict – Buckets
• Dict starts with 8 buckets and increases in size as new elements are added.
• It’s size increases by 4 each time a threshold is reached.
• Then halves when items are removed.
25. Dict – Recommended Viewing
• Very good presentation on the internal workings of Python Dicts.
• http://pyvideo.org/video/276/the-mighty-dictionary-55
26. Python Sets
• A special kind of dict where the key is the
value.
• A set contains only one occurrence of an
object.
• You can’t add immutable objects to a set.
• Sets have mathematical operations – union,
intersection, difference, superset.
29. Python Sets - Union
Python Expression Results
A | B {1, 2, 3, 4, 5, 6, 7, 8}
A.union(B) {1, 2, 3, 4, 5, 6, 7, 8}
b.union(A) {1, 2, 3, 4, 5, 6, 7, 8}
Given : A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}
30. Python Sets - Intersection
Python Expression Results
A & B {4, 5}
A.intersection(B) {4, 5}
B.intersection(A) {4, 5}
Given : A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}
31. Python Sets - Difference
Python Expression Results
A - B {1, 2, 3}
A.difference(B) {1, 2, 3}
B.difference(A) {6, 7, 8}
Given : A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}
32. Python Sets – Symmetric Difference
Python Expression Results
A ^ B {1, 2, 3, 6, 7, 8}
A.symmetric_difference(B) {1, 2, 3, 6, 7, 8}
B.symmetric_difference(A) {1, 2, 3, 6, 7, 8}
(A|B) – (A & B) {1, 2, 3, 6, 7, 8}
Given : A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}
33. When to use Tuple, List, Dict Or Set?
• Tuple is an immutable List.
• A list keeps order, dict and set don't: when you care about order, use list.
• dict associates with each key a value, list and set just contain values.
• set items must be hashable, list don’t: if you have non-hashable items use list.
• Tuple is hashable.
• set forbids duplicates, list does not - a crucial distinction.
• Search to see if a value is in a set (or dict, for keys) is very fast .
list search it takes time proportional to the list's length.
• So, if you have hashable items, don't care about order or duplicates, and want
speedy membership checking, set is better than list.
• Stacktrace.com http://tinyurl.com/q8f2g3g
34. Performance of List, Dict Or Set.
• Performance of collections under different circumstanced will determine
which collection you must chose.
• Performance or ‘complexity’ of the data structure denoted by ‘Big O’
notation – There are other notations but ‘Big O’ is usually talked about by
programmers.
• https://wiki.python.org/moin/TimeComplexity
• https://justin.abrah.ms/computer-science/big-o-notation-explained.html
• http://bigocheatsheet.com/
• Complexity is not just a measure of speed, it is also a measure of how
much memory overhead the data structure consumes.
35. Testing Performance Of Your Code
• Obviously - consider time complexity of the data structure first unless you
are dealing with very large collections and memory is a constraint.
• Experiment with data set performance using timeit function.
• https://docs.python.org/2/library/timeit.html