2. About Me
• Open Source Developer
• Founder of Open Source Web Application
and CMS service provider: Scryent -
www.scryent.com
• Founder of Toronto Plone Users Group -
www.torontoplone.ca
3. Agenda
• About Python
• Show me your CODE
• A Spell Checker in 21 lines of code
• Why Python ROCKS
• Resources for further exploration
7. Significant Whitespace
• less code clutter
• eliminates many common syntax errors
• proper code layout
• use an indentation aware editor or IDE
• Get over it!
8. Python is Interactive
Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for
more information.
>>>
10. FIZZ BUZZ
def fizzbuzz(n):
for i in range(n + 1):
if not i % 3:
print "Fizz",
if not i % 5:
print "Buzz",
if i % 3 and i % 5:
print i,
print
fizzbuzz(50)
11. FIZZ BUZZ
def fizzbuzz(n):
for i in range(n + 1):
if not i % 3:
print "Fizz",
if not i % 5:
print "Buzz",
if i % 3 and i % 5:
print i,
print
fizzbuzz(50)
12. FIZZ BUZZ (OO)
class FizzBuzzWriter(object):
def __init__(self, limit):
self.limit = limit
def run(self):
for n in range(1, self.limit + 1):
self.write_number(n)
def write_number(self, n):
if not n % 3:
print "Fizz",
if not n % 5:
print "Buzz",
if n % 3 and n % 5:
print n,
print
fizzbuzz = FizzBuzzWriter(50)
fizzbuzz.run()
13. A Spell Checker in 21
Lines of Code
• Written by Peter Norvig
• Duplicated in many languages
• Simple Spellchecking algorithm based on
probability
• http://norvig.com/spell-correct.html
14. The Approach
• Census by frequency
• Morph the word (werd)
• Insertions: waerd, wberd, werzd
• Deletions: wrd, wed, wer
• Transpositions: ewrd, wred, wedr
• Replacements: aerd, ward, wbrd, word, wzrd,
werz
• Find the one with the highest frequency: were
15. Norvig Spellchecker
import re, collections
def words(text):
return re.findall('[a-z]+', text.lower())
def train(words):
model = collections.defaultdict(int)
for w in words:
model[w] += 1
return model
NWORDS = train(words(file('big.txt').read()))
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def edits1(word):
s = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [a + b[1:] for a, b in s if b]
transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]
replaces = [a + c + b[1:] for a, b in s for c in alphabet if b]
inserts = [a + c + b for a, b in s for c in alphabet]
return set(deletes + transposes + replaces + inserts)
def known_edits2(word):
return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
def known(words):
return set(w for w in words if w in NWORDS)
def correct(word):
candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
return max(candidates, key=NWORDS.get)
16. Regular Expressions
def words(text):
return re.findall('[a-z]+', text.lower())
>>> words("The cat in the hat!")
['the', 'cat', 'in', 'the', 'hat']
17. Dictionaries
>>> d = {'cat':1}
>>> d
{'cat': 1}
>>> d['cat']
1
>>> d['cat'] += 1
>>> d
{'cat': 2}
>>> d['dog'] += 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'dog'
18. defaultdict
# Has a factory for missing keys
>>> d = collections.defaultdict(int)
>>> d['dog'] += 1
>>> d
{'dog': 1}
>>> int
<type 'int'>
>>> int()
0
def train(words):
model = collections.defaultdict(int)
for w in words:
model[w] += 1
return model
>>> train(words("The cat in the hat!"))
{'cat': 1, 'the': 2, 'hat': 1, 'in': 1}
20. Training the Probability
Model
import re, collections
def words(text):
return re.findall('[a-z]+', text.lower())
def train(words):
model = collections.defaultdict(int)
for w in words:
model[w] += 1
return model
NWORDS = train(words(file('big.txt').read()))
21. List Comprehensions
# These two are equivalent:
result = []
for v in iter:
if cond:
result.append(expr)
[ expr for v in iter if cond ]
# You can nest loops also:
result = []
for v1 in iter1:
for v2 in iter2:
if cond:
result.append(expr)
[ expr for v1 in iter1 for v2 in iter2 if cond ]
23. Deletions
>>> word = "spam"
>>> s = [(word[:i], word[i:]) for i in range(len(word) + 1)]
>>> deletes = [a + b[1:] for a, b in s if b]
>>> deletes
['pam', 'sam', 'spm', 'spa']
>>> a, b = ('s', 'pam')
>>> a
's'
>>> b
'pam'
>>> bool('pam')
True
>>> bool('')
False
24. Transpositions
For example: teh => the
>>> transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]
>>> transposes
['psam', 'sapm', 'spma']
25. Replacements
>>> alphabet = "abcdefghijklmnopqrstuvwxyz"
>>> replaces = [a + c + b[1:] for a, b in s for c in alphabet if b]
>>> replaces
['apam', 'bpam', ..., 'zpam', 'saam', ..., 'szam', ..., 'spaz']
26. Insertion
>>> alphabet = "abcdefghijklmnopqrstuvwxyz"
>>> inserts = [a + c + b for a, b in s for c in alphabet]
>>> inserts
['aspam', ..., 'zspam', 'sapam', ..., 'szpam', 'spaam', ..., 'spamz']
27. Find all Edits
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def edits1(word):
s = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [a + b[1:] for a, b in s if b]
transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]
replaces = [a + c + b[1:] for a, b in s for c in alphabet if b]
inserts = [a + c + b for a, b in s for c in alphabet]
return set(deletes + transposes + replaces + inserts)
>>> edits1("spam")
set(['sptm', 'skam', 'spzam', 'vspam', 'spamj', 'zpam', 'sbam',
'spham', 'snam', 'sjpam', 'spma', 'swam', 'spaem', 'tspam', 'spmm',
'slpam', 'upam', 'spaim', 'sppm', 'spnam', 'spem', 'sparm', 'spamr',
'lspam', 'sdpam', 'spams', 'spaml', 'spamm', 'spamn', 'spum',
'spamh', 'spami', 'spatm', 'spamk', 'spamd', ..., 'spcam', 'spamy'])
28. Known Words
def known(words):
""" Return the known words from `words`. """
return set(w for w in words if w in NWORDS)
29. Correct
def known(words):
""" Return the known words from `words`. """
return set(w for w in words if w in NWORDS)
def correct(word):
candidates = known([word]) or known(edits1(word)) or [word]
return max(candidates, key=NWORDS.get)
>>> bool(set([]))
False
>>> correct("computr")
'computer'
>>> correct("computor")
'computer'
>>> correct("computerr")
'computer'
30. Edit Distance 2
def known_edits2(word):
return set(
e2
for e1 in edits1(word)
for e2 in edits1(e1)
if e2 in NWORDS
)
def correct(word):
candidates = known([word]) or known(edits1(word)) or
known_edits2(word) or [word]
return max(candidates, key=NWORDS.get)
>>> correct("conpuler")
'computer'
>>> correct("cmpuler")
'computer'
31. import re, collections
def words(text):
return re.findall('[a-z]+', text.lower())
def train(words):
model = collections.defaultdict(int)
for w in words:
model[w] += 1
return model
NWORDS = train(words(file('big.txt').read()))
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def edits1(word):
s = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [a + b[1:] for a, b in s if b]
transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]
replaces = [a + c + b[1:] for a, b in s for c in alphabet if b]
inserts = [a + c + b for a, b in s for c in alphabet]
return set(deletes + transposes + replaces + inserts)
def known_edits2(word):
return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
def known(words):
return set(w for w in words if w in NWORDS)
def correct(word):
candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
return max(candidates, key=NWORDS.get)
34. import re, collections
def words(text):
return re.findall('[a-z]+', text.lower())
def train(words):
model = collections.defaultdict(int)
for w in words:
model[w] += 1
return model
NWORDS = train(words(file('big.txt').read()))
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def edits1(word):
s = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [a + b[1:] for a, b in s if b]
transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]
replaces = [a + c + b[1:] for a, b in s for c in alphabet if b]
inserts = [a + c + b for a, b in s for c in alphabet]
return set(deletes + transposes + replaces + inserts)
def known_edits2(word):
return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
def known(words):
return set(w for w in words if w in NWORDS)
def correct(word):
candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
return max(candidates, key=NWORDS.get)
35. IDE for Python
• IDE’s for Python include:
• PyDev for Eclipse
• WingIDE
• IDLE for Windows/ Linux/ Mac
• there’s more
36. Why Python ROCKS
• Elegant and readable language - “Executable
Pseudocode”
• Standard Libraries - “Batteries Included”
• Very High level Datatypes
• Dynamically Typed
• It’s FUN!
37. An Open Source
Community
• Projects: Plone, Zope, Grok, BFG, Django,
SciPy & NumPy, Google App Engine,
PyGame
• PyCon
38. Resources
• PyGTA
• Toronto Plone Users
• Toronto Django Users
• Stackoverflow
• Dive into Python
• Python Tutorial
39. Thanks
• I’d love to hear your questions or
comments on this presentation. Reach me
at:
• jbb@scryent.com
• http://twitter.com/hexsprite