A Taste of Python - Devdays Toronto 2009

a taste of

Presented by Jordan Baker
October 23, 2009
DevDays Toronto

About Me

• Open Source Developer
• Founder of Open Source Web Application
and CMS service provider: Scryent -
www.scryent.com
• Founder of Toronto Plone Users Group -
www.torontoplone.ca

Agenda

• About Python
• Show me your CODE
• A Spell Checker in 21 lines of code
• Why Python ROCKS
• Resources for further exploration

About Python

http://www.ﬂickr.com/photos/schoffer/196079076/

About Python

• Gotta love a language named after Monty
Python’s Flying Circus
• Used in more places than you might know

Signiﬁcant Whitespace
C-like

if(x == 2) {
do_something();
}
do_something_else();

Python

if x == 2:
do_something()
do_something_else()

Signiﬁcant Whitespace

• less code clutter
• eliminates many common syntax errors
• proper code layout
• use an indentation aware editor or IDE
• Get over it!

Python is Interactive

Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for
more information.
>>>

FIZZ BUZZ
1
2
FIZZ
4
BUZZ
...
14
FIZZ BUZZ

FIZZ BUZZ
def fizzbuzz(n):
for i in range(n + 1):
if not i % 3:
print "Fizz",
if not i % 5:
print "Buzz",
if i % 3 and i % 5:
print i,
print

fizzbuzz(50)

FIZZ BUZZ (OO)
class FizzBuzzWriter(object):
def __init__(self, limit):
self.limit = limit

def run(self):
for n in range(1, self.limit + 1):
self.write_number(n)

def write_number(self, n):
if not n % 3:
print "Fizz",
if not n % 5:
print "Buzz",
if n % 3 and n % 5:
print n,
print

fizzbuzz = FizzBuzzWriter(50)
fizzbuzz.run()

A Spell Checker in 21
Lines of Code
• Written by Peter Norvig
• Duplicated in many languages
• Simple Spellchecking algorithm based on
probability
• http://norvig.com/spell-correct.html

The Approach
• Census by frequency

• Morph the word (werd)

• Insertions: waerd, wberd, werzd

• Deletions: wrd, wed, wer

• Transpositions: ewrd, wred, wedr

• Replacements: aerd, ward, wbrd, word, wzrd,
werz

• Find the one with the highest frequency: were

Norvig Spellchecker
import re, collections

def words(text):
return re.findall('[a-z]+', text.lower())

def train(words):
model = collections.defaultdict(int)
for w in words:
model[w] += 1
return model

NWORDS = train(words(file('big.txt').read()))

alphabet = 'abcdefghijklmnopqrstuvwxyz'

def edits1(word):
s = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [a + b[1:] for a, b in s if b]
transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]
replaces = [a + c + b[1:] for a, b in s for c in alphabet if b]
inserts = [a + c + b for a, b in s for c in alphabet]
return set(deletes + transposes + replaces + inserts)

def known_edits2(word):
return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words):
return set(w for w in words if w in NWORDS)

def correct(word):
candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
return max(candidates, key=NWORDS.get)

Regular Expressions

def words(text):

>>> words("The cat in the hat!")
['the', 'cat', 'in', 'the', 'hat']

Dictionaries
>>> d = {'cat':1}
>>> d
{'cat': 1}
>>> d['cat']
1

>>> d['cat'] += 1
>>> d
{'cat': 2}

>>> d['dog'] += 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'dog'

defaultdict
# Has a factory for missing keys
>>> d = collections.defaultdict(int)
>>> d['dog'] += 1
>>> d
{'dog': 1}

>>> int
<type 'int'>
>>> int()
0

def train(words):
for w in words:
model[w] += 1
return model

>>> train(words("The cat in the hat!"))
{'cat': 1, 'the': 2, 'hat': 1, 'in': 1}

Reading the File
>>> text = file('big.txt').read()
>>> NWORDS = train(words(text))
>>> NWORDS
{'nunnery': 3, 'presnya': 1, 'woods': 22, 'clotted': 1, 'spiders': 1,
'hanging': 42, 'disobeying': 2, 'scold': 3, 'originality': 6,
'grenadiers': 8, 'pigment': 16, 'appropriation': 6, 'strictest': 1,
'bringing': 48, 'revelers': 1, 'wooded': 8, 'wooden': 37,
'wednesday': 13, 'shows': 50, 'immunities': 3, 'guardsmen': 4,
'sooty': 1, 'inevitably': 32, 'clavicular': 9, 'sustaining': 5,
'consenting': 1, 'scraped': 21, 'errors': 16, 'semicircular': 1,
'cooking': 6, 'spiroch': 25, 'designing': 1, 'pawed': 1,
'succumb': 12, 'shocks': 1, 'crouch': 2, 'chins': 1, 'awistocwacy': 1,
'sunbeams': 1, 'perforations': 6, 'china': 43, 'affiliated': 4,
'chunk': 22, 'natured': 34, 'uplifting': 1, 'slaveholders': 2,
'climbed': 13, 'controversy': 33, 'natures': 2, 'climber': 1,
'lency': 2, 'joyousness': 1, 'reproaching': 3, 'insecurity': 1,
'abbreviations': 1, 'definiteness': 1, 'music': 56, 'therefore': 186,
'expeditionary': 3, 'primeval': 1, 'unpack': 1, 'circumstances': 107,
... (about 6500 more lines) ...

>>> NWORDS['the']
80030
>>> NWORDS['unusual']
32
>>> NWORDS['cephalopod']
0

Training the Probability
Model

def words(text):

def train(words):
for w in words:
model[w] += 1
return model


List Comprehensions
# These two are equivalent:

result = []
for v in iter:
if cond:
result.append(expr)

[ expr for v in iter if cond ]

# You can nest loops also:

result = []
for v1 in iter1:
for v2 in iter2:
if cond:
result.append(expr)

[ expr for v1 in iter1 for v2 in iter2 if cond ]

String Slicing
>>> word = "spam"
>>> word[:1]
's'
>>> word[1:]
'pam'

>>> (word[:1], word[1:])
('s', 'pam')

>>> range(len(word) + 1)
[0, 1, 2, 3, 4]

>>> [(word[:i], word[i:]) for i in range(len(word) + 1)]
[('', 'spam'), ('s', 'pam'), ('sp', 'am'), ('spa', 'm'),
('spam', '')]

Deletions
>>> word = "spam"
>>> s = [(word[:i], word[i:]) for i in range(len(word) + 1)]

>>> deletes = [a + b[1:] for a, b in s if b]

>>> deletes
['pam', 'sam', 'spm', 'spa']

>>> a, b = ('s', 'pam')
>>> a
's'
>>> b
'pam'

>>> bool('pam')
True
>>> bool('')
False

Transpositions

For example: teh => the

>>> transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]

>>> transposes
['psam', 'sapm', 'spma']

Replacements

>>> alphabet = "abcdefghijklmnopqrstuvwxyz"

>>> replaces = [a + c + b[1:] for a, b in s for c in alphabet if b]
>>> replaces
['apam', 'bpam', ..., 'zpam', 'saam', ..., 'szam', ..., 'spaz']

Insertion

>>> alphabet = "abcdefghijklmnopqrstuvwxyz"

>>> inserts = [a + c + b for a, b in s for c in alphabet]
>>> inserts
['aspam', ..., 'zspam', 'sapam', ..., 'szpam', 'spaam', ..., 'spamz']

Find all Edits

def edits1(word):

>>> edits1("spam")
set(['sptm', 'skam', 'spzam', 'vspam', 'spamj', 'zpam', 'sbam',
'spham', 'snam', 'sjpam', 'spma', 'swam', 'spaem', 'tspam', 'spmm',
'slpam', 'upam', 'spaim', 'sppm', 'spnam', 'spem', 'sparm', 'spamr',
'lspam', 'sdpam', 'spams', 'spaml', 'spamm', 'spamn', 'spum',
'spamh', 'spami', 'spatm', 'spamk', 'spamd', ..., 'spcam', 'spamy'])

Known Words
def known(words):
""" Return the known words from `words`. """

Correct
def known(words):
""" Return the known words from `words`. """

def correct(word):
candidates = known([word]) or known(edits1(word)) or [word]

>>> bool(set([]))
False

>>> correct("computr")
'computer'

>>> correct("computor")
'computer'

>>> correct("computerr")
'computer'

Edit Distance 2
return set(
e2
for e1 in edits1(word)
for e2 in edits1(e1)
if e2 in NWORDS
)

def correct(word):
candidates = known([word]) or known(edits1(word)) or
known_edits2(word) or [word]

>>> correct("conpuler")
'computer'
>>> correct("cmpuler")
'computer'


def words(text):

def train(words):
for w in words:
model[w] += 1
return model



def edits1(word):

return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words):

def correct(word):
candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]

Comparing Python &
Java Versions

• http://raelcunha.com/spell-correct.php
• 35 lines of Java

import java.io.*;
import java.util.*;
import java.util.regex.*;

class Spelling {

" private final HashMap<String, Integer> nWords = new HashMap<String, Integer>();

" public Spelling(String file) throws IOException {
" " BufferedReader in = new BufferedReader(new FileReader(file));
" " Pattern p = Pattern.compile("w+");
" " for(String temp = ""; temp != null; temp = in.readLine()){
" " " Matcher m = p.matcher(temp.toLowerCase());
" " " while(m.find()) nWords.put((temp = m.group()), nWords.containsKey(temp) ? nWords.get(temp) + 1 : 1);
" " }
" " in.close();
" }

" private final ArrayList<String> edits(String word) {
" " ArrayList<String> result = new ArrayList<String>();
" " for(int i=0; i < word.length(); ++i) result.add(word.substring(0, i) + word.substring(i+1));
" " for(int i=0; i < word.length()-1; ++i) result.add(word.substring(0, i) + word.substring(i+1, i+2) +
word.substring(i, i+1) + word.substring(i+2));
" " for(int i=0; i < word.length(); ++i) for(char c='a'; c <= 'z'; ++c) result.add(word.substring(0, i) +
String.valueOf(c) + word.substring(i+1));
" " for(int i=0; i <= word.length(); ++i) for(char c='a'; c <= 'z'; ++c) result.add(word.substring(0, i) +
String.valueOf(c) + word.substring(i));
" " return result;
" }

" public final String correct(String word) {
" " if(nWords.containsKey(word)) return word;
" " ArrayList<String> list = edits(word);
" " HashMap<Integer, String> candidates = new HashMap<Integer, String>();
" " for(String s : list) if(nWords.containsKey(s)) candidates.put(nWords.get(s),s);
" " if(candidates.size() > 0) return candidates.get(Collections.max(candidates.keySet()));
" " for(String s : list) for(String w : edits(s)) if(nWords.containsKey(w)) candidates.put(nWords.get(w),w);
" " return candidates.size() > 0 ? candidates.get(Collections.max(candidates.keySet())) : word;
" }

" public static void main(String args[]) throws IOException {
" " if(args.length > 0) System.out.println((new Spelling("big.txt")).correct(args[0]));
" }

}

IDE for Python

• IDE’s for Python include:
• PyDev for Eclipse
• WingIDE
• IDLE for Windows/ Linux/ Mac
• there’s more

Why Python ROCKS
• Elegant and readable language - “Executable
Pseudocode”
• Standard Libraries - “Batteries Included”
• Very High level Datatypes
• Dynamically Typed
• It’s FUN!

An Open Source
Community

• Projects: Plone, Zope, Grok, BFG, Django,
SciPy & NumPy, Google App Engine,
PyGame
• PyCon

Resources
• PyGTA
• Toronto Plone Users
• Toronto Django Users
• Stackoverﬂow
• Dive into Python
• Python Tutorial

Thanks

• I’d love to hear your questions or
comments on this presentation. Reach me
at:
• jbb@scryent.com
• http://twitter.com/hexsprite

A Taste of Python - Devdays Toronto 2009

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (7)

Similar a A Taste of Python - Devdays Toronto 2009

Similar a A Taste of Python - Devdays Toronto 2009 (20)

Último

Último (20)

A Taste of Python - Devdays Toronto 2009