SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
a taste of

Presented by Jordan Baker
    October 23, 2009
    DevDays Toronto
About Me

• Open Source Developer
• Founder of Open Source Web Application
  and CMS service provider: Scryent -
  www.scryent.com
• Founder of Toronto Plone Users Group -
  www.torontoplone.ca
Agenda

• About Python
• Show me your CODE
• A Spell Checker in 21 lines of code
• Why Python ROCKS
• Resources for further exploration
About Python




http://www.flickr.com/photos/schoffer/196079076/
About Python


• Gotta love a language named after Monty
  Python’s Flying Circus
• Used in more places than you might know
Significant Whitespace
C-like

if(x == 2) {
    do_something();
}
do_something_else();

Python

if x == 2:
    do_something()
do_something_else()
Significant Whitespace

• less code clutter
• eliminates many common syntax errors
• proper code layout
• use an indentation aware editor or IDE
• Get over it!
Python is Interactive

Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for
more information.
>>>
FIZZ BUZZ
1
2
FIZZ
4
BUZZ
...
14
FIZZ BUZZ
FIZZ BUZZ
    def fizzbuzz(n):
      for i in range(n + 1):
          if not i % 3:
              print "Fizz",
          if not i % 5:
              print "Buzz",
          if i % 3 and i % 5:
              print i,
          print

fizzbuzz(50)
FIZZ BUZZ
    def fizzbuzz(n):
      for i in range(n + 1):
          if not i % 3:
              print "Fizz",
          if not i % 5:
              print "Buzz",
          if i % 3 and i % 5:
              print i,
          print

fizzbuzz(50)
FIZZ BUZZ (OO)
   class FizzBuzzWriter(object):
    def __init__(self, limit):
        self.limit = limit
       
    def run(self):
        for n in range(1, self.limit + 1):
            self.write_number(n)
   
    def write_number(self, n):
        if not n % 3:
            print "Fizz",
        if not n % 5:
            print "Buzz",
        if n % 3 and n % 5:
            print n,
        print
       
fizzbuzz = FizzBuzzWriter(50)
fizzbuzz.run()
A Spell Checker in 21
   Lines of Code
• Written by Peter Norvig
• Duplicated in many languages
• Simple Spellchecking algorithm based on
  probability
• http://norvig.com/spell-correct.html
The Approach
•   Census by frequency

•   Morph the word (werd)

    •   Insertions: waerd, wberd, werzd

    •   Deletions: wrd, wed, wer

    •   Transpositions: ewrd, wred, wedr

    •   Replacements: aerd, ward, wbrd, word, wzrd,
        werz

•   Find the one with the highest frequency: were
Norvig Spellchecker
import re, collections

def words(text):
    return re.findall('[a-z]+', text.lower())

def train(words):
    model = collections.defaultdict(int)
    for w in words:
       model[w] += 1
    return model

NWORDS = train(words(file('big.txt').read()))

alphabet = 'abcdefghijklmnopqrstuvwxyz'

def edits1(word):
    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]
    deletes    = [a + b[1:] for a, b in s if b]
    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]
    replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]
    inserts    = [a + c + b     for a, b in s for c in alphabet]
    return set(deletes + transposes + replaces + inserts)

def known_edits2(word):
    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words):
    return set(w for w in words if w in NWORDS)

def correct(word):
    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
    return max(candidates, key=NWORDS.get)
Regular Expressions

def words(text):
    return re.findall('[a-z]+', text.lower())

>>> words("The cat in the hat!")
['the', 'cat', 'in', 'the', 'hat']
Dictionaries
>>> d = {'cat':1}
>>> d
{'cat': 1}
>>> d['cat']
1

>>> d['cat'] += 1
>>> d
{'cat': 2}

>>> d['dog'] += 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'dog' 
defaultdict
# Has a factory for missing keys
>>> d = collections.defaultdict(int)
>>> d['dog'] += 1
>>> d
{'dog': 1}

>>> int
<type 'int'>
>>> int()
0

def train(words):
   model = collections.defaultdict(int)
   for w in words:
       model[w] += 1
   return model

>>> train(words("The cat in the hat!"))
{'cat': 1, 'the': 2, 'hat': 1, 'in': 1}              
Reading the File
     >>> text = file('big.txt').read()
     >>> NWORDS = train(words(text))
     >>> NWORDS
     {'nunnery': 3, 'presnya': 1, 'woods': 22, 'clotted': 1, 'spiders': 1,
     'hanging': 42, 'disobeying': 2, 'scold': 3, 'originality': 6,
     'grenadiers': 8, 'pigment': 16, 'appropriation': 6, 'strictest': 1,
     'bringing': 48, 'revelers': 1, 'wooded': 8, 'wooden': 37,
     'wednesday': 13, 'shows': 50, 'immunities': 3, 'guardsmen': 4,
     'sooty': 1, 'inevitably': 32, 'clavicular': 9, 'sustaining': 5,
     'consenting': 1, 'scraped': 21, 'errors': 16, 'semicircular': 1,
     'cooking': 6, 'spiroch': 25, 'designing': 1, 'pawed': 1,
     'succumb': 12, 'shocks': 1, 'crouch': 2, 'chins': 1, 'awistocwacy': 1,
     'sunbeams': 1, 'perforations': 6, 'china': 43, 'affiliated': 4,
     'chunk': 22, 'natured': 34, 'uplifting': 1, 'slaveholders': 2,
     'climbed': 13, 'controversy': 33, 'natures': 2, 'climber': 1,
     'lency': 2, 'joyousness': 1, 'reproaching': 3, 'insecurity': 1,
     'abbreviations': 1, 'definiteness': 1, 'music': 56, 'therefore': 186,
     'expeditionary': 3, 'primeval': 1, 'unpack': 1, 'circumstances': 107,
     ... (about 6500 more lines) ...

     >>> NWORDS['the']
     80030
     >>> NWORDS['unusual']
     32
     >>> NWORDS['cephalopod']
     0
Training the Probability
         Model
import re, collections

def words(text):
    return re.findall('[a-z]+', text.lower())

def train(words):
    model = collections.defaultdict(int)
    for w in words:
        model[w] += 1
    return model

NWORDS = train(words(file('big.txt').read()))
List Comprehensions
# These two are equivalent:

result = []
for v in iter:
    if cond:
        result.append(expr)


[ expr for v in iter if cond ]


# You can nest loops also:

result = []
for v1 in iter1:
    for v2 in iter2:
        if cond:
            result.append(expr)


[ expr for v1 in iter1 for v2 in iter2 if cond ]


 
String Slicing
>>> word = "spam"
>>> word[:1]
's'
>>> word[1:]
'pam'

>>> (word[:1], word[1:])
('s', 'pam')

>>> range(len(word) + 1)
[0, 1, 2, 3, 4]

>>> [(word[:i], word[i:]) for i in range(len(word) + 1)]
[('', 'spam'), ('s', 'pam'), ('sp', 'am'), ('spa', 'm'),
('spam', '')]
Deletions
>>> word = "spam"
>>> s = [(word[:i], word[i:]) for i in range(len(word) + 1)]

>>> deletes = [a + b[1:] for a, b in s if b]

>>> deletes
['pam', 'sam', 'spm', 'spa']

>>> a, b = ('s', 'pam')
>>> a
's'
>>> b
'pam'

>>> bool('pam')
True
>>> bool('')
False
Transpositions

For example: teh => the

>>> transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]

>>> transposes
['psam', 'sapm', 'spma']
Replacements

>>> alphabet = "abcdefghijklmnopqrstuvwxyz"

>>> replaces = [a + c + b[1:]  for a, b in s for c in alphabet if b]
>>> replaces
['apam', 'bpam', ..., 'zpam', 'saam', ..., 'szam', ..., 'spaz']
Insertion

>>> alphabet = "abcdefghijklmnopqrstuvwxyz"

>>> inserts = [a + c + b  for a, b in s for c in alphabet]
>>> inserts
['aspam', ..., 'zspam', 'sapam', ..., 'szpam', 'spaam', ..., 'spamz']
Find all Edits
alphabet = 'abcdefghijklmnopqrstuvwxyz'

def edits1(word):
    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]
    deletes = [a + b[1:] for a, b in s if b]
    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]
    replaces = [a + c + b[1:] for a, b in s for c in alphabet if b]
    inserts = [a + c + b  for a, b in s for c in alphabet]
    return set(deletes + transposes + replaces + inserts)

>>> edits1("spam")
set(['sptm', 'skam', 'spzam', 'vspam', 'spamj', 'zpam', 'sbam',
'spham', 'snam', 'sjpam', 'spma', 'swam', 'spaem', 'tspam', 'spmm',
'slpam', 'upam', 'spaim', 'sppm', 'spnam', 'spem', 'sparm', 'spamr',
'lspam', 'sdpam', 'spams', 'spaml', 'spamm', 'spamn', 'spum',
'spamh', 'spami', 'spatm', 'spamk', 'spamd', ..., 'spcam', 'spamy'])
Known Words
def known(words):
       """ Return the known words from `words`. """
       return set(w for w in words if w in NWORDS)
Correct
def known(words):
    """ Return the known words from `words`. """
    return set(w for w in words if w in NWORDS)

def correct(word):
    candidates = known([word]) or known(edits1(word)) or [word]
    return max(candidates, key=NWORDS.get)

>>> bool(set([]))
False

>>> correct("computr")
'computer'

>>> correct("computor")
'computer'

>>> correct("computerr")
'computer'
Edit Distance 2
def known_edits2(word):
    return set(
        e2
            for e1 in edits1(word)
                for e2 in edits1(e1)
                    if e2 in NWORDS
        )

def correct(word):
    candidates = known([word]) or known(edits1(word)) or 
        known_edits2(word) or [word]
    return max(candidates, key=NWORDS.get)

>>> correct("conpuler")
'computer'
>>> correct("cmpuler")
'computer'
import re, collections

def words(text):
    return re.findall('[a-z]+', text.lower())

def train(words):
    model = collections.defaultdict(int)
    for w in words:
       model[w] += 1
    return model

NWORDS = train(words(file('big.txt').read()))

alphabet = 'abcdefghijklmnopqrstuvwxyz'

def edits1(word):
    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]
    deletes    = [a + b[1:] for a, b in s if b]
    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]
    replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]
    inserts    = [a + c + b     for a, b in s for c in alphabet]
    return set(deletes + transposes + replaces + inserts)

def known_edits2(word):
    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words):
    return set(w for w in words if w in NWORDS)

def correct(word):
    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
    return max(candidates, key=NWORDS.get)
Comparing Python &
    Java Versions

• http://raelcunha.com/spell-correct.php
• 35 lines of Java
import java.io.*;
import java.util.*;
import java.util.regex.*;


class Spelling {

"   private final HashMap<String, Integer> nWords = new HashMap<String, Integer>();

"   public Spelling(String file) throws IOException {
"   "    BufferedReader in = new BufferedReader(new FileReader(file));
"   "    Pattern p = Pattern.compile("w+");
"   "    for(String temp = ""; temp != null; temp = in.readLine()){
"   "    "    Matcher m = p.matcher(temp.toLowerCase());
"   "    "    while(m.find()) nWords.put((temp = m.group()), nWords.containsKey(temp) ? nWords.get(temp) + 1 : 1);
"   "    }
"   "    in.close();
"   }

"    private final ArrayList<String> edits(String word) {
"    "    ArrayList<String> result = new ArrayList<String>();
"    "    for(int i=0; i < word.length(); ++i) result.add(word.substring(0, i) + word.substring(i+1));
"    "    for(int i=0; i < word.length()-1; ++i) result.add(word.substring(0, i) + word.substring(i+1, i+2) +
word.substring(i, i+1) + word.substring(i+2));
"    "    for(int i=0; i < word.length(); ++i) for(char c='a'; c <= 'z'; ++c) result.add(word.substring(0, i) +
String.valueOf(c) + word.substring(i+1));
"    "    for(int i=0; i <= word.length(); ++i) for(char c='a'; c <= 'z'; ++c) result.add(word.substring(0, i) +
String.valueOf(c) + word.substring(i));
"    "    return result;
"    }

"   public final String correct(String word) {
"   "    if(nWords.containsKey(word)) return word;
"   "    ArrayList<String> list = edits(word);
"   "    HashMap<Integer, String> candidates = new HashMap<Integer, String>();
"   "    for(String s : list) if(nWords.containsKey(s)) candidates.put(nWords.get(s),s);
"   "    if(candidates.size() > 0) return candidates.get(Collections.max(candidates.keySet()));
"   "    for(String s : list) for(String w : edits(s)) if(nWords.containsKey(w)) candidates.put(nWords.get(w),w);
"   "    return candidates.size() > 0 ? candidates.get(Collections.max(candidates.keySet())) : word;
"   }

"   public static void main(String args[]) throws IOException {
"   "    if(args.length > 0) System.out.println((new Spelling("big.txt")).correct(args[0]));
"   }

}
import re, collections

def words(text):
    return re.findall('[a-z]+', text.lower())

def train(words):
    model = collections.defaultdict(int)
    for w in words:
       model[w] += 1
    return model

NWORDS = train(words(file('big.txt').read()))

alphabet = 'abcdefghijklmnopqrstuvwxyz'

def edits1(word):
    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]
    deletes    = [a + b[1:] for a, b in s if b]
    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]
    replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]
    inserts    = [a + c + b     for a, b in s for c in alphabet]
    return set(deletes + transposes + replaces + inserts)

def known_edits2(word):
    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words):
    return set(w for w in words if w in NWORDS)

def correct(word):
    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
    return max(candidates, key=NWORDS.get)
IDE for Python

• IDE’s for Python include:
 • PyDev for Eclipse
 • WingIDE
 • IDLE for Windows/ Linux/ Mac
 • there’s more
Why Python ROCKS
• Elegant and readable language - “Executable
  Pseudocode”
• Standard Libraries - “Batteries Included”
• Very High level Datatypes
• Dynamically Typed
• It’s FUN!
An Open Source
       Community

• Projects: Plone, Zope, Grok, BFG, Django,
  SciPy & NumPy, Google App Engine,
  PyGame
• PyCon
Resources
• PyGTA
• Toronto Plone Users
• Toronto Django Users
• Stackoverflow
• Dive into Python
• Python Tutorial
Thanks

• I’d love to hear your questions or
  comments on this presentation. Reach me
  at:
  • jbb@scryent.com
  • http://twitter.com/hexsprite

Más contenido relacionado

La actualidad más candente

Descobrindo a linguagem Perl
Descobrindo a linguagem PerlDescobrindo a linguagem Perl
Descobrindo a linguagem Perlgarux
 
An (Inaccurate) Introduction to Python
An (Inaccurate) Introduction to PythonAn (Inaccurate) Introduction to Python
An (Inaccurate) Introduction to PythonNicholas Tollervey
 
Functional Pe(a)rls version 2
Functional Pe(a)rls version 2Functional Pe(a)rls version 2
Functional Pe(a)rls version 2osfameron
 
Introdução ao Perl 6
Introdução ao Perl 6Introdução ao Perl 6
Introdução ao Perl 6garux
 
The Error of Our Ways
The Error of Our WaysThe Error of Our Ways
The Error of Our WaysKevlin Henney
 
Groovy puzzlers jug-moscow-part 2
Groovy puzzlers jug-moscow-part 2Groovy puzzlers jug-moscow-part 2
Groovy puzzlers jug-moscow-part 2Evgeny Borisov
 
Functional Pe(a)rls - the Purely Functional Datastructures edition
Functional Pe(a)rls - the Purely Functional Datastructures editionFunctional Pe(a)rls - the Purely Functional Datastructures edition
Functional Pe(a)rls - the Purely Functional Datastructures editionosfameron
 
Pre-Bootcamp introduction to Elixir
Pre-Bootcamp introduction to ElixirPre-Bootcamp introduction to Elixir
Pre-Bootcamp introduction to ElixirPaweł Dawczak
 
Palestra sobre Collections com Python
Palestra sobre Collections com PythonPalestra sobre Collections com Python
Palestra sobre Collections com Pythonpugpe
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsMichael Pirnat
 
Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?osfameron
 
第二讲 Python基礎
第二讲 Python基礎第二讲 Python基礎
第二讲 Python基礎juzihua1102
 
第二讲 预备-Python基礎
第二讲 预备-Python基礎第二讲 预备-Python基礎
第二讲 预备-Python基礎anzhong70
 
RxSwift 시작하기
RxSwift 시작하기RxSwift 시작하기
RxSwift 시작하기Suyeol Jeon
 
Python tutorial
Python tutorialPython tutorial
Python tutorialnazzf
 
Ruby 程式語言簡介
Ruby 程式語言簡介Ruby 程式語言簡介
Ruby 程式語言簡介Wen-Tien Chang
 

La actualidad más candente (20)

Descobrindo a linguagem Perl
Descobrindo a linguagem PerlDescobrindo a linguagem Perl
Descobrindo a linguagem Perl
 
An (Inaccurate) Introduction to Python
An (Inaccurate) Introduction to PythonAn (Inaccurate) Introduction to Python
An (Inaccurate) Introduction to Python
 
Functional Pe(a)rls version 2
Functional Pe(a)rls version 2Functional Pe(a)rls version 2
Functional Pe(a)rls version 2
 
Introdução ao Perl 6
Introdução ao Perl 6Introdução ao Perl 6
Introdução ao Perl 6
 
The Error of Our Ways
The Error of Our WaysThe Error of Our Ways
The Error of Our Ways
 
Python 1
Python 1Python 1
Python 1
 
Groovy puzzlers jug-moscow-part 2
Groovy puzzlers jug-moscow-part 2Groovy puzzlers jug-moscow-part 2
Groovy puzzlers jug-moscow-part 2
 
Functional Pe(a)rls - the Purely Functional Datastructures edition
Functional Pe(a)rls - the Purely Functional Datastructures editionFunctional Pe(a)rls - the Purely Functional Datastructures edition
Functional Pe(a)rls - the Purely Functional Datastructures edition
 
Pre-Bootcamp introduction to Elixir
Pre-Bootcamp introduction to ElixirPre-Bootcamp introduction to Elixir
Pre-Bootcamp introduction to Elixir
 
Palestra sobre Collections com Python
Palestra sobre Collections com PythonPalestra sobre Collections com Python
Palestra sobre Collections com Python
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) Things
 
Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?
 
第二讲 Python基礎
第二讲 Python基礎第二讲 Python基礎
第二讲 Python基礎
 
第二讲 预备-Python基礎
第二讲 预备-Python基礎第二讲 预备-Python基礎
第二讲 预备-Python基礎
 
RxSwift 시작하기
RxSwift 시작하기RxSwift 시작하기
RxSwift 시작하기
 
Five
FiveFive
Five
 
Python tutorial
Python tutorialPython tutorial
Python tutorial
 
Ruby 程式語言簡介
Ruby 程式語言簡介Ruby 程式語言簡介
Ruby 程式語言簡介
 
CoffeeScript
CoffeeScriptCoffeeScript
CoffeeScript
 
PHP 5.4
PHP 5.4PHP 5.4
PHP 5.4
 

Destacado

'Wicked' Policy Challenges: Tools, Strategies and Directions for Driving Ment...
'Wicked' Policy Challenges: Tools, Strategies and Directions for Driving Ment...'Wicked' Policy Challenges: Tools, Strategies and Directions for Driving Ment...
'Wicked' Policy Challenges: Tools, Strategies and Directions for Driving Ment...Wellesley Institute
 
Plone i18n, LinguaPlone
Plone i18n, LinguaPlonePlone i18n, LinguaPlone
Plone i18n, LinguaPloneQuintagroup
 
Intro to Testing in Zope, Plone
Intro to Testing in Zope, PloneIntro to Testing in Zope, Plone
Intro to Testing in Zope, PloneQuintagroup
 
Plone Testing Tools And Techniques
Plone Testing Tools And TechniquesPlone Testing Tools And Techniques
Plone Testing Tools And TechniquesJordan Baker
 
Plone testingdzug tagung2010
Plone testingdzug tagung2010Plone testingdzug tagung2010
Plone testingdzug tagung2010Timo Stollenwerk
 
Plone TuneUp challenges
Plone TuneUp challengesPlone TuneUp challenges
Plone TuneUp challengesAndrew Mleczko
 

Destacado (7)

'Wicked' Policy Challenges: Tools, Strategies and Directions for Driving Ment...
'Wicked' Policy Challenges: Tools, Strategies and Directions for Driving Ment...'Wicked' Policy Challenges: Tools, Strategies and Directions for Driving Ment...
'Wicked' Policy Challenges: Tools, Strategies and Directions for Driving Ment...
 
Plone i18n, LinguaPlone
Plone i18n, LinguaPlonePlone i18n, LinguaPlone
Plone i18n, LinguaPlone
 
Intro to Testing in Zope, Plone
Intro to Testing in Zope, PloneIntro to Testing in Zope, Plone
Intro to Testing in Zope, Plone
 
Plone Testing Tools And Techniques
Plone Testing Tools And TechniquesPlone Testing Tools And Techniques
Plone Testing Tools And Techniques
 
Plone testingdzug tagung2010
Plone testingdzug tagung2010Plone testingdzug tagung2010
Plone testingdzug tagung2010
 
Plone TuneUp challenges
Plone TuneUp challengesPlone TuneUp challenges
Plone TuneUp challenges
 
Adobe Connect Audio Conference Bridge
Adobe Connect Audio Conference BridgeAdobe Connect Audio Conference Bridge
Adobe Connect Audio Conference Bridge
 

Similar a A Taste of Python - Devdays Toronto 2009

Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to PythonUC San Diego
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7decoupled
 
GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingMuthu Vinayagam
 
Snakes for Camels
Snakes for CamelsSnakes for Camels
Snakes for Camelsmiquelruizm
 
Super Advanced Python –act1
Super Advanced Python –act1Super Advanced Python –act1
Super Advanced Python –act1Ke Wei Louis
 
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...Mozaic Works
 
Crystal presentation in NY
Crystal presentation in NYCrystal presentation in NY
Crystal presentation in NYCrystal Language
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesMatt Harrison
 
Byterun, a Python bytecode interpreter - Allison Kaptur at NYCPython
Byterun, a Python bytecode interpreter - Allison Kaptur at NYCPythonByterun, a Python bytecode interpreter - Allison Kaptur at NYCPython
Byterun, a Python bytecode interpreter - Allison Kaptur at NYCPythonakaptur
 
Python tutorial
Python tutorialPython tutorial
Python tutorialRajiv Risi
 
Class 31: Deanonymizing
Class 31: DeanonymizingClass 31: Deanonymizing
Class 31: DeanonymizingDavid Evans
 
Slides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammersSlides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammersGiovanni924
 
Ruby 程式語言入門導覽
Ruby 程式語言入門導覽Ruby 程式語言入門導覽
Ruby 程式語言入門導覽Wen-Tien Chang
 
Python 101++: Let's Get Down to Business!
Python 101++: Let's Get Down to Business!Python 101++: Let's Get Down to Business!
Python 101++: Let's Get Down to Business!Paige Bailey
 

Similar a A Taste of Python - Devdays Toronto 2009 (20)

Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
 
An overview of Python 2.7
An overview of Python 2.7An overview of Python 2.7
An overview of Python 2.7
 
A tour of Python
A tour of PythonA tour of Python
A tour of Python
 
Python Tidbits
Python TidbitsPython Tidbits
Python Tidbits
 
GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python Programming
 
Snakes for Camels
Snakes for CamelsSnakes for Camels
Snakes for Camels
 
Super Advanced Python –act1
Super Advanced Python –act1Super Advanced Python –act1
Super Advanced Python –act1
 
Basics
BasicsBasics
Basics
 
An introduction to Ruby
An introduction to RubyAn introduction to Ruby
An introduction to Ruby
 
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
 
python codes
python codespython codes
python codes
 
Crystal presentation in NY
Crystal presentation in NYCrystal presentation in NY
Crystal presentation in NY
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 Minutes
 
Byterun, a Python bytecode interpreter - Allison Kaptur at NYCPython
Byterun, a Python bytecode interpreter - Allison Kaptur at NYCPythonByterun, a Python bytecode interpreter - Allison Kaptur at NYCPython
Byterun, a Python bytecode interpreter - Allison Kaptur at NYCPython
 
Python tutorial
Python tutorialPython tutorial
Python tutorial
 
Class 31: Deanonymizing
Class 31: DeanonymizingClass 31: Deanonymizing
Class 31: Deanonymizing
 
Slides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammersSlides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammers
 
Ruby 程式語言入門導覽
Ruby 程式語言入門導覽Ruby 程式語言入門導覽
Ruby 程式語言入門導覽
 
Python 101++: Let's Get Down to Business!
Python 101++: Let's Get Down to Business!Python 101++: Let's Get Down to Business!
Python 101++: Let's Get Down to Business!
 

Último

Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Último (20)

Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

A Taste of Python - Devdays Toronto 2009

  • 1. a taste of Presented by Jordan Baker October 23, 2009 DevDays Toronto
  • 2. About Me • Open Source Developer • Founder of Open Source Web Application and CMS service provider: Scryent - www.scryent.com • Founder of Toronto Plone Users Group - www.torontoplone.ca
  • 3. Agenda • About Python • Show me your CODE • A Spell Checker in 21 lines of code • Why Python ROCKS • Resources for further exploration
  • 5. About Python • Gotta love a language named after Monty Python’s Flying Circus • Used in more places than you might know
  • 6. Significant Whitespace C-like if(x == 2) { do_something(); } do_something_else(); Python if x == 2: do_something() do_something_else()
  • 7. Significant Whitespace • less code clutter • eliminates many common syntax errors • proper code layout • use an indentation aware editor or IDE • Get over it!
  • 8. Python is Interactive Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>>
  • 10. FIZZ BUZZ def fizzbuzz(n):     for i in range(n + 1):         if not i % 3:             print "Fizz",         if not i % 5:             print "Buzz",         if i % 3 and i % 5:             print i,         print fizzbuzz(50)
  • 11. FIZZ BUZZ def fizzbuzz(n):     for i in range(n + 1):         if not i % 3:             print "Fizz",         if not i % 5:             print "Buzz",         if i % 3 and i % 5:             print i,         print fizzbuzz(50)
  • 12. FIZZ BUZZ (OO) class FizzBuzzWriter(object):     def __init__(self, limit):         self.limit = limit             def run(self):         for n in range(1, self.limit + 1):             self.write_number(n)         def write_number(self, n):         if not n % 3:             print "Fizz",         if not n % 5:             print "Buzz",         if n % 3 and n % 5:             print n,         print         fizzbuzz = FizzBuzzWriter(50) fizzbuzz.run()
  • 13. A Spell Checker in 21 Lines of Code • Written by Peter Norvig • Duplicated in many languages • Simple Spellchecking algorithm based on probability • http://norvig.com/spell-correct.html
  • 14. The Approach • Census by frequency • Morph the word (werd) • Insertions: waerd, wberd, werzd • Deletions: wrd, wed, wer • Transpositions: ewrd, wred, wedr • Replacements: aerd, ward, wbrd, word, wzrd, werz • Find the one with the highest frequency: were
  • 15. Norvig Spellchecker import re, collections def words(text):    return re.findall('[a-z]+', text.lower()) def train(words):    model = collections.defaultdict(int)     for w in words:        model[w] += 1     return model NWORDS = train(words(file('big.txt').read())) alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word):    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]    deletes    = [a + b[1:] for a, b in s if b]    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]    replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]    inserts    = [a + c + b     for a, b in s for c in alphabet]    return set(deletes + transposes + replaces + inserts) def known_edits2(word):    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS) def known(words):    return set(w for w in words if w in NWORDS) def correct(word):    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]    return max(candidates, key=NWORDS.get)
  • 16. Regular Expressions def words(text): return re.findall('[a-z]+', text.lower()) >>> words("The cat in the hat!") ['the', 'cat', 'in', 'the', 'hat']
  • 17. Dictionaries >>> d = {'cat':1} >>> d {'cat': 1} >>> d['cat'] 1 >>> d['cat'] += 1 >>> d {'cat': 2} >>> d['dog'] += 1 Traceback (most recent call last):  File "<stdin>", line 1, in <module> KeyError: 'dog' 
  • 18. defaultdict # Has a factory for missing keys >>> d = collections.defaultdict(int) >>> d['dog'] += 1 >>> d {'dog': 1} >>> int <type 'int'> >>> int() 0 def train(words):    model = collections.defaultdict(int)    for w in words:        model[w] += 1    return model >>> train(words("The cat in the hat!")) {'cat': 1, 'the': 2, 'hat': 1, 'in': 1}              
  • 19. Reading the File    >>> text = file('big.txt').read()    >>> NWORDS = train(words(text))    >>> NWORDS    {'nunnery': 3, 'presnya': 1, 'woods': 22, 'clotted': 1, 'spiders': 1,    'hanging': 42, 'disobeying': 2, 'scold': 3, 'originality': 6,    'grenadiers': 8, 'pigment': 16, 'appropriation': 6, 'strictest': 1,    'bringing': 48, 'revelers': 1, 'wooded': 8, 'wooden': 37,    'wednesday': 13, 'shows': 50, 'immunities': 3, 'guardsmen': 4,    'sooty': 1, 'inevitably': 32, 'clavicular': 9, 'sustaining': 5,    'consenting': 1, 'scraped': 21, 'errors': 16, 'semicircular': 1,    'cooking': 6, 'spiroch': 25, 'designing': 1, 'pawed': 1,    'succumb': 12, 'shocks': 1, 'crouch': 2, 'chins': 1, 'awistocwacy': 1,    'sunbeams': 1, 'perforations': 6, 'china': 43, 'affiliated': 4,    'chunk': 22, 'natured': 34, 'uplifting': 1, 'slaveholders': 2,    'climbed': 13, 'controversy': 33, 'natures': 2, 'climber': 1,    'lency': 2, 'joyousness': 1, 'reproaching': 3, 'insecurity': 1,    'abbreviations': 1, 'definiteness': 1, 'music': 56, 'therefore': 186,    'expeditionary': 3, 'primeval': 1, 'unpack': 1, 'circumstances': 107,    ... (about 6500 more lines) ...    >>> NWORDS['the']    80030    >>> NWORDS['unusual']    32    >>> NWORDS['cephalopod']    0
  • 20. Training the Probability Model import re, collections def words(text): return re.findall('[a-z]+', text.lower()) def train(words):    model = collections.defaultdict(int)    for w in words:    model[w] += 1    return model NWORDS = train(words(file('big.txt').read()))
  • 21. List Comprehensions # These two are equivalent: result = [] for v in iter: if cond:    result.append(expr) [ expr for v in iter if cond ] # You can nest loops also: result = [] for v1 in iter1:    for v2 in iter2:        if cond:            result.append(expr) [ expr for v1 in iter1 for v2 in iter2 if cond ]  
  • 22. String Slicing >>> word = "spam" >>> word[:1] 's' >>> word[1:] 'pam' >>> (word[:1], word[1:]) ('s', 'pam') >>> range(len(word) + 1) [0, 1, 2, 3, 4] >>> [(word[:i], word[i:]) for i in range(len(word) + 1)] [('', 'spam'), ('s', 'pam'), ('sp', 'am'), ('spa', 'm'), ('spam', '')]
  • 23. Deletions >>> word = "spam" >>> s = [(word[:i], word[i:]) for i in range(len(word) + 1)] >>> deletes = [a + b[1:] for a, b in s if b] >>> deletes ['pam', 'sam', 'spm', 'spa'] >>> a, b = ('s', 'pam') >>> a 's' >>> b 'pam' >>> bool('pam') True >>> bool('') False
  • 24. Transpositions For example: teh => the >>> transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1] >>> transposes ['psam', 'sapm', 'spma']
  • 25. Replacements >>> alphabet = "abcdefghijklmnopqrstuvwxyz" >>> replaces = [a + c + b[1:]  for a, b in s for c in alphabet if b] >>> replaces ['apam', 'bpam', ..., 'zpam', 'saam', ..., 'szam', ..., 'spaz']
  • 26. Insertion >>> alphabet = "abcdefghijklmnopqrstuvwxyz" >>> inserts = [a + c + b  for a, b in s for c in alphabet] >>> inserts ['aspam', ..., 'zspam', 'sapam', ..., 'szpam', 'spaam', ..., 'spamz']
  • 27. Find all Edits alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word):    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]    deletes = [a + b[1:] for a, b in s if b]    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]    replaces = [a + c + b[1:] for a, b in s for c in alphabet if b]    inserts = [a + c + b  for a, b in s for c in alphabet]    return set(deletes + transposes + replaces + inserts) >>> edits1("spam") set(['sptm', 'skam', 'spzam', 'vspam', 'spamj', 'zpam', 'sbam', 'spham', 'snam', 'sjpam', 'spma', 'swam', 'spaem', 'tspam', 'spmm', 'slpam', 'upam', 'spaim', 'sppm', 'spnam', 'spem', 'sparm', 'spamr', 'lspam', 'sdpam', 'spams', 'spaml', 'spamm', 'spamn', 'spum', 'spamh', 'spami', 'spatm', 'spamk', 'spamd', ..., 'spcam', 'spamy'])
  • 28. Known Words def known(words):        """ Return the known words from `words`. """        return set(w for w in words if w in NWORDS)
  • 29. Correct def known(words):    """ Return the known words from `words`. """    return set(w for w in words if w in NWORDS) def correct(word):    candidates = known([word]) or known(edits1(word)) or [word]    return max(candidates, key=NWORDS.get) >>> bool(set([])) False >>> correct("computr") 'computer' >>> correct("computor") 'computer' >>> correct("computerr") 'computer'
  • 30. Edit Distance 2 def known_edits2(word):    return set(        e2            for e1 in edits1(word)                for e2 in edits1(e1)                    if e2 in NWORDS        ) def correct(word):    candidates = known([word]) or known(edits1(word)) or        known_edits2(word) or [word]    return max(candidates, key=NWORDS.get) >>> correct("conpuler") 'computer' >>> correct("cmpuler") 'computer'
  • 31. import re, collections def words(text):    return re.findall('[a-z]+', text.lower()) def train(words):    model = collections.defaultdict(int)     for w in words:        model[w] += 1     return model NWORDS = train(words(file('big.txt').read())) alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word):    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]    deletes    = [a + b[1:] for a, b in s if b]    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]    replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]    inserts    = [a + c + b     for a, b in s for c in alphabet]    return set(deletes + transposes + replaces + inserts) def known_edits2(word):    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS) def known(words):    return set(w for w in words if w in NWORDS) def correct(word):    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]    return max(candidates, key=NWORDS.get)
  • 32. Comparing Python & Java Versions • http://raelcunha.com/spell-correct.php • 35 lines of Java
  • 33. import java.io.*; import java.util.*; import java.util.regex.*; class Spelling { " private final HashMap<String, Integer> nWords = new HashMap<String, Integer>(); " public Spelling(String file) throws IOException { " " BufferedReader in = new BufferedReader(new FileReader(file)); " " Pattern p = Pattern.compile("w+"); " " for(String temp = ""; temp != null; temp = in.readLine()){ " " " Matcher m = p.matcher(temp.toLowerCase()); " " " while(m.find()) nWords.put((temp = m.group()), nWords.containsKey(temp) ? nWords.get(temp) + 1 : 1); " " } " " in.close(); " } " private final ArrayList<String> edits(String word) { " " ArrayList<String> result = new ArrayList<String>(); " " for(int i=0; i < word.length(); ++i) result.add(word.substring(0, i) + word.substring(i+1)); " " for(int i=0; i < word.length()-1; ++i) result.add(word.substring(0, i) + word.substring(i+1, i+2) + word.substring(i, i+1) + word.substring(i+2)); " " for(int i=0; i < word.length(); ++i) for(char c='a'; c <= 'z'; ++c) result.add(word.substring(0, i) + String.valueOf(c) + word.substring(i+1)); " " for(int i=0; i <= word.length(); ++i) for(char c='a'; c <= 'z'; ++c) result.add(word.substring(0, i) + String.valueOf(c) + word.substring(i)); " " return result; " } " public final String correct(String word) { " " if(nWords.containsKey(word)) return word; " " ArrayList<String> list = edits(word); " " HashMap<Integer, String> candidates = new HashMap<Integer, String>(); " " for(String s : list) if(nWords.containsKey(s)) candidates.put(nWords.get(s),s); " " if(candidates.size() > 0) return candidates.get(Collections.max(candidates.keySet())); " " for(String s : list) for(String w : edits(s)) if(nWords.containsKey(w)) candidates.put(nWords.get(w),w); " " return candidates.size() > 0 ? candidates.get(Collections.max(candidates.keySet())) : word; " } " public static void main(String args[]) throws IOException { " " if(args.length > 0) System.out.println((new Spelling("big.txt")).correct(args[0])); " } }
  • 34. import re, collections def words(text):    return re.findall('[a-z]+', text.lower()) def train(words):    model = collections.defaultdict(int)     for w in words:        model[w] += 1     return model NWORDS = train(words(file('big.txt').read())) alphabet = 'abcdefghijklmnopqrstuvwxyz' def edits1(word):    s = [(word[:i], word[i:]) for i in range(len(word) + 1)]    deletes    = [a + b[1:] for a, b in s if b]    transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]    replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]    inserts    = [a + c + b     for a, b in s for c in alphabet]    return set(deletes + transposes + replaces + inserts) def known_edits2(word):    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS) def known(words):    return set(w for w in words if w in NWORDS) def correct(word):    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]    return max(candidates, key=NWORDS.get)
  • 35. IDE for Python • IDE’s for Python include: • PyDev for Eclipse • WingIDE • IDLE for Windows/ Linux/ Mac • there’s more
  • 36. Why Python ROCKS • Elegant and readable language - “Executable Pseudocode” • Standard Libraries - “Batteries Included” • Very High level Datatypes • Dynamically Typed • It’s FUN!
  • 37. An Open Source Community • Projects: Plone, Zope, Grok, BFG, Django, SciPy & NumPy, Google App Engine, PyGame • PyCon
  • 38. Resources • PyGTA • Toronto Plone Users • Toronto Django Users • Stackoverflow • Dive into Python • Python Tutorial
  • 39. Thanks • I’d love to hear your questions or comments on this presentation. Reach me at: • jbb@scryent.com • http://twitter.com/hexsprite