Python tutorial

Introduction to Python
Chen LinChen Lin
clin@brandeis.educlin@brandeis.edu
COSI 134aCOSI 134a
Volen 110Volen 110
Office Hour: Thurs. 3-5Office Hour: Thurs. 3-5

For More Information?
http://python.org/
- documentation, tutorials, beginners guide, core
distribution, ...
Books include:
 Learning Python by Mark Lutz
 Python Essential Reference by David Beazley
 Python Cookbook, ed. by Martelli, Ravenscroft and
Ascher
 (online at
http://code.activestate.com/recipes/langs/python/)
 http://wiki.python.org/moin/PythonBooks

Python VideosPython Videos
http://showmedo.com/videotutorials/python
“5 Minute Overview (What Does Python
Look Like?)”
“Introducing the PyDev IDE for Eclipse”
“Linear Algebra with Numpy”
And many more

4 Major Versions of Python4 Major Versions of Python
“Python” or “CPython” is written in C/C++
- Version 2.7 came out in mid-2010
- Version 3.1.2 came out in early 2010
“Jython” is written in Java for the JVM
“IronPython” is written in C# for the .Net
environment
Go To Website

Development EnvironmentsDevelopment Environments
what IDE to use?what IDE to use? http://stackoverflow.com/questions/81584http://stackoverflow.com/questions/81584
1. PyDev with Eclipse
2. Komodo
3. Emacs
4. Vim
5. TextMate
6. Gedit
7. Idle
8. PIDA (Linux)(VIM Based)
9. NotePad++ (Windows)
10.BlueFish (Linux)

Pydev with EclipsePydev with Eclipse

Python Interactive ShellPython Interactive Shell
% python% python
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.Type "help", "copyright", "credits" or "license" for more information.
>>>>>>
You can type things directly into a running Python sessionYou can type things directly into a running Python session
>>> 2+3*4>>> 2+3*4
1414
>>> name = "Andrew">>> name = "Andrew"
>>> name>>> name
'Andrew''Andrew'
>>> print "Hello", name>>> print "Hello", name
Hello AndrewHello Andrew
>>>>>>

 BackgroundBackground
 Data Types/StructureData Types/Structure
 Control flowControl flow
 File I/OFile I/O
 ModulesModules
 ClassClass
 NLTKNLTK

ListList
A compound data type:A compound data type:
[0][0]
[2.3, 4.5][2.3, 4.5]
[5, "Hello", "there", 9.8][5, "Hello", "there", 9.8]
[][]
Use len() to get the length of a listUse len() to get the length of a list
>>> names = [“Ben", “Chen", “Yaqin"]>>> names = [“Ben", “Chen", “Yaqin"]
>>> len(names)>>> len(names)
33

Use [ ] to index items in the listUse [ ] to index items in the list
>>> names[0]>>> names[0]
‘‘Ben'Ben'
‘‘Chen'Chen'
‘‘Yaqin'Yaqin'
Traceback (most recent call last):Traceback (most recent call last):
File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>
IndexError: list index out of rangeIndexError: list index out of range
>>> names[-1]>>> names[-1]
‘‘Yaqin'Yaqin'
>>> names[-2]>>> names[-2]
‘‘Chen'Chen'
>>> names[-3]>>> names[-3]
‘‘Ben'Ben'
[0] is the first item.
[1] is the second item
...
Out of range values
raise an exception
Negative values
go backwards from
the last element.

Strings share many features with listsStrings share many features with lists
>>> smiles = "C(=N)(N)N.C(=O)(O)O">>> smiles = "C(=N)(N)N.C(=O)(O)O"
>>> smiles[0]>>> smiles[0]
'C''C'
>>> smiles[1]>>> smiles[1]
'(''('
>>> smiles[-1]>>> smiles[-1]
'O''O'
>>> smiles[1:5]>>> smiles[1:5]
'(=N)''(=N)'
>>> smiles[10:-4]>>> smiles[10:-4]
'C(=O)''C(=O)'
Use “slice” notation to
get a substring

String Methods: find, splitString Methods: find, split
smiles = "C(=N)(N)N.C(=O)(O)O"smiles = "C(=N)(N)N.C(=O)(O)O"
>>> smiles.find("(O)")>>> smiles.find("(O)")
1515
>>> smiles.find(".")>>> smiles.find(".")
99
>>> smiles.find(".", 10)>>> smiles.find(".", 10)
-1-1
>>> smiles.split(".")>>> smiles.split(".")
['C(=N)(N)N', 'C(=O)(O)O']['C(=N)(N)N', 'C(=O)(O)O']
>>>>>>
Use “find” to find the
start of a substring.
Start looking at position 10.
Find returns -1 if it couldn’t
find a match.
Split the string into parts
with “.” as the delimiter

String operators: in, not inString operators: in, not in
if "Br" in “Brother”:if "Br" in “Brother”:
print "contains brother“print "contains brother“
email_address = “clin”email_address = “clin”
if "@" not in email_address:if "@" not in email_address:
email_address += "@brandeis.edu“email_address += "@brandeis.edu“

String Method: “strip”, “rstrip”, “lstrip” are ways toString Method: “strip”, “rstrip”, “lstrip” are ways to
remove whitespace or selected charactersremove whitespace or selected characters
>>> line = " # This is a comment line n">>> line = " # This is a comment line n"
>>> line.strip()>>> line.strip()
'# This is a comment line''# This is a comment line'
>>> line.rstrip()>>> line.rstrip()
' # This is a comment line'' # This is a comment line'
>>> line.rstrip("n")>>> line.rstrip("n")
' # This is a comment line '' # This is a comment line '
>>>>>>

More String methodsMore String methods
email.startswith(“c") endswith(“u”)email.startswith(“c") endswith(“u”)
True/FalseTrue/False
>>> "%s@brandeis.edu" % "clin">>> "%s@brandeis.edu" % "clin"
'clin@brandeis.edu''clin@brandeis.edu'
>>> ", ".join(names)>>> ", ".join(names)
‘‘Ben, Chen, Yaqin‘Ben, Chen, Yaqin‘
>>> “chen".upper()>>> “chen".upper()
‘‘CHEN'CHEN'

Unexpected things about stringsUnexpected things about strings
>>> s = "andrew">>> s = "andrew"
>>> s[0] = "A">>> s[0] = "A"
TypeError: 'str' object does not support itemTypeError: 'str' object does not support item
assignmentassignment
>>> s = "A" + s[1:]>>> s = "A" + s[1:]
>>> s>>> s
'Andrew‘'Andrew‘
Strings are read only

““” is for special characters” is for special characters
n -> newlinen -> newline
t -> tabt -> tab
-> backslash -> backslash
......
But Windows uses backslash for directories!
filename = "M:nickel_projectreactive.smi" # DANGER!
filename = "M:nickel_projectreactive.smi" # Better!
filename = "M:/nickel_project/reactive.smi" # Usually works

Lists are mutable - some usefulLists are mutable - some useful
methodsmethods
>>> ids = ["9pti", "2plv", "1crn"]>>> ids = ["9pti", "2plv", "1crn"]
>>> ids.append("1alm")>>> ids.append("1alm")
>>> ids>>> ids
['9pti', '2plv', '1crn', '1alm']['9pti', '2plv', '1crn', '1alm']
>>>ids.extend(L)>>>ids.extend(L)
Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.
>>> del ids[0]>>> del ids[0]
>>> ids>>> ids
['2plv', '1crn', '1alm']['2plv', '1crn', '1alm']
>>> ids.sort()>>> ids.sort()
>>> ids>>> ids
['1alm', '1crn', '2plv']['1alm', '1crn', '2plv']
>>> ids.reverse()>>> ids.reverse()
>>> ids>>> ids
['2plv', '1crn', '1alm']['2plv', '1crn', '1alm']
>>> ids.insert(0, "9pti")>>> ids.insert(0, "9pti")
>>> ids>>> ids
['9pti', '2plv', '1crn', '1alm']['9pti', '2plv', '1crn', '1alm']
append an element
remove an element
sort by default order
reverse the elements in a list
insert an element at some
specified position.
(Slower than .append())

Tuples:Tuples: sort of an immutable list
>>> yellow = (255, 255, 0) # r, g, b>>> yellow = (255, 255, 0) # r, g, b
>>> one = (1,)>>> one = (1,)
>>> yellow[0]>>> yellow[0]
>>> yellow[1:]>>> yellow[1:]
(255, 0)(255, 0)
>>> yellow[0] = 0>>> yellow[0] = 0
TypeError: 'tuple' object does not support item assignmentTypeError: 'tuple' object does not support item assignment
Very common in string interpolation:
>>> "%s lives in %s at latitude %.1f" % ("Andrew", "Sweden", 57.7056)
'Andrew lives in Sweden at latitude 57.7'

zipping lists togetherzipping lists together
>>> names>>> names
['ben', 'chen', 'yaqin']['ben', 'chen', 'yaqin']
>>> gender =>>> gender = [0, 0, 1][0, 0, 1]
>>> zip(names, gender)>>> zip(names, gender)
[('ben', 0), ('chen', 0), ('yaqin', 1)][('ben', 0), ('chen', 0), ('yaqin', 1)]

DictionariesDictionaries
 Dictionaries are lookup tables.
 They map from a “key” to a “value”.
symbol_to_name = {
"H": "hydrogen",
"He": "helium",
"Li": "lithium",
"C": "carbon",
"O": "oxygen",
"N": "nitrogen"
}
 Duplicate keys are not allowed
 Duplicate values are just fine

Keys can be any immutable valueKeys can be any immutable value
numbers, strings, tuples, frozensetnumbers, strings, tuples, frozenset,,
not list, dictionary, set, ...not list, dictionary, set, ...
atomic_number_to_name = {atomic_number_to_name = {
1: "hydrogen"1: "hydrogen"
6: "carbon",6: "carbon",
7: "nitrogen"7: "nitrogen"
8: "oxygen",8: "oxygen",
}}
nobel_prize_winners = {nobel_prize_winners = {
(1979, "physics"): ["Glashow", "Salam", "Weinberg"],(1979, "physics"): ["Glashow", "Salam", "Weinberg"],
(1962, "chemistry"): ["Hodgkin"],(1962, "chemistry"): ["Hodgkin"],
(1984, "biology"): ["McClintock"],(1984, "biology"): ["McClintock"],
}}
A set is an unordered collection
with no duplicate elements.

DictionaryDictionary
>>> symbol_to_name["C"]>>> symbol_to_name["C"]
'carbon''carbon'
>>> "O" in symbol_to_name, "U" in symbol_to_name>>> "O" in symbol_to_name, "U" in symbol_to_name
(True, False)(True, False)
>>> "oxygen" in symbol_to_name>>> "oxygen" in symbol_to_name
FalseFalse
>>> symbol_to_name["P"]>>> symbol_to_name["P"]
KeyError: 'P'KeyError: 'P'
>>> symbol_to_name.get("P", "unknown")>>> symbol_to_name.get("P", "unknown")
'unknown''unknown'
>>> symbol_to_name.get("C", "unknown")>>> symbol_to_name.get("C", "unknown")
'carbon''carbon'
Get the value for a given key
Test if the key exists
(“in” only checks the keys,
not the values.)
[] lookup failures raise an exception.
Use “.get()” if you want
to return a default value.

Some useful dictionary methodsSome useful dictionary methods
>>> symbol_to_name.keys()>>> symbol_to_name.keys()
['C', 'H', 'O', 'N', 'Li', 'He']['C', 'H', 'O', 'N', 'Li', 'He']
>>> symbol_to_name.values()>>> symbol_to_name.values()
['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']
>>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} )>>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} )
>>> symbol_to_name.items()>>> symbol_to_name.items()
[('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), ('P',[('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), ('P',
'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')]'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')]
>>> del symbol_to_name['C']>>> del symbol_to_name['C']
>>> symbol_to_name>>> symbol_to_name
{'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': 'helium'}{'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': 'helium'}

 BackgroundBackground
 Data Types/StructureData Types/Structure
list, string, tuple, dictionarylist, string, tuple, dictionary
 Control flowControl flow
 File I/OFile I/O
 ModulesModules
 ClassClass
 NLTKNLTK

Control FlowControl Flow
Things that are FalseThings that are False
 The boolean value False
 The numbers 0 (integer), 0.0 (float) and 0j (complex).
 The empty string "".
 The empty list [], empty dictionary {} and empty set set().
Things that are TrueThings that are True
 The boolean value TrueThe boolean value True
 All non-zero numbers.All non-zero numbers.
 Any string containing at least one character.Any string containing at least one character.
 A non-empty data structure.A non-empty data structure.

IfIf
>>> smiles = "BrC1=CC=C(C=C1)NN.Cl">>> smiles = "BrC1=CC=C(C=C1)NN.Cl"
>>> bool(smiles)>>> bool(smiles)
TrueTrue
>>> not bool(smiles)>>> not bool(smiles)
FalseFalse
>>> if not smiles>>> if not smiles::
... print "The SMILES string is empty"... print "The SMILES string is empty"
......
 The “else” case is always optional

Use “elif” to chain subsequent testsUse “elif” to chain subsequent tests
>>> mode = "absolute">>> mode = "absolute"
>>> if mode == "canonical":>>> if mode == "canonical":
...... smiles = "canonical"smiles = "canonical"
... elif mode == "isomeric":... elif mode == "isomeric":
...... smiles = "isomeric”smiles = "isomeric”
...... elif mode == "absolute":elif mode == "absolute":
...... smiles = "absolute"smiles = "absolute"
... else:... else:
...... raise TypeError("unknown mode")raise TypeError("unknown mode")
......
>>> smiles>>> smiles
' absolute '' absolute '
>>>>>>
“raise” is the Python way to raise exceptions

Boolean logicBoolean logic
Python expressions can have “and”s andPython expressions can have “and”s and
“or”s:“or”s:
if (benif (ben <=<= 5 and chen5 and chen >=>= 10 or10 or
chenchen ==== 500 and ben500 and ben !=!= 5):5):
print “Ben and Chen“print “Ben and Chen“

Range TestRange Test
if (3if (3 <= Time <=<= Time <= 5):5):
print “Office Hour"print “Office Hour"

ForFor
>>> for name in names:>>> for name in names:
...... print smilesprint smiles
......
BenBen
ChenChen
YaqinYaqin

Tuple assignment in for loopsTuple assignment in for loops
data = [ ("C20H20O3", 308.371),data = [ ("C20H20O3", 308.371),
("C22H20O2", 316.393),("C22H20O2", 316.393),
("C24H40N4O2", 416.6),("C24H40N4O2", 416.6),
("C14H25N5O3", 311.38),("C14H25N5O3", 311.38),
("C15H20O2", 232.3181)]("C15H20O2", 232.3181)]
forfor (formula, mw)(formula, mw) in data:in data:
print "The molecular weight of %s is %s" % (formula, mw)print "The molecular weight of %s is %s" % (formula, mw)
The molecular weight of C20H20O3 is 308.371The molecular weight of C20H20O3 is 308.371
The molecular weight of C24H40N4O2 is 416.6The molecular weight of C24H40N4O2 is 416.6
The molecular weight of C14H25N5O3 is 311.38The molecular weight of C14H25N5O3 is 311.38

Break, continueBreak, continue
>>> for value in [3, 1, 4, 1, 5, 9, 2]:>>> for value in [3, 1, 4, 1, 5, 9, 2]:
...... print "Checking", valueprint "Checking", value
...... if value > 8:if value > 8:
...... print "Exiting for loop"print "Exiting for loop"
...... breakbreak
...... elif value < 3:elif value < 3:
...... print "Ignoring"print "Ignoring"
...... continuecontinue
...... print "The square is", value**2print "The square is", value**2
......
Use “break” to stopUse “break” to stop
the for loopthe for loop
Use “continue” to stopUse “continue” to stop
processing the current itemprocessing the current item
Checking 3Checking 3
The square is 9The square is 9
IgnoringIgnoring
IgnoringIgnoring
Exiting for loopExiting for loop
>>>>>>

Range()Range()
 ““range” creates a list of numbers in a specified rangerange” creates a list of numbers in a specified range
 range([start,] stop[, step]) -> list of integersrange([start,] stop[, step]) -> list of integers
 When step is given, it specifies the increment (or decrement).When step is given, it specifies the increment (or decrement).
>>> range(5)>>> range(5)
[0, 1, 2, 3, 4][0, 1, 2, 3, 4]
>>> range(5, 10)>>> range(5, 10)
[5, 6, 7, 8, 9][5, 6, 7, 8, 9]
>>> range(0, 10, 2)>>> range(0, 10, 2)
[0, 2, 4, 6, 8][0, 2, 4, 6, 8]
How to get every second element in a list?
for i in range(0, len(data), 2):
print data[i]

Reading filesReading files
>>> f = open(“names.txt")>>> f = open(“names.txt")
>>> f.readline()>>> f.readline()
'Yaqinn''Yaqinn'

Quick WayQuick Way
>>> lst= [ x for x in open("text.txt","r").readlines() ]>>> lst= [ x for x in open("text.txt","r").readlines() ]
>>> lst>>> lst
['Chen Linn', 'clin@brandeis.edun', 'Volen 110n', 'Office['Chen Linn', 'clin@brandeis.edun', 'Volen 110n', 'Office
Hour: Thurs. 3-5n', 'n', 'Yaqin Yangn',Hour: Thurs. 3-5n', 'n', 'Yaqin Yangn',
'yaqin@brandeis.edun', 'Volen 110n', 'Offiche Hour:'yaqin@brandeis.edun', 'Volen 110n', 'Offiche Hour:
Tues. 3-5n']Tues. 3-5n']
Ignore the header?Ignore the header?
for (i,line) in enumerate(open(‘text.txt’,"r").readlines()):for (i,line) in enumerate(open(‘text.txt’,"r").readlines()):
if i == 0: continueif i == 0: continue
print lineprint line

Using dictionaries to countUsing dictionaries to count
occurrencesoccurrences
>>> for line in open('names.txt'):>>> for line in open('names.txt'):
...... name = line.strip()name = line.strip()
...... name_count[name] = name_count.get(name,0)+ 1name_count[name] = name_count.get(name,0)+ 1
......
>>> for (name, count) in name_count.items():>>> for (name, count) in name_count.items():
...... print name, countprint name, count
......
Chen 3Chen 3
Ben 3Ben 3
Yaqin 3Yaqin 3

File OutputFile Output
input_file = open(“in.txt")input_file = open(“in.txt")
output_file = open(“out.txt", "w")output_file = open(“out.txt", "w")
for line in input_file:for line in input_file:
output_file.write(line)output_file.write(line)
“w” = “write mode”
“a” = “append mode”
“wb” = “write in binary”
“r” = “read mode” (default)
“rb” = “read in binary”
“U” = “read files with Unix
or Windows line endings”

ModulesModules
When a Python program starts it only has
access to a basic functions and classes.
(“int”, “dict”, “len”, “sum”, “range”, ...)
“Modules” contain additional functionality.
Use “import” to tell Python to load a
module.
>>> import math
>>> import nltk

import the math moduleimport the math module
>>> import math>>> import math
>>> math.pi>>> math.pi
3.14159265358979313.1415926535897931
>>> math.cos(0)>>> math.cos(0)
1.01.0
>>> math.cos(math.pi)>>> math.cos(math.pi)
-1.0-1.0
>>> dir(math)>>> dir(math)
['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh',['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh',
'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos','asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos',
'cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod','cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod',
'frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10','frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10',
'log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan','log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan',
'tanh', 'trunc']'tanh', 'trunc']
>>> help(math)>>> help(math)
>>> help(math.cos)>>> help(math.cos)

““import” and “from ... import ...”import” and “from ... import ...”
>>> import math>>> import math
math.cosmath.cos
>>> from math import cos, pi
cos
>>> from math import *

ClassesClasses
class ClassName(object):class ClassName(object):
<statement-1><statement-1>
. . .. . .
<statement-N><statement-N>
class MyClass(object):class MyClass(object):
"""A simple example class""""""A simple example class"""
i = 1234512345
def f(self):def f(self):
return self.ireturn self.i
class DerivedClassName(BaseClassName):class DerivedClassName(BaseClassName):
<statement-1><statement-1>
. . .. . .
<statement-N><statement-N>

http://www.nltk.org/bookhttp://www.nltk.org/book
NLTK is on berry patch machines!NLTK is on berry patch machines!
>>>from nltk.book import *>>>from nltk.book import *
>>> text1>>> text1
<Text: Moby Dick by Herman Melville 1851><Text: Moby Dick by Herman Melville 1851>
>>> text1.name>>> text1.name
'Moby Dick by Herman Melville 1851''Moby Dick by Herman Melville 1851'
>>> text1.concordance("monstrous")>>> text1.concordance("monstrous")
>>> dir(text1)>>> dir(text1)
>>> text1.tokens>>> text1.tokens
>>> text1.index("my")>>> text1.index("my")
46474647
>>> sent2>>> sent2
['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in',['The', 'family', 'of', 'Dashwood', 'had', 'long', 'been', 'settled', 'in',
'Sussex', '.']'Sussex', '.']

Classify TextClassify Text
>>> def gender_features(word):>>> def gender_features(word):
...... return {'last_letter': word[-1]}return {'last_letter': word[-1]}
>>> gender_features('Shrek')>>> gender_features('Shrek')
{'last_letter': 'k'}{'last_letter': 'k'}
>>> from nltk.corpus import names>>> from nltk.corpus import names
>>> import random>>> import random
>>> names = ([(name, 'male') for name in names.words('male.txt')] +>>> names = ([(name, 'male') for name in names.words('male.txt')] +
... [(name, 'female') for name in names.words('female.txt')])... [(name, 'female') for name in names.words('female.txt')])
>>> random.shuffle(names)>>> random.shuffle(names)

Featurize, train, test, predictFeaturize, train, test, predict
>>> featuresets = [(gender_features(n), g) for (n,g) in names]>>> featuresets = [(gender_features(n), g) for (n,g) in names]
>>> train_set, test_set = featuresets[500:], featuresets[:500]>>> train_set, test_set = featuresets[500:], featuresets[:500]
>>> classifier = nltk.NaiveBayesClassifier.train(train_set)>>> classifier = nltk.NaiveBayesClassifier.train(train_set)
>>> print nltk.classify.accuracy(classifier, test_set)>>> print nltk.classify.accuracy(classifier, test_set)
0.7260.726
>>> classifier.classify(gender_features('Neo'))>>> classifier.classify(gender_features('Neo'))
'male''male'

fromfrom nltknltk.corpus import.corpus import reutersreuters
 Reuters Corpus:Reuters Corpus:10,788 news10,788 news
1.3 million words.1.3 million words.
 Been classified intoBeen classified into 9090 topicstopics
 Grouped into 2 sets, "training" and "test“Grouped into 2 sets, "training" and "test“
 Categories overlap with each otherCategories overlap with each other
http://nltk.googlecode.com/svn/trunk/doc/bohttp://nltk.googlecode.com/svn/trunk/doc/bo
ok/ch02.htmlok/ch02.html

ReutersReuters
>>> from nltk.corpus import reuters>>> from nltk.corpus import reuters
>>> reuters.fileids()>>> reuters.fileids()
['test/14826', 'test/14828', 'test/14829', 'test/14832', ...]['test/14826', 'test/14828', 'test/14829', 'test/14832', ...]
>>> reuters.categories()>>> reuters.categories()
['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut',['acq', 'alum', 'barley', 'bop', 'carcass', 'castor-oil', 'cocoa', 'coconut',
'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton-'coconut-oil', 'coffee', 'copper', 'copra-cake', 'corn', 'cotton', 'cotton-
oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...]oil', 'cpi', 'cpu', 'crude', 'dfl', 'dlr', ...]

Python tutorial

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Python tutorial

Similar a Python tutorial (20)

Más de Shani729

Más de Shani729 (20)

Último

Último (20)

Python tutorial