2. Homework:
TranslateProtein.py
● Input files are in
/projects/temporary/cees-python-course/Karin
● translationtable.txt - tab separated
● dna31.fsa
● Script should:
● Open the translationtable.txt file and read it into a
dictionary
● Open the dna31.fsa file and read the contents.
● Translates the DNA into protein using the dictionary
● Prints the translation in a fasta format to the file
TranslateProtein.fsa. Each protein line should be 60
characters long.
3. Modularization
● Programs can get big
● Risk of doing the same thing many times
● Functions and modules encourage
● re-usability
● readability
● helps with maintenance
4. Functions
● Most common way to modularize a
program
● Takes values as parameters, executes
code on them, returns results
● Functions also found builtin to Python:
● open(filename, mode)
● sum([list of numbers]
● These do something on their parameters,
and returns the results
5. Functions – how to define
def FunctionName(param1, param2, ...):
""" Optional Function desc (Docstring) """
FUNCTION CODE ...
return DATA
●
keyword: def – says this is a function
● functions need names
● parameters are optional, but common
● docstring useful, but not mandatory
● FUNCTION CODE does something
● keyword return results: return
6. Function example
>>> def hello(name):
... results = "Hello World to " + name + "!"
... return results
...
>>> hello()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: hello() takes exactly 1 argument (0 given)
>>> hello("Lex")
'Hello World to Lex!'
>>>
● Task: make script from this – take name
from command line
● Print results to screen
7. Function example
script
import sys
def hello(name):
results = "Hello World to " + name + "!"
return results
name = sys.argv[1]
functionresult = hello(name)
print functionresult
[karinlag@freebee]% python hello.py
Traceback (most recent call last):
File "hello.py", line 8, in ?
name = sys.argv[1]
IndexError: list index out of range
[karinlag@freebee]% python hello.py Lex
Hello World to Lex!
[karinlag@freebee]%
8. Returning values
● Returning is not mandatory, if no return,
None is returned by default
● Can return more than one value - results
will be shown as a tuple
>>> def test(x, y):
... a = x*y
... return x, a
...
>>> test(1,2)
(1, 2)
>>>
9. Function scope
● Variables defined inside a function can
only be seen there!
● Access the value of variables defined
inside of function: return variable
10. Scope example
>>> def test(x):
... z = 10
... print "the value of z is " + str(z)
... return x*2
...
>>> z = 50
>>> test(3)
the value of z is 10
6
>>> z
50
>>> x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>>
11. Parameters
● Functions can take parameters – not
mandatory
● Parameters follow the order in which they
are given
>>> def test(x, y):
... print x*2
... print y + str(x)
...
>>> test(2, "y")
4
y2
>>> test("y", 2)
yy
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in test
TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>>
12. Named parameters
● Can use named parameters
>>> def test(x, y):
... print x*2
... print y + str(x)
...
>>> test(2, "y")
4
y2
>>> test("y", 2)
yy
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in test
TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>> test(y="y", x=2)
4
y2
>>>
13. Default parameters
● Parameters can be given a default value
● With default, parameter does not have to
be specified, default will be used
● Can still name parameter in parameter list
>>> def hello(name = "Everybody"):
... results = "Hello World to " + name + "!"
... return results
...
>>> hello("Anna")
'Hello World to Anna!'
>>> hello()
'Hello World to Everybody!'
>>> hello(name = "Annette")
'Hello World to Annette!'
>>>
14. Exercise
TranslateProteinFunctions.py
● Use script from homework
● Create the following functions:
● get_translation_table(filename)
– return dict with codons and protein codes
● read_dna_string(filename)
– return tuple with (descr, DNA_string)
● translate_protein(dictionary, DNA_string)
– return the protein version of the DNA string
● pretty_print(descr, protein_string, outname)
– write result to outname in fasta format
20. Modules
● A module is a file with functions, constants
and other code in it
● Module name = filename without .py
● Can be used inside another program
● Needs to be import-ed into program
● Lots of builtin modules: sys, os, os.path....
● Can also create your own
21. Using module
● One of two import statements:
1: import modulename
2: from module import function/constant
● If method 1:
● modulename.function(arguments)
● If method 2:
● function(arguments) – module name not
needed
● beware of function name collision
22. Operating system modules –
os and os.path
● Modules dealing with files and operating
system interaction
● Commonly used methods:
● os.getcwd() - get working directory
● os.chdir(path) – change working directory
● os.listdir([dir = .]) - get a list of all files in this
directory
● os.mkdir(path) – create directory
● os.path.join(dirname, dirname/filename...)
23. Your own modules
● Three steps:
1. Create file with functions in it. Module
name is same as filename without .py
2. In other script, do
import modulename
3. In other script, use function like this:
modulename.functionname(args)
24. Separating module use and
main use
● Files containing python code can be:
● script file
● module file
● Module functions can be used in scripts
● But: modules can also be scripts
● Question is – how do you know if the code
is being executed in the module script or
an external script?
25. Module use / main use
● When a script is being run, within that
script a variable called __name__ will be
set to the string “__main__”
● Can test on this string to see if this script is
being run
● Benefit: can define functions in script that
can be used in module mode later
26. Module mode / main mode
import sys
<code as before>
When this script is being used,
translationtable = sys.argv[1] this will always run, no matter what!
fastafile = sys.argv[2]
outfile = sys.argv[3]
translation_dict = get_translation_table(translationtable)
description, DNA_string = read_dna_string(fastafile)
protein_string = translate_protein(translation_dict, DNA_string)
pretty_print(description, protein_string, outfile)
27. Module use / main use
# this is a script
import sys
import TranslateProteinFunctions
description, DNA_string = read_dna_string(sys.argv[1])
print description
[karinlag@freebee]% python modtest.py dna31.fsa
Traceback (most recent call last):
File "modtest.py", line 2, in ?
import TranslateProteinFunctions
File "TranslateProteinFunctions.py", line 44, in ?
fastafile = sys.argv[2]
IndexError: list index out of range
[karinlag@freebee]Karin%
28. TranslateProteinFuctions.py
with main
import sys
<code as before>
if __name__ == “__main__”:
translationtable = sys.argv[1]
fastafile = sys.argv[2]
outfile = sys.argv[3]
translation_dict = get_translation_table(translationtable)
description, DNA_string = read_dna_string(fastafile)
protein_string = translate_protein(translation_dict, DNA_string)
pretty_print(description, protein_string, outfile)
29. ConcatFasta.py
● Create a script that has the following:
● function get_fastafiles(dirname)
– gets all the files in the directory, checks if they are
fasta files (end in .fsa), returns list of fasta files
– hint: you need os.path to create full relative file
names
● function concat_fastafiles(filelist, outfile)
– takes a list of fasta files, opens and reads each of
them, writes them to outfile
● if __name__ == “__main__”:
– do what needs to be done to run script
● Remember imports!