SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
File handling
     Karin Lagesen

karin.lagesen@bio.uio.no
Homework
●   ATCurve.py
      ●   take an input string from the user
      ●   check if the sequence only contains DNA – if
          not, prompt for new sequence.
      ●   calculate a running average of AT content
          along the sequence. Window size should be
          3, and the step size should be 1. Print one
          value per line.
●   Note: you need to include several runtime
    examples to show that all parts of the code
    works.
ATCurve.py - thinking
●   Take input from user:
     ●   raw_input
●   Check for the presence of !ATCG
     ●   use sets – very easy
●   Calculate AT – window = 3, step = 1
     ●   iterate over string in slices of three
ATCurve.py
# variable valid is used to see if the string is ok or not.
valid = False
while not valid:
   # promt user for input using raw_input() and store in string,
   # convert all characters into uppercase
   test_string = raw_input("Enter string: ")
   upper_string = test_string.upper()

  # Figure out if anything else than ATGCs are present
  dnaset = set(list("ATGC"))
  upper_string_set = set(list(upper_string))

  if len(upper_string_set - dnaset) > 0:
         print "Non-DNA present in your string, try again"
  else:
         valid = True



if valid:
    for i in range(0, len(upper_string)-3, 1):
       at_sum = 0.0
        at_sum += upper_string.count("A",i,i+2)
        at_sum += upper_string.count("T",i,i+2)
Homework
●   CodonFrequency.py
     ●   take an input string from the user
     ●   if the sequence only contains DNA
           –   find a start codon in your string
           –   if startcodon is present
                  ●   count the occurrences of each three-mer from start
                      codon and onwards
                  ●   print the results
CodonFrequency.py - thinking
●   First part – same as earlier
●   Find start codon: locate index of AUG
      ●   Note, can simplify and find ATG
●   If start codon is found:
      ●   create dictionary
      ●   for slice of three in input[StartCodon:]:
            –   get codon
            –   if codon is in dict:
                    ●   add to count
            –   if not:
                    ●   create key-value pair in dict
CodonFrequency.py
input = raw_input("Type a piece of DNA here: ")

if len(set(input) - set(list("ATGC"))) > 0:
    print "Not a valid DNA sequence"
else:
    atg = input.find("ATG")
    if atg == -1:
        print "Start codon not found"
    else:
        codondict = {}
        for i in xrange(atg,len(input)-3,3):
           codon = input[i:i+3]
           if codon not in codondict:
               codondict[codon] = 1
           else:
               codondict[codon] +=1

     for codon in codondict:
        print codon, codondict[codon]
CodonFrequency.py w/
     stopcodon
input = raw_input("Type a piece of DNA here: ")

if len(set(input) - set(list("ATGC"))) > 0:
    print "Not a valid DNA sequence"
else:
    atg = input.find("ATG")
    if atg == -1:
        print "Start codon not found"
    else:
        codondict = {}
        for i in xrange(atg,len(input) -3,3):
           codon = input[i:i+3]
           if codon in ['UAG', 'UAA', 'UAG']:
               break
           elif codon not in codondict:
               codondict[codon] = 1
           else:
               codondict[codon] +=1

     for codon in codondict:
        print codon, codondict[codon]
Results

[karinlag@freebee]/projects/temporary/cees-python-course/Karin% python CodonFrequency2.py
Type a piece of DNA here: ATGATTATTTAAATG
ATG 1
ATT 2
TAA 1
[karinlag@freebee]/projects/temporary/cees-python-course/Karin% python CodonFrequency2.py
Type a piece of DNA here: ATGATTATTTAAATGT
ATG 2
ATT 2
TAA 1
[karinlag@freebee]/projects/temporary/cees-python-course/Karin%
Working with files
●   Reading – get info into your program
●   Parsing – processing file contents
●   Writing – get info out of your program
Reading and writing
●   Three-step process
     ●   Open file
           –   create file handle – reference to file
     ●   Read or write to file
     ●   Close file
           –   will be automatically close on program end, but
               bad form to not close
Opening files
●   Opening modes:
     ●   “r” - read file
     ●   “w” - write file
     ●   “a” - append to end of file
●   fh = open(“filename”, “mode”)
●   fh = filehandle, reference to a file, NOT the
    file itself
Reading a file
●   Three ways to read
     ●   read([n]) - n = bytes to read, default is all
     ●   readline() - read one line, incl. newline
     ●   readlines() - read file into a list, one element
         per line, including newline
Reading example
●   Log on to freebee, and go to your area
●   do cp ../Karin/fastafile.fsa .
●   open python
       >>> fh = open("fastafile.fsa", "r")
       >>> fh



●   Q: what does the response mean?
Read example
●   Use all three methods to read the file. Print
    the results.
     ●   read
     ●   readlines
     ●   readline
●   Q: what happens after you have read the
    file?
●   Q: What is the difference between the
    three?
Read example
>>> fh = open("fastafile.fsa", "r")
>>> withread = fh.read()
>>> withread
'>This is the description linenATGCGCTTAGGATCGATAGCGATTTAGAnTTAGCGGAn'
>>> withreadlines = fh.readlines()
>>> withreadlines
[]
>>> fh = open("fastafile.fsa", "r")
>>> withreadlines = fh.readlines()
>>> withreadlines
['>This is the description linen', 'ATGCGCTTAGGATCGATAGCGATTTAGAn', 'TTAGCGGAn']
>>> fh = open("fastafile.fsa", "r")
>>> withreadline = fh.readline()
>>> withreadline
'>This is the description linen'
>>>
Parsing
●   Getting information out of a file
●   Commonly used string methods
      ●   split([character]) – default is whitespace
      ●   replace(“in string”, “put into instead”)
      ●   “string character”.join(list)
            –   joins all elements in the list with string
                character as a separator
            –   common construction: ''.join(list)
      ●   slicing
Type conversions
●   Everything that comes on the command
    line or from a file is a string
●   Conversions:
     ●   int(X)
           –   string cannot have decimals
           –   floats will be floored
     ●   float(X)
     ●   str(X)
Parsing example
●   Continue using fastafile.fsa
●   Print only the description line to screen
●   Print the whole DNA string
    >>> fh = open("fastafile.fsa", "r")
    >>> firstline = fh.readline()
    >>> print firstline[1:-1]
    This is the description line
    >>> sequence = ''
    >>> for line in fh:
    ... sequence += line.replace("n", "")
    ...
    >>> print sequence
    ATGCGCTTAGGATCGATAGCGATTTAGA
    >>>
Accepting input from
             command line
●   Need to be able to specify file name on
    command line
●   Command line parameters stored in list
    called sys.argv – program name is 0
●   Usage:
      ●   python pythonscript.py arg1 arg2 arg3....
●   In script:
      ●   at the top of the file, write import sys
      ●
          arg1 = sys.argv[1]
Batch example
●   Read fastafile.fsa with all three methods
●   Per method, print method, name and
    sequence
●   Remember to close the file at the end!
Batch example
import sys
filename = sys.argv[1]
#using readline
fh = open(filename, "r")
firstline = fh.readline()
name = firstline[1:-1]
sequence =''
for line in fh:
    sequence += line.replace("n", "")
print "Readline", name, sequence

#using readlines()
fh = open(filename, "r")
inputlines = fh.readlines()
name = inputlines[0][1:-1]
sequence = ''
for line in inputlines[1:]:
   sequence += line.replace("n", "")
print "Readlines", name, sequence

#using read
fh = open(filename, "r")
inputlines = fh.read()
name = inputlines.split("n")[0][1:-1]
sequence = "".join(inputlines.split("n")[1:])
print "Read", name, sequence

fh.close()
Classroom exercise
●   Modify ATCurve.py script so that it accepts
    the following input on the command line:
      ●   fasta filename
      ●   window size
●   Let the user input an alternate filename if it
    contains !ATGC
●   Print results to screen
ATCurve2.py
import sys
# Define filename
filename = sys.argv[1]
windowsize = int(sys.argv[2])

# variable valid is used to see if the string is ok or not.
valid = False
while not valid:
   fh = open(filename, "r")
   inputlines = fh.readlines()
   name = inputlines[0][1:-1]
   sequence = ''
   for line in inputlines[1:]:
          sequence += line.replace("n", "")
   upper_string = sequence.upper()

  # Figure out if anything else than ATGCs are present
  dnaset = set(list("ATGC"))
  upper_string_set = set(list(upper_string))

  if len(upper_string_set - dnaset) > 0:
        print "Non-DNA present in your file, try again"
        filename = raw_input("Type in filename: ")
  else:
        valid = True

if valid:
    for i in range(0, len(upper_string)-windowsize + 1, 1):
       at_sum = 0.0
       at_sum += upper_string.count("A",i,i+windowsize)
       at_sum += upper_string.count("T",i,i+windowsize)
       print i + 1, at_sum/windowsize
Writing to files
●   Similar procedure as for read
     ●   Open file, mode is “w” or “a”
     ●   fh.write(string)
           –   Note: one single string
           –   No newlines are added
     ●   fh.close()
ATContent3.py
●   Modify previous script so that you have the
    following on the command line
     ●   fasta filename for input file
     ●   window size
     ●   output file
●   Output should be on the format
     ●   number, AT content
     ●   number is the 1-based position of the first
         nucleotide in the window
ATCurve3.py

 import sys
 # Define filename
 filename = sys.argv[1]
 windowsize = int(sys.argv[2])
 outputfile = sys.argv[3]



if valid:
    fh = open(outputfile, "w")
    for i in range(0, len(upper_string)-windowsize + 1, 1):
       at_sum = 0.0
       at_sum += upper_string.count("A",i,i+windowsize)
       at_sum += upper_string.count("T",i,i+windowsize)
       fh.write(str(i + 1) + " " + str(at_sum/windowsize) + "n")
    fh.close()
Homework:
            TranslateProtein.py
●   Input files are in
    /projects/temporary/cees-python-course/Karin
      ●   translationtable.txt - tab separated
      ●   dna31.fsa
●   Script should:
      ●   Open the translationtable.txt file and read it into a
          dictionary
      ●   Open the dna31.fsa file and read the contents.
      ●   Translates the DNA into protein using the dictionary
      ●   Prints the translation in a fasta format to the file
          TranslateProtein.fsa. Each protein line should be 60
          characters long.

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Functions in python
Functions in pythonFunctions in python
Functions in python
 
Programming in Python
Programming in Python Programming in Python
Programming in Python
 
python codes
python codespython codes
python codes
 
Python
PythonPython
Python
 
Python programming
Python  programmingPython  programming
Python programming
 
Biopython
BiopythonBiopython
Biopython
 
GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python Programming
 
Python ppt
Python pptPython ppt
Python ppt
 
Python language data types
Python language data typesPython language data types
Python language data types
 
Python programming Workshop SITTTR - Kalamassery
Python programming Workshop SITTTR - KalamasseryPython programming Workshop SITTTR - Kalamassery
Python programming Workshop SITTTR - Kalamassery
 
Intro to Functions Python
Intro to Functions PythonIntro to Functions Python
Intro to Functions Python
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
 
Python basic
Python basicPython basic
Python basic
 
Python for Beginners(v1)
Python for Beginners(v1)Python for Beginners(v1)
Python for Beginners(v1)
 
Pythonppt28 11-18
Pythonppt28 11-18Pythonppt28 11-18
Pythonppt28 11-18
 
Python basics
Python basicsPython basics
Python basics
 
4 b file-io-if-then-else
4 b file-io-if-then-else4 b file-io-if-then-else
4 b file-io-if-then-else
 
Python Basics
Python BasicsPython Basics
Python Basics
 
Python programing
Python programingPython programing
Python programing
 
Iteration
IterationIteration
Iteration
 

Destacado

Organización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FP
Organización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FPOrganización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FP
Organización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FPMónica Diz Besada
 
Sesión informativa 1º PCPI 2014
Sesión informativa 1º PCPI 2014Sesión informativa 1º PCPI 2014
Sesión informativa 1º PCPI 2014Mónica Diz Besada
 

Destacado (6)

Charla orientación 4º eso
Charla orientación 4º esoCharla orientación 4º eso
Charla orientación 4º eso
 
Organización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FP
Organización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FPOrganización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FP
Organización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FP
 
Sesión informativa 1º PCPI 2014
Sesión informativa 1º PCPI 2014Sesión informativa 1º PCPI 2014
Sesión informativa 1º PCPI 2014
 
Charla orientación 4º eso
Charla orientación 4º esoCharla orientación 4º eso
Charla orientación 4º eso
 
Presentation1
Presentation1Presentation1
Presentation1
 
2015 12-09 nmdd
2015 12-09 nmdd2015 12-09 nmdd
2015 12-09 nmdd
 

Similar a Day3

Similar a Day3 (20)

iPython
iPythoniPython
iPython
 
L8 file
L8 fileL8 file
L8 file
 
Introduction To Programming with Python
Introduction To Programming with PythonIntroduction To Programming with Python
Introduction To Programming with Python
 
Productive bash
Productive bashProductive bash
Productive bash
 
Five
FiveFive
Five
 
Python 101
Python 101Python 101
Python 101
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
File management
File managementFile management
File management
 
System Calls.pptxnsjsnssbhsbbebdbdbshshsbshsbbs
System Calls.pptxnsjsnssbhsbbebdbdbshshsbshsbbsSystem Calls.pptxnsjsnssbhsbbebdbdbshshsbshsbbs
System Calls.pptxnsjsnssbhsbbebdbdbshshsbshsbbs
 
file.ppt
file.pptfile.ppt
file.ppt
 
shellScriptAlt.pptx
shellScriptAlt.pptxshellScriptAlt.pptx
shellScriptAlt.pptx
 
Python overview
Python   overviewPython   overview
Python overview
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
Introduction to Python for Bioinformatics
Introduction to Python for BioinformaticsIntroduction to Python for Bioinformatics
Introduction to Python for Bioinformatics
 
Program 1 (Practicing an example of function using call by referenc.pdf
Program 1 (Practicing an example of function using call by referenc.pdfProgram 1 (Practicing an example of function using call by referenc.pdf
Program 1 (Practicing an example of function using call by referenc.pdf
 

Último

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 

Último (20)

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 

Day3

  • 1. File handling Karin Lagesen karin.lagesen@bio.uio.no
  • 2. Homework ● ATCurve.py ● take an input string from the user ● check if the sequence only contains DNA – if not, prompt for new sequence. ● calculate a running average of AT content along the sequence. Window size should be 3, and the step size should be 1. Print one value per line. ● Note: you need to include several runtime examples to show that all parts of the code works.
  • 3. ATCurve.py - thinking ● Take input from user: ● raw_input ● Check for the presence of !ATCG ● use sets – very easy ● Calculate AT – window = 3, step = 1 ● iterate over string in slices of three
  • 4. ATCurve.py # variable valid is used to see if the string is ok or not. valid = False while not valid: # promt user for input using raw_input() and store in string, # convert all characters into uppercase test_string = raw_input("Enter string: ") upper_string = test_string.upper() # Figure out if anything else than ATGCs are present dnaset = set(list("ATGC")) upper_string_set = set(list(upper_string)) if len(upper_string_set - dnaset) > 0: print "Non-DNA present in your string, try again" else: valid = True if valid: for i in range(0, len(upper_string)-3, 1): at_sum = 0.0 at_sum += upper_string.count("A",i,i+2) at_sum += upper_string.count("T",i,i+2)
  • 5. Homework ● CodonFrequency.py ● take an input string from the user ● if the sequence only contains DNA – find a start codon in your string – if startcodon is present ● count the occurrences of each three-mer from start codon and onwards ● print the results
  • 6. CodonFrequency.py - thinking ● First part – same as earlier ● Find start codon: locate index of AUG ● Note, can simplify and find ATG ● If start codon is found: ● create dictionary ● for slice of three in input[StartCodon:]: – get codon – if codon is in dict: ● add to count – if not: ● create key-value pair in dict
  • 7. CodonFrequency.py input = raw_input("Type a piece of DNA here: ") if len(set(input) - set(list("ATGC"))) > 0: print "Not a valid DNA sequence" else: atg = input.find("ATG") if atg == -1: print "Start codon not found" else: codondict = {} for i in xrange(atg,len(input)-3,3): codon = input[i:i+3] if codon not in codondict: codondict[codon] = 1 else: codondict[codon] +=1 for codon in codondict: print codon, codondict[codon]
  • 8. CodonFrequency.py w/ stopcodon input = raw_input("Type a piece of DNA here: ") if len(set(input) - set(list("ATGC"))) > 0: print "Not a valid DNA sequence" else: atg = input.find("ATG") if atg == -1: print "Start codon not found" else: codondict = {} for i in xrange(atg,len(input) -3,3): codon = input[i:i+3] if codon in ['UAG', 'UAA', 'UAG']: break elif codon not in codondict: codondict[codon] = 1 else: codondict[codon] +=1 for codon in codondict: print codon, codondict[codon]
  • 9. Results [karinlag@freebee]/projects/temporary/cees-python-course/Karin% python CodonFrequency2.py Type a piece of DNA here: ATGATTATTTAAATG ATG 1 ATT 2 TAA 1 [karinlag@freebee]/projects/temporary/cees-python-course/Karin% python CodonFrequency2.py Type a piece of DNA here: ATGATTATTTAAATGT ATG 2 ATT 2 TAA 1 [karinlag@freebee]/projects/temporary/cees-python-course/Karin%
  • 10. Working with files ● Reading – get info into your program ● Parsing – processing file contents ● Writing – get info out of your program
  • 11. Reading and writing ● Three-step process ● Open file – create file handle – reference to file ● Read or write to file ● Close file – will be automatically close on program end, but bad form to not close
  • 12. Opening files ● Opening modes: ● “r” - read file ● “w” - write file ● “a” - append to end of file ● fh = open(“filename”, “mode”) ● fh = filehandle, reference to a file, NOT the file itself
  • 13. Reading a file ● Three ways to read ● read([n]) - n = bytes to read, default is all ● readline() - read one line, incl. newline ● readlines() - read file into a list, one element per line, including newline
  • 14. Reading example ● Log on to freebee, and go to your area ● do cp ../Karin/fastafile.fsa . ● open python >>> fh = open("fastafile.fsa", "r") >>> fh ● Q: what does the response mean?
  • 15. Read example ● Use all three methods to read the file. Print the results. ● read ● readlines ● readline ● Q: what happens after you have read the file? ● Q: What is the difference between the three?
  • 16. Read example >>> fh = open("fastafile.fsa", "r") >>> withread = fh.read() >>> withread '>This is the description linenATGCGCTTAGGATCGATAGCGATTTAGAnTTAGCGGAn' >>> withreadlines = fh.readlines() >>> withreadlines [] >>> fh = open("fastafile.fsa", "r") >>> withreadlines = fh.readlines() >>> withreadlines ['>This is the description linen', 'ATGCGCTTAGGATCGATAGCGATTTAGAn', 'TTAGCGGAn'] >>> fh = open("fastafile.fsa", "r") >>> withreadline = fh.readline() >>> withreadline '>This is the description linen' >>>
  • 17. Parsing ● Getting information out of a file ● Commonly used string methods ● split([character]) – default is whitespace ● replace(“in string”, “put into instead”) ● “string character”.join(list) – joins all elements in the list with string character as a separator – common construction: ''.join(list) ● slicing
  • 18. Type conversions ● Everything that comes on the command line or from a file is a string ● Conversions: ● int(X) – string cannot have decimals – floats will be floored ● float(X) ● str(X)
  • 19. Parsing example ● Continue using fastafile.fsa ● Print only the description line to screen ● Print the whole DNA string >>> fh = open("fastafile.fsa", "r") >>> firstline = fh.readline() >>> print firstline[1:-1] This is the description line >>> sequence = '' >>> for line in fh: ... sequence += line.replace("n", "") ... >>> print sequence ATGCGCTTAGGATCGATAGCGATTTAGA >>>
  • 20. Accepting input from command line ● Need to be able to specify file name on command line ● Command line parameters stored in list called sys.argv – program name is 0 ● Usage: ● python pythonscript.py arg1 arg2 arg3.... ● In script: ● at the top of the file, write import sys ● arg1 = sys.argv[1]
  • 21. Batch example ● Read fastafile.fsa with all three methods ● Per method, print method, name and sequence ● Remember to close the file at the end!
  • 22. Batch example import sys filename = sys.argv[1] #using readline fh = open(filename, "r") firstline = fh.readline() name = firstline[1:-1] sequence ='' for line in fh: sequence += line.replace("n", "") print "Readline", name, sequence #using readlines() fh = open(filename, "r") inputlines = fh.readlines() name = inputlines[0][1:-1] sequence = '' for line in inputlines[1:]: sequence += line.replace("n", "") print "Readlines", name, sequence #using read fh = open(filename, "r") inputlines = fh.read() name = inputlines.split("n")[0][1:-1] sequence = "".join(inputlines.split("n")[1:]) print "Read", name, sequence fh.close()
  • 23. Classroom exercise ● Modify ATCurve.py script so that it accepts the following input on the command line: ● fasta filename ● window size ● Let the user input an alternate filename if it contains !ATGC ● Print results to screen
  • 24. ATCurve2.py import sys # Define filename filename = sys.argv[1] windowsize = int(sys.argv[2]) # variable valid is used to see if the string is ok or not. valid = False while not valid: fh = open(filename, "r") inputlines = fh.readlines() name = inputlines[0][1:-1] sequence = '' for line in inputlines[1:]: sequence += line.replace("n", "") upper_string = sequence.upper() # Figure out if anything else than ATGCs are present dnaset = set(list("ATGC")) upper_string_set = set(list(upper_string)) if len(upper_string_set - dnaset) > 0: print "Non-DNA present in your file, try again" filename = raw_input("Type in filename: ") else: valid = True if valid: for i in range(0, len(upper_string)-windowsize + 1, 1): at_sum = 0.0 at_sum += upper_string.count("A",i,i+windowsize) at_sum += upper_string.count("T",i,i+windowsize) print i + 1, at_sum/windowsize
  • 25. Writing to files ● Similar procedure as for read ● Open file, mode is “w” or “a” ● fh.write(string) – Note: one single string – No newlines are added ● fh.close()
  • 26. ATContent3.py ● Modify previous script so that you have the following on the command line ● fasta filename for input file ● window size ● output file ● Output should be on the format ● number, AT content ● number is the 1-based position of the first nucleotide in the window
  • 27. ATCurve3.py import sys # Define filename filename = sys.argv[1] windowsize = int(sys.argv[2]) outputfile = sys.argv[3] if valid: fh = open(outputfile, "w") for i in range(0, len(upper_string)-windowsize + 1, 1): at_sum = 0.0 at_sum += upper_string.count("A",i,i+windowsize) at_sum += upper_string.count("T",i,i+windowsize) fh.write(str(i + 1) + " " + str(at_sum/windowsize) + "n") fh.close()
  • 28. Homework: TranslateProtein.py ● Input files are in /projects/temporary/cees-python-course/Karin ● translationtable.txt - tab separated ● dna31.fsa ● Script should: ● Open the translationtable.txt file and read it into a dictionary ● Open the dna31.fsa file and read the contents. ● Translates the DNA into protein using the dictionary ● Prints the translation in a fasta format to the file TranslateProtein.fsa. Each protein line should be 60 characters long.