3. Bioinformatics.be
• Communiceren van praktische zaken: waar en
wanneer gaan de lessen door
• Ter beschikking stellen van lesmateriaal
• Aanvullend educatief materiaal (FAQ, Web Links)
• Practicum opgaven en programmacode
Voordelen
• Gebruik van het webtechnologie bij het assimileren
van de cursus
• Veel vragen/antwoorden kunnen interessant voor
meerdere mensen, Vermijden van terugkerende
vragen
• Permante discussie (tijdens het jaar) tussen
studenten, prof maar ook thesis en
doctoraatsstudenten
4.
5. Practicum
• Practicum regeling ?
– Inleiding van 45min over de gebruikte editor,
programmeertaal, websites
– 15min toelichting tot de opgaven
– Normaal in PC-zaal D (check bioinformatics.be!)
Perl for Bioinformatics
Part 1: Beginning
Part 2: Mastering
6. Practicum Bioinformatica
• Practicum
– Inleiding tot Perl
– Write your first PERL program !
– Execute your first.pl
7. What is Perl ?
• Perl is a High-level Scripting language
• Larry Wall created Perl in 1987
– Practical Extraction (a)nd Reporting
Language
– (or Pathologically Eclectic Rubbish Lister)
• Born from a system administration tool
• Faster than sh or csh
• Sslower than C
• No need for sed, awk, tr, wc, cut, …
• Perl is open and free
• http://conferences.oreillynet.com/e
urooscon/
8. What is Perl ?
• Perl is available for most computing
platforms: all flavors of UNIX
(Linux), MS-
DOS/Win32, Macintosh, VMS, OS/2, Amig
a, AS/400, Atari
• Perl is a computer language that is:
– Interpreted, compiles at run-time (need for
perl.exe !)
– Loosely “typed”
– String/text oriented
– Capable of using multiple syntax formats
• In Perl, “there‟s more than one way to do it”
9. Why use Perl for bioinformatics ?
• Ease of use by novice programmers
• Flexible language: Fast software prototyping (quick
and dirty creation of small analysis programs)
• Expressiveness. Compact code, Perl Poetry:
@{$_[$#_]||[]}
• Glutility: Read disparate files and parse the relevant
data into a new format
• Powerful pattern matching via “regular expressions”
(Best Regular Expressions on Earth)
• With the advent of the WWW, Perl has become the
language of choice to create Common Gateway
Interface (CGI) scripts to handle form submissions
and create compute severs on the WWW.
• Open Source – Free. Availability of Perl modules
for Bioinformatics and Internet.
10. Why NOT use Perl for bioinformatics ?
• Some tasks are still better done with other
languages (heavy computations / graphics)
– C(++),C#, Fortran, Java (Pascal,Visual Basic)
• With perl you can write simple programs
fast, but on the other hand it is also suitable
for large and complex programs. (yet, it is
not adequate for very large projects)
– Python
• Larry Wall: “For programmers, laziness is
a virtue”
11. What bioinformatics tasks are suited to Perl ?
• Sequence manipulation and analysis
• Parsing results of sequence analysis
programs (Blast, Genscan, Hmmer etc)
• Parsing database (eg Genbank) files
• Obtaining multiple database entries
over the internet
• …
12. Example of problems we will be solving
• Primary Sequence analysis
• Perform alignments
• Simulation experiments to explain
Blast statistics
• Predicting protein topology
• Predicting secondary structures
• “Real-life” problems
– Proteomics: Given aa masses find protein
in database
–…
13. Perl installation
• Perl (op USB):
– Perl is available for various operating systems. To
download Perl and install it on your computer, have a
look at the following resources:
– www.perl.com (O'Reilly).
• Downloading Perl Software
– ActiveState. ActivePerl for Windows, as well as for
Linux and Solaris.
• ActivePerl binary packages.
– CPAN
• http://www.bioinformatics.be/n
ew/faq/setup/
14. Check installation
• Command-line flags for perl
– Perl – v
• Gives the current version of Perl
– Perl –e
• Executes Perl statements from the comment
line.
– Perl –e “print 42;”
– Perl –e “print ”Twonlinesn”;”
– Perl –we
• Executes and print warnings
– Perl –we “print „hello‟;x++;”
15. How to enter your first program ?
• Gebruik een editor
– DOS: EDIT
– Windows:
• NOTEPAD (Let op!)
• Word(Pad) -> TEXT FILE
– Scite:
http://www.scintilla.org/SciTE.html
– Textpad
– Others
• VIM
• Eclipse
16. Brief Introduction to Subdirectories—The Path
Path:
Route followed by OS to
locate, save, and/or
retrieve a file
17. Het absolute pad probleem …
• Probleem
– Ofwel kan je perl starten
– Ofwel kan je het script niet vinden
– Ofwel kan je een file nodig in het script niet
vinden
• Oplossing
– Don‟t panic !
– Gebruikt absolute path-namen
• D:Perlbinperl.exe D:tempTest.pl
– Let wel in je script met je de slash “escape”
• $filename = “d:Temppdb.fasta”
18. • Oplossingen (II)
– Kopieer al de files in dezelfde directory !
– Dus als je perl start vanuit D:Perlbin met perl
kan je wel verwijzen naar D:Temptest.pl maar
dan moet ook de absolute verwijzing gebruikt
worden voor $filename ofwel moet je pdb.fasta
copieren naar D:PerlBin
– Pas het zoekpad aan zodat je perl overal kan
starten
• Path (geeft het zoekpad)
• Set Path (past het pad aan, Voorzichtig !). Gebruik de
dos environment variabele %path% om een directory
toe te voegen
• Set path=%path%;d:Perlbin
• (nadien kan de aanpassing controleren door “path” uit
te voeren)
19. Redirection
Keyboard:
Standard input device
Screen:
Standard output device
Redirection . . .
changes output from monitor to
somewhere else (usually file or
printer).
20. Textpad
Minimal install: via Minerva save file
textpad.be to your folder. Create
system folder in the same location. In
system folder save plumb.exe
(Minerva) and perl syntax files
(textpad.com)
• Syntax Highlighting
– Document Class
• Launch Perl
– Tools
22. General Remarks
• Perl is mostly a free format language: add
spaces, tabs or new lines wherever you
want.
• For clarity, it is recommended to write
each statement in a separate line, and use
indentation in nested structures.
• Comments: Anything from the # sign to
the end of the line is a comment. (There
are no multi-line comments).
• A perl program consists of all of the Perl
statements of the file taken collectively as
one big routine to execute.
23. How does the real perl program look like:
#!/usr/local/bin/perl
Mandatory first line (on UNIX)
print “Hello everyonen”;
How to run it:
1. Save the text of your code as a file -- program.pl
2. Execute it:
perl program.pl
Hello everyone
24. Three Basic Data Types
• Scalars - $
• Arrays of scalars - @
• Associative arrays of
scalers or Hashes - %
25. 2+2 = ?
$ - indicates a variable
$a = 2;
$b = 2;
$c = $a + $b;
- ends every command
;
= - assigns a value to a variable
or $c = 2 + 2;
or $c = 2 * 2;
or $c = 2 / 2;
or $c = 2 ^ 4; 2^4 <-> 24 =16
or $c = 1.35 * 2 - 3 / (0.12 + 1);
26. Ok, $c is 4. How do we know it?
$c = 4;
print “$c”;
print command:
“ ” - bracket output expression
print “Hello n”;
n - print a end-of-the-line character
(equivalent to pressing ‘Enter’)
Strings concatenation:
print “Hello everyonen”;
print “Hello” . ” everyone” . “n”;
Expressions and strings together:
print “2 + 2 = “ . (2+2) . ”n”; 2 + 2 = 4
expression
27. Loops and cycles (for statement):
# Output all the numbers from 1 to 100
for ($n=1; $n<=100; $n+=1) {
print “$n n”;
}
1. Initialization:
for ( $n=1 ; ; ) { … }
2. Increment:
for ( ; ; $n+=1 ) { … }
3. Termination (do until the criteria is satisfied):
for ( ; $n<=100 ; ) { … }
4. Body of the loop - command inside curly brackets:
for ( ; ; ) { … }
28. FOR & IF -- all the even numbers from 1 to 100:
for ($n=1; $n<=100; $n+=1) {
if (($n % 2) == 0) {
print “$n”;
}
}
Note: $a % $b -- Modulus
-- Remainder when $a is divided by $b
29. Two brief diversions (warnings & strict)
• Use warnings
• strict – forces you to „declare‟ a variable the
first time you use it.
– usage: use strict; (somewhere near the top of
your script)
• declare variables with „my‟
– usage: my $variable;
– or: my $variable = „value‟;
• my sets the „scope‟ of the variable. Variable
exists only within the current block of code
• use strict and my both help you to debug
errors, and help prevent mistakes.
30. Unary Arithmetic Operators eg. Autoincrement ++
• If you place one of the auto operators before the variable, it is
known as a pre-incremented (pre-decremented) variable. Its
value will be changed before it is referenced. If it is placed
after the variable, it is known as a post-incremented (post-
decremented) variable and its value is changed after it is used
For example:
• $a = 5; # $a is assigned 5
• $b = ++$a; # $b is assigned the incremented value of $a, 6
• $c = $a--; # $c is assigned 6, then $a is decremented to 5
#!e:perlbinperl.exe
• $getal1 = 5;
• print $getal1."n";
• print $getal1++."n";
• print ++$getal1."n";
31. Logical and Comparison operators
• Equal (True if $a is equal to $b)
– Numeric: ==
– String: eq
• And: &&
• Or: ||
33. Text Processing Functions
The substr function
• Definition
• The substr function extracts a substring out of a
string and returns it. The function receives 3
arguments: a string value, a position on the string
(starting to count from 0) and a length.
Example:
• $a = "university";
• $k = substr ($a, 3, 5);
• $k is now "versi" $a remains unchanged.
• If length is omitted, everything to the end of the
string is returned.
34. Random
#!c:perlbinperl.exe -w
#srand(time|$$);
$x = rand(1);
• srand
– The default seed for srand, which used to be time, has
been changed. Now it's a heady mix of difficult-to-predict
system-dependent values, which should be sufficient for
most everyday purposes. Previous to version
5.004, calling rand without first calling srand would yield
the same sequence of random numbers on most or all
machines. Now, when perl sees that you're calling rand
and haven't yet called srand, it calls srand with the default
seed. You should still call srand manually if your code
might ever be run on a pre-5.004 system, of course, or if
you want a seed other than the default
35. • Oefening hoe goed zijn de random
nummers ?
• Als ze goed zijn kan je er Pi mee
berekenen …
• Een goede random generator is
belangrijk voor goede
randomsequenties die we nadien
kunnen gebruiken in simulaties