These are the lecture slides for the BITS training session "Introduction to programming in Bioperl".
See for more material: http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203793:bioperl-additional-material&catid=84&Itemid=610
2. Perl “A script is what you give an actor, but a program is what you give an audience.”
3. Goals Perl Positioning System: find your way in the Perl World Write Once, Use Many Times Object Oriented Perl Consumer Developer Thou shalt not be afraid of the Bioperl Beast
4.
5. Agenda Day1 Perl refresher Scalars Arrays and lists Hashes Subroutines and functions Perldoc Creating and running a Perl script References and advanced data structures Packages and modules Objects, (multiple) inheritance, polymorphism
6. Agenda Day 2 What is bioperl ? Taming the Bioperl Beast Finding modules Finding methods Data::Dumper Sequence processing One image says more than 1000 words
7. Variables Data of any type may be stored within three basic types of variables: Scalar (strings, numbers, references) Array (aka list but not quite the same) Hash (aka associative array) Variable names are always preceded by a “dereferencing symbol” or prefix. If needed: {} $ - Scalar variables @ - List variables % - Associative array aka hash variables
8. Variables You do NOT have to Declare the variable before using it Define the variable’s data type Allocate memory for new data values
9. Scalar variables Scalar variable stores a string, a number, a character, a reference, undef $name, ${name}, ${‘name’} More magic: $_
10. Array variables Array variable stores a list of scalars @name, @{name}, @{‘name’} Index Map: index => scalar value zero-indexed (distance from start)
12. Array variables Access multiple values via array slice: Assign multiple values via array slice: print @array[3,2,4,1,0,-1]; @array[3,2,4,1,0,-1] = @new_values;
13. Lists List = temporary sequence of comma separated values usually in () or result of qw operator Array = container for a list Use: Array initialization Extract values from array my @array = qw/blood sweat tears/; my ($var1, $var2[42], $var3, @var4) = @args; my ($var5, $var6) = @args;
14.
15. Hash variables Hash variables are denoted by the % dereferencing symbol. Hash variables is a list of key-value pairs Both keys and values must be scalar Notice the ‘=>’ aka ‘quotifying comma’ my %fruit_color = ("apple", "red", "banana", "yellow"); my %fruit_color = ( apple => "red", banana => "yellow", );
17. Non data types Filehandle There are several predefined filehandles, including STDIN, STDOUT, STDERR and DATA (default opened). No prefix Code value aka subroutine Dereferencing symbol “&”
18.
19. Subroutines We can reuse a segment of Perl code by placing it within a subroutine. The subroutine is defined using the sub keyword and a name (= variable name !!). The subroutine body is defined by placing code statements within the {} code block symbols. sub MySubroutine{ #Perl code goes here. my @args = @_; }
20. Subroutines To call a subroutine, prepend the name with the & symbol: &MySubroutine; # w/o arguments Or: MySubroutine(); # with or w/o arguments
21. Subroutines Arguments in underscore array variable (@_) List flattening !! my @results = MySubroutine(@arg1, ‘arg2’, (‘arg3’, ‘arg4’)); sub MySubroutine{ #Perl code goes here. my ($thingy, @args) = @_; }
22. Subroutines Return value Nothing Scalar value List value Return value Explicit with return function Implicit: value of the last statement sub MySubroutine{ #Perl code goes here. my ($thingy, @args) = @_; do_something(@args); }
23. Subroutines Calling contexts Void Scalar List wantarray function Void => undef Scalar => 0 List => 1 getFiles($dir); my $num = getFiles($dir); my @files = getFiles($dir);
24. Functions and operators Built-in routines Function Arguments at right hand side Sensible name (defined, open, print, ...)
25. Functions Perl provides a rich set of built-in functions to help you perform common tasks. Several categories of useful built-in function include Arithmetic functions (sqrt, sin, … ) List functions (push, chomp, … ) String functions (length, substr, … ) Existance functions (defined, undef)
26. Array functions Array as queue: push/shift (FIFO) Array as stack: push/pop (LIFO) @row1 push shift 1 2 3 unshift pop
27. List functions chomp: remove newline from every element in the list map: kind of loop without escape, every element ($_) is ‘processed’ grep: kind of filter sort join
28. Hash functions keys: returns the hash keys in random order values: returns values of the hash in random order but same order as keys function call each: returns (key, value) pairs delete: remove a particular key (and associated value) from a hash
29. Operators Operator Complex and subtle (=,<>, <=>, ?:, ->,=>,...) Symbolic name (+,<,>,&,!, ...)
40. References(and referents) A reference is a special scalar value which “refers to” or “points to” any value. A variable name is one kind of reference that you are already familiar with. It’s a given name. Reference is a kind of private, internal, computer generated name A referent is the value that the reference is pointing to
41. Creating References Method 1: references to variables are created by using the backslash( operator. $name = ‘bioperl’; $reference = name; $array_reference = array_name; $hash_reference = hash_name; $subroutine_ref = amp;sub_name;
42. Creating References Method 2: [ ITEMS ] makes a new, anonymous array and returns a reference to that array. { ITEMS } makes a new, anonymous hash, and returns a reference to that hash my $array_ref = [ 1, ‘foo’, undef, 13 ]; my $hash_ref = {one => 1, two => 2};
43. Dereferencing a Reference Use the appropriate dereferencing symbol Scalar: $ Array: @ Hash: % Subroutine: &
44. Dereferencing a Reference Remember $name, ${‘name’} ? Means: give me the scalar value where the variable ‘name’ is pointing to. A reference $reference ìs a name, so $$reference, ${$reference} Means: give me the scalar value where the reference $reference is pointing to
45. Dereferencing a Reference The arrow operator: -> Arrays and hashes Subroutines my $array_ref = [ 1, ‘foo’, undef, 13 ]; my $hash_ref = {one => 1, two => 2}; ${$array_ref}[1] = ${$hash_ref}{‘two’} # can be written as: $array_ref->[1] = $hash_ref->{two} &{$sub_ref}($arg1,$arg2) # can be written as: $sub_ref->($arg1, $arg2)
47. References Why do we need references ??? Create complex data structures !! Arrays and hashes can only store scalar values Pass arrays, hashes, subroutines, ... as arguments to subroutines and functions !! List flattening
48. Complex data structures Remind: Reference is a scalar value Arrays and hashes are sets of scalar values In one go: my $array_ref = [ 1, 2, 3 ]; my $hash_ref = {one => 1, two => 2}; my %data = ( arrayref => $array_ref, hash_ref => $hash_ref); my %data = ( arrayref => [ 1, 2, 3 ], hash_ref => {one => 1, two => 2} );
49. Complex data structures Individual access my %data = ( arrayref => [ 1, 2, 3 ], hash_ref => {one => 1, two => [‘a’,’b’]}); How to access this value ? my $wanted_value = $data{hash_ref}->{two}->[1];
50. Complex data structures my @row1 = (1..3); my @row2 = (2,4,6); my @row3 = (3,6,9); my @rows = (row1,row2,row3); my $table = rows; @row1 $table 1 2 3 @rows @row2 2 4 6 @row3 3 6 9
53. Packages and modules 2 types of variables: Global aka package variables Lexical variables
54. Packages and modules Global / package variables Visible everywhere in every program You get the if you don’t say otherwise !! Autovivification Name has 2 parts: family name + given name Default family name is ‘main’. $John is actually $main::John $Cleese::John has nothing to do with $Wayne::John Family name = package name $var1 = 42; print “$var1, “, ++$var2; # results in: 42, 1
55.
56. Packages and modules Lexical / private variables Explicitely declared as Only visible within the boundaries of a code block or file. They cease to exist as soon as the program leaves the code block or the program ends The do not have a family name aka they do not belong to a package ALWAYS USE LEXICAL VARIABLES (except for subroutines ...) my $var1 = 42; #!/usr/bin/perl use strict; my $var1 = 42;
57. Packages Wikipedia: Family where the (global!) variables (incl. subroutines) live (remember $John) In general, a namespace is a container that provides context for the identifiers (variable names) it holds, and allows the disambiguation of homonym identifiers residing in different namespaces.
58. Packages Family has a: name, defined via package declaration House, block or blocks of code that follow the package declaration package Bio::SeqIO::genbank; # welcome to the Bio::SeqIO::genbank family sub write_seq{} package Bio::SeqIO::fasta; # welcome to the Bio::SeqIO::fasta family sub write_seq{}
59. Packages Why do we need packages ??? To organize code To improve maintainability To avoid name space collisions
60. Modules What ? A text file(with a .pm suffix) containing Perl source code, that can contain any number of namespaces. It must evaluate to a true value. Loading At compile time: use <module> At run time: require <expr> <expr> and <module>:compiler translates each double-colon '::' into a path separator and appends '.pm'. E.g. Data::Dumper yields Data/Dumper.pm use Data::Dumper; require Data::Dumper; require ‘my_file.pl’; require $class;
61. Modules A module can contain multiple packages, but convention dictates that each module contains a package of the same name. easy to quickly locate the code in any given package (perldoc –m <module>) not obligatory !! A module name is unique 1 to 1 mapping to file system !! Should start with capital letter
62. Module files Module files are stored in a subdirectory hierarchy that parallels the module name hierarchy. All module files must have an extension of .pm.
63. Modules Module path is relative. So, where is Perl searching for that module ? Possible modules roots @INC []$ perldoc –V … @INC: /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .
64. Modules Alternative module roots (perldoc -q library) In script Command line Environment use lib ‘/my/alternative/module/path’; []$ perl -I/my/alternative/module/path script.pl export PERL5LIB=$PERL5LIB:/my/alternative/module/path
65. Modules Test/Speak.pm Test.pl package My::Package::Says::Hello; sub speak { print __PACKAGE__, " says: 'Hello'"; } package My::Package::Says::Blah; sub speak { print __PACKAGE__, " says: 'Blah'"; } 1; #!/usr/bin/perl use strict; use Test::Speak; My::Package::Says::Hello>speak; My::Package::Says::Blah->speak;
66. Modules Why do we need modules??? To organize packages into files/folders Code reuse (ne copy & paste !) Module repository: CPAN http://search.cpan.org https://metacpan.org/ Pragma Special module that influences the code (compilation) Lowercase Lexically scoped
67. Modules Module information In standard distribution: perldoc perlmodlib Manually installed: perldoc perllocal All modules: perldoc –q installed Documentation: perldoc <module name> Location: perldoc –l <module name> Source: perldoc –m <module name>
68. Packages and Modules - Summary A package is a separate namespace within Perl code. A module can have more than one package defined within it. The default package is main. We can get to variables (and subroutines) within packages by using the fully qualified name To write a package, just write package <package name> where you want the package to start. Package declarations last until the end of the enclosing block, file or until the next package statement The require and use keywords can be used to import the contents of other files for use in a program. Files which are included must end with a true value. Perl looks for modules in a list of directories stored in @INC Module names map to the file system
69.
70. Exercises Bioperl Training Exercise 1: perldoc Bioperl Training Exercise 2: thou shalt not forget Bioperl Training Exercise 3: arrays Bioperl Training Exercise 4: hashes Bioperl Training Exercise 5: packages and modules 1 Bioperl Training Exercise 6: packages and modules 2 Bioperl Training Exercise 7: complex data structures
73. Object Oriented Programming in Perl What is an object ? An object is a (complex) data structure representing a new, user defined type with a collection of behaviors (functions aka methods) Collection of attributes Developer’s perspective: 3 little make rules To create a class, build a package To create a method, write a subroutine To create an object, bless a referent
74. Rule 1: To create a class, build a package Defining a class A class is simply a package with subroutines that function as methods. Class name = type = label = namespace package Cat; 1;
75. Rule 2: To create a method, write a subroutine First argument of methods is always class name or object itself (or rather: reference) Subroutine call the OO way (method invocation arrow operator) package Cat; sub meow { my $self = shift; print __PACKAGE__ “ says: meow !”; } 1; Cat->meow; $cat->meow;
76. Rule 3: To create an object, bless a referent ‘Special’ method: constructor Any name will do, in most cases new Object can be anything, in most cases hash Reference to object is stored in variable bless Arguments: reference (+ class). Does not change !! Underlying referent is blessed (= typed, labelled) Returns reference package Cat; sub new { my ($class, @args) = @_; my $self = { _name => $_args[0] }; bless $self, $class; }
77. Objects Perl objects are data structures ( a collection of attributes). To create an object we have to take 3 rules into account: Classes are just packages Methods are just subroutines Blessing a referent creates an object
78. Objects Objects are passed around as references Calling an object method can be done using the method invocation arrow: Constructor functions in Perl are conventionally called new() and can be called by writing: $object_ref->method() $object_ref = ClassName->new()
79. Inheritance Concept Way to extend functionality of a class by deriving a (more specific) sub-class from it In Perl: Way of specifying where to look for methods store the name of 1 or more classes in the package variable @ISA Multiple inheritance !! package NorthAmericanCat; use Cat; @ISA = qw(Cat); package NorthAmericanCat; use Cat; use Animal; @ISA = qw(Cat Animal);
80. Inheritance UNIVERSAL, parent of all classes Predifined methods isa(‘<class name>’): check if the object inherits from a particular class can(‘<method name>’): check if <method name> is a callable method
81. Inheritance SUPER: superclass of the current package start looking in @ISA for a class that can() do_something explicitely call a method of a parental class often used by Bioperl to initialize object attributes $self->SUPER::do_something()
82. Polymorphism Concept methods defined in the base class will override methods defined in the parent classes same method has different behaviours
83.
84. Exercises Bioperl Training Exercise 8: OOP Bioperl Training Exercise 9: inheritance, polymorphism Bioperl Training Exercise 10: aggregation, delegation
Notas del editor
Remember to wear it !
Not often needed. Why might you need the braces ? String interpolation:$name = ‘Johnny’;Print “$name1”; # => nothing printedPrint “${name}1”; # => ‘Johnny1’
If there are more variables in the list than elements in the array, the extra variables are assigned the udefined value. If there are fewer variables than array elements, the extra elements are ignored.Distributiviteit: my ()
If there are more variables in the list than elements in the array, the extra variables are assigned the udefined value. If there are fewer variables than array elements, the extra elements are ignored.
Comma is operator: flattens (‘concatenates’) lists/arrays
Comma is operator: flattens (‘concatenates’) lists/arrays
No parens needed: comma operators produce list
main should have been called ‘our’ ;-)Not needed to use the family name when you are with your family. If you call John for dinner, John will know it’s him and you know who will come.But if your family has visitors of another family and they have a John in the family as well ...Family name + given name = fully qualified variable name