SlideShare una empresa de Scribd logo
1 de 87
Descargar para leer sin conexión
Basic Linux

Boyce Thompson Institute for Plant
            Research
           Tower Road
  Ithaca, New York 14853-1801
             U.S.A.

             by
 Aureliano Bombarely Gomez
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
1.1 What is Linux ?


     Linux:
        Unix-like computer operating system assembled
     under the model of free and open source software
     development and distribution




          operating system (OS):
              Set of programs that
          manage computer hardware
          resources and provide common
          services for application
          software.



                                                   Wikipedia
1.1 What is Linux ?


                      Unix   Linux




                               Feduccia A, Trends Ecol. Evol. 2001
1.1 What is Linux ?


     Unix:
         Is a multitasking, multi-user computer operating
     system originally developed in 1969.




                                                            Wikipedia
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
1.2 Distributions.


     Linux Distribution:

        Distributions (often called distros for short) are
     Operating Systems including a large collection of
     software applications such as word processors,
     spreadsheets, media players, and database
     applications.

        The operating system will consist of the Linux kernel
     and, usually, a set of libraries and utilities from the
     GNU Project, with graphics support from the X Window
     System.



                                                             Wikipedia
1.2 Distributions.


     Linux Distribution:

         Different libraries and utilities.




                                Web Browser
                              installed by default




                                                     Wikipedia
1.2 Distributions.


     Linux Distribution: ... more than 600 distributions




                                                    Wikipedia
1.2 Distributions.


     Linux Distribution:

         One can distinguish between:

         1) Commercially-backed distributions, such as:
             Fedora (Red Hat),
             OpenSUSE (Novell),
             Ubuntu (Canonical Ltd.)
             Mandriva Linux (Mandriva)

         2) Entirely community-driven distributions, such as:
            Debian.
            Gentoo.


                                                          Wikipedia
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
2.1 What is a console ?


     What is a console ?

        Computer terminal or system consoles are the text entry
     and display device for system administration messages,
     particularly those from the BIOS or boot loader, the kernel,
     from the init system and from the system logger. It is a
     physical device consisting of a keyboard and a screen.




                                                             Wikipedia
2.1 What is a console ?


     So then...
       What is typical black or white screen where
     programers and system administrators write
     commands?




                                                Wikipedia
2.1 What is a console ?


     Command-line interface (CLI):

         Mechanism for interacting with a computer operating
     system or software by typing commands to perform specific
     tasks.
             The command-line interpreter may be run in a text
     terminal or in a terminal emulator window as a remote shell
     client such as PuTTY.
                                       Software

    Shell:
       Piece of software that provides an interface for users of an
    operating system. There are two categories:
           - Command-line interface (CLI)       Bash (Bourne-Again Shell)
           - Graphical user interface (GUI)
                                                                 Wikipedia
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
2.2 Commands, stdin, stdout and stderr


     Command-line interface (CLI):                  Shell:        Bash:


                   STDIN                         STDOUT
                                 Operating
                                  System
                Command                          STDERR
                               Shell (Bash)



        Command is a directive to a computer program acting as an
        interpreter of some kind, in order to perform a specific task.




                                                                    Wikipedia
2.2 Commands, stdin, stdout and stderr


     Parts of a command:

        Name TagArguments Arguments             Structure


          ls      -lh             /home         Example


         List     l = long listing     target   Meaning
         info     h = human readable
         about
         the
         target


     ...And push RETURN ot ENTER key to execute the command
2.2 Commands, stdin, stdout and stderr


     Special characters in bash:
       CHARACTER             MEANING
           SPACE             Separate commands and arguments
       # POUND               Comment
       ; SEMICOLON           Command separator two run multiple commands
       . DOT                 Source command OR filename component
                             OR current directory
       .. DOUBLE DOTS        Parent directory
       ' ' SINGLE QUOTES     Use expression between quotes literaly
       , COMMA               Concatenate strings
        BACKSLASH           Escape for single character
       / SLASH               Filename path separator
       * ASTERISK            Wild card for filename expansion in globbing
       >, <, >> CHARACTERS   Redirection input/outputs
       |   PIPE              Pipe outputs between commands
2.2 Commands, stdin, stdout and stderr


     Special characters in bash:

        ls Solanum lycopersicum

    Bash undestand spaces as separators between arguments


        ls 'Solanum lycopersicum'
        ls Solanum lycopersicum


         Use single quotes or escape for special characters
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
2.3 Typing shortcuts.


     Typing shortcuts for bash:

       SHORTCUT         MEANING
       Tab              Autocomplete files or folder names
       ↑                Scroll up to the command history
       ↓                Scroll down to the command history
       Ctrl + A         Go to the beginning of the line that you are typing
       Ctrl + D         Go to the end of the line that you are typing
       Ctrl + U         Clear all the line (or until the cursor position)
       Ctrl + R         Search previously used commands
       Ctrl + C         Kill the process that you are running
       Ctrl + D         Exit the current shell
       Ctrl + Z         Put the running process to the background. Use
                        command fg to recover it.
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
3.1 Directories, files and compressions.


     Just a reminder about directory structure:

         Tree structure.

                    /home
                                                        Default dir
                       _ /user                         after user
                           _ /Desktop
                           _ /Downloads
                           _ /Documents
                               _ /LinuxClass
             Filename complete path: /home/user/Documents/LinuxClass
3.1 Directories, files and compressions.


     Moving through working directories:

         COMMAND: cd
                                            Move to the
                                              parent
                                               cd ..
                /home                                Default dir
                   _ /user                          after user
                       _ /Desktop                      login

                       _ /Downloads       Move to the
                       _ /Documents         children
                           _ /LinuxClass cd /Document

        Move using complete path: cd /home/user/Documents/LinuxClass
3.1 Directories, files and compressions.


     Directories commands:
    COMMAND           USE                              EXAMPLE
    cd                Change working dir               cd ../
    pwd               Print working dir                pwd
    ls                List information                 ls -lh /home
    mkdir             Create a new dir                 mkdir test
    rmdir             Remove empty dir                 rmdir test
    rm -Rf            Remove the dir and its content   rm -Rf test
3.1 Directories, files and compressions.


     Files commands:
    COMMAND      USE                                      EXAMPLE
    less         Open a file with less.                   less myfile
                 Q to exit. Arrows to scroll
    touch        Create an empty file                     touch myfile
    mv           Move file between dirs. Change name      mv myfile yourfile
    rm           Remove file                              rm youfil
    cat          Print file content as STDOUT             cat myfile
    head         Print first 10 lines as STDOUT           head myfile
    tail         Print last 10 lines as STDOUT            tail myfile
    grep         Print matching lines as STDOUT           grep 'ATG' myfile
    cut          Cut columns and print as STDOUT          cut -f1 myfile
    sort         Sort lines and print as STDOUT           sort myfile
    sed          Replace ocurrences, print lines STDOUT   sed 's/ATG/CTG/' myfile
    wc           Word count                               wc myfile
3.1 Directories, files and compressions.


     Compression and archiving commands:
    COMMAND           USE                            EXAMPLE
    gzip              Compress a file using gzip     gzip -c test.txt > test.txt.gz
    gunzip            Uncompress a file using gzip   gunzip test.txt.gz
    bzip2             Compress a file using bzip     bzip2 -c test.txt > test.txt.bz2
    bunzip2           Uncompress a file using gzip   bunzip2 test.txt.bz2
    tar               Archive files usint tar        tar -cf sample.tar sample/*.txt
    tar -zcvf         Archive using tar and          tar -zcvf samples.tar.gz
                      compress using gzip            sample/*.txt
    tar -zxvf         Unarchive using tar and        tar -zxvf samples.tar.gz
                      uncompress using gunzip
    tar -jcvf         Archive using tar and          tar -jcvf samples.tar.bz2
                      compress using bzip2           sample/*.txt
    tar -jxvf         Unarchive using tar and        tar -jxvf samples.tar.bz2
                      uncompress using bunzip2
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
3.2 Manual and help.


     More information about the command usage:


        COMMAND: man
          Example: man ls


        ARGUMENTS: -h or –help
          Example: blastall --help
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
3.3 Monitoring resources.


     Monitoring resources

         How much disk space I am using ?
            df -lh

         df displays the amount of disk space available on the filesystem



         How much ram memory I am using ?
            free mem

         free display information about free and used memory on the
         system
3.3 Monitoring resources.


     Monitoring resources

         Which process are being running ?
            top

         top provides an ongoing look at processor activity in real time.
         It displays a listing of the most CPU-intensive tasks on the
         system, and can provide an interactive interface for
         manipulating processes.

         Several single-key commands are recognized while top is
         running:
                - q, quit
                - P, sort by CPU usage
                - M, sort by memory usage
3.3 Monitoring resources.


     Monitoring resources using Top:

 1. Global Resources Usage: CPU, Mem (Memory) and Swap (Swapping)




    2.1   2.2                  2.3        2.4   2.4            2.1

   2. Task Resources Usage:
        2.1 PID, process ID and COMMAND
        2.2 User
        2.3 RES, resources (memory used)
        2.4 %CPU and %MEM (percentage of cpu or memory used)
3.3 Monitoring resources.


     Monitoring resources

          Which process have been run ?
             ps aux

          ps gives a snapshot of the current processes.
          If you want a repetitive update of this status, use top.

   User     PID                   Status




                                 R Running
                                 S Slepping         --forest
                                 T Stopped
                                 Z Zombie
3.3 Monitoring resources.


     Managing resources

         How do I stop a process that is running IN the terminal?
           CRTL + C

         How do I stop a process that is running OUT the terminal?
           kill <PID>

         kill sends the specified signal to the specified process or
         process group.
         If no signal is specified, the TERM signal is sent.
         The TERM signal will kill processes which do not catch this
         Signal (kill -s TERM <PID>).
         For other processes, it may be necessary to use the KILL (9)
         signal, since this signal cannot be caught (kill -s KILL <PID>).
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
3.4 Networking.


     Monitoring Network Connections:

        Is it possible the communication between two machines ?
            ping <url address>

        ping check if it is possible the connection between two machines
        sending ECHO_REQUEST datagram to elicit an ICMP
        ECHO_RESPONSE from a host or gateway.


        How can I find the IP address associated with an URL ?
            host

        host is normally used to convert names to IP addresses and
        vice versa.
3.4 Networking.


     Downloading Files Non-Interactively :

        How can I download a file from a URL ?
            wget <url path to the file/s>
            curl <url path to the file/s>

        wget, free utility for non-interactive download of files from the Web.
        It supports HTTP, HTTPS, and FTP protocols and HTTP proxies

        curl is a client to get documents/files from or send documents to a
        server, using any of the supported protocols (HTTP, HTTPS, FTP,
        GOPHER, DICT, TELNET, LDAP or FILE).

                   wget                  Recursive download (-r argument)
                    vs
                   curl                  Pipeting capabilities
3.4 Networking.


     Remote Connections:

        How can I login remotely to other machine ?
            ssh user@hostname

        ssh (SSH client) is a program for logging into a remote machine
        and for executing commands on a remote machine.

        X11 connections are possible using -X option.


        Example: ssh -X user1@cbsuwrkst1.tc.cornell.edu
3.4 Networking.


     Remote Connections,
     Transfering Files Non-Interactively:

        How can I copy data remotely from/to other machine ?
            scp user@hostname:/filepath localpath
            scp localfilepath user@hostname>:/path

        scp Copies files between hosts on a network. It uses ssh.


        Example: scp -r mydir/
                   user1@cbsuwrkst1.tc.cornell.edu:/workdir/user1
3.4 Networking.


     Remote Connections,
     Transfering Files Interactively:

        How can I transfer a file from/to a remote machine ?
                sftp user@hostname

        sftp, is an interactive file transfer program, similar to ftp, which
        performs all operations over an encrypted ssh transport

                                  Useful SFTP commands
                    Local Machine                  Remote Machine
          Command           Description     Command             Description
          lcd        Change dir             cd           Change dir
          lls        List files             ls           List files
          get        Copy remote → local    put          Copy local → remote
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
4. Permissions and Owners.


     File Permissions and Ownerships:

        All the Unix systems are designed as multiuser operating
        systems. It means that different could access, modify or
        delete the same files.

        To avoid problems, they has a file permission and ownership
        system. It restrict who can access and modify each of the
        files in the computer.

        This system has two parts:

           • Who is the owner of the file ?

           • What type of access has each of the users in the
             System ?
4. Permissions and Owners.


     Ownership:

        Each file has assigned an user owner and a group owner.
        The user owner can be:

            + Real user (for example: aure).
            + Virtual user created by a program (for example: mysql).
            + Administrator user or root.

        Listing users in a computer:      cat /etc/passwd | cut -f ':' -f1

        Listing groups in a computer: cat /etc/group | cut -f ':' -f1

        Information about an user/group:       id -u username; id -g group
4. Permissions and Owners.


     Permissions:

         Each file has assigned 9 different permissions, 3 for the file
        user-owner (u), 3 for the group-owner (g) and 3 for everyone
        else (o).

        There are 3 types of permissions or file attributes:

          + Readable (r), it has permission to read the file.
          + Writable (w), it has permission to write the file.
          + Executable (x), it has permission to execute as program.

        10 letters code for linux file:   ----------                   switch OFF
                                          drwxrwxrwx                   switch ON
                                                user

                                                       group

                                                               other
                                          dir
4. Permissions and Owners.


     Permissions:

      Information about the file:    ls -l;        ls -l <target>

      Examples:     -r--r--r--      Readable for everyone
                    -rwxr—-r--      Readable for everyone, writable or
                                    executable only for the user-owner
                    drw-rw-r--      Dir readable and writable for user
                                    and group, readable for everyone.


      To change the onwer/group:    chowner owner:group file

      To change permissions:        chmod [ugo] [+-=] [rwx] file
                                    chmod [0-7] [0-7] [0-7] file
         |rwx|rwx|rwx|
         |421|421|421|
4. Permissions and Owners.


     Permissions:

      sudo, is a program for Unix-like computer operating systems
           that allows users to run programs with the security
           privileges of another user (normally the superuser, or
           root). Its name is a concatenation of the su command
           (which grants the user a shell for the superuser) and
           "do", or take action.


      Example:    sudo cp ./myscript.pl /usr/local/bin/
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
5. Installing programs.


     Ways to Install Programs:

         1) Using Packages Managers

             1.1) Graphical package manager
                  (Example: Synaptic for Ubuntu).
             1.2) High level command-line package manager
                  (Example: apt for Debian, Yum for Red Hat)
             1.3) Low level command-line package manager
                   (Example: dpkg for Debian, rpm for Red Hat)

         2) Moving Executable file program to the PATH* and the
            libraries need to their corresponding locations.

             2.1) Precompiled.
             2.2) From-source.
5. Installing programs.


     Install Programs Using Package Managers:

     1.1) Graphical package manager (Synaptic for Ubuntu).




                                              Root/Sudo access
5. Installing programs.


     Install Programs Using Package Managers:

     1.1) Graphical package manager (Synaptic for Ubuntu).




                                                Search Box/Button




                                        Mark for installation
5. Installing programs.


     Install Programs Using Package Managers:

     1.1) Graphical package manager (Synaptic for Ubuntu).

                                     Additional changes
                                     like dependencies




                                                          Apply for install
5. Installing programs.


     Install Programs Using Package Managers:

     1.2) High level command-line package manager (Apt).

          APT (Advance Package Tool), is a free user interface that
          works with core libraries to handle the installation and
          removal of software on the Debian GNU/Linux distribution
          and its variants.
          APT simplifies the process of managing software on Unix-
          like computer systems by automating the retrieval,
          configuration and installation of software packages, either
          from binary files or by compiling source code.

            To search:      apt-cache search [package]
            To install:     apt-get install [package]
            To remove:      apt-get remove [package]
5. Installing programs.


     Install Programs Using Package Managers:

     1.2) High level command-line package manager (Apt).
                                                   1- Search Package




                                                         Admin.
      2- Install                                       priviledges
         Package
5. Installing programs.


     Install Programs Using Package Managers:

     1.3) Low level command-line package manager (Dpkg).

          DPKG is the software at the base of the Debian package
          management system. dpkg is used to install, remove, and
          provide information about .deb packages.

            To install:     dpkg -i [package.deb]
            To remove:      dpkg -r [package.deb]
5. Installing programs.


     Ways to Install Programs:

         2) Moving Executable file program to the PATH* and the
            libraries need to their corresponding locations.

             2.1) Precompiled.




                1- Download file with wget
5. Installing programs.


     Ways to Install Programs:

         2) Moving Executable file program to the PATH* and the
            libraries need to their corresponding locations.

             2.1) Precompiled.


                1- Download file with wget


                2- Unarchive and uncompress




                3- Copy** into program PATH (for example /usr/local/bin)
5. Installing programs.


     Ways to Install Programs:

         2) Moving Executable file program to the PATH* and the
            libraries need to their corresponding locations.

             2.2) From-source.

                  Generally it is more complex because it requires the
                  right version of one or more compilers (gcc, g++).
                  The compilation usually has three or four steps:
                        1) ./configuration
                        2) make
                        3) (make check)
                        4) make install
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
6. Piping and scripting.


     Piping:

         Piping: Run different programs sequencially where the output
                 of a program will be the input of the next one.


                  prog1           prog2       prog3           prog4




         Bash use the sign '|' (pipe) to pipe the output as the input
         of the next program. For example:
              ls /mydir | wc -l            # To count files in a folder
6. Piping and scripting.


     Piping:

         Another popular combination is redirect the stdout (output)
         to a file using '>' (write or overwrite if it exists) or '>>'
         (append):

              grep '>' fasta | sed 's/>//' > id.txt
              # To extract id lines from a fasta file, remove the id and
              # redirect the output to a id.txt file
6. Piping and scripting.


     Scripting:

         Shell commands can be run from a file as a script using the
         Shell program, for example: bash or sh

             Example: bash script fastaseq_count
                    #! /bin/sh
                    ## The argument will be use as $1 variable
                    grep -c '>' $1


             To use it

                     sh ./fastaseq_count myfasta.file

             More info: http://tldp.org/LDP/abs/html/
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
7. Enviroment variables.


     Environment Variables:

         There are two types of variables in a Unix shell:

            1) System variables.

                Variables defined by the system such as home dir
                or the executable path.

            2) User define variables.

                Variables defined by the user during a bash session.
                To define an user variable:

                    myvariable=test
                    a=$(grep -c '>' fasta)
7. Enviroment variables.


     System Variables:

           SYSTEM VARIABLE                     MEANING
         SHELL               Shell name
         BASH                Shell name
         BASH_VERSION        Shell version
         COLUMNS             Number of columns printed in the screen
         LINES               Number of lines printed in the screen
         HOME                Home dir.
         LOGNAME             Login name
         OSTYPE              Operating system type
         PATH                Path dirs
         PS1                 Prompt settings
         PWD                 Current working dir
         USERNAME            Username currently login into the system
7. Enviroment variables.


     Commands to interact with variables:

              COMMAND                                MEANING
       env                     Print environment variables
       echo                    Print environment variable value
       export                  Create an env. variable available to all child process
       source                  Execute commands from a filename


        Example:
           How to add program executable dir
           (/home/user/shscripts) to the executable path.

                export PATH=/home/user/shscripts:$PATH
                # ':' is used to separate elements
                # $PATH adds the old PATH variable
Basic Linux:
   1. Introduction
       1.1 What is Linux ?
       1.2 Distributions.
   2. Terminal and Virtual Consoles.
       2.1 What is a console ?
       2.2 Commands, stdin, stdout and stderr.
       2.3 Typing shortcuts.
   3. Popular commands.
       3.1 Directories, files and compressions.
       3.2 Manual and help.
       3.3 Monitoring resources.
       3.4 Networking.
   4. Permissions and Owners.
   5. Installing programs.
   6. Piping and scripting.
   7. Enviroment variables.
   8. Popular combinations in bioinformatics.
8. Popular combinations in bioinformatics.


     Fasta and fastq file manipulation:
         Just a reminder:
            1) Fasta files.
                It is a text-based format for representing either
                nucleotide sequences or peptide sequences, in
                which nucleotides or amino acids are represented
                using single-letter codes.


               >SEQUENCE_ID1 DESCRIPTION
               ATGCGCGCGCGCGCGCGCGGGTAGCAGATGACGACACAGAGCGAGGATGCGCTGAGAGTA
               GTGTGACGACGATGACGGAAAATCAGATGGACCCGATGACAGCATGACGATGGGACGGGA
               AAGATTGGACCAGGACAGGACCAGGACCAGGACCAGGGATTAGA
               >SEQUENCE_ID2 DESCRIPTION
               ATGGGGGGGACGACGATGGACACAGAGACAGAGACGACGACAGCAGACAGATTTACCTTA
               GACGAGATAGGAGAGACGACAGATATATATATATAGCAGACAGACAGACATTTAGACGAG
               ACGACGATAGACGAT



                                             http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml
8. Popular combinations in bioinformatics.


     Fasta and fastq file manipulation:
         Just a reminder:
            1) Fasta files.
                I) Single line description (defline) with a greater-
                  than (“>”) symbol in the first column.

                II) There should be not space between “>”
                   symbol and the first letter of the identifier.

                III) Sequences are in multiple lines after the defline

                 FILE EXTENSION       MEANING
                 .fasta, .fas, .seq   Generic sequence fasta format
                 .fna                 Fasta nucleic acid
                 .faa                 Fasta amino acid
                 .ffn, frn            Sequence coding region / Non-coding regions
8. Popular combinations in bioinformatics.


       Fasta and fastq file manipulation:

                    COMMANDS COMBINATION                                             FUNCTION
  grep -c '>' file.fasta                                                     Count sequences
  grep '>' file.fasta | sed 's/>//' > defline.txt                            Extract IDs and descriptions
  grep '>' file.fasta | sed 's/>//' | sed 's/s+/t/g' | cut -f1 > id.txt    Extract IDs
  grep '>' file.fasta | sed 's/>//' | sed 's/s+/t/g' | cut -f1 | sort -u   Extract unique IDs
  > unique_id.txt
  grep -v '>' file.fasta | wc -m                                             Total character in sequences
  a=$(grep -c '>' fasta); b=$(grep '>' fasta | sort -u | wc -l); test        Ask if there are redundant
  $a -eq $b && echo “TRUE” || echo “FALSE”                                   IDs in a fasta file
8. Popular combinations in bioinformatics.


     Fasta and fastq file manipulation:
         Just a reminder:
            2) Fastq files.
                It is a text-based format for storing both a biological
                sequence (usually nucleotide sequence) and its
                corresponding quality scores. Both the sequence letter
                and quality score are encoded with a single ASCII
                character for brevity.

               @SEQUENCE_ID1
               ATGCGCGCGCGCGCGCGCGGGTAGCAGATGACGACACAGAGCGAGGATGCGCTGAGAGTA
               GTGTGACGACGATGACGGAAAATCAGA
               +
               BBBBBPPPPPXXXXX^^^^^^^^^^^^^^^^^^^^^^^__^^^^^^^^^^^^_eeeeeee
               [[[[[^^^^]]]]XXXXXPPPPPBBBB




                                             http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml
8. Popular combinations in bioinformatics.


     Fasta and fastq file manipulation:
            2) Fastq files.
                I) Single line ID with at symbol (“@”) in the first column.

                II) There should be not space between “@”
                   symbol and the first letter of the identifier.

                III) Sequences are in multiple lines after the ID line

                IV) Single line with plus symbol (“+”) in the first column
                    to represent the quality line.

                V) quality ID line can have or have not ID

                VI) Quality values are in multiple lines after the + line
8. Popular combinations in bioinformatics.


        Fasta and fastq file manipulation:
              2) Fastq files.
                        Quality score character vary depending of the
                        sequence method and sequencer version.

   SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.....................................................
   ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX......................
   ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......................
   .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ......................
   LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL....................................................
   !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~
   |                         |    |        |                              |                     |
   33                        59   64       73                            104                   126

    S   -
        Sanger        Phred+33, raw reads typically (0, 40)
    X   -
        Solexa        Solexa+64, raw reads typically (-5, 40)
    I   -
        Illumina 1.3+ Phred+64, raw reads typically (0, 40)
    J   -
        Illumina 1.5+ Phred+64, raw reads typically (3, 40)
       with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold)
       (Note: See discussion above).
    L - Illumina 1.8+ Phred+33, raw reads typically (0, 41)




                                                                                        Wikipedia
8. Popular combinations in bioinformatics.


     Fasta and fastq file manipulation:
            2) Fastq files.
                    Illumina sequence identifiers contains additional
                    information.

              ILLUMINA1.3+: @HWUSI-EAS100R:6:73:941:1973#0/1




                                    Name
                                    Instrument
                                    Unique


                                                 Flow Lane
                                                 Tile N. Flow Lane
                                                                     X Coord.Flow Lane

                                                                                         Y Coord Flow Lane
                                                                                                             Multiplexing Number/Tag
                                                                                                             Pair
                                                                                                                                       Wikipedia
8. Popular combinations in bioinformatics.


     Fasta and fastq file manipulation:
            2) Fastq files.
                    Illumina sequence identifiers contains additional
                    information.

       ILLUMINA1.8+: @EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG
                        Name
                        Instrument
                        Unique

                                     Run ID


                                              Flowcell ID

                                                            Flow Lane

                                                                        Tile N. Flow Lane


                                                                                            X Coord.Flow Lane


                                                                                                                Y Coord Flow Lane




                                                                                                                                                                     Number/Tag
                                                                                                                                                                     Multiplexing
                                                                                                                                    Pair
                                                                                                                                    Filtered
                                                                                                                                    Control Bits (0 no control)
                                                                                                                                                                  Wikipedia
8. Popular combinations in bioinformatics.


      Fasta and fastq file manipulation:

                   COMMANDS COMBINATION                                          FUNCTION
  grep -c '@' file.fastq                                                  Count sequences
  a=$(grep -c '@' fastq); b=$(grep '+' fastq); test $a -eq $b &&          Check if there are the
  echo “TRUE” || echo “FALSE”                                             same of @ than + lines
  grep '@' pair1.fastq | sed 's//1//' > pair1.id; grep '@'               Check if the pairs have
  pair2.fastq | sed 's//2//'> pair2.id; diff pair1.id pair2.id | a=$     the same IDs
  (grep -c '[>|<]') | echo “$a differences”; rm pair1.id; rm
  pair2.id;
  grep '@' pair1.fastq | sed 's//1//> pair1.id; grep '@'                 Redirect the differences to
  pair2.fastq | sed 's/2//' > pair2.id; diff pair1.id pair2.id > diff.   a file
  pair1.pair2.txt; rm pair1.id; rm pair2.id;
8. Popular combinations in bioinformatics.


     Blast and blat result file manipulation:
         Just a reminder:
            3) Blast m8 result files (hit table).
                It is a tabular (separator=”t”) text-based format with 8
                columns: query_id, subject_id, %identity, align_length,
                mismatches, gaps_openings, q_start, q_end, s_start,
                s_end, e_value, bit_score


         QID1    SID5   91.00   178 0   0    1   178 1001   1179   1e-40    156
         QID1    SID8   65.00   68 1    1    110 178 1050   1118   1e-25    87
8. Popular combinations in bioinformatics.


      Blast and blat result file manipulation:

                   COMMANDS COMBINATION                                     FUNCTION
  cut -f1 blast.m8 | wc -l                                           Count total hits
  cut -f1 blast.m8 | sort -u | wc -l                                 Count query sequences
                                                                     with at least one hit
  cut -f1 blast.m8 | sort | uniq -c > query.hits.txt                 Count number of hits for
                                                                     each query sequence
  cut -f2 blast.m8 | sort -u | wc -l                                 Count subject sequences
                                                                     with at least one hit
  awk '{if ($3 > 90) print $0}' blast.m8 | cut -f1 | wc -l           Count hits with % identity
                                                                     > 90 %
  awk '{if ($3>90 && $4>100) print $0}' blast.m8 | cut -f1 | wc -l   Count hits with % identity
                                                                     > 90 % and Length > 100
8. Popular combinations in bioinformatics.


      Blast and blat result file manipulation:
           Just a reminder:
              3) Blat psl result files (hit table).
                      It is a tabular (separator=”t”) text-based format with 21
                      columns: match, mismatch, repmatch, N's, Q_gap_count,
                      T_gap_count, T_gap_bases, strand, Qname, Qsize,
                      Qstart, Qend, Tname, Tsize, Tstart, Tend, block_count,
                      block_sizes, qStarts, tStarts


                  COMMANDS COMBINATION                                    FUNCTION
grep '^[0-9]' | cut -f1 blat.psl | wc -l                           Count total hits
grep '^[0-9]' | cut -f10 blat.psl | sort -u | wc -l                Count query sequences
                                                                   with at least one hit
awk '{if ($1*100/$11 > 90) print $0}' blat.psl | cut -f1 | wc -l   Count query sequences
                                                                   with hit coverage > 90%
Exercises:
1) Create in your home dir the /LinuxExercises dir.

2) Change dir to LinuxExercise dir and print into the screen the
   complete path,

3) Download the Arabidopsis thaliana complete genome from:
   ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/

4) Create a fasta file with all the chromosomes.

5) Download the Arabidopsis thaliana cdna set from:
  ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR10_blastsets/
Exercises:
6) Install Blast in your computer using apt.

7) Install last executable version of Blat
   (http://hgwdev.cse.ucsc.edu/~kent/exe/)

8) Run a blast and a blat using A. thaliana genome as database (subject)
   and A.thaliana cds as query (object)

9) How many cds sequences have at least one match in each of the
   Results ?

10) Could you design a bash script to count how many introns have
    each gene model based in the blat results ?
Solutions:
1) Create in your home dir the /LinuxExercises dir.

      mkdir LinuxExercises

2) Change dir to LinuxExercise dir and print into the screen the
   complete path,

      cd LinuxExercises

3) Download the Arabidopsis thaliana complete genome from:
   ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/

      wget ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/TAIR10_chr*
Solutions:
4) Create a fasta file with all the chromosomes.

     cat TAIR10_chr* > TAIR10_genome.fasta

5) Download the Arabidopsis thaliana cdna set from:
  ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR10_blastsets/

     wget ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR10_blastsets/
          TAIR10_cds_20101214
Solutions:
6) Install Blast in your computer using apt.

      apt-cache search blast
      sudo apt-get install blast2

7) Install last executable version of Blat
   (http://hgwdev.cse.ucsc.edu/~kent/exe/linux/)

      wget http://hgwdev.cse.ucsc.edu/~kent/exe/linux/blatSuite.zip
      mkdir blatdir
      unzip -c blatdir blatSuite.zip
      sudo cp ./blatdir/* /usr/local/bin
Solutions:
8) Run a blast and a blat using A. thaliana genome as database (subject)
   and A.thaliana cds as query (object)

      formatdb -p F TAIR10_genome.fasta
      blastall -p blastn -d TAIR10_genome.fasta -i TAIR10_cds_20101214
           -o TAIR10_cds.blastn.TAIR10_genome.m8 -m 8 -a 2
      blat TAIR10_genome.fasta TAIR10_cds_20101214
           TAIR10_cds.blat.TAIR10_genome.psl

9) How many cds sequences have at least one match in each of the
   Results ?

      cut -f1 TAIR10_cds.blastn.TAIR10_genome.m8 | sort -u | wc -l
      grep '^[0-9]' TAIR10_cds.blat.TAIR10_genome.psl | cut -f9 | sort -u | wc -l
Solutions:
10) Could you design a bash script to count how many introns have
    each gene model based in the blat results ?

     #! /bin/sh
     grep '^[0-9]' $1 | awk '{ if ($1 == $11) print $0}' | cut f10,18 |
       awk '{ print $1”t”$2-1}'

Más contenido relacionado

La actualidad más candente

Basic Linux Security
Basic Linux SecurityBasic Linux Security
Basic Linux Securitypankaj009
 
Linux Kernel Tour
Linux Kernel TourLinux Kernel Tour
Linux Kernel Toursamrat das
 
Linux Tutorial For Beginners | Linux Administration Tutorial | Linux Commands...
Linux Tutorial For Beginners | Linux Administration Tutorial | Linux Commands...Linux Tutorial For Beginners | Linux Administration Tutorial | Linux Commands...
Linux Tutorial For Beginners | Linux Administration Tutorial | Linux Commands...Edureka!
 
Linux presentation
Linux presentationLinux presentation
Linux presentationNikhil Jain
 
Linux commands and file structure
Linux commands and file structureLinux commands and file structure
Linux commands and file structureSreenatha Reddy K R
 
Linux or unix interview questions
Linux or unix interview questionsLinux or unix interview questions
Linux or unix interview questionsTeja Bheemanapally
 
Description of GRUB 2
Description of GRUB 2Description of GRUB 2
Description of GRUB 2iamumr
 
Introduction to Linux
Introduction to Linux Introduction to Linux
Introduction to Linux Harish R
 
Course 102: Lecture 16: Process Management (Part 2)
Course 102: Lecture 16: Process Management (Part 2) Course 102: Lecture 16: Process Management (Part 2)
Course 102: Lecture 16: Process Management (Part 2) Ahmed El-Arabawy
 
Linux Presentation
Linux PresentationLinux Presentation
Linux Presentationnishantsri
 
Linux Directory Structure
Linux Directory StructureLinux Directory Structure
Linux Directory StructureKevin OBrien
 
Linux Basic Commands
Linux Basic CommandsLinux Basic Commands
Linux Basic CommandsHanan Nmr
 
Linux directory structure by jitu mistry
Linux directory structure by jitu mistryLinux directory structure by jitu mistry
Linux directory structure by jitu mistryJITU MISTRY
 
Unix/Linux Basic Commands and Shell Script
Unix/Linux Basic Commands and Shell ScriptUnix/Linux Basic Commands and Shell Script
Unix/Linux Basic Commands and Shell Scriptsbmguys
 
Unix / Linux Command Reference
Unix / Linux Command ReferenceUnix / Linux Command Reference
Unix / Linux Command ReferenceSumankumar Panchal
 
Install and configure linux
Install and configure linuxInstall and configure linux
Install and configure linuxVicent Selfa
 

La actualidad más candente (20)

Basic Linux Security
Basic Linux SecurityBasic Linux Security
Basic Linux Security
 
Linux Kernel Tour
Linux Kernel TourLinux Kernel Tour
Linux Kernel Tour
 
Linux Tutorial For Beginners | Linux Administration Tutorial | Linux Commands...
Linux Tutorial For Beginners | Linux Administration Tutorial | Linux Commands...Linux Tutorial For Beginners | Linux Administration Tutorial | Linux Commands...
Linux Tutorial For Beginners | Linux Administration Tutorial | Linux Commands...
 
Linux presentation
Linux presentationLinux presentation
Linux presentation
 
Linux commands and file structure
Linux commands and file structureLinux commands and file structure
Linux commands and file structure
 
Linux or unix interview questions
Linux or unix interview questionsLinux or unix interview questions
Linux or unix interview questions
 
4. linux file systems
4. linux file systems4. linux file systems
4. linux file systems
 
Description of GRUB 2
Description of GRUB 2Description of GRUB 2
Description of GRUB 2
 
Introduction to Linux
Introduction to Linux Introduction to Linux
Introduction to Linux
 
Course 102: Lecture 16: Process Management (Part 2)
Course 102: Lecture 16: Process Management (Part 2) Course 102: Lecture 16: Process Management (Part 2)
Course 102: Lecture 16: Process Management (Part 2)
 
Linux training
Linux trainingLinux training
Linux training
 
Linux systems - Linux Commands and Shell Scripting
Linux systems - Linux Commands and Shell ScriptingLinux systems - Linux Commands and Shell Scripting
Linux systems - Linux Commands and Shell Scripting
 
Linux Presentation
Linux PresentationLinux Presentation
Linux Presentation
 
Linux Directory Structure
Linux Directory StructureLinux Directory Structure
Linux Directory Structure
 
Linux Basic Commands
Linux Basic CommandsLinux Basic Commands
Linux Basic Commands
 
Linux directory structure by jitu mistry
Linux directory structure by jitu mistryLinux directory structure by jitu mistry
Linux directory structure by jitu mistry
 
Unix/Linux Basic Commands and Shell Script
Unix/Linux Basic Commands and Shell ScriptUnix/Linux Basic Commands and Shell Script
Unix/Linux Basic Commands and Shell Script
 
Unix / Linux Command Reference
Unix / Linux Command ReferenceUnix / Linux Command Reference
Unix / Linux Command Reference
 
Install and configure linux
Install and configure linuxInstall and configure linux
Install and configure linux
 
Shell scripting
Shell scriptingShell scripting
Shell scripting
 

Destacado (11)

PerlTesting
PerlTestingPerlTesting
PerlTesting
 
PerlScripting
PerlScriptingPerlScripting
PerlScripting
 
Introduction2R
Introduction2RIntroduction2R
Introduction2R
 
BasicGraphsWithR
BasicGraphsWithRBasicGraphsWithR
BasicGraphsWithR
 
GoTermsAnalysisWithR
GoTermsAnalysisWithRGoTermsAnalysisWithR
GoTermsAnalysisWithR
 
Linux basic commands
Linux basic commandsLinux basic commands
Linux basic commands
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
types of operating system
types of operating systemtypes of operating system
types of operating system
 
Types of operating system
Types of operating systemTypes of operating system
Types of operating system
 
RNAseq Analysis
RNAseq AnalysisRNAseq Analysis
RNAseq Analysis
 
Operating system.ppt (1)
Operating system.ppt (1)Operating system.ppt (1)
Operating system.ppt (1)
 

Similar a BasicLinux

Similar a BasicLinux (20)

Ch1 linux basics
Ch1 linux basicsCh1 linux basics
Ch1 linux basics
 
Linux introduction (eng)
Linux introduction (eng)Linux introduction (eng)
Linux introduction (eng)
 
Unix_commands_theory
Unix_commands_theoryUnix_commands_theory
Unix_commands_theory
 
Unix Operating System
Unix Operating SystemUnix Operating System
Unix Operating System
 
3CS LSP UNIT 1-1.pdf
3CS LSP UNIT 1-1.pdf3CS LSP UNIT 1-1.pdf
3CS LSP UNIT 1-1.pdf
 
Unix final
Unix finalUnix final
Unix final
 
Genomic Data Analysis: From Reads to Variants
Genomic Data Analysis: From Reads to VariantsGenomic Data Analysis: From Reads to Variants
Genomic Data Analysis: From Reads to Variants
 
Linux Virus
Linux VirusLinux Virus
Linux Virus
 
Linux operating system ppt
Linux operating system pptLinux operating system ppt
Linux operating system ppt
 
01 t1 s2_linux_lesson1
01 t1 s2_linux_lesson101 t1 s2_linux_lesson1
01 t1 s2_linux_lesson1
 
Spsl unit1
Spsl   unit1Spsl   unit1
Spsl unit1
 
Linux for beginners
Linux for beginnersLinux for beginners
Linux for beginners
 
Linux
LinuxLinux
Linux
 
Programming and problem solving 3
Programming and problem solving 3Programming and problem solving 3
Programming and problem solving 3
 
Linux Command Suumary
Linux Command SuumaryLinux Command Suumary
Linux Command Suumary
 
cisco
ciscocisco
cisco
 
Linux@assignment ppt
Linux@assignment pptLinux@assignment ppt
Linux@assignment ppt
 
Command Line Interpreter
Command Line InterpreterCommand Line Interpreter
Command Line Interpreter
 
3. intro
3. intro3. intro
3. intro
 
Fight with linux reverse
Fight with linux reverseFight with linux reverse
Fight with linux reverse
 

Último

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 

Último (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

BasicLinux

  • 1. Basic Linux Boyce Thompson Institute for Plant Research Tower Road Ithaca, New York 14853-1801 U.S.A. by Aureliano Bombarely Gomez
  • 2. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 3. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 4. 1.1 What is Linux ? Linux: Unix-like computer operating system assembled under the model of free and open source software development and distribution operating system (OS): Set of programs that manage computer hardware resources and provide common services for application software. Wikipedia
  • 5. 1.1 What is Linux ? Unix Linux Feduccia A, Trends Ecol. Evol. 2001
  • 6. 1.1 What is Linux ? Unix: Is a multitasking, multi-user computer operating system originally developed in 1969. Wikipedia
  • 7. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 8. 1.2 Distributions. Linux Distribution: Distributions (often called distros for short) are Operating Systems including a large collection of software applications such as word processors, spreadsheets, media players, and database applications. The operating system will consist of the Linux kernel and, usually, a set of libraries and utilities from the GNU Project, with graphics support from the X Window System. Wikipedia
  • 9. 1.2 Distributions. Linux Distribution: Different libraries and utilities. Web Browser installed by default Wikipedia
  • 10. 1.2 Distributions. Linux Distribution: ... more than 600 distributions Wikipedia
  • 11. 1.2 Distributions. Linux Distribution: One can distinguish between: 1) Commercially-backed distributions, such as: Fedora (Red Hat), OpenSUSE (Novell), Ubuntu (Canonical Ltd.) Mandriva Linux (Mandriva) 2) Entirely community-driven distributions, such as: Debian. Gentoo. Wikipedia
  • 12. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 13. 2.1 What is a console ? What is a console ? Computer terminal or system consoles are the text entry and display device for system administration messages, particularly those from the BIOS or boot loader, the kernel, from the init system and from the system logger. It is a physical device consisting of a keyboard and a screen. Wikipedia
  • 14. 2.1 What is a console ? So then... What is typical black or white screen where programers and system administrators write commands? Wikipedia
  • 15. 2.1 What is a console ? Command-line interface (CLI): Mechanism for interacting with a computer operating system or software by typing commands to perform specific tasks. The command-line interpreter may be run in a text terminal or in a terminal emulator window as a remote shell client such as PuTTY. Software Shell: Piece of software that provides an interface for users of an operating system. There are two categories: - Command-line interface (CLI) Bash (Bourne-Again Shell) - Graphical user interface (GUI) Wikipedia
  • 16. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 17. 2.2 Commands, stdin, stdout and stderr Command-line interface (CLI): Shell: Bash: STDIN STDOUT Operating System Command STDERR Shell (Bash) Command is a directive to a computer program acting as an interpreter of some kind, in order to perform a specific task. Wikipedia
  • 18. 2.2 Commands, stdin, stdout and stderr Parts of a command: Name TagArguments Arguments Structure ls -lh /home Example List l = long listing target Meaning info h = human readable about the target ...And push RETURN ot ENTER key to execute the command
  • 19. 2.2 Commands, stdin, stdout and stderr Special characters in bash: CHARACTER MEANING SPACE Separate commands and arguments # POUND Comment ; SEMICOLON Command separator two run multiple commands . DOT Source command OR filename component OR current directory .. DOUBLE DOTS Parent directory ' ' SINGLE QUOTES Use expression between quotes literaly , COMMA Concatenate strings BACKSLASH Escape for single character / SLASH Filename path separator * ASTERISK Wild card for filename expansion in globbing >, <, >> CHARACTERS Redirection input/outputs | PIPE Pipe outputs between commands
  • 20. 2.2 Commands, stdin, stdout and stderr Special characters in bash: ls Solanum lycopersicum Bash undestand spaces as separators between arguments ls 'Solanum lycopersicum' ls Solanum lycopersicum Use single quotes or escape for special characters
  • 21. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 22. 2.3 Typing shortcuts. Typing shortcuts for bash: SHORTCUT MEANING Tab Autocomplete files or folder names ↑ Scroll up to the command history ↓ Scroll down to the command history Ctrl + A Go to the beginning of the line that you are typing Ctrl + D Go to the end of the line that you are typing Ctrl + U Clear all the line (or until the cursor position) Ctrl + R Search previously used commands Ctrl + C Kill the process that you are running Ctrl + D Exit the current shell Ctrl + Z Put the running process to the background. Use command fg to recover it.
  • 23. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 24. 3.1 Directories, files and compressions. Just a reminder about directory structure: Tree structure. /home Default dir _ /user after user _ /Desktop _ /Downloads _ /Documents _ /LinuxClass Filename complete path: /home/user/Documents/LinuxClass
  • 25. 3.1 Directories, files and compressions. Moving through working directories: COMMAND: cd Move to the parent cd .. /home Default dir _ /user after user _ /Desktop login _ /Downloads Move to the _ /Documents children _ /LinuxClass cd /Document Move using complete path: cd /home/user/Documents/LinuxClass
  • 26. 3.1 Directories, files and compressions. Directories commands: COMMAND USE EXAMPLE cd Change working dir cd ../ pwd Print working dir pwd ls List information ls -lh /home mkdir Create a new dir mkdir test rmdir Remove empty dir rmdir test rm -Rf Remove the dir and its content rm -Rf test
  • 27. 3.1 Directories, files and compressions. Files commands: COMMAND USE EXAMPLE less Open a file with less. less myfile Q to exit. Arrows to scroll touch Create an empty file touch myfile mv Move file between dirs. Change name mv myfile yourfile rm Remove file rm youfil cat Print file content as STDOUT cat myfile head Print first 10 lines as STDOUT head myfile tail Print last 10 lines as STDOUT tail myfile grep Print matching lines as STDOUT grep 'ATG' myfile cut Cut columns and print as STDOUT cut -f1 myfile sort Sort lines and print as STDOUT sort myfile sed Replace ocurrences, print lines STDOUT sed 's/ATG/CTG/' myfile wc Word count wc myfile
  • 28. 3.1 Directories, files and compressions. Compression and archiving commands: COMMAND USE EXAMPLE gzip Compress a file using gzip gzip -c test.txt > test.txt.gz gunzip Uncompress a file using gzip gunzip test.txt.gz bzip2 Compress a file using bzip bzip2 -c test.txt > test.txt.bz2 bunzip2 Uncompress a file using gzip bunzip2 test.txt.bz2 tar Archive files usint tar tar -cf sample.tar sample/*.txt tar -zcvf Archive using tar and tar -zcvf samples.tar.gz compress using gzip sample/*.txt tar -zxvf Unarchive using tar and tar -zxvf samples.tar.gz uncompress using gunzip tar -jcvf Archive using tar and tar -jcvf samples.tar.bz2 compress using bzip2 sample/*.txt tar -jxvf Unarchive using tar and tar -jxvf samples.tar.bz2 uncompress using bunzip2
  • 29. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 30. 3.2 Manual and help. More information about the command usage: COMMAND: man Example: man ls ARGUMENTS: -h or –help Example: blastall --help
  • 31. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 32. 3.3 Monitoring resources. Monitoring resources How much disk space I am using ? df -lh df displays the amount of disk space available on the filesystem How much ram memory I am using ? free mem free display information about free and used memory on the system
  • 33. 3.3 Monitoring resources. Monitoring resources Which process are being running ? top top provides an ongoing look at processor activity in real time. It displays a listing of the most CPU-intensive tasks on the system, and can provide an interactive interface for manipulating processes. Several single-key commands are recognized while top is running: - q, quit - P, sort by CPU usage - M, sort by memory usage
  • 34. 3.3 Monitoring resources. Monitoring resources using Top: 1. Global Resources Usage: CPU, Mem (Memory) and Swap (Swapping) 2.1 2.2 2.3 2.4 2.4 2.1 2. Task Resources Usage: 2.1 PID, process ID and COMMAND 2.2 User 2.3 RES, resources (memory used) 2.4 %CPU and %MEM (percentage of cpu or memory used)
  • 35. 3.3 Monitoring resources. Monitoring resources Which process have been run ? ps aux ps gives a snapshot of the current processes. If you want a repetitive update of this status, use top. User PID Status R Running S Slepping --forest T Stopped Z Zombie
  • 36. 3.3 Monitoring resources. Managing resources How do I stop a process that is running IN the terminal? CRTL + C How do I stop a process that is running OUT the terminal? kill <PID> kill sends the specified signal to the specified process or process group. If no signal is specified, the TERM signal is sent. The TERM signal will kill processes which do not catch this Signal (kill -s TERM <PID>). For other processes, it may be necessary to use the KILL (9) signal, since this signal cannot be caught (kill -s KILL <PID>).
  • 37. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 38. 3.4 Networking. Monitoring Network Connections: Is it possible the communication between two machines ? ping <url address> ping check if it is possible the connection between two machines sending ECHO_REQUEST datagram to elicit an ICMP ECHO_RESPONSE from a host or gateway. How can I find the IP address associated with an URL ? host host is normally used to convert names to IP addresses and vice versa.
  • 39. 3.4 Networking. Downloading Files Non-Interactively : How can I download a file from a URL ? wget <url path to the file/s> curl <url path to the file/s> wget, free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols and HTTP proxies curl is a client to get documents/files from or send documents to a server, using any of the supported protocols (HTTP, HTTPS, FTP, GOPHER, DICT, TELNET, LDAP or FILE). wget Recursive download (-r argument) vs curl Pipeting capabilities
  • 40. 3.4 Networking. Remote Connections: How can I login remotely to other machine ? ssh user@hostname ssh (SSH client) is a program for logging into a remote machine and for executing commands on a remote machine. X11 connections are possible using -X option. Example: ssh -X user1@cbsuwrkst1.tc.cornell.edu
  • 41. 3.4 Networking. Remote Connections, Transfering Files Non-Interactively: How can I copy data remotely from/to other machine ? scp user@hostname:/filepath localpath scp localfilepath user@hostname>:/path scp Copies files between hosts on a network. It uses ssh. Example: scp -r mydir/ user1@cbsuwrkst1.tc.cornell.edu:/workdir/user1
  • 42. 3.4 Networking. Remote Connections, Transfering Files Interactively: How can I transfer a file from/to a remote machine ? sftp user@hostname sftp, is an interactive file transfer program, similar to ftp, which performs all operations over an encrypted ssh transport Useful SFTP commands Local Machine Remote Machine Command Description Command Description lcd Change dir cd Change dir lls List files ls List files get Copy remote → local put Copy local → remote
  • 43. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 44. 4. Permissions and Owners. File Permissions and Ownerships: All the Unix systems are designed as multiuser operating systems. It means that different could access, modify or delete the same files. To avoid problems, they has a file permission and ownership system. It restrict who can access and modify each of the files in the computer. This system has two parts: • Who is the owner of the file ? • What type of access has each of the users in the System ?
  • 45. 4. Permissions and Owners. Ownership: Each file has assigned an user owner and a group owner. The user owner can be: + Real user (for example: aure). + Virtual user created by a program (for example: mysql). + Administrator user or root. Listing users in a computer: cat /etc/passwd | cut -f ':' -f1 Listing groups in a computer: cat /etc/group | cut -f ':' -f1 Information about an user/group: id -u username; id -g group
  • 46. 4. Permissions and Owners. Permissions: Each file has assigned 9 different permissions, 3 for the file user-owner (u), 3 for the group-owner (g) and 3 for everyone else (o). There are 3 types of permissions or file attributes: + Readable (r), it has permission to read the file. + Writable (w), it has permission to write the file. + Executable (x), it has permission to execute as program. 10 letters code for linux file: ---------- switch OFF drwxrwxrwx switch ON user group other dir
  • 47. 4. Permissions and Owners. Permissions: Information about the file: ls -l; ls -l <target> Examples: -r--r--r-- Readable for everyone -rwxr—-r-- Readable for everyone, writable or executable only for the user-owner drw-rw-r-- Dir readable and writable for user and group, readable for everyone. To change the onwer/group: chowner owner:group file To change permissions: chmod [ugo] [+-=] [rwx] file chmod [0-7] [0-7] [0-7] file |rwx|rwx|rwx| |421|421|421|
  • 48. 4. Permissions and Owners. Permissions: sudo, is a program for Unix-like computer operating systems that allows users to run programs with the security privileges of another user (normally the superuser, or root). Its name is a concatenation of the su command (which grants the user a shell for the superuser) and "do", or take action. Example: sudo cp ./myscript.pl /usr/local/bin/
  • 49. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 50. 5. Installing programs. Ways to Install Programs: 1) Using Packages Managers 1.1) Graphical package manager (Example: Synaptic for Ubuntu). 1.2) High level command-line package manager (Example: apt for Debian, Yum for Red Hat) 1.3) Low level command-line package manager (Example: dpkg for Debian, rpm for Red Hat) 2) Moving Executable file program to the PATH* and the libraries need to their corresponding locations. 2.1) Precompiled. 2.2) From-source.
  • 51. 5. Installing programs. Install Programs Using Package Managers: 1.1) Graphical package manager (Synaptic for Ubuntu). Root/Sudo access
  • 52. 5. Installing programs. Install Programs Using Package Managers: 1.1) Graphical package manager (Synaptic for Ubuntu). Search Box/Button Mark for installation
  • 53. 5. Installing programs. Install Programs Using Package Managers: 1.1) Graphical package manager (Synaptic for Ubuntu). Additional changes like dependencies Apply for install
  • 54. 5. Installing programs. Install Programs Using Package Managers: 1.2) High level command-line package manager (Apt). APT (Advance Package Tool), is a free user interface that works with core libraries to handle the installation and removal of software on the Debian GNU/Linux distribution and its variants. APT simplifies the process of managing software on Unix- like computer systems by automating the retrieval, configuration and installation of software packages, either from binary files or by compiling source code. To search: apt-cache search [package] To install: apt-get install [package] To remove: apt-get remove [package]
  • 55. 5. Installing programs. Install Programs Using Package Managers: 1.2) High level command-line package manager (Apt). 1- Search Package Admin. 2- Install priviledges Package
  • 56. 5. Installing programs. Install Programs Using Package Managers: 1.3) Low level command-line package manager (Dpkg). DPKG is the software at the base of the Debian package management system. dpkg is used to install, remove, and provide information about .deb packages. To install: dpkg -i [package.deb] To remove: dpkg -r [package.deb]
  • 57. 5. Installing programs. Ways to Install Programs: 2) Moving Executable file program to the PATH* and the libraries need to their corresponding locations. 2.1) Precompiled. 1- Download file with wget
  • 58. 5. Installing programs. Ways to Install Programs: 2) Moving Executable file program to the PATH* and the libraries need to their corresponding locations. 2.1) Precompiled. 1- Download file with wget 2- Unarchive and uncompress 3- Copy** into program PATH (for example /usr/local/bin)
  • 59. 5. Installing programs. Ways to Install Programs: 2) Moving Executable file program to the PATH* and the libraries need to their corresponding locations. 2.2) From-source. Generally it is more complex because it requires the right version of one or more compilers (gcc, g++). The compilation usually has three or four steps: 1) ./configuration 2) make 3) (make check) 4) make install
  • 60. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 61. 6. Piping and scripting. Piping: Piping: Run different programs sequencially where the output of a program will be the input of the next one. prog1 prog2 prog3 prog4 Bash use the sign '|' (pipe) to pipe the output as the input of the next program. For example: ls /mydir | wc -l # To count files in a folder
  • 62. 6. Piping and scripting. Piping: Another popular combination is redirect the stdout (output) to a file using '>' (write or overwrite if it exists) or '>>' (append): grep '>' fasta | sed 's/>//' > id.txt # To extract id lines from a fasta file, remove the id and # redirect the output to a id.txt file
  • 63. 6. Piping and scripting. Scripting: Shell commands can be run from a file as a script using the Shell program, for example: bash or sh Example: bash script fastaseq_count #! /bin/sh ## The argument will be use as $1 variable grep -c '>' $1 To use it sh ./fastaseq_count myfasta.file More info: http://tldp.org/LDP/abs/html/
  • 64. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 65. 7. Enviroment variables. Environment Variables: There are two types of variables in a Unix shell: 1) System variables. Variables defined by the system such as home dir or the executable path. 2) User define variables. Variables defined by the user during a bash session. To define an user variable: myvariable=test a=$(grep -c '>' fasta)
  • 66. 7. Enviroment variables. System Variables: SYSTEM VARIABLE MEANING SHELL Shell name BASH Shell name BASH_VERSION Shell version COLUMNS Number of columns printed in the screen LINES Number of lines printed in the screen HOME Home dir. LOGNAME Login name OSTYPE Operating system type PATH Path dirs PS1 Prompt settings PWD Current working dir USERNAME Username currently login into the system
  • 67. 7. Enviroment variables. Commands to interact with variables: COMMAND MEANING env Print environment variables echo Print environment variable value export Create an env. variable available to all child process source Execute commands from a filename Example: How to add program executable dir (/home/user/shscripts) to the executable path. export PATH=/home/user/shscripts:$PATH # ':' is used to separate elements # $PATH adds the old PATH variable
  • 68. Basic Linux: 1. Introduction 1.1 What is Linux ? 1.2 Distributions. 2. Terminal and Virtual Consoles. 2.1 What is a console ? 2.2 Commands, stdin, stdout and stderr. 2.3 Typing shortcuts. 3. Popular commands. 3.1 Directories, files and compressions. 3.2 Manual and help. 3.3 Monitoring resources. 3.4 Networking. 4. Permissions and Owners. 5. Installing programs. 6. Piping and scripting. 7. Enviroment variables. 8. Popular combinations in bioinformatics.
  • 69. 8. Popular combinations in bioinformatics. Fasta and fastq file manipulation: Just a reminder: 1) Fasta files. It is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. >SEQUENCE_ID1 DESCRIPTION ATGCGCGCGCGCGCGCGCGGGTAGCAGATGACGACACAGAGCGAGGATGCGCTGAGAGTA GTGTGACGACGATGACGGAAAATCAGATGGACCCGATGACAGCATGACGATGGGACGGGA AAGATTGGACCAGGACAGGACCAGGACCAGGACCAGGGATTAGA >SEQUENCE_ID2 DESCRIPTION ATGGGGGGGACGACGATGGACACAGAGACAGAGACGACGACAGCAGACAGATTTACCTTA GACGAGATAGGAGAGACGACAGATATATATATATAGCAGACAGACAGACATTTAGACGAG ACGACGATAGACGAT http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml
  • 70. 8. Popular combinations in bioinformatics. Fasta and fastq file manipulation: Just a reminder: 1) Fasta files. I) Single line description (defline) with a greater- than (“>”) symbol in the first column. II) There should be not space between “>” symbol and the first letter of the identifier. III) Sequences are in multiple lines after the defline FILE EXTENSION MEANING .fasta, .fas, .seq Generic sequence fasta format .fna Fasta nucleic acid .faa Fasta amino acid .ffn, frn Sequence coding region / Non-coding regions
  • 71. 8. Popular combinations in bioinformatics. Fasta and fastq file manipulation: COMMANDS COMBINATION FUNCTION grep -c '>' file.fasta Count sequences grep '>' file.fasta | sed 's/>//' > defline.txt Extract IDs and descriptions grep '>' file.fasta | sed 's/>//' | sed 's/s+/t/g' | cut -f1 > id.txt Extract IDs grep '>' file.fasta | sed 's/>//' | sed 's/s+/t/g' | cut -f1 | sort -u Extract unique IDs > unique_id.txt grep -v '>' file.fasta | wc -m Total character in sequences a=$(grep -c '>' fasta); b=$(grep '>' fasta | sort -u | wc -l); test Ask if there are redundant $a -eq $b && echo “TRUE” || echo “FALSE” IDs in a fasta file
  • 72. 8. Popular combinations in bioinformatics. Fasta and fastq file manipulation: Just a reminder: 2) Fastq files. It is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are encoded with a single ASCII character for brevity. @SEQUENCE_ID1 ATGCGCGCGCGCGCGCGCGGGTAGCAGATGACGACACAGAGCGAGGATGCGCTGAGAGTA GTGTGACGACGATGACGGAAAATCAGA + BBBBBPPPPPXXXXX^^^^^^^^^^^^^^^^^^^^^^^__^^^^^^^^^^^^_eeeeeee [[[[[^^^^]]]]XXXXXPPPPPBBBB http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml
  • 73. 8. Popular combinations in bioinformatics. Fasta and fastq file manipulation: 2) Fastq files. I) Single line ID with at symbol (“@”) in the first column. II) There should be not space between “@” symbol and the first letter of the identifier. III) Sequences are in multiple lines after the ID line IV) Single line with plus symbol (“+”) in the first column to represent the quality line. V) quality ID line can have or have not ID VI) Quality values are in multiple lines after the + line
  • 74. 8. Popular combinations in bioinformatics. Fasta and fastq file manipulation: 2) Fastq files. Quality score character vary depending of the sequence method and sequencer version. SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS..................................................... ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...................... ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII...................... .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ...................... LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL.................................................... !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~ | | | | | | 33 59 64 73 104 126 S - Sanger Phred+33, raw reads typically (0, 40) X - Solexa Solexa+64, raw reads typically (-5, 40) I - Illumina 1.3+ Phred+64, raw reads typically (0, 40) J - Illumina 1.5+ Phred+64, raw reads typically (3, 40) with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold) (Note: See discussion above). L - Illumina 1.8+ Phred+33, raw reads typically (0, 41) Wikipedia
  • 75. 8. Popular combinations in bioinformatics. Fasta and fastq file manipulation: 2) Fastq files. Illumina sequence identifiers contains additional information. ILLUMINA1.3+: @HWUSI-EAS100R:6:73:941:1973#0/1 Name Instrument Unique Flow Lane Tile N. Flow Lane X Coord.Flow Lane Y Coord Flow Lane Multiplexing Number/Tag Pair Wikipedia
  • 76. 8. Popular combinations in bioinformatics. Fasta and fastq file manipulation: 2) Fastq files. Illumina sequence identifiers contains additional information. ILLUMINA1.8+: @EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG Name Instrument Unique Run ID Flowcell ID Flow Lane Tile N. Flow Lane X Coord.Flow Lane Y Coord Flow Lane Number/Tag Multiplexing Pair Filtered Control Bits (0 no control) Wikipedia
  • 77. 8. Popular combinations in bioinformatics. Fasta and fastq file manipulation: COMMANDS COMBINATION FUNCTION grep -c '@' file.fastq Count sequences a=$(grep -c '@' fastq); b=$(grep '+' fastq); test $a -eq $b && Check if there are the echo “TRUE” || echo “FALSE” same of @ than + lines grep '@' pair1.fastq | sed 's//1//' > pair1.id; grep '@' Check if the pairs have pair2.fastq | sed 's//2//'> pair2.id; diff pair1.id pair2.id | a=$ the same IDs (grep -c '[>|<]') | echo “$a differences”; rm pair1.id; rm pair2.id; grep '@' pair1.fastq | sed 's//1//> pair1.id; grep '@' Redirect the differences to pair2.fastq | sed 's/2//' > pair2.id; diff pair1.id pair2.id > diff. a file pair1.pair2.txt; rm pair1.id; rm pair2.id;
  • 78. 8. Popular combinations in bioinformatics. Blast and blat result file manipulation: Just a reminder: 3) Blast m8 result files (hit table). It is a tabular (separator=”t”) text-based format with 8 columns: query_id, subject_id, %identity, align_length, mismatches, gaps_openings, q_start, q_end, s_start, s_end, e_value, bit_score QID1 SID5 91.00 178 0 0 1 178 1001 1179 1e-40 156 QID1 SID8 65.00 68 1 1 110 178 1050 1118 1e-25 87
  • 79. 8. Popular combinations in bioinformatics. Blast and blat result file manipulation: COMMANDS COMBINATION FUNCTION cut -f1 blast.m8 | wc -l Count total hits cut -f1 blast.m8 | sort -u | wc -l Count query sequences with at least one hit cut -f1 blast.m8 | sort | uniq -c > query.hits.txt Count number of hits for each query sequence cut -f2 blast.m8 | sort -u | wc -l Count subject sequences with at least one hit awk '{if ($3 > 90) print $0}' blast.m8 | cut -f1 | wc -l Count hits with % identity > 90 % awk '{if ($3>90 && $4>100) print $0}' blast.m8 | cut -f1 | wc -l Count hits with % identity > 90 % and Length > 100
  • 80. 8. Popular combinations in bioinformatics. Blast and blat result file manipulation: Just a reminder: 3) Blat psl result files (hit table). It is a tabular (separator=”t”) text-based format with 21 columns: match, mismatch, repmatch, N's, Q_gap_count, T_gap_count, T_gap_bases, strand, Qname, Qsize, Qstart, Qend, Tname, Tsize, Tstart, Tend, block_count, block_sizes, qStarts, tStarts COMMANDS COMBINATION FUNCTION grep '^[0-9]' | cut -f1 blat.psl | wc -l Count total hits grep '^[0-9]' | cut -f10 blat.psl | sort -u | wc -l Count query sequences with at least one hit awk '{if ($1*100/$11 > 90) print $0}' blat.psl | cut -f1 | wc -l Count query sequences with hit coverage > 90%
  • 81. Exercises: 1) Create in your home dir the /LinuxExercises dir. 2) Change dir to LinuxExercise dir and print into the screen the complete path, 3) Download the Arabidopsis thaliana complete genome from: ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/ 4) Create a fasta file with all the chromosomes. 5) Download the Arabidopsis thaliana cdna set from: ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR10_blastsets/
  • 82. Exercises: 6) Install Blast in your computer using apt. 7) Install last executable version of Blat (http://hgwdev.cse.ucsc.edu/~kent/exe/) 8) Run a blast and a blat using A. thaliana genome as database (subject) and A.thaliana cds as query (object) 9) How many cds sequences have at least one match in each of the Results ? 10) Could you design a bash script to count how many introns have each gene model based in the blat results ?
  • 83. Solutions: 1) Create in your home dir the /LinuxExercises dir. mkdir LinuxExercises 2) Change dir to LinuxExercise dir and print into the screen the complete path, cd LinuxExercises 3) Download the Arabidopsis thaliana complete genome from: ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/ wget ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/TAIR10_chr*
  • 84. Solutions: 4) Create a fasta file with all the chromosomes. cat TAIR10_chr* > TAIR10_genome.fasta 5) Download the Arabidopsis thaliana cdna set from: ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR10_blastsets/ wget ftp://ftp.arabidopsis.org/home/tair/Sequences/blast_datasets/TAIR10_blastsets/ TAIR10_cds_20101214
  • 85. Solutions: 6) Install Blast in your computer using apt. apt-cache search blast sudo apt-get install blast2 7) Install last executable version of Blat (http://hgwdev.cse.ucsc.edu/~kent/exe/linux/) wget http://hgwdev.cse.ucsc.edu/~kent/exe/linux/blatSuite.zip mkdir blatdir unzip -c blatdir blatSuite.zip sudo cp ./blatdir/* /usr/local/bin
  • 86. Solutions: 8) Run a blast and a blat using A. thaliana genome as database (subject) and A.thaliana cds as query (object) formatdb -p F TAIR10_genome.fasta blastall -p blastn -d TAIR10_genome.fasta -i TAIR10_cds_20101214 -o TAIR10_cds.blastn.TAIR10_genome.m8 -m 8 -a 2 blat TAIR10_genome.fasta TAIR10_cds_20101214 TAIR10_cds.blat.TAIR10_genome.psl 9) How many cds sequences have at least one match in each of the Results ? cut -f1 TAIR10_cds.blastn.TAIR10_genome.m8 | sort -u | wc -l grep '^[0-9]' TAIR10_cds.blat.TAIR10_genome.psl | cut -f9 | sort -u | wc -l
  • 87. Solutions: 10) Could you design a bash script to count how many introns have each gene model based in the blat results ? #! /bin/sh grep '^[0-9]' $1 | awk '{ if ($1 == $11) print $0}' | cut f10,18 | awk '{ print $1”t”$2-1}'