SlideShare a Scribd company logo
1 of 10
Lab#.

Data manipulation: Biostatistic & Gene expression data analysis
                       (Microarray, NGS & qRT-PCR)




        Theme: Transcriptional Program in Response of Human
                        Fibroblasts to Serum.
                              Etienne Z. Gnimpieba
                                  BRIN WS 2012
                             Sioux Falls, May 30 2012
                          Etienne.gnimpieba@usd.edu
Data manipulation                Gene expression data analysis
                                                                           OMIC World




   Genomics                           DNA                            DNA

                                       E                  Transcription



                                                                                          Degradation
                                                                    mRNA
                    Transcriptomics
                                                             Translation
Functional                                     Gene
                                             Repression
Genomics                                                                                  Degradation
                    Proteomics                                             E


                                                                               Catalyse




    Metabolomics                                S                                                       P
Data manipulation    Gene expression data analysis
                                                     OMIC World




                    GENOMICS
Data manipulation                               Gene expression data analysis
                                                                       Excel used in genomics




       • How to select columns
       • How to use functions
       • How to anchor a cell value in a function
       • How to copy the function result and not the
         function itself
       • How to sort data by columns
       • How to search and replace




           •   Frouin, V. & Gidrol, X. (2005)           •   Transcriptome ENS (France)           Etienne Z. Gnimpieba
                                                                                                     BRIN WS 2012
           •   CBB group (Berlin)                                                               Sioux Falls, May 31 2012
Data manipulation                                Gene expression data analysis
                                                         Excel used in genomics: Pre-treatment

Centering and scaling data
    1.   Open the file containing the experiment series (your expression matrix) in Excel software, using the
         tabulation character as the column separator. Click on the second spreadsheet named Fibroblast real.
         Look over this spreadsheet quickly. It is a realistic data set from a microarray experiment. Click back on
         the first spreadsheet named Fibroblast lab. We will be using a condensed version.

    2.   For one column (corresponding to one DNA microarray experiment), calculate the mean value, using the
         AVERAGE Excel function. Verify that the value obtained is equal to zero.

    3.   If it is not the case, from each experiment (15MIN, 30MIN, 2HR, etc…) remove the log2(Ratio) value from
         the corresponding mean value by:

                    - subtract the average value for each column from the corresponding individual values (for the
                    first example, B2-$B$37). Place these values in the corresponding table on the right (R2). Use the
                    drag down box to quickly finish a column.

                    - Continue to center the data for each column (each DNA microarray experiment), filling in the
                    blank table to the right. Again use the AVERAGE function to find mean values for each column in
                    the new table. Each average should now be zero.

                    - Be careful, if there are missing values (empty cells), replace empty contents with the NULL or
                    NA command, in order to avoid introducing a zero value in Excel calculations in this cell. Indeed,
                    a missing value is different from a true null one!

                    - Be careful with decimal separator handling in Excel (dot or coma)!

           •   Frouin, V. & Gidrol, X. (2005)               •   Transcriptome ENS (France)
           •   CBB group (Berlin)
Data manipulation                                         Gene expression data analysis
                                                Excel used in genomics : Differential expression analysis (1)
SAM (Significance Analysis of Microarray), Excel macro allowing to search for differentially expressed
genes using a bootstrapping method. Website: http://www-stat.stanford.edu/~tibs/SAM/



Significance Analysis of Microarrays (SAM):
SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet
makes this tool easier to use for most microarray users. Using SAM implies several modifications in your
data file:

 The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal
  separator.


 The header line depends on the type of analysis you want to perform. You can refer to SAM manual
  for more information. You must highlight your header if you don’t want to loose the experiment
  information.


 Two annotation columns are available. SAM always references its calculation to the line number in
  the departure sheet.



           •   Frouin, V. & Gidrol, X. (2005)                       •   Transcriptome ENS (France)
           •   CBB group (Berlin)
Data manipulation                                          Gene expression data analysis
                                                Excel used in genomics : Differential expression analysis (2)

 Under the Add-Ins tab, view the “SAM” toolbar Command. Highlight from R2 to AF37. Now select
  SAM. When SAM macro is launched in the tool bar, a setting window appears. For further
  information on the various options you can choose, it is best to refer to the SAM manual. However,
  the first important thing to do is to indicate if the data source has been transformed in log2 or not. In
  this case we will select Unlogged. Then, as data bootstrapping uses a random generator, you need to
  initialize it several times by selecting “Generate Random Seed”.

 Click “OK”. Once all the chosen iterations have been done, SAM displays a plot representing each
  gene in reference to its score in the real distribution compared to the random distributions.
  Therefore, the differentially expressed genes are the ones moving away from the 45° slope line.

 The table that appears indicates for each delta value, the number of putative differentially expressed
  genes, the significant genes, and the number of false positive genes estimated using the False
  Discovery Rate (FDR). The user can change the delta value according to the number of false positive
  or significant genes he or she wants to obtain.
 Choose a delta value by selecting “Manually Enter Delta”. Enter your own delta value between 0 and
  0.25. Then if you select the “List Significant Genes” button, SAM displays the list of differentially
  expressed genes in the “SAM output” sheet according to the delta value you chose.

 This sheet summarizes the selected parameters and gives you the list of induced and repressed
  genes.



           •   Frouin, V. & Gidrol, X. (2005)                       •   Transcriptome ENS (France)               Etienne Z. Gnimpieba
                                                                                                                     BRIN WS 2012
           •   CBB group (Berlin)                                                                               Sioux Falls, May 31 2012
Data manipulation                             Gene expression data analysis
                                               GEPAS: Gene Expression pattern Analysis suite

   Review this section. Become familiar on your own by reviewing each section listed
   under tools.

    Verify that the data file FibroGEPAS.txt is in your folder
    Open the file
    Open GEPAS portal on
     http://www.transcriptome.ens.fr/gepas/index.html
    Click on “Tools”
       Preprocessing
           Preprocess DNA array data files: log-transformation, replicate
              handling, missing value imputation, filtering and
              normalization
           Filtering
       Viewing
       Clustering
       Differential expression
       Classification
       Data mining

         •   Frouin, V. & Gidrol, X. (2005)           •   Transcriptome ENS (France)            Etienne Z. Gnimpieba
                                                                                                    BRIN WS 2012
         •   CBB group (Berlin)                                                                Sioux Falls, May 31 2012
Gene Expression Data Analysis
  Context
  Statement of problem / Case study:
        The temporal program of gene expression during a model physiological response of human cells, the response of fibroblasts to serum, was explored with a
  complementary DNA microarray representing about 8600 different human genes. Genes could be clustered into groups on the basis of their temporal patterns of expression in
  this program. Many features of the transcriptional program appeared to be related to the physiology of wound repair, suggesting that fibroblasts play a larger and richer role in
  this complex multicellular response than had previously been appreciated.

 Specification & aims                                                                        Resolution process
Aim:
The purpose of this lab is to initiate a gene expression data analysis process.              T1. Gene expression overview
We simulated the application on “Transcriptional Program in the Response of
Human Fibroblasts to Serum” . Now we can understand how a researcher can                        T1.1. Review of genomics place in OMIC- world
come to identify a significant expressed gene from microarray datasets.                         T1.2. Microarray data technics and process
                                                                                                T1.3. Data analysis cycle and tools
                                                                                             T2. Excel used in Genomics
                                                                                                Objective: use of basic excel functionalities to solve some gene
                                                                                                                  expression data analysis needs
                                                                                                T2.1. Column manipulation, functions used, anchor, copy with
                                                                                                function, sort data, search and replace
                                                                                                T2.2. Experiment comparison: Data pre-treatment
                                                                                                T1.3. Differential expressed gene from replicate experiments (SAM)
Target preparation                   Hybridization
                                                                Slide scanning
                                                                                              T2. GEPAS: Gene expression analysis pattern suite
                                                                                               Objective: use of the GEPAS suite to apply the whole microarray data
                                                                                                               analyzing process on fibroblast data.


                                                                                                   Preprocessing
                                                                                                   Viewing
                                                                                                   Clustering
                                                                                                   Differential expression
                                                        Expression profile clustering
         Data analysis                                                                             Classification
 Acquired skills                                                                                   Data mining
 -   Gene expression data overview
 -   Excel Used for genomics                                                                  Conclusion: ?
 -   Microarray data analysis using GEPAS

           16 Vishwanath   R. Iyer, Scince, 1999                                                                                                                                      9
END.

More Related Content

Similar to Lab Gene Expression Data Analysis

SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesMike Hucka
 
Multi-scale network biology model & the model library
Multi-scale network biology model & the model libraryMulti-scale network biology model & the model library
Multi-scale network biology model & the model librarylaserxiong
 
SBML: What Is It About?
SBML: What Is It About?SBML: What Is It About?
SBML: What Is It About?Mike Hucka
 
Software for SBML Today
Software for SBML TodaySoftware for SBML Today
Software for SBML TodayMike Hucka
 
Gene expression introduction
Gene expression introductionGene expression introduction
Gene expression introductionSetia Pramana
 
GeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein productionGeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein productionThermo Fisher Scientific
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0Computer Science Club
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Projectbutest
 
Variation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar seriesVariation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar seriesDenise Carvalho-Silva, PhD
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Leighton Pritchard
 
Survey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue ClassificationSurvey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue Classificationperfj
 
Creating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSACreating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSAheathermerk
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...Natalio Krasnogor
 
Sample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationSample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationPhD Assistance
 

Similar to Lab Gene Expression Data Analysis (20)

Gene Expression Lab Summary
Gene Expression Lab SummaryGene Expression Lab Summary
Gene Expression Lab Summary
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resources
 
Multi-scale network biology model & the model library
Multi-scale network biology model & the model libraryMulti-scale network biology model & the model library
Multi-scale network biology model & the model library
 
SBML: What Is It About?
SBML: What Is It About?SBML: What Is It About?
SBML: What Is It About?
 
Software for SBML Today
Software for SBML TodaySoftware for SBML Today
Software for SBML Today
 
Gene expression introduction
Gene expression introductionGene expression introduction
Gene expression introduction
 
GeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein productionGeneArt® services - Gene synthesis through protein production
GeneArt® services - Gene synthesis through protein production
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 
CSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning ProjectCSCI 6505 Machine Learning Project
CSCI 6505 Machine Learning Project
 
Variation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar seriesVariation and the VEP: Ensembl Online Webinar series
Variation and the VEP: Ensembl Online Webinar series
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010
 
B4 jeanmougin
B4 jeanmouginB4 jeanmougin
B4 jeanmougin
 
genetic computing
genetic computinggenetic computing
genetic computing
 
Survey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue ClassificationSurvey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue Classification
 
Creating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSACreating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSA
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
 
Sample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationSample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap Identification
 

More from USD Bioinformatics

Clinical Application of RNA Sequencing - Bladder Cancer
Clinical Application of RNA Sequencing - Bladder CancerClinical Application of RNA Sequencing - Bladder Cancer
Clinical Application of RNA Sequencing - Bladder CancerUSD Bioinformatics
 
Small Molecule Real Time Sequencing
Small Molecule Real Time SequencingSmall Molecule Real Time Sequencing
Small Molecule Real Time SequencingUSD Bioinformatics
 
Next Generation Sequencing - the basics
Next Generation Sequencing - the basicsNext Generation Sequencing - the basics
Next Generation Sequencing - the basicsUSD Bioinformatics
 
Session ii g3 overview epidemiology modeling mmc
Session ii g3 overview epidemiology modeling mmcSession ii g3 overview epidemiology modeling mmc
Session ii g3 overview epidemiology modeling mmcUSD Bioinformatics
 
Session ii g3 lab behavior science mmc
Session ii g3 lab behavior science mmcSession ii g3 lab behavior science mmc
Session ii g3 lab behavior science mmcUSD Bioinformatics
 
Session ii g2 overview protein modeling mmc
Session ii g2 overview protein modeling mmcSession ii g2 overview protein modeling mmc
Session ii g2 overview protein modeling mmcUSD Bioinformatics
 
Session ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmcSession ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmcUSD Bioinformatics
 
Session i overview bioinfo dm and app mmc
Session i overview bioinfo dm and app mmcSession i overview bioinfo dm and app mmc
Session i overview bioinfo dm and app mmcUSD Bioinformatics
 

More from USD Bioinformatics (20)

Clinical Application of RNA Sequencing - Bladder Cancer
Clinical Application of RNA Sequencing - Bladder CancerClinical Application of RNA Sequencing - Bladder Cancer
Clinical Application of RNA Sequencing - Bladder Cancer
 
Clinical Application 1.0
Clinical Application 1.0Clinical Application 1.0
Clinical Application 1.0
 
Clinical Application 2.0
Clinical Application 2.0Clinical Application 2.0
Clinical Application 2.0
 
Bridge Amplification Part 2
Bridge Amplification Part 2Bridge Amplification Part 2
Bridge Amplification Part 2
 
Bridge Amplification Part 1
Bridge Amplification Part 1Bridge Amplification Part 1
Bridge Amplification Part 1
 
Basic Steps of the NGS Method
Basic Steps of the NGS MethodBasic Steps of the NGS Method
Basic Steps of the NGS Method
 
True Single Molecule Sequencing
True Single Molecule SequencingTrue Single Molecule Sequencing
True Single Molecule Sequencing
 
Small Molecule Real Time Sequencing
Small Molecule Real Time SequencingSmall Molecule Real Time Sequencing
Small Molecule Real Time Sequencing
 
Sanger Dideoxy Method
Sanger Dideoxy MethodSanger Dideoxy Method
Sanger Dideoxy Method
 
Pyrosequencing 454
Pyrosequencing 454Pyrosequencing 454
Pyrosequencing 454
 
Ion Torrent Sequencing
Ion Torrent SequencingIon Torrent Sequencing
Ion Torrent Sequencing
 
Next Generation Sequencing - the basics
Next Generation Sequencing - the basicsNext Generation Sequencing - the basics
Next Generation Sequencing - the basics
 
Illumina Sequencing
Illumina SequencingIllumina Sequencing
Illumina Sequencing
 
Session ii g3 overview epidemiology modeling mmc
Session ii g3 overview epidemiology modeling mmcSession ii g3 overview epidemiology modeling mmc
Session ii g3 overview epidemiology modeling mmc
 
Session ii g3 lab behavior science mmc
Session ii g3 lab behavior science mmcSession ii g3 lab behavior science mmc
Session ii g3 lab behavior science mmc
 
Session ii g2 overview protein modeling mmc
Session ii g2 overview protein modeling mmcSession ii g2 overview protein modeling mmc
Session ii g2 overview protein modeling mmc
 
Session ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmcSession ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmc
 
Session ii g2 lab modeling mmc
Session ii g2 lab modeling mmcSession ii g2 lab modeling mmc
Session ii g2 lab modeling mmc
 
Session i overview bioinfo dm and app mmc
Session i overview bioinfo dm and app mmcSession i overview bioinfo dm and app mmc
Session i overview bioinfo dm and app mmc
 
Swiss model evaluation
Swiss model evaluationSwiss model evaluation
Swiss model evaluation
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Lab Gene Expression Data Analysis

  • 1. Lab#. Data manipulation: Biostatistic & Gene expression data analysis (Microarray, NGS & qRT-PCR) Theme: Transcriptional Program in Response of Human Fibroblasts to Serum. Etienne Z. Gnimpieba BRIN WS 2012 Sioux Falls, May 30 2012 Etienne.gnimpieba@usd.edu
  • 2. Data manipulation Gene expression data analysis OMIC World Genomics DNA DNA E Transcription Degradation mRNA Transcriptomics Translation Functional Gene Repression Genomics Degradation Proteomics E Catalyse Metabolomics S P
  • 3. Data manipulation Gene expression data analysis OMIC World GENOMICS
  • 4. Data manipulation Gene expression data analysis Excel used in genomics • How to select columns • How to use functions • How to anchor a cell value in a function • How to copy the function result and not the function itself • How to sort data by columns • How to search and replace • Frouin, V. & Gidrol, X. (2005) • Transcriptome ENS (France) Etienne Z. Gnimpieba BRIN WS 2012 • CBB group (Berlin) Sioux Falls, May 31 2012
  • 5. Data manipulation Gene expression data analysis Excel used in genomics: Pre-treatment Centering and scaling data 1. Open the file containing the experiment series (your expression matrix) in Excel software, using the tabulation character as the column separator. Click on the second spreadsheet named Fibroblast real. Look over this spreadsheet quickly. It is a realistic data set from a microarray experiment. Click back on the first spreadsheet named Fibroblast lab. We will be using a condensed version. 2. For one column (corresponding to one DNA microarray experiment), calculate the mean value, using the AVERAGE Excel function. Verify that the value obtained is equal to zero. 3. If it is not the case, from each experiment (15MIN, 30MIN, 2HR, etc…) remove the log2(Ratio) value from the corresponding mean value by: - subtract the average value for each column from the corresponding individual values (for the first example, B2-$B$37). Place these values in the corresponding table on the right (R2). Use the drag down box to quickly finish a column. - Continue to center the data for each column (each DNA microarray experiment), filling in the blank table to the right. Again use the AVERAGE function to find mean values for each column in the new table. Each average should now be zero. - Be careful, if there are missing values (empty cells), replace empty contents with the NULL or NA command, in order to avoid introducing a zero value in Excel calculations in this cell. Indeed, a missing value is different from a true null one! - Be careful with decimal separator handling in Excel (dot or coma)! • Frouin, V. & Gidrol, X. (2005) • Transcriptome ENS (France) • CBB group (Berlin)
  • 6. Data manipulation Gene expression data analysis Excel used in genomics : Differential expression analysis (1) SAM (Significance Analysis of Microarray), Excel macro allowing to search for differentially expressed genes using a bootstrapping method. Website: http://www-stat.stanford.edu/~tibs/SAM/ Significance Analysis of Microarrays (SAM): SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet makes this tool easier to use for most microarray users. Using SAM implies several modifications in your data file:  The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator.  The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. You must highlight your header if you don’t want to loose the experiment information.  Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet. • Frouin, V. & Gidrol, X. (2005) • Transcriptome ENS (France) • CBB group (Berlin)
  • 7. Data manipulation Gene expression data analysis Excel used in genomics : Differential expression analysis (2)  Under the Add-Ins tab, view the “SAM” toolbar Command. Highlight from R2 to AF37. Now select SAM. When SAM macro is launched in the tool bar, a setting window appears. For further information on the various options you can choose, it is best to refer to the SAM manual. However, the first important thing to do is to indicate if the data source has been transformed in log2 or not. In this case we will select Unlogged. Then, as data bootstrapping uses a random generator, you need to initialize it several times by selecting “Generate Random Seed”.  Click “OK”. Once all the chosen iterations have been done, SAM displays a plot representing each gene in reference to its score in the real distribution compared to the random distributions. Therefore, the differentially expressed genes are the ones moving away from the 45° slope line.  The table that appears indicates for each delta value, the number of putative differentially expressed genes, the significant genes, and the number of false positive genes estimated using the False Discovery Rate (FDR). The user can change the delta value according to the number of false positive or significant genes he or she wants to obtain.  Choose a delta value by selecting “Manually Enter Delta”. Enter your own delta value between 0 and 0.25. Then if you select the “List Significant Genes” button, SAM displays the list of differentially expressed genes in the “SAM output” sheet according to the delta value you chose.  This sheet summarizes the selected parameters and gives you the list of induced and repressed genes. • Frouin, V. & Gidrol, X. (2005) • Transcriptome ENS (France) Etienne Z. Gnimpieba BRIN WS 2012 • CBB group (Berlin) Sioux Falls, May 31 2012
  • 8. Data manipulation Gene expression data analysis GEPAS: Gene Expression pattern Analysis suite Review this section. Become familiar on your own by reviewing each section listed under tools.  Verify that the data file FibroGEPAS.txt is in your folder  Open the file  Open GEPAS portal on http://www.transcriptome.ens.fr/gepas/index.html  Click on “Tools”  Preprocessing  Preprocess DNA array data files: log-transformation, replicate handling, missing value imputation, filtering and normalization  Filtering  Viewing  Clustering  Differential expression  Classification  Data mining • Frouin, V. & Gidrol, X. (2005) • Transcriptome ENS (France) Etienne Z. Gnimpieba BRIN WS 2012 • CBB group (Berlin) Sioux Falls, May 31 2012
  • 9. Gene Expression Data Analysis Context Statement of problem / Case study: The temporal program of gene expression during a model physiological response of human cells, the response of fibroblasts to serum, was explored with a complementary DNA microarray representing about 8600 different human genes. Genes could be clustered into groups on the basis of their temporal patterns of expression in this program. Many features of the transcriptional program appeared to be related to the physiology of wound repair, suggesting that fibroblasts play a larger and richer role in this complex multicellular response than had previously been appreciated. Specification & aims Resolution process Aim: The purpose of this lab is to initiate a gene expression data analysis process. T1. Gene expression overview We simulated the application on “Transcriptional Program in the Response of Human Fibroblasts to Serum” . Now we can understand how a researcher can T1.1. Review of genomics place in OMIC- world come to identify a significant expressed gene from microarray datasets. T1.2. Microarray data technics and process T1.3. Data analysis cycle and tools T2. Excel used in Genomics Objective: use of basic excel functionalities to solve some gene expression data analysis needs T2.1. Column manipulation, functions used, anchor, copy with function, sort data, search and replace T2.2. Experiment comparison: Data pre-treatment T1.3. Differential expressed gene from replicate experiments (SAM) Target preparation Hybridization Slide scanning T2. GEPAS: Gene expression analysis pattern suite Objective: use of the GEPAS suite to apply the whole microarray data analyzing process on fibroblast data.  Preprocessing  Viewing  Clustering  Differential expression Expression profile clustering Data analysis  Classification Acquired skills  Data mining - Gene expression data overview - Excel Used for genomics Conclusion: ? - Microarray data analysis using GEPAS 16 Vishwanath R. Iyer, Scince, 1999 9
  • 10. END.

Editor's Notes

  1. During this lab, we have:A brief review Lab’s templateGenome exploration practice…
  2. Once you have your normalized data file, open it with Excel. You can filter out weak intensity spots (eliminate the weakest intensities in both channels) keep spot with ratio greater than 1 or lower than –1. Remember we are working with log2(ratio) so log2(2)=1. This method called “fold change” is the one used at the beginning of microarray analysis and is still useful if you do not have enough replicates to apply statistical treatments.The “fold change” method lack accuracy regarding the significant threshold to be fixed. That’s the reason why it is useful to apply a statistical method able to take into account intensity variations and most of all, the variability among experiments.Significance Analysis of Microarrays (SAM):SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet makes this tool easier to use for most of microarray users. Using SAM implies several modifications in your data file:The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator.The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. So you must duplicate your header if you don’t want to loose the experiment information (see image below).Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet.Before launching the macro, it is necessary to select the data precisely because SAM rejects lines with too much missing values (such as empty lines).
  3. Once you have your normalized data file, open it with Excel. You can filter out weak intensity spots (eliminate the weakest intensities in both channels) keep spot with ratio greater than 1 or lower than –1. Remember we are working with log2(ratio) so log2(2)=1. This method called “fold change” is the one used at the beginning of microarray analysis and is still useful if you do not have enough replicates to apply statistical treatments.The “fold change” method lack accuracy regarding the significant threshold to be fixed. That’s the reason why it is useful to apply a statistical method able to take into account intensity variations and most of all, the variability among experiments.Significance Analysis of Microarrays (SAM):SAM is an Excel macro freely available for academics on the web. The use of SAM in Excel spreadsheet makes this tool easier to use for most of microarray users. Using SAM implies several modifications in your data file:The ratio or intensity values in the Excel sheet must not contain any comas but only points as decimal separator.The header line depends on the type of analysis you want to perform. You can refer to SAM manual for more information. So you must duplicate your header if you don’t want to loose the experiment information (see image below).Two annotation columns are available. SAM always references its calculation to the line number in the departure sheet.Before launching the macro, it is necessary to select the data precisely because SAM rejects lines with too much missing values (such as empty lines).