SlideShare a Scribd company logo
1 of 26
Download to read offline
Elegant Graphics for Data
  Analysis with ggplot2
      Yann Abraham
     BaselR 28.04.2010
Who is Yann Abraham
• Biochemist by training
• Bioinformatician by trade
• Pharma/Biotech
  – Cellzome AG
  – Novartis Pharma AG


• http://ch.linkedin.com/in/yannabraham
How to Represent Data
“A Picture is Worth a Thousand Words”
• Visualization is a critical component of data
  analysis
• Graphics are the most efficient way to digest
  large volumes of data & identify trends
• Graphical design is a mixture of mathematical
  and perceptual science
A Straightforward Way to Create
              Visualizations
• Grammar of Graphics provides a framework to
  streamline the description and creation of graphics
• For a given dataset to be displayed:
   – Map variables to aesthetics
   – Define Layers
      • A representation (a ‘geom’) ie line, boxplot, histogram,…
      • Associated statistical transformation ie counts, model,…
   – Define Scales
      • Color, Shape, axes,…
   – Define Coordinates
   – Define Facets
Why Use ggplot2?
• Simple yet powerful syntax
• Provides a framework for creating any type of
  graphics
• Implements basic graphical design rules by
  default
An Example
• 4 cell lines where treated with a compound
  active against a class of enzymes
• Proteins where extracted and quantified using
  mass spectrometry
• Is there anything interesting?!?
Excel…
R…
R…
R…
R… (this could go on for hours)
…ggplot2!
ggplot2!
ggplot2!
ggplot2!
ggplot2!



qplot(Experiment_1,Experiment_2,data=comp,color=CELLLINE,facets=ISTARGET~CELLLINE)+
         coord_equal()+
         scale_x_log10()+
         scale_y_log10()
When Visualization Alone is Not
              Enough
• Some datasets are large multidimensional
  data structures
• Representing data from such structure
  requires data transformation
• R is good at handling large sets
• R functions for handling multidimensional sets
  are complex to use
Easy Data Transformation With plyr
• plyr provides wrappers around typical R
  operations
  – Split
  – Apply
  – Combine
• plyr functions are similar to the by() function
Why use plyr?
• Simple syntax
• Predictable output
• Tightly integrated into ggplot2

• This comes at a price – somewhat slower than
  apply
An Example
• Given a set of raw data from a High
  Throughput Screen, compute the plate-
  normalized effect
The standard R way
    plate.mean <- aggregate(hts.data$RAW,
         list(hts.data$PLATE_ID),mean)

names(plate.mean) <- c(‘PLATE_ID’,’PLATE_MEAN’)

     hts.data <- merge(hts.data,plate.mean)

               hts.data$NORM<-
      hts.data$RAW/hts.data$PLATE_MEAN
The plyr way
             hts.data <-
ddply(hts.data,.(PLATE_ID),function(df) {
   df$NORM<-df$RAW/mean(df$RAW)
             return(df) }
                   )
Benefits of Using plyr & ggplot2
• Compact, straightforward syntax
  – Good basic output, complex options only required
    for polishing
• Shifts focus from plotting to exploring
  – Presentation graphics can be created from there
    at minimal cost
  – Data transformation is intuitive
• Powerful statistics available
  – It’s R!
Some links…
• The Grammar of Graphics book by Leland
  Wilkinson
• The ggplot2 book by Hadley Wickham
  – And the corresponding website
• A presentation about plyr by JD Long
  – And his initial blog post
THANK YOU FOR YOUR
    ATTENTION!

More Related Content

What's hot

R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenEdureka!
 
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 r-kor
 
Vectors data frames
Vectors data framesVectors data frames
Vectors data framesFAO
 
Machine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - PerceptronMachine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - PerceptronAndrew Ferlitsch
 
Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Ha Phuong
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개r-kor
 
Pandas data transformational data structure patterns and challenges final
Pandas   data transformational data structure patterns and challenges  finalPandas   data transformational data structure patterns and challenges  final
Pandas data transformational data structure patterns and challenges finalRajesh M
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in RSujaAldrin
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
Python data structures - best in class for data analysis
Python data structures -   best in class for data analysisPython data structures -   best in class for data analysis
Python data structures - best in class for data analysisRajesh M
 
I Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesI Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesWork-Bench
 
5. working on data using R -Cleaning, filtering ,transformation, Sampling
5. working on data using R -Cleaning, filtering ,transformation, Sampling5. working on data using R -Cleaning, filtering ,transformation, Sampling
5. working on data using R -Cleaning, filtering ,transformation, Samplingkrishna singh
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesWork-Bench
 
Self taught clustering
Self taught clusteringSelf taught clustering
Self taught clusteringSOYEON KIM
 
3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data framekrishna singh
 
DATA VISUALIZATION WITH R PACKAGES
DATA VISUALIZATION WITH R PACKAGESDATA VISUALIZATION WITH R PACKAGES
DATA VISUALIZATION WITH R PACKAGESFatma ÇINAR
 

What's hot (20)

R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in Heaven
 
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
 
Chapter 9 ds
Chapter 9 dsChapter 9 ds
Chapter 9 ds
 
Vectors data frames
Vectors data framesVectors data frames
Vectors data frames
 
Machine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - PerceptronMachine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - Perceptron
 
Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
 
Pandas data transformational data structure patterns and challenges final
Pandas   data transformational data structure patterns and challenges  finalPandas   data transformational data structure patterns and challenges  final
Pandas data transformational data structure patterns and challenges final
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
Python data structures - best in class for data analysis
Python data structures -   best in class for data analysisPython data structures -   best in class for data analysis
Python data structures - best in class for data analysis
 
I Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesI Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for Trees
 
5. working on data using R -Cleaning, filtering ,transformation, Sampling
5. working on data using R -Cleaning, filtering ,transformation, Sampling5. working on data using R -Cleaning, filtering ,transformation, Sampling
5. working on data using R -Cleaning, filtering ,transformation, Sampling
 
Broom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data FramesBroom: Converting Statistical Models to Tidy Data Frames
Broom: Converting Statistical Models to Tidy Data Frames
 
Self taught clustering
Self taught clusteringSelf taught clustering
Self taught clustering
 
chapter1
chapter1chapter1
chapter1
 
Chapter 3 ds
Chapter 3 dsChapter 3 ds
Chapter 3 ds
 
3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data frame
 
DATA VISUALIZATION WITH R PACKAGES
DATA VISUALIZATION WITH R PACKAGESDATA VISUALIZATION WITH R PACKAGES
DATA VISUALIZATION WITH R PACKAGES
 
PCA and SVD in brief
PCA and SVD in briefPCA and SVD in brief
PCA and SVD in brief
 

Similar to Elegant Graphics for Data Analysis with ggplot2

Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsRajendran
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdfRohanBorgalli
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphsStanka Dalekova
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxIvo Andreev
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningmy6305874
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxhelzerpatrina
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 
Python for data science
Python for data sciencePython for data science
Python for data sciencebotsplash.com
 
Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.Elena Sügis
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingBigML, Inc
 
Matplotlib Review 2021
Matplotlib Review 2021Matplotlib Review 2021
Matplotlib Review 2021Bhaskar J.Roy
 
Matplotlib_Complete review_2021_abridged_version
Matplotlib_Complete review_2021_abridged_versionMatplotlib_Complete review_2021_abridged_version
Matplotlib_Complete review_2021_abridged_versionBhaskar J.Roy
 
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...tdc-globalcode
 

Similar to Elegant Graphics for Data Analysis with ggplot2 (20)

Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdf
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
R programmingmilano
R programmingmilanoR programmingmilano
R programmingmilano
 
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learning
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Week-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docxWeek-3 – System RSupplemental material1Recap •.docx
Week-3 – System RSupplemental material1Recap •.docx
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
Python for data science
Python for data sciencePython for data science
Python for data science
 
Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision Making
 
Matplotlib Review 2021
Matplotlib Review 2021Matplotlib Review 2021
Matplotlib Review 2021
 
Matplotlib_Complete review_2021_abridged_version
Matplotlib_Complete review_2021_abridged_versionMatplotlib_Complete review_2021_abridged_version
Matplotlib_Complete review_2021_abridged_version
 
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Elegant Graphics for Data Analysis with ggplot2

  • 1. Elegant Graphics for Data Analysis with ggplot2 Yann Abraham BaselR 28.04.2010
  • 2. Who is Yann Abraham • Biochemist by training • Bioinformatician by trade • Pharma/Biotech – Cellzome AG – Novartis Pharma AG • http://ch.linkedin.com/in/yannabraham
  • 4. “A Picture is Worth a Thousand Words” • Visualization is a critical component of data analysis • Graphics are the most efficient way to digest large volumes of data & identify trends • Graphical design is a mixture of mathematical and perceptual science
  • 5. A Straightforward Way to Create Visualizations • Grammar of Graphics provides a framework to streamline the description and creation of graphics • For a given dataset to be displayed: – Map variables to aesthetics – Define Layers • A representation (a ‘geom’) ie line, boxplot, histogram,… • Associated statistical transformation ie counts, model,… – Define Scales • Color, Shape, axes,… – Define Coordinates – Define Facets
  • 6. Why Use ggplot2? • Simple yet powerful syntax • Provides a framework for creating any type of graphics • Implements basic graphical design rules by default
  • 7. An Example • 4 cell lines where treated with a compound active against a class of enzymes • Proteins where extracted and quantified using mass spectrometry • Is there anything interesting?!?
  • 10. R…
  • 11. R…
  • 12. R… (this could go on for hours)
  • 18. When Visualization Alone is Not Enough • Some datasets are large multidimensional data structures • Representing data from such structure requires data transformation • R is good at handling large sets • R functions for handling multidimensional sets are complex to use
  • 19. Easy Data Transformation With plyr • plyr provides wrappers around typical R operations – Split – Apply – Combine • plyr functions are similar to the by() function
  • 20. Why use plyr? • Simple syntax • Predictable output • Tightly integrated into ggplot2 • This comes at a price – somewhat slower than apply
  • 21. An Example • Given a set of raw data from a High Throughput Screen, compute the plate- normalized effect
  • 22. The standard R way plate.mean <- aggregate(hts.data$RAW, list(hts.data$PLATE_ID),mean) names(plate.mean) <- c(‘PLATE_ID’,’PLATE_MEAN’) hts.data <- merge(hts.data,plate.mean) hts.data$NORM<- hts.data$RAW/hts.data$PLATE_MEAN
  • 23. The plyr way hts.data <- ddply(hts.data,.(PLATE_ID),function(df) { df$NORM<-df$RAW/mean(df$RAW) return(df) } )
  • 24. Benefits of Using plyr & ggplot2 • Compact, straightforward syntax – Good basic output, complex options only required for polishing • Shifts focus from plotting to exploring – Presentation graphics can be created from there at minimal cost – Data transformation is intuitive • Powerful statistics available – It’s R!
  • 25. Some links… • The Grammar of Graphics book by Leland Wilkinson • The ggplot2 book by Hadley Wickham – And the corresponding website • A presentation about plyr by JD Long – And his initial blog post
  • 26. THANK YOU FOR YOUR ATTENTION!