SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
14-2-2013




         Metabolomics: data acquisition,
         preprocessing & quality control
                          Theo Reijmers,
                Analytical BioSciences, Leiden University
                     Barcelona, 14-02-2013




                                                             Coenzymes (vitamines)



                                                            Amino acids
carbohydrates
                                                                   hormones


                                                                 nucleotides




                                                                 Amino acids


lipids




                                                                                            1
14-2-2013




                       The metabolome
                                                                                              • Metabolites chemical
                                                                                                compounds with low
                                                                                                molecular weight
                dynamic range 109
concentration




                                                                                              • Many chemical classes, with
                                                                                                different chemical properties
                                                                                                (different from proteomics)
                                                                             polarity
                                                                             log P –6 to 14   • Large differences in
                                    mass < 1500 Da                                              abundance




                                    The metabolome


                                                                                                       global screen
                                                         dynamic range 109




      NMR
                                         concentration




      LC-MS

      custom                                                                                       polarity
                                                                                                   log P –6 to 14
                                      targeted mass < 1500 Da




                                                                                                                                       2
14-2-2013




  Analytical strategies: 1H NMR
   Advantages
   • Straightforward sample preparation
   • High sample throughput (robotic control)
   • Chemical shifts stable (if pH kept constant)
   • Quantification without standards
   • Highly repeatable and reproducible
   • Very valuable for identification of isolated metabolites

   Disadvantages
   • Limited sensitivity
   • Identification in complex mixtures
      rather difficult




Analytical strategies: LC-MS and GC-MS

• Chromatography: separation of compounds in
  sample

• Mass-spectrometry: detection of ions based
  on mass-to-charge ratio (m/z)




                                                                       3
14-2-2013




   Chromatography
Separation of chemical compounds
based on chemical properties                     chromatogram




Types of interaction:     A              B            C
A. Surface adsorption
B. Solvent partitioning
C. Ion exchange




Mass spectrometer
  separation of charged particles in the gas phase

  separation based on mass-to-charge ratio (m/z)


                              mass           mass
ionisation                                                 detector
                              analyser       analyser




                                                                             4
14-2-2013




    LC-MS vs GC-MS
Liquid C-MS                                                Gas C-MS

Advantages:                                                Advantages:
•Fast                                                      • Highly reproducible retention times
•Efficient                                                 • Sensitive detection for all metabolites
•Sensitive                                                 • Characteristic mass fingerprint
                                                             (identification!)
•Wide range of compounds

Disadvantages:                                             Disadvantages:
•Unstable*                                                 • Derivatization is needed to include
•Sensitivity compound dependent                               polar analytes
•Ion suppression gives rubbish data
•Relative quantification (if no authentic
standard is available)
*About as stable as a chocolate teapot in a heatwave. (Wilson 2009)




    Demonstration & Competence Lab
    • Applying technology developed in core in associate projects
      with industry, academia, clinics, knowledge institutes
    • Validation and implementation of metabolomics platforms
          •   QA/QC system/error model per metabolite
    • Clinical & preclinical studies (projects with partners)
    • >15 000 samples/year
    • > 2000 metabolites
    • Identification pipeline
    • Training & hands-on-workshops




                                                                                                              5
14-2-2013




                                    Platforms
•    Lipid analysis by LC-MS (ca. 300 individual compounds)
•    Amine analysis by LC-MS/MS (ca. 120 compounds)
•    Oxylipin analysis (ca. 140 compounds)
•    Global profiling by RP-LC-MS (ca. 450 compounds identified)
•    Global profiling by GC-MS (ca. 150 compounds)
•    Global profiling by CE-MS (ca. 300 compounds)
•    And more under development




      Large Metabolomics Measurement
                 series DCL
    • IOP biomarkers for healthy aging
       – ±2500 samples, 28 batches
       – Measurement time ±28 weeks
          •   Matching project LUMC and NCHA Netherlands centre for healthy Aging
    • Dutch Twin Register (NTR)
       – ±3000 samples, 31 batches
       – Measurement time ± 30 weeks
          •   Dutch Twin Register (Nederlands Tweeling Register, NTR)
    • DiOGenes Diet, Obesity and Genes
       – ± 2000 samples, 27 batches
       – Measurement time ±14 weeks
          • NMC Associate project N & H cluster




                                                                                           6
14-2-2013




                     Measurement Design
   • Randomization, replication & blocking of
     measurements
   • Inclusion of compounds & samples to monitor (&
     eventually correct for) quality
        –   Internal Standards
        –   Calibration samples
        –   Quality Control (QC) samples
        –   Replicate samples (technical & analytical)
        –   Blanks
        –   System suitability samples
        –   Transfer samples




            Typical sample sequence list                         Orde r
                                                                   1
                                                                   2
                                                                   3
                                                                           Nam e
                                                                           Blank
                                                                           Blank
                                                                           Blank
                                                                                           Id
                                                                                         Blank
                                                                                         Blank
                                                                                         Blank
                                                                                                       Leve l Batch P repar atio n Injectio n isSamp le isSST isQC isd QC isBlan k isCal isOut lier isSuspe ct
                                                                                                         0
                                                                                                         0
                                                                                                         0
                                                                                                               5
                                                                                                               5
                                                                                                               5
                                                                                                                         1
                                                                                                                         1
                                                                                                                         1
                                                                                                                                                                              1
                                                                                                                                                                              1
                                                                                                                                                                              1
                                                                                                                                                                                                                             Co mmen t




                                                                   4       Blank         Blank           0     5         1                                                    1
                                                                   5      dSST.C2       dSST.C2          2     5         1                               1
                                                                   6       SST.C2        SST.C2          2     5         1                               1
                                                                   7        dQ C          dQ C           4     5         1                                           1




Technical samples: system cleaning, testing and equilibrating.
                                                                   8         QC            QC            4     5         1                                     1
                                                                   9      P5.C6.a          C6            6     5         1                                                           1
                                                                  10      P5.C7.a          C7            7     5         1                                                           1
                                                                  11      P5.C0.a          C0            0     5         1                                                           1
                                                                  12      P5.C1.a          C1            1     5         1                                                           1
                                                                  13      P5.C4.a          C4            4     5         1                                                           1
                                                                  14      P5.C5.a          C5            5     5         1                                                           1
                                                                  15      P5.C2.a          C2            2     5         1                                                           1
                                                                  16      P5.C3.a          C3            3     5         1                                                           1
                                                                  17       P5.C1     0543_090.3.01.0     4     5         1                      1
                                                                  18       P5.D1     0546_094.3.01.0     4     5         1                      1
                                                                  19       P5.E1     0550_076.3.01.0     4     5         1                      1
                                                                  20        QC            QC             4     5         1                                     1
                                                                  21       Blank         Blank           0     5         1                                                   1
                                                                  22       dQ C            QC            4     5         1                                     1     1
                                                                  23       P5.F 1    0553_015.3.15.0     4     5         1                      1
                                                                  24       P5.G1     0555_097.3.01.0     4     5         1                      1
                                                                  25       P5.H1     0556_097.3.01.1     4     5         1                      1                                                       1        There might be somethi ng wrong here
                                                                  26       P5.A2     0559_077.3.05.0     4     5         1                      1
                                                                  27       P5.B2     0561_103.3.01.1     4     5         1                      1                                            1                          Something wrong here
                                                                  28       P5.C2     0563_103.3.01.0     4     5         1                      1
                                                                  29       P5.D2     0564_093.3.03.0     4     5         1                      1
                                                                  30       P5.E2 0570_095.3.01.0         4     5         1                      1
                                                                  31      P5. bE1 0550_076.3.01.0        4     5         2                      1
                                                                  32      P5. bA7 0631_057.3.09.0        4     5         2                      1
                                                                  33        QC          QC               4     5         1                                     1
                                                                  34       Blank         Blank           0     5         1                                                   1
                                                                  35       dQ C          dQ C            4     5         1                                           1
                                                                  36       P5.F 2    0571_105.3.04.0     4     5         1                      1
                                                                  37       P5.G2     0573_105.3.03.0     4     5         1                      1
                                                                  38       P5.H2     0574_099.3.02.0     4     5         1                      1
                                                                  39       P5.A3     0575_099.3.01.0     4     5         1                      1
                                                                  40       P5.B3     0577_099.3.03.0     4     5         1                      1
                                                                  41       P5.C3     0578_099.3.01.1     4     5         1                      1
                                                                  42       P5.D3     0581_096.3.01.0     4     5         1                      1
                                                                  43       P5.E3     0582_101.3.01.0     4     5         1                      1
                                                                  44       P5.F 3    0584_123.3.01.0     4     5         1                      1
                                                                  45       P5.G3     0585_085.3.01.0     4     5         1                      1
                                                                  46        QC            QC             4     5         1                                     1
                                                                  47       Blank         Blank           0     5         1                                                   1
                                                                  48        dQ C          dQ C           4     5         1                                           1
                                                                  49       P5.H3     0587_085.3.01.1     4     5         1                      1
                                                                  50       P5.A4     0589_095.3.01.1     4     5         1                      1
                                                                  51       P5.B4     0590_105.3.01.0     4     5         1                      1
                                                                  52       P5.C4     0591_105.3.02.0     4     5         1                      1
                                                                  53       P5.D4     0593_077.3.12.1     4     5         1                      1




Running samples
                                                                  54       P5.E4     0594_077.3.12.0     4     5         1                      1
                                                                  55      P5. bF9 0664_130.3.20.1        4     5         2                      1
                                                                  56      P5. bF10 0678_118.3.01.0       4     5         2                      1
                                                                  57       P5.F 4    0597_117.3.02.1     4     5         1                      1
                                                                  58       P5.G4     0598_117.3.02.0     4     5         1                      1
                                                                  59        QC            QC             4     5         1                                     1
                                                                  60       Blank         Blank           0     5         1                                                   1
                                                                  61        dQ C          dQ C           4     5         1                                           1
                                                                  62       P5.H4     0599_117.3.01.1     4     5         1                      1
                                                                  63       P5.A5     0600_117.3.01.0     4     5         1                      1
                                                                  64       P5.B5     0603_098.3.04.0     4     5         1                      1
                                                                  65       P5.C5     0604_098.3.02.0     4     5         1                      1
                                                                  66       P5.D5     0605_098.3.01.0     4     5         1                      1
                                                                  67       P5.E5 0606_098.3.01.1         4     5         1                      1
                                                                  68      P5. bB3 0577_099.3.03.0        4     5         2                      1
                                                                  69      P5. bH3 0587_085.3.01.1        4     5         2                      1
                                                                  70       P5.F 5 0607_015.3.16.0        4     5         1                      1
                                                                  71       P5.G5     0608_078.3.02.0     4     5         1                      1
                                                                  72        QC             QC            4     5         1                                     1
                                                                  73       Blank         Blank           0     5         1                                                   1
                                                                  74       dQ C          dQ C            4     5         1                                           1
                                                                  75       P5.H5     0609_078.3.03.0     4     5         1                      1
                                                                  76       P5.A6     0611_078.3.01.0     4     5         1                      1
                                                                  77       P5.B6     0612_088.3.02.0     4     5         1                      1
                                                                  78       P5.C6     0613_088.3.01.0     4     5         1                      1
                                                                  79       P5.D6     0616_085.3.02.0     4     5         1                      1
                                                                  80       P5.E6     0618_094.3.05.0     4     5         1                      1
                                                                  81      P5. bE6 0618_094.3.05.0        4     5         2                      1
                                                                  82      P5. bB10 0673_107.3.05.0       4     5         2                      1
                                                                  83      P5. bG1 0555_097.3.01.0        4     5         2                      1
                                                                  84      P5. bC4 0591_105.3.02.0        4     5         2                      1
                                                                  85        QC          QC               4     5         1                                     1
                                                                  86       Blank         Blank           0     5         1                                                   1




Calibration blocks at regular intervals
                                                                  87       dQ C          dQ C            4     5         1                                           1
                                                                  88      P5.C3.b          C3            3     5         1                                                           1
                                                                  89      P5.C7.b          C7            7     5         1                                                           1
                                                                  90      P5.C2.b          C2            2     5         1                                                           1
                                                                  91      P5.C6.b          C6            6     5         1                                                           1
                                                                  92      P5.C5.b          C5            5     5         1                                                           1
                                                                  93      P5.C4.b          C4            4     5         1                                                           1
                                                                  94      P5.C0.b          C0            0     5         1                                                           1
                                                                  95      P5.C1.b          C1            1     5         1                                                           1
                                                                  96       P5.F 6    0620_107.3.01.0     4     5         1                      1
                                                                  97       P5.G6     0629_092.3.01.1     4     5         1                      1
                                                                  98       P5.H6     0630_092.3.01.0     4     5         1                      1
                                                                  99        QC             QC            4     5         1                                     1
                                                                  100      Blank         Blank           0     5         1                                                   1
                                                                  101      dQ C          dQ C            4     5         1                                           1
                                                                  102      P5.A7     0631_057.3.09.0     4     5         1                      1
                                                                  103      P5.B7     0632_057.3.09.1     4     5         1                      1
                                                                  104      P5.C7     0634_091.3.01.0     4     5         1                      1
                                                                  105      P5.D7     0635_015.3.17.0     4     5         1                      1
                                                                  106      P5.E7     0638_072.3.01.0     4     5         1                      1
                                                                  107      P5.F 7    0639_066.3.03.0     4     5         1                      1
                                                                  108      P5.G7     0640_066.3.03.1     4     5         1                      1
                                                                  109      P5.H7     0642_109.3.02.0     4     5         1                      1
                                                                  110      P5.A8     0643_109.3.01.0     4     5         1                      1
                                                                  111      P5.B8     0646_110.3.06.1     4     5         1                      1
                                                                  112       QC            QC             4     5         1                                     1
                                                                  113      Blank         Blank           0     5         1                                                   1
                                                                  114      dQ C          dQ C            4     5         1                                           1
                                                                  115      P5.C8     0647_110.3.01.0     4     5         1                      1
                                                                  116      P5.D8     0648_110.3.03.1     4     5         1                      1
                                                                  117      P5.E8     0649_110.3.03.0     4     5         1                      1
                                                                  118      P5.F 8    0650_110.3.06.0     4     5         1                      1
                                                                  119     P5. bH6 0630_092.3.01.0        4     5         2                      1
                                                                  120     P5. bF11 0689_065.3.22.0       4     5         2                      1
                                                                  121      P5.G8     0651_110.3.02.0     4     5         1                      1
                                                                  122      P5.H8     0655_108.3.01.1     4     5         1                      1
                                                                  123      P5.A9     0656_108.3.01.0     4     5         1                      1
                                                                  124      P5.B9     0658_111.3.01.0     4     5         1                      1
                                                                  125       QC            QC             4     5         1                                     1




QC-blank-(dummy) QC sequence at regular intervals
                                                                  126      Blank         Blank           0     5         1                                                   1
                                                                  127       dQ C          dQ C           4     5         1                                           1
                                                                  128      P5.C9     0659_111.3.02.0     4     5         1                      1
                                                                  129      P5.D9     0661_128.3.01.0     4     5         1                      1
                                                                  130     P5. bF4    0597_117.3.02.1     4     5         2                      1
                                                                  131     P5. bC10 0675_129.3.01.1       4     5         2                      1
                                                                  132      P5.E9 0663_130.3.20.0         4     5         1                      1
                                                                  133      P5.F 9    0664_130.3.20.1     4     5         1                      1
                                                                  134      P5.G9     0665_130.3.19.1     4     5         1                      1
                                                                  135      P5.H9     0666_130.3.19.0     4     5         1                      1
                                                                  136      P5.A10    0668_097.3.10.0     4     5         1                      1
                                                                  137      P5.B10    0673_107.3.05.0     4     5         1                      1
                                                                  138       QC             QC            4     5         1                                     1
                                                                  139      Blank         Blank           0     5         1                                                   1
                                                                  140      dQ C          dQ C            4     5         1                                           1
                                                                  141     P5. bB5 0603_098.3.04.0        4     5         2                      1
                                                                  142      P5.C10 0675_129.3.01.1        4     5         1                      1
                                                                  143      P5.D10    0676_129.3.01.0     4     5         1                      1
                                                                  144      P5.E10    0677_118.3.01.1     4     5         1                      1
                                                                  145      P5.F 10   0678_118.3.01.0     4     5         1                      1
                                                                  146      P5.G10 0681_118.3.02.0        4     5         1                      1
                                                                  147     P5. bH10 0683_078.3.05.0       4     5         2                      1
                                                                  148     P5. bD4 0593_077.3.12.1        4     5         2                      1                                            1                         O nly Integrated for TGs
                                                                  149     P5.H10 0683_078.3.05.0         4     5         1                      1
                                                                  150      P5.A11    0684_065.3.27.0     4     5         1                      1
                                                                  151       QC             QC            4     5         1                                     1
                                                                  152      Blank         Blank           0     5         1                                                   1
                                                                  153      dQ C          dQ C            4     5         1                                           1
                                                                  154      P5.B11    0685_065.3.28.0     4     5         1                      1
                                                                  155      P5.C11    0686_065.3.29.0     4     5         1                      1
                                                                  156      P5.D11    0687_065.3.26.0     4     5         1                      1
                                                                  157      P5.E11    0688_065.3.30.0     4     5         1                      1
                                                                  158      P5.F 11   0689_065.3.22.0     4     5         1                      1
                                                                  159      P5.G11    0690_065.3.20.0     4     5         1                      1
                                                                  160      P5.H11    0691_065.3.24.0     4     5         1                      1
                                                                  161      P5.A12    0693_065.3.23.0     4     5         1                      1




Possible outliers are flagged and if confirmed ignored
                                                                  162      P5.B12    0694_065.3.25.0     4     5         1                      1
                                                                  163      P5.C12    0696_112.3.04.0     4     5         1                      1
                                                                  164       QC            QC             4     5         1                                     1
                                                                  165      Blank         Blank           0     5         1                                                   1
                                                                  166       dQ C          dQ C           4     5         1                                           1
                                                                  167      P5.D12    0697_112.3.04.1     4     5         1                      1
                                                                  168      P5.E12    0699_072.3.02.1     4     5         1                      1
                                                                  169      P5.F 12   0692_065.3.21.0     4     5         1                      1
                                                                  170     P5.C0.c          C0            0     5         1                                                           1
                                                                  171     P5.C2.c          C2            2     5         1                                                           1
                                                                  172     P5.C4.c          C4            4     5         1                                                           1
                                                                  173     P5.C6.c          C6            6     5         1                                                           1
                                                                  174     P5.C5.c          C5            5     5         1                                                           1
                                                                  175     P5.C3.c          C3            3     5         1                                                           1
                                                                  176     P5.C7.c          C7            7     5         1                                                           1
                                                                  177     P5.C1.c       C1               1     5         1                                                           1
                                                                  178     P5. bH7 0642_109.3.02.0        4     5         2                      1
                                                                  179       QC            QC             4     5         1                                     1
                                                                  180      Blank         Blank           4     5         1                                                   1
                                                                  181      Blank         Blank           0     5         1                                                   1
                                                                  182      Blank         Blank           0     5         1                                                   1




                                                                                                                                                                                                                                                               7
14-2-2013




  Data Acquisition, LC-MS & GC-MS
For one chemical compound, the pattern is
approximately the multiplication of a component




                                                        Intensity
specific mass profile
                                                                                     M/Z

                                                              6



                                                              5



and the abundance at a certain retention time                 4




                                                  Intensity
                                                              3



                                                              2



                                                              1



Component specific mass profile:                              0
                                                               1    2   3   4   5            6
                                                                                Retention time
                                                                                                 7   8   9   10




LC-MS: natural isotopes + adducts (soft ionization)

GC-MS: fragments (hard ionization)




                                                                                                                         8
14-2-2013




                                                                                number of mass channels selected for processing vs scan number
                                                                    18000

                                                                    16000

                                                                    14000




Raw Data, LC-MS                                                     12000
                                                  # mass channels




                                                                    10000

                                                                    8000

                                                                    6000

                                                                    4000


• Huge amount of data                                               2000

                                                                       0
                                                                            0       200       400       600           800   1000      1200       1400
     ~1000s mass spectra (retention time scans)                                                               scan#




     ~10.000s ion chromatograms
     ~1.000.000s (m/z – retention time) pairs
                       For each sample!

• Complex data
     - Noise (detector noise and chemical noise), spikes, background
     - Concentration differences between the compounds are rather large
     and therefore also intensity differences




                                                                                                                                                               9
14-2-2013




  Preprocessing, LC-MS
• Targeted platforms: vendor preprocessing software
   – Expert knowledge => optimized settings

• Untargeted platforms: in-house developed preprocessing software
   – Conversion of manufacturer formats to common formats (e.g. ‘netcdf’ & ‘mzxml’)

   – Centroiding and binning

   – Baseline correction

   – Alignment

   – Peak extraction (asks for an estimate of noise level)

   – Matching of peaks over samples

• Result: feature/peak/compound list
   – m/z & rt: peak area




  Centroiding




               RAW                                           CENTROIDED




                                                                                            10
14-2-2013




        m/z shifts within a sample




    Small m/z shifts probably due to centroid sampling mode MS
    spectra and mass fluctuations during recording




Binning
• Binning algorithm: sum intensities within
  predefined bins = mass ranges

• Definition of bins is a challenge, mostly related to
  the mass resolution (e.g. resolution = 10 000
  define bin 100.00 – 100.01)

• When done incorrect                   large influence on peak
  extraction steps




                                                                        11
14-2-2013




Background correction

                                                                                TIC




                                                                               Background corrected




Retention time alignment
                 5
          x 10
     3

   2.5

     2

   1.5

     1

   0.5

     0

   -0.5
          0                 1000      2000      3000            4000    5000    6000   7000


                 5
          x 10                                         detail

   2.5

     2

   1.5

     1

   0.5

     0

   -0.5
                     2000          2200      2400           2600       2800    3000    3200




                                                                                                            12
14-2-2013




Alignment algorithms
                                                           target dataset

•   Dynamic Time Warping (DTW)
     – Time point by time point mapping
       (dynamic programming)
                                                dataset to align


•   Correlation Optimized Warping (COW)         -optimization of correlation between
     – Piecewise linear, segments instead of    the two pieces of each dataset
                                                -not allow large retention time
       individual time points (dyn. progr.)     variation (determined by the slack
                                                parameter t)



•   (Semi)-Parametric Warping (PTW, Eilers)
     – Global, nonlinear (parametric transfer
       function estimation)




Alignment algorithms                                 200                                                200




                                                     150                                                150




                                                     100                                                100




•   Dynamic Time Warping (DTW)                        50                                                50




     – Time point by time point mapping
                                                       0                                                  0




       (dynamic programming)                         -50
                                                      3200        3300     3400          3500
                                                                                                        -50
                                                                                                          3200          3300       3400      3500




                                                     200




                                                     150




                                                     100



•   Correlation Optimized Warping (COW)               50



     – Piecewise linear, segments instead of           0


       individual time points (dyn. progr.)
                                                     -50
                                                      3200         3250           3300           3350            3400            3450        3500




                                                                                     Warped, detail


                                                    200


                                                    180


                                                    160




•   Parametric Warping (Eilers)                     140


                                                    120


                                                    100




     – Global, nonlinear (parametric transfer        80


                                                     60




       function estimation)                          40


                                                     20


                                                      0


                                                           3250     3300      3350              3400          3450             3500       3550




                                                                                                                                                          13
14-2-2013




        Peak/Feature extraction and peak integration

   • XCMS                http://metlin.scripps.edu/xcms/index.php


   • MetAlign            http://www.wageningenur.nl/en/show/MetAlign-1.htm


   • TNO-DECO Jellema, et al, Chemom. Intel. Lab. Systems, 104 (10) 132

   • MZExtract van der Kloet et al, submitted




   TNO-DECO
   Works with GC-MS and not too complex LC-MS

   Decomposes experimental data into the product of
   pure mass spectra and concentration profiles of all
   compounds in the sample

   Advantages:
   -Result is combined mass spectrum (identification!!)
   -All samples analyzed at once

   Problems / issues:
   -Least squares (abundant compounds have large
   influence on result)
   -Noise level estimation
   -Correct binning essential




Jellema, Chemo. Intel. Lab. Systems (2010) 104 132-139.




                                                                                   14
14-2-2013




                                Deconvolution




Deconvolution of LC-MS data
         6                                                                      Extracted mass spectra
      x 10                                                                                  rt: 14.769
 14                                                        1
                                                                                                                             761




                  baseline corrected data              0.5
 12
                                                                  184




                                                           0
                                                           100     200        300    400   500 14.3868
                                                                                            rt:    600         700               800        900   1000
                                                           1
                                                                                                                             759




 10
                                                       0.5
                                                                  184




 8                                                         0
                                                           100     200        300    400   500 13.9818
                                                                                            rt:    600         700               800        900   1000
                                                           1
                                                                                                               704




 6
                                                                  184




                                                                                                                     726




                                                       0.5
                                                                                                         628




                                                                                                                           757




 4                                                         0
                                                           100     200        300    400   500 14.5777
                                                                                            rt:    600         700               800        900   1000
                                                                                                                                 785




                                                           1

 2                                                     0.5
                                                                  184




                                                           0
 0 0         10     20     30     40        50   60
                                                           100     200        300    400   500        600      700               800        900   1000




          6                                                       6
      x 10 Extracted chromatographic profiles                  x 10                 reconstructed signal
 16                                                   14

 14                                                   12
 12
                                                      10
 10
                                                      8
  8
                                                      6
  6
                                                      4
  4

  2                                                   2

  00         10     20     30     40        50   60   0 0                10          20          30            40                      50          60




                                                                                                                                                               15
14-2-2013




   MZExtract
   Per sample:
   •Feature extraction of recalibrated and
   centroided data (in-house)
   •Integration of features (areas)
   •Grouping of features to feature-sets
   (enrichment step knowledge based:
   isotopes, adducts)

   Over samples:
   •Match feature-sets

   Advantage of two-step approach: fully scalable
   solution (parallel implementation)


van der Kloet, submitted.




  Grouping related features within a single sample




 No retention time window necessary to
 match features (only isotopic patterns or
 other known relations, e.g. adducts)




                                                           16
14-2-2013




Validation
Target list from MassHunter (Agilent) used to
locate 174 known targets.
   – Mass window -> resolution 10.000
   – RT window -> +/- 10 seconds

   – 171 were found
   – 3 missing targets: no isotopic patterns were
     detected (they were found in the list of ‘single’
     features)




How to validate unknown feature-sets?
here: selection based on QC presence


                    Comparable: 1.175 feature-sets

                                                   about 3.200 unknown
                                                   feature-sets




                  Low abundant: 366 feature-sets




                                                                               17
14-2-2013




 PLS-DA, Selectivity ratio*, to quantify the
 variables discrimanatory ability




The low abundant feature-sets do contain biological relevance!
The most important feature-sets is an unknown!

*Anal. Chem. 2009, 81, 2581–2590




                        Quality Assessment
• Make use of all additional measured compounds
  and samples
   – Internal Standards
   – Replicates
   – Blanks
   – Quality Control samples


• Quality Assessment => QC report (in-house)




                                                                       18
14-2-2013




     Part of a measurement run
                                                                              QC sample

                                                                              Study sample

                                                                              Replicate study sample
     Response




                                      Measurement Order




            QC report overviewtotable
                        ANOVA for batch batch variation

                 N        mean     std        RSDqc    RSD reps p-value     diffs
CholE02              58     0.0298     0.0079    26.4%   21.4%      0.000   (2-1,3-1,3-2,4-2,4-3)
CholE04              46     0.0240     0.0124    51.9%   40.6%
CholE05              58     0.0120     0.0024    20.4%   19.1%      0.000   (2-1,3-1,4-1,3-2,4-3)
CholE06              58     0.0085     0.0021    24.7%   19.5%      0.000   (3-1,3-2,4-3)
DG02                 58     0.0049     0.0011    23.4%   22.7%      0.000   (2-1,3-1,4-1,3-2,4-2,4-3)
LPC01                58     0.0183     0.0009     4.7%    4.8%      0.000   (4-1,4-2,4-3)
LPC02                58     0.0130     0.0015    11.7%   11.5%      0.000   (2-1,3-1,4-1)
LPC03                58     0.0101     0.0010     9.5%   12.1%      0.360
LPC04                58     0.0436     0.0019     4.4%    5.4%      0.000   (2-1,4-1,3-2,4-3)
LPC05                58     1.8684     0.1259     6.7%    6.8%      0.000   (2-1,3-1,4-1,3-2,4-2,4-3)
LPC07                58     0.0109     0.0007     6.1%    6.4%      0.004   (4-2)
LPC08                58     0.6096     0.0141     2.3%    3.2%      0.000   (2-1,3-1,4-1,3-2,4-2,4-3)
LPC09                58     0.4170     0.0200     4.8%    4.8%      0.000   (3-1,4-1,3-2,4-2,4-3)
LPC10                58     0.6625     0.0976    14.7%   13.8%      0.000   (2-1,3-1,4-1,3-2,4-2,4-3)
LPC11                58     0.0394     0.0446   113.1%   57.6%      0.000   (2-1,3-2,4-2,4-3)
LPC12                58     0.1126     0.0024     2.1%    3.6%      0.000   (2-1,3-1,3-2,4-2,4-3)
LPC13                58     0.0425     0.0049    11.5%    9.8%      0.000   (3-1,4-1,3-2,4-2)
LPC14                58     0.0311     0.0010     3.3%    3.7%      0.000   (2-1,3-1,4-2,4-3)
LPC16                58     0.0064     0.0016    24.9%   28.7%      0.000   (4-1,3-2,4-2,4-3)
LPC17                58     0.0033     0.0010    32.0%   36.4%      0.000   (3-1,4-1,3-2,4-2,4-3)
LPE02                58     0.0303     0.0056    18.6%   19.4%      0.000   (2-1,4-1,3-2,4-2,4-3)
    RSD values for
LPE04                43     0.0034     0.0011    33.1%   21.9%
PC01                 58     0.0832     0.0105    12.6%   12.5%      0.000   (4-1,4-2,4-3)
PC02• QC samples     58     0.3333     0.0151     4.5%    4.6%      0.000   (2-1,4-1,4-2,4-3)
PC03
PC04
    • Replicate samples
                     58
                     58
                            0.2238
                            0.1257
                                       0.0077
                                       0.0040
                                                  3.4%
                                                  3.1%
                                                          3.7%
                                                          4.8%
                                                                    0.000
                                                                    0.000
                                                                            (2-1,3-1,4-1,4-2,4-3)
                                                                            (3-1,4-1,3-2,4-3)
PC05     (independent validation)
                     58     0.0674     0.0248    36.8%   35.9%      0.000   (2-1,3-1,4-1,3-2,4-3)
PC06                 58     0.0667     0.0084    12.7%   10.1%      0.000   (2-1,4-1,3-2,4-3)
PC07                 58     0.0225     0.0026    11.5%   14.2%      0.000   (2-1,3-1,4-1,4-2,4-3)




                                                                                                              19
14-2-2013




Uncorrected Peak areas




                               20
14-2-2013




QC samples only




                  Ratio (unc)Area
                            RSD
                            QC
                                 25.8%




                                               21
14-2-2013




       Internal standard

                       RSDQC=25.8%




Internal Standard Corrected data




                             RSDQC=20.6%




                                                 22
14-2-2013




     Intra and Inter batch variation
•   Analytical Column ‘aging’
•   Analytical Column replacement
•   Eluent ‘refills’ and small variations
•   Instrument malfunction/breakdown
    – Etc…




    Intra and Inter batch correction
• Instead of just monitoring QC sample
  responses use them to correct variation




                                                  23
14-2-2013




                                  QC correction
                                                                     QC sample

                                                                     Study sample

                                                                     Penalized smoother
             Response




                                               Measurement Order

Van der Kloet et al., Journal of Proteome Research 2009




                                  QC correction
before                                                    after
Response




                                                          Response




                        Measurement Order                            Measurement Order


Van der Kloet et al., Journal of Proteome Research 2009




                                                                                                24
14-2-2013




                                  QC correction




van der Kloet et al., Journal of Proteome Research 2009




                                  QC correction




van der Kloet et al., Journal of Proteome Research 2009




                                                                25
14-2-2013




ISTD/QC corrected data




                   RSDQC=4.1%
                   RSDreplicates=10.0%




     All samples




                                               26
14-2-2013




                   All batches




                Correction charts




RSDQC




RSDReplicates




                                          27
14-2-2013




                    Scores plot based upon 93 lipids
                           Uncorrected Area batches.
                                    Differences between


                                                              Scores plot based on 93 components (Peak Area)
                                       35
                                               batch 1
                                       30      batch 2
                                               batch 3
                                               batch 4
                                       25
                                               QC samples

                                       20


                                       15
                          PC 2 (14%)




                                       10


                                        5


                                        0


                                        -5


                                       -10


                                       -15
                                         -15      Clear trends in QC 0samples.
                                                   -10      -5               5                                 10        15        20
                                                                              PC 1 (39.3%)




  Scores plot based upon 93 lipids ISTD
                      Smaller differences between
               correction
                      batches.
                                                            Scores plot based on 93 components (ISTD correction)
                    15
                                                                                                                                        batch 1
                                                                                                                                        batch 2
                                                                                                                                        batch 3
                    10                                                                                                                  batch 4
                                                                                                                                        QC samples


                     5
     PC 2 (14.8%)




                     0




                     -5




                    -10



Spread in QC samples greatly
        -15
reduced. -10
           However, batch to batch 5
                 -5        0                                                     10          15                     20        25        30           35
                                                                                  PC 1 (21.3%)
differences remain present.




                                                                                                                                                                28
14-2-2013




       Scores plot based upon 93 lipids
                                Scores plot based on 93 components RSDqc<0.15 and RSDreps<0.15
                       20


                       15

                                                                                              batch 1
                       10                                                                     batch 2
                                                                                              batch 3
                                                                                              batch 4
        PC 2 (14.7%)




                        5
                                                                                              QC samples

                        0


                        -5


                       -10


                       -15
                         -15   -10     -5      0      5        10      15     20             25             30       35
                                                          PC 1 (22.9%)




                         Combining data in systems biology
                                                                                                             variables
Comprehensive view of patient, animal, … :
                                                                                   objects




e.g. combine genomics, proteomics & metabolomics data
                                                                                                            1             2
   Data integration / fusion:
  joining data from different measurement
  approaches, same objects
                                                                                                             variables

                                                                                                                 1
                                                                                                  objects




Increase power of statistical analyses:
Combine e.g. metabolomics batch datasets
                                                                                                                 2
   ‘Equating’: (*)
 make comparable data from
 same measurement approach, different objects                               *Equating is psychometrical term




                                                                                                                                    29
14-2-2013




         Why not just concatenate datasets?

                                                                     variables

  • ‘Omics data typically batch data                                     1




                                                           objects
  • Metabolomics often not quantitative                                  2
      datasets not comparable

  • Calibration model transfer would be solution but…
                                                                          ?
     …often no full calibration models can be made!*




 *Sangster et al, The Analyst 2006 (131): 1075-1078




       A proposed approach: QC samples
      Correction for structural differences between series
      using quality control (QC) samples (pooled samples
      or representative samples)*




                          (picture from reference below)

*van der Greef et al, J Proteome Res 2007 (6): 1540-1559




                                                                                       30
14-2-2013




           Problem with QC sample approach
     • Rationale: make medians of QC data equal for all series
     • Unwanted side-effect: inflation of variation in rest of data:

                               Inflation of MAD in series 2 relative to series 1



                                                                                Series 1
    MAD




                                                                                Series 2, uncorrected
                                                                                Series 2, QC-corrected




                              Lipid compounds
MAD: median absolute deviation (robust SD)




                    Alternative solution: equating
                                                                                 variables
     • Combination of data from
       different measurement series                                                   1
                                                                      objects




                                                                                      2
     • …in studies with limited number of
       internal standards
       (typically metabolomics!)

     • …or even from different studies

     • General: enables maximal flexibility in subsequent data
       analysis on combined datasets




                                                                                                               31
14-2-2013




          Illustration: LC–MS data
• 182 (54 + 128) healthy participants
  (Netherlands Twin Register)*

              Measured in two series:
• Blood samples (overnight fasting)

                      year 1 (Y1) N=54
• Plasma analyzed with liquid chromatography–MS method for
  lipids
                                +
    Target list for 59 lipids:
  LPC / PC / SPM / year 2 (Y2) N=128
                      ChE / TG

  Data per lipid corrected for class-specific internal standard

                                                *Draisma et al, OMICS 2008: 17–31




       PCA scores before equating



                  Y2
                                                Y1




              Data mean-centered prior to PCA




                                                                                          32
14-2-2013




     Univariate quantile equating
•Quantiles:
 values marking boundaries between regular intervals
 of the cumulative distribution function (CDF)

•Example: 54 data values and associated CDF
               CDF                                   0.52 quantile
                                                                         1/54

                                                     0.50 quantile (= median)
                                                                         1/54
                                                     0.48 quantile




         Univariate quantile equating
Average values of corresponding quantiles

              CDF   Y1




                                       x = 1.81
                                                                CDF(x) = 0.50

              CDF Y2




                                       x = 2.64
                    Data from: Frisby & Clatworthy, Perception 1975: 173-178




                                                                                      33
14-2-2013




                  Quantile equating
Algorithm:

    1. Number of quantiles = min {N1 , N2, …}

    2. Average values of corresponding
                                                 1     1
       quantiles by projection onto unit vector ( ,..., )
                                                                  n      n

    3. Substitute averaged values for original values belonging
       to each quantile

   Often applied for quantile normalization (*)
   of gene arrays, between arrays (objects) over probes (variables)

                                             *Bolstad et al, Bioinformatics 2003: 185–193




      Example univariate quantile equating
                                                                      Q-Q plot
                     Y1
 Projection onto
                                    CDF Y2




 Projection onto
 unit vector:
 unit vector
 averaging           Y2




      After

                    Y1




                    Y2

                                                                      CDF Y1

     Before




                                                                                                  34
14-2-2013




             PCA scores after equating LC–MS data

                                                                           After
                                                                         equating




    Before
        Y2
                                                  red: Y1
                                                black: Y2
                      Y1


                           Data meancentered prior to PCA




             Y1–Y2 similarity in PCA score space*

direction:
location:
variance:
  Box’sloadings D2
  PCA M statistic
  Mahalanobis’



                     Y2
  PC3




                                                                       Y1




                                 *Jouan-Rimbaud et al, Chemom Intell Lab Syst 1998: 129-144




                                                                                                    35
14-2-2013




          Y1–Y2 similarity in PCA score space
                                         direction



                                         variance




                                          location



            Before                                                       After
           equating                                                    equating


                   All parameters: 0 = ‘dissimilar’, 1 = ‘similar’

    Jouan-Rimbaud et al, Chemom Intell Lab Syst (1998) 129-144




                 Effects on clustering results
                             Y2                   Y1


                                                                 No equating,
                                                                 Y1–Y2 datasets combined:

                                                                 Obvious
 Y2




                                                                 between-series effect
 Y1




Draisma et al, Anal Chem (2010) 82 1039-1046




                                                                                                  36
14-2-2013




                 Effects on clustering results
                          ♂                    ♀


                                                    After quantile equating,
                                                    Y1–Y2 datasets
                                                    combined:
♂




                                                   Y1–Y2 effect removed

                                                   Biological information
                                                    extractable from
                                                    combined dataset
 ♀




Draisma et al, Anal Chem (2010) 82 1039-1046




                                Conclusions
• ‘Garbage in = Garbage out’ so try to control data
  quality as much as possible

• Proper measurement design allows separation of
  unwanted experimental variation from biological
  variation (IS, QCs, replicates)

• Preprocessing: trade off between data quality, speed
  (automation) and completeness (number of features)

• Road to high quality data is balanced mix of data
  acquisition and data processing




                                                                                     37
14-2-2013




             Acknowledgements
• DCL
  –   Jorne Troost           • LACDR
  –   Evelyne Steenvoorden     –   Frans van der Kloet
  –   Shanna Shi               –   Katrin Strassbourgh
  –   Faisa Galud              –   Vanessa Gonzalez
  –   Rob Vreeken              –   Margriet Hendriks
  –   Amy Harms                –   Harmen Draisma
  –   Raymond Ramakers         –   Thomas Hankemeier
  –   Irina Paliukovich
  –   Adrie Dane




                                                               38

Más contenido relacionado

La actualidad más candente

NMR of protein
NMR of proteinNMR of protein
NMR of protein
Jiya Ali
 
Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure Determination
Amjad Ibrahim
 

La actualidad más candente (20)

Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Proteomics
ProteomicsProteomics
Proteomics
 
Metabolomics- concepts and applications
Metabolomics- concepts and applicationsMetabolomics- concepts and applications
Metabolomics- concepts and applications
 
Salisha ppt (1) (1)
Salisha ppt (1) (1)Salisha ppt (1) (1)
Salisha ppt (1) (1)
 
Protein-protein interaction networks
Protein-protein interaction networksProtein-protein interaction networks
Protein-protein interaction networks
 
Metabolic Network Analysis
Metabolic Network AnalysisMetabolic Network Analysis
Metabolic Network Analysis
 
Pathway and network analysis
Pathway and network analysisPathway and network analysis
Pathway and network analysis
 
NMR of protein
NMR of proteinNMR of protein
NMR of protein
 
Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Impacts of genomics, proteomics, and metabolomics ppt
Impacts of genomics, proteomics, and metabolomics pptImpacts of genomics, proteomics, and metabolomics ppt
Impacts of genomics, proteomics, and metabolomics ppt
 
Chemoinformatics
ChemoinformaticsChemoinformatics
Chemoinformatics
 
Molecular marker
Molecular markerMolecular marker
Molecular marker
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
PERSONALIZED MEDICINE
PERSONALIZED MEDICINEPERSONALIZED MEDICINE
PERSONALIZED MEDICINE
 
Metabolomics Data Analysis
Metabolomics Data AnalysisMetabolomics Data Analysis
Metabolomics Data Analysis
 
Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure Determination
 
Proteomics and protein-protein interaction
Proteomics  and protein-protein interactionProteomics  and protein-protein interaction
Proteomics and protein-protein interaction
 
Clinical Genomics and Medicine
Clinical Genomics and MedicineClinical Genomics and Medicine
Clinical Genomics and Medicine
 
Chemoinformatic
Chemoinformatic Chemoinformatic
Chemoinformatic
 

Destacado

database design intro(database)
database design intro(database)database design intro(database)
database design intro(database)
welcometofacebook
 
Lcms gcms and its applications
Lcms gcms and its applicationsLcms gcms and its applications
Lcms gcms and its applications
Nihal Calicut
 
Gas Chromatography and HPLC
Gas Chromatography and HPLCGas Chromatography and HPLC
Gas Chromatography and HPLC
Momina Mateen
 
Gas chromatography mass spectrometry
Gas chromatography mass spectrometryGas chromatography mass spectrometry
Gas chromatography mass spectrometry
Antara Sengupta
 
Chromatography and its types
Chromatography and its typesChromatography and its types
Chromatography and its types
nadeem akhter
 
Principles and application of chromatography
Principles and application of chromatographyPrinciples and application of chromatography
Principles and application of chromatography
suniu
 
Gc ms applications
Gc ms applicationsGc ms applications
Gc ms applications
9829686702
 

Destacado (18)

Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014
 
database design intro(database)
database design intro(database)database design intro(database)
database design intro(database)
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization Strategies
 
Data Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological StudiesData Normalization Approaches for Large-scale Biological Studies
Data Normalization Approaches for Large-scale Biological Studies
 
Lcms gcms and its applications
Lcms gcms and its applicationsLcms gcms and its applications
Lcms gcms and its applications
 
Gc & gc vs hplc
Gc & gc vs hplcGc & gc vs hplc
Gc & gc vs hplc
 
Gas Chromatography and HPLC
Gas Chromatography and HPLCGas Chromatography and HPLC
Gas Chromatography and HPLC
 
Chromatographic and High Performance Liquid Chromatography (HPLC)
Chromatographic and High Performance Liquid Chromatography (HPLC)Chromatographic and High Performance Liquid Chromatography (HPLC)
Chromatographic and High Performance Liquid Chromatography (HPLC)
 
Gas chromatography-mass spectrometry (GC-MS)-an introduction
Gas chromatography-mass spectrometry (GC-MS)-an introductionGas chromatography-mass spectrometry (GC-MS)-an introduction
Gas chromatography-mass spectrometry (GC-MS)-an introduction
 
Gas chromatography mass spectrometry
Gas chromatography mass spectrometryGas chromatography mass spectrometry
Gas chromatography mass spectrometry
 
12 - Infrared Spectroscopy and Mass Spectrometry - Wade 7th
12 - Infrared Spectroscopy and Mass Spectrometry - Wade 7th12 - Infrared Spectroscopy and Mass Spectrometry - Wade 7th
12 - Infrared Spectroscopy and Mass Spectrometry - Wade 7th
 
GAS CHROMATOGRAPHY AND MASS SPECTROMETRY (GC-MS) BY P.RAVISANKAR.
GAS CHROMATOGRAPHY AND MASS SPECTROMETRY (GC-MS) BY P.RAVISANKAR.GAS CHROMATOGRAPHY AND MASS SPECTROMETRY (GC-MS) BY P.RAVISANKAR.
GAS CHROMATOGRAPHY AND MASS SPECTROMETRY (GC-MS) BY P.RAVISANKAR.
 
Gc Ms
Gc MsGc Ms
Gc Ms
 
HPLC
HPLCHPLC
HPLC
 
Chromatography and its types
Chromatography and its typesChromatography and its types
Chromatography and its types
 
Principles and application of chromatography
Principles and application of chromatographyPrinciples and application of chromatography
Principles and application of chromatography
 
Gc ms applications
Gc ms applicationsGc ms applications
Gc ms applications
 
Chromatography
ChromatographyChromatography
Chromatography
 

Similar a Metabolomics: data acquisition, pre-processing and quality control

MABs Stability – Analytical Techniques
MABs Stability – Analytical TechniquesMABs Stability – Analytical Techniques
MABs Stability – Analytical Techniques
Pharmaxo
 
UV-Visible_spectrphotometry-II.ppt
UV-Visible_spectrphotometry-II.pptUV-Visible_spectrphotometry-II.ppt
UV-Visible_spectrphotometry-II.ppt
MuhannadOmer
 
biologics and biosimilars.pptx
biologics and biosimilars.pptxbiologics and biosimilars.pptx
biologics and biosimilars.pptx
ARUNNT2
 
Peptide_Bioanalysis (1)
Peptide_Bioanalysis (1)Peptide_Bioanalysis (1)
Peptide_Bioanalysis (1)
shiva gudlawar
 

Similar a Metabolomics: data acquisition, pre-processing and quality control (20)

Bioanalytical Techniques Revised.pptx
Bioanalytical Techniques Revised.pptxBioanalytical Techniques Revised.pptx
Bioanalytical Techniques Revised.pptx
 
AAPS2011 Oral--Analytical Techniques To Characterize Excipient Stability &amp...
AAPS2011 Oral--Analytical Techniques To Characterize Excipient Stability &amp...AAPS2011 Oral--Analytical Techniques To Characterize Excipient Stability &amp...
AAPS2011 Oral--Analytical Techniques To Characterize Excipient Stability &amp...
 
Owrutsky (for Berman) - Molecular Dynamics and Theoretical Chemistry - Spring...
Owrutsky (for Berman) - Molecular Dynamics and Theoretical Chemistry - Spring...Owrutsky (for Berman) - Molecular Dynamics and Theoretical Chemistry - Spring...
Owrutsky (for Berman) - Molecular Dynamics and Theoretical Chemistry - Spring...
 
MABs Stability – Analytical Techniques
MABs Stability – Analytical TechniquesMABs Stability – Analytical Techniques
MABs Stability – Analytical Techniques
 
UV-Visible_spectrphotometry-II.ppt
UV-Visible_spectrphotometry-II.pptUV-Visible_spectrphotometry-II.ppt
UV-Visible_spectrphotometry-II.ppt
 
Hyphenated techniques(GC-MS/MS, LC-MS/MS, HPTLC-MS)
Hyphenated techniques(GC-MS/MS, LC-MS/MS,  HPTLC-MS)Hyphenated techniques(GC-MS/MS, LC-MS/MS,  HPTLC-MS)
Hyphenated techniques(GC-MS/MS, LC-MS/MS, HPTLC-MS)
 
MBAT LC MS-MS
MBAT LC MS-MSMBAT LC MS-MS
MBAT LC MS-MS
 
Plant metabolomics
Plant metabolomicsPlant metabolomics
Plant metabolomics
 
Molecular weight determination and Characterization of Enzymes
Molecular weight determination and Characterization of Enzymes Molecular weight determination and Characterization of Enzymes
Molecular weight determination and Characterization of Enzymes
 
metabolomics_techniques_approaches_methods
metabolomics_techniques_approaches_methodsmetabolomics_techniques_approaches_methods
metabolomics_techniques_approaches_methods
 
Liposomes
LiposomesLiposomes
Liposomes
 
Genotoxic Impurities
Genotoxic ImpuritiesGenotoxic Impurities
Genotoxic Impurities
 
biologics and biosimilars.pptx
biologics and biosimilars.pptxbiologics and biosimilars.pptx
biologics and biosimilars.pptx
 
Quality assessment of biologics
Quality assessment of biologicsQuality assessment of biologics
Quality assessment of biologics
 
Quantification of drugs in the body.pptx
Quantification of drugs in the body.pptxQuantification of drugs in the body.pptx
Quantification of drugs in the body.pptx
 
MDC Connects: Proteins, structures and how to get them
MDC Connects: Proteins, structures and how to get themMDC Connects: Proteins, structures and how to get them
MDC Connects: Proteins, structures and how to get them
 
Techniques used for separation in proteomics
Techniques used for separation in proteomicsTechniques used for separation in proteomics
Techniques used for separation in proteomics
 
ADME Services
ADME ServicesADME Services
ADME Services
 
ADME Services
ADME ServicesADME Services
ADME Services
 
Peptide_Bioanalysis (1)
Peptide_Bioanalysis (1)Peptide_Bioanalysis (1)
Peptide_Bioanalysis (1)
 

Más de COST action BM1006

Knowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisKnowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysis
COST action BM1006
 
An Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network ApproachAn Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network Approach
COST action BM1006
 
Reverse-engineering techniques in Data Integration
Reverse-engineering techniques in Data IntegrationReverse-engineering techniques in Data Integration
Reverse-engineering techniques in Data Integration
COST action BM1006
 
from B-cell Biology to Data Integration
from B-cell Biology to Data Integrationfrom B-cell Biology to Data Integration
from B-cell Biology to Data Integration
COST action BM1006
 
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
COST action BM1006
 
Integrative Analysis of Epigenomics and miRNA data in Immune System Models
Integrative Analysis of Epigenomics and miRNA data in Immune System ModelsIntegrative Analysis of Epigenomics and miRNA data in Immune System Models
Integrative Analysis of Epigenomics and miRNA data in Immune System Models
COST action BM1006
 
Proteomics analysis: Basics and Applications
Proteomics analysis: Basics and ApplicationsProteomics analysis: Basics and Applications
Proteomics analysis: Basics and Applications
COST action BM1006
 

Más de COST action BM1006 (10)

Knowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisKnowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysis
 
An Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network ApproachAn Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network Approach
 
Reverse-engineering techniques in Data Integration
Reverse-engineering techniques in Data IntegrationReverse-engineering techniques in Data Integration
Reverse-engineering techniques in Data Integration
 
from B-cell Biology to Data Integration
from B-cell Biology to Data Integrationfrom B-cell Biology to Data Integration
from B-cell Biology to Data Integration
 
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
 
Integrative Analysis of Epigenomics and miRNA data in Immune System Models
Integrative Analysis of Epigenomics and miRNA data in Immune System ModelsIntegrative Analysis of Epigenomics and miRNA data in Immune System Models
Integrative Analysis of Epigenomics and miRNA data in Immune System Models
 
Proteomics analysis: Basics and Applications
Proteomics analysis: Basics and ApplicationsProteomics analysis: Basics and Applications
Proteomics analysis: Basics and Applications
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
X-omics Data Integration Challenges
X-omics Data Integration ChallengesX-omics Data Integration Challenges
X-omics Data Integration Challenges
 

Metabolomics: data acquisition, pre-processing and quality control

  • 1. 14-2-2013 Metabolomics: data acquisition, preprocessing & quality control Theo Reijmers, Analytical BioSciences, Leiden University Barcelona, 14-02-2013 Coenzymes (vitamines) Amino acids carbohydrates hormones nucleotides Amino acids lipids 1
  • 2. 14-2-2013 The metabolome • Metabolites chemical compounds with low molecular weight dynamic range 109 concentration • Many chemical classes, with different chemical properties (different from proteomics) polarity log P –6 to 14 • Large differences in mass < 1500 Da abundance The metabolome global screen dynamic range 109 NMR concentration LC-MS custom polarity log P –6 to 14 targeted mass < 1500 Da 2
  • 3. 14-2-2013 Analytical strategies: 1H NMR Advantages • Straightforward sample preparation • High sample throughput (robotic control) • Chemical shifts stable (if pH kept constant) • Quantification without standards • Highly repeatable and reproducible • Very valuable for identification of isolated metabolites Disadvantages • Limited sensitivity • Identification in complex mixtures rather difficult Analytical strategies: LC-MS and GC-MS • Chromatography: separation of compounds in sample • Mass-spectrometry: detection of ions based on mass-to-charge ratio (m/z) 3
  • 4. 14-2-2013 Chromatography Separation of chemical compounds based on chemical properties chromatogram Types of interaction: A B C A. Surface adsorption B. Solvent partitioning C. Ion exchange Mass spectrometer separation of charged particles in the gas phase separation based on mass-to-charge ratio (m/z) mass mass ionisation detector analyser analyser 4
  • 5. 14-2-2013 LC-MS vs GC-MS Liquid C-MS Gas C-MS Advantages: Advantages: •Fast • Highly reproducible retention times •Efficient • Sensitive detection for all metabolites •Sensitive • Characteristic mass fingerprint (identification!) •Wide range of compounds Disadvantages: Disadvantages: •Unstable* • Derivatization is needed to include •Sensitivity compound dependent polar analytes •Ion suppression gives rubbish data •Relative quantification (if no authentic standard is available) *About as stable as a chocolate teapot in a heatwave. (Wilson 2009) Demonstration & Competence Lab • Applying technology developed in core in associate projects with industry, academia, clinics, knowledge institutes • Validation and implementation of metabolomics platforms • QA/QC system/error model per metabolite • Clinical & preclinical studies (projects with partners) • >15 000 samples/year • > 2000 metabolites • Identification pipeline • Training & hands-on-workshops 5
  • 6. 14-2-2013 Platforms • Lipid analysis by LC-MS (ca. 300 individual compounds) • Amine analysis by LC-MS/MS (ca. 120 compounds) • Oxylipin analysis (ca. 140 compounds) • Global profiling by RP-LC-MS (ca. 450 compounds identified) • Global profiling by GC-MS (ca. 150 compounds) • Global profiling by CE-MS (ca. 300 compounds) • And more under development Large Metabolomics Measurement series DCL • IOP biomarkers for healthy aging – ±2500 samples, 28 batches – Measurement time ±28 weeks • Matching project LUMC and NCHA Netherlands centre for healthy Aging • Dutch Twin Register (NTR) – ±3000 samples, 31 batches – Measurement time ± 30 weeks • Dutch Twin Register (Nederlands Tweeling Register, NTR) • DiOGenes Diet, Obesity and Genes – ± 2000 samples, 27 batches – Measurement time ±14 weeks • NMC Associate project N & H cluster 6
  • 7. 14-2-2013 Measurement Design • Randomization, replication & blocking of measurements • Inclusion of compounds & samples to monitor (& eventually correct for) quality – Internal Standards – Calibration samples – Quality Control (QC) samples – Replicate samples (technical & analytical) – Blanks – System suitability samples – Transfer samples Typical sample sequence list Orde r 1 2 3 Nam e Blank Blank Blank Id Blank Blank Blank Leve l Batch P repar atio n Injectio n isSamp le isSST isQC isd QC isBlan k isCal isOut lier isSuspe ct 0 0 0 5 5 5 1 1 1 1 1 1 Co mmen t 4 Blank Blank 0 5 1 1 5 dSST.C2 dSST.C2 2 5 1 1 6 SST.C2 SST.C2 2 5 1 1 7 dQ C dQ C 4 5 1 1 Technical samples: system cleaning, testing and equilibrating. 8 QC QC 4 5 1 1 9 P5.C6.a C6 6 5 1 1 10 P5.C7.a C7 7 5 1 1 11 P5.C0.a C0 0 5 1 1 12 P5.C1.a C1 1 5 1 1 13 P5.C4.a C4 4 5 1 1 14 P5.C5.a C5 5 5 1 1 15 P5.C2.a C2 2 5 1 1 16 P5.C3.a C3 3 5 1 1 17 P5.C1 0543_090.3.01.0 4 5 1 1 18 P5.D1 0546_094.3.01.0 4 5 1 1 19 P5.E1 0550_076.3.01.0 4 5 1 1 20 QC QC 4 5 1 1 21 Blank Blank 0 5 1 1 22 dQ C QC 4 5 1 1 1 23 P5.F 1 0553_015.3.15.0 4 5 1 1 24 P5.G1 0555_097.3.01.0 4 5 1 1 25 P5.H1 0556_097.3.01.1 4 5 1 1 1 There might be somethi ng wrong here 26 P5.A2 0559_077.3.05.0 4 5 1 1 27 P5.B2 0561_103.3.01.1 4 5 1 1 1 Something wrong here 28 P5.C2 0563_103.3.01.0 4 5 1 1 29 P5.D2 0564_093.3.03.0 4 5 1 1 30 P5.E2 0570_095.3.01.0 4 5 1 1 31 P5. bE1 0550_076.3.01.0 4 5 2 1 32 P5. bA7 0631_057.3.09.0 4 5 2 1 33 QC QC 4 5 1 1 34 Blank Blank 0 5 1 1 35 dQ C dQ C 4 5 1 1 36 P5.F 2 0571_105.3.04.0 4 5 1 1 37 P5.G2 0573_105.3.03.0 4 5 1 1 38 P5.H2 0574_099.3.02.0 4 5 1 1 39 P5.A3 0575_099.3.01.0 4 5 1 1 40 P5.B3 0577_099.3.03.0 4 5 1 1 41 P5.C3 0578_099.3.01.1 4 5 1 1 42 P5.D3 0581_096.3.01.0 4 5 1 1 43 P5.E3 0582_101.3.01.0 4 5 1 1 44 P5.F 3 0584_123.3.01.0 4 5 1 1 45 P5.G3 0585_085.3.01.0 4 5 1 1 46 QC QC 4 5 1 1 47 Blank Blank 0 5 1 1 48 dQ C dQ C 4 5 1 1 49 P5.H3 0587_085.3.01.1 4 5 1 1 50 P5.A4 0589_095.3.01.1 4 5 1 1 51 P5.B4 0590_105.3.01.0 4 5 1 1 52 P5.C4 0591_105.3.02.0 4 5 1 1 53 P5.D4 0593_077.3.12.1 4 5 1 1 Running samples 54 P5.E4 0594_077.3.12.0 4 5 1 1 55 P5. bF9 0664_130.3.20.1 4 5 2 1 56 P5. bF10 0678_118.3.01.0 4 5 2 1 57 P5.F 4 0597_117.3.02.1 4 5 1 1 58 P5.G4 0598_117.3.02.0 4 5 1 1 59 QC QC 4 5 1 1 60 Blank Blank 0 5 1 1 61 dQ C dQ C 4 5 1 1 62 P5.H4 0599_117.3.01.1 4 5 1 1 63 P5.A5 0600_117.3.01.0 4 5 1 1 64 P5.B5 0603_098.3.04.0 4 5 1 1 65 P5.C5 0604_098.3.02.0 4 5 1 1 66 P5.D5 0605_098.3.01.0 4 5 1 1 67 P5.E5 0606_098.3.01.1 4 5 1 1 68 P5. bB3 0577_099.3.03.0 4 5 2 1 69 P5. bH3 0587_085.3.01.1 4 5 2 1 70 P5.F 5 0607_015.3.16.0 4 5 1 1 71 P5.G5 0608_078.3.02.0 4 5 1 1 72 QC QC 4 5 1 1 73 Blank Blank 0 5 1 1 74 dQ C dQ C 4 5 1 1 75 P5.H5 0609_078.3.03.0 4 5 1 1 76 P5.A6 0611_078.3.01.0 4 5 1 1 77 P5.B6 0612_088.3.02.0 4 5 1 1 78 P5.C6 0613_088.3.01.0 4 5 1 1 79 P5.D6 0616_085.3.02.0 4 5 1 1 80 P5.E6 0618_094.3.05.0 4 5 1 1 81 P5. bE6 0618_094.3.05.0 4 5 2 1 82 P5. bB10 0673_107.3.05.0 4 5 2 1 83 P5. bG1 0555_097.3.01.0 4 5 2 1 84 P5. bC4 0591_105.3.02.0 4 5 2 1 85 QC QC 4 5 1 1 86 Blank Blank 0 5 1 1 Calibration blocks at regular intervals 87 dQ C dQ C 4 5 1 1 88 P5.C3.b C3 3 5 1 1 89 P5.C7.b C7 7 5 1 1 90 P5.C2.b C2 2 5 1 1 91 P5.C6.b C6 6 5 1 1 92 P5.C5.b C5 5 5 1 1 93 P5.C4.b C4 4 5 1 1 94 P5.C0.b C0 0 5 1 1 95 P5.C1.b C1 1 5 1 1 96 P5.F 6 0620_107.3.01.0 4 5 1 1 97 P5.G6 0629_092.3.01.1 4 5 1 1 98 P5.H6 0630_092.3.01.0 4 5 1 1 99 QC QC 4 5 1 1 100 Blank Blank 0 5 1 1 101 dQ C dQ C 4 5 1 1 102 P5.A7 0631_057.3.09.0 4 5 1 1 103 P5.B7 0632_057.3.09.1 4 5 1 1 104 P5.C7 0634_091.3.01.0 4 5 1 1 105 P5.D7 0635_015.3.17.0 4 5 1 1 106 P5.E7 0638_072.3.01.0 4 5 1 1 107 P5.F 7 0639_066.3.03.0 4 5 1 1 108 P5.G7 0640_066.3.03.1 4 5 1 1 109 P5.H7 0642_109.3.02.0 4 5 1 1 110 P5.A8 0643_109.3.01.0 4 5 1 1 111 P5.B8 0646_110.3.06.1 4 5 1 1 112 QC QC 4 5 1 1 113 Blank Blank 0 5 1 1 114 dQ C dQ C 4 5 1 1 115 P5.C8 0647_110.3.01.0 4 5 1 1 116 P5.D8 0648_110.3.03.1 4 5 1 1 117 P5.E8 0649_110.3.03.0 4 5 1 1 118 P5.F 8 0650_110.3.06.0 4 5 1 1 119 P5. bH6 0630_092.3.01.0 4 5 2 1 120 P5. bF11 0689_065.3.22.0 4 5 2 1 121 P5.G8 0651_110.3.02.0 4 5 1 1 122 P5.H8 0655_108.3.01.1 4 5 1 1 123 P5.A9 0656_108.3.01.0 4 5 1 1 124 P5.B9 0658_111.3.01.0 4 5 1 1 125 QC QC 4 5 1 1 QC-blank-(dummy) QC sequence at regular intervals 126 Blank Blank 0 5 1 1 127 dQ C dQ C 4 5 1 1 128 P5.C9 0659_111.3.02.0 4 5 1 1 129 P5.D9 0661_128.3.01.0 4 5 1 1 130 P5. bF4 0597_117.3.02.1 4 5 2 1 131 P5. bC10 0675_129.3.01.1 4 5 2 1 132 P5.E9 0663_130.3.20.0 4 5 1 1 133 P5.F 9 0664_130.3.20.1 4 5 1 1 134 P5.G9 0665_130.3.19.1 4 5 1 1 135 P5.H9 0666_130.3.19.0 4 5 1 1 136 P5.A10 0668_097.3.10.0 4 5 1 1 137 P5.B10 0673_107.3.05.0 4 5 1 1 138 QC QC 4 5 1 1 139 Blank Blank 0 5 1 1 140 dQ C dQ C 4 5 1 1 141 P5. bB5 0603_098.3.04.0 4 5 2 1 142 P5.C10 0675_129.3.01.1 4 5 1 1 143 P5.D10 0676_129.3.01.0 4 5 1 1 144 P5.E10 0677_118.3.01.1 4 5 1 1 145 P5.F 10 0678_118.3.01.0 4 5 1 1 146 P5.G10 0681_118.3.02.0 4 5 1 1 147 P5. bH10 0683_078.3.05.0 4 5 2 1 148 P5. bD4 0593_077.3.12.1 4 5 2 1 1 O nly Integrated for TGs 149 P5.H10 0683_078.3.05.0 4 5 1 1 150 P5.A11 0684_065.3.27.0 4 5 1 1 151 QC QC 4 5 1 1 152 Blank Blank 0 5 1 1 153 dQ C dQ C 4 5 1 1 154 P5.B11 0685_065.3.28.0 4 5 1 1 155 P5.C11 0686_065.3.29.0 4 5 1 1 156 P5.D11 0687_065.3.26.0 4 5 1 1 157 P5.E11 0688_065.3.30.0 4 5 1 1 158 P5.F 11 0689_065.3.22.0 4 5 1 1 159 P5.G11 0690_065.3.20.0 4 5 1 1 160 P5.H11 0691_065.3.24.0 4 5 1 1 161 P5.A12 0693_065.3.23.0 4 5 1 1 Possible outliers are flagged and if confirmed ignored 162 P5.B12 0694_065.3.25.0 4 5 1 1 163 P5.C12 0696_112.3.04.0 4 5 1 1 164 QC QC 4 5 1 1 165 Blank Blank 0 5 1 1 166 dQ C dQ C 4 5 1 1 167 P5.D12 0697_112.3.04.1 4 5 1 1 168 P5.E12 0699_072.3.02.1 4 5 1 1 169 P5.F 12 0692_065.3.21.0 4 5 1 1 170 P5.C0.c C0 0 5 1 1 171 P5.C2.c C2 2 5 1 1 172 P5.C4.c C4 4 5 1 1 173 P5.C6.c C6 6 5 1 1 174 P5.C5.c C5 5 5 1 1 175 P5.C3.c C3 3 5 1 1 176 P5.C7.c C7 7 5 1 1 177 P5.C1.c C1 1 5 1 1 178 P5. bH7 0642_109.3.02.0 4 5 2 1 179 QC QC 4 5 1 1 180 Blank Blank 4 5 1 1 181 Blank Blank 0 5 1 1 182 Blank Blank 0 5 1 1 7
  • 8. 14-2-2013 Data Acquisition, LC-MS & GC-MS For one chemical compound, the pattern is approximately the multiplication of a component Intensity specific mass profile M/Z 6 5 and the abundance at a certain retention time 4 Intensity 3 2 1 Component specific mass profile: 0 1 2 3 4 5 6 Retention time 7 8 9 10 LC-MS: natural isotopes + adducts (soft ionization) GC-MS: fragments (hard ionization) 8
  • 9. 14-2-2013 number of mass channels selected for processing vs scan number 18000 16000 14000 Raw Data, LC-MS 12000 # mass channels 10000 8000 6000 4000 • Huge amount of data 2000 0 0 200 400 600 800 1000 1200 1400 ~1000s mass spectra (retention time scans) scan# ~10.000s ion chromatograms ~1.000.000s (m/z – retention time) pairs For each sample! • Complex data - Noise (detector noise and chemical noise), spikes, background - Concentration differences between the compounds are rather large and therefore also intensity differences 9
  • 10. 14-2-2013 Preprocessing, LC-MS • Targeted platforms: vendor preprocessing software – Expert knowledge => optimized settings • Untargeted platforms: in-house developed preprocessing software – Conversion of manufacturer formats to common formats (e.g. ‘netcdf’ & ‘mzxml’) – Centroiding and binning – Baseline correction – Alignment – Peak extraction (asks for an estimate of noise level) – Matching of peaks over samples • Result: feature/peak/compound list – m/z & rt: peak area Centroiding RAW CENTROIDED 10
  • 11. 14-2-2013 m/z shifts within a sample Small m/z shifts probably due to centroid sampling mode MS spectra and mass fluctuations during recording Binning • Binning algorithm: sum intensities within predefined bins = mass ranges • Definition of bins is a challenge, mostly related to the mass resolution (e.g. resolution = 10 000 define bin 100.00 – 100.01) • When done incorrect large influence on peak extraction steps 11
  • 12. 14-2-2013 Background correction TIC Background corrected Retention time alignment 5 x 10 3 2.5 2 1.5 1 0.5 0 -0.5 0 1000 2000 3000 4000 5000 6000 7000 5 x 10 detail 2.5 2 1.5 1 0.5 0 -0.5 2000 2200 2400 2600 2800 3000 3200 12
  • 13. 14-2-2013 Alignment algorithms target dataset • Dynamic Time Warping (DTW) – Time point by time point mapping (dynamic programming) dataset to align • Correlation Optimized Warping (COW) -optimization of correlation between – Piecewise linear, segments instead of the two pieces of each dataset -not allow large retention time individual time points (dyn. progr.) variation (determined by the slack parameter t) • (Semi)-Parametric Warping (PTW, Eilers) – Global, nonlinear (parametric transfer function estimation) Alignment algorithms 200 200 150 150 100 100 • Dynamic Time Warping (DTW) 50 50 – Time point by time point mapping 0 0 (dynamic programming) -50 3200 3300 3400 3500 -50 3200 3300 3400 3500 200 150 100 • Correlation Optimized Warping (COW) 50 – Piecewise linear, segments instead of 0 individual time points (dyn. progr.) -50 3200 3250 3300 3350 3400 3450 3500 Warped, detail 200 180 160 • Parametric Warping (Eilers) 140 120 100 – Global, nonlinear (parametric transfer 80 60 function estimation) 40 20 0 3250 3300 3350 3400 3450 3500 3550 13
  • 14. 14-2-2013 Peak/Feature extraction and peak integration • XCMS http://metlin.scripps.edu/xcms/index.php • MetAlign http://www.wageningenur.nl/en/show/MetAlign-1.htm • TNO-DECO Jellema, et al, Chemom. Intel. Lab. Systems, 104 (10) 132 • MZExtract van der Kloet et al, submitted TNO-DECO Works with GC-MS and not too complex LC-MS Decomposes experimental data into the product of pure mass spectra and concentration profiles of all compounds in the sample Advantages: -Result is combined mass spectrum (identification!!) -All samples analyzed at once Problems / issues: -Least squares (abundant compounds have large influence on result) -Noise level estimation -Correct binning essential Jellema, Chemo. Intel. Lab. Systems (2010) 104 132-139. 14
  • 15. 14-2-2013 Deconvolution Deconvolution of LC-MS data 6 Extracted mass spectra x 10 rt: 14.769 14 1 761 baseline corrected data 0.5 12 184 0 100 200 300 400 500 14.3868 rt: 600 700 800 900 1000 1 759 10 0.5 184 8 0 100 200 300 400 500 13.9818 rt: 600 700 800 900 1000 1 704 6 184 726 0.5 628 757 4 0 100 200 300 400 500 14.5777 rt: 600 700 800 900 1000 785 1 2 0.5 184 0 0 0 10 20 30 40 50 60 100 200 300 400 500 600 700 800 900 1000 6 6 x 10 Extracted chromatographic profiles x 10 reconstructed signal 16 14 14 12 12 10 10 8 8 6 6 4 4 2 2 00 10 20 30 40 50 60 0 0 10 20 30 40 50 60 15
  • 16. 14-2-2013 MZExtract Per sample: •Feature extraction of recalibrated and centroided data (in-house) •Integration of features (areas) •Grouping of features to feature-sets (enrichment step knowledge based: isotopes, adducts) Over samples: •Match feature-sets Advantage of two-step approach: fully scalable solution (parallel implementation) van der Kloet, submitted. Grouping related features within a single sample No retention time window necessary to match features (only isotopic patterns or other known relations, e.g. adducts) 16
  • 17. 14-2-2013 Validation Target list from MassHunter (Agilent) used to locate 174 known targets. – Mass window -> resolution 10.000 – RT window -> +/- 10 seconds – 171 were found – 3 missing targets: no isotopic patterns were detected (they were found in the list of ‘single’ features) How to validate unknown feature-sets? here: selection based on QC presence Comparable: 1.175 feature-sets about 3.200 unknown feature-sets Low abundant: 366 feature-sets 17
  • 18. 14-2-2013 PLS-DA, Selectivity ratio*, to quantify the variables discrimanatory ability The low abundant feature-sets do contain biological relevance! The most important feature-sets is an unknown! *Anal. Chem. 2009, 81, 2581–2590 Quality Assessment • Make use of all additional measured compounds and samples – Internal Standards – Replicates – Blanks – Quality Control samples • Quality Assessment => QC report (in-house) 18
  • 19. 14-2-2013 Part of a measurement run QC sample Study sample Replicate study sample Response Measurement Order QC report overviewtotable ANOVA for batch batch variation N mean std RSDqc RSD reps p-value diffs CholE02 58 0.0298 0.0079 26.4% 21.4% 0.000 (2-1,3-1,3-2,4-2,4-3) CholE04 46 0.0240 0.0124 51.9% 40.6% CholE05 58 0.0120 0.0024 20.4% 19.1% 0.000 (2-1,3-1,4-1,3-2,4-3) CholE06 58 0.0085 0.0021 24.7% 19.5% 0.000 (3-1,3-2,4-3) DG02 58 0.0049 0.0011 23.4% 22.7% 0.000 (2-1,3-1,4-1,3-2,4-2,4-3) LPC01 58 0.0183 0.0009 4.7% 4.8% 0.000 (4-1,4-2,4-3) LPC02 58 0.0130 0.0015 11.7% 11.5% 0.000 (2-1,3-1,4-1) LPC03 58 0.0101 0.0010 9.5% 12.1% 0.360 LPC04 58 0.0436 0.0019 4.4% 5.4% 0.000 (2-1,4-1,3-2,4-3) LPC05 58 1.8684 0.1259 6.7% 6.8% 0.000 (2-1,3-1,4-1,3-2,4-2,4-3) LPC07 58 0.0109 0.0007 6.1% 6.4% 0.004 (4-2) LPC08 58 0.6096 0.0141 2.3% 3.2% 0.000 (2-1,3-1,4-1,3-2,4-2,4-3) LPC09 58 0.4170 0.0200 4.8% 4.8% 0.000 (3-1,4-1,3-2,4-2,4-3) LPC10 58 0.6625 0.0976 14.7% 13.8% 0.000 (2-1,3-1,4-1,3-2,4-2,4-3) LPC11 58 0.0394 0.0446 113.1% 57.6% 0.000 (2-1,3-2,4-2,4-3) LPC12 58 0.1126 0.0024 2.1% 3.6% 0.000 (2-1,3-1,3-2,4-2,4-3) LPC13 58 0.0425 0.0049 11.5% 9.8% 0.000 (3-1,4-1,3-2,4-2) LPC14 58 0.0311 0.0010 3.3% 3.7% 0.000 (2-1,3-1,4-2,4-3) LPC16 58 0.0064 0.0016 24.9% 28.7% 0.000 (4-1,3-2,4-2,4-3) LPC17 58 0.0033 0.0010 32.0% 36.4% 0.000 (3-1,4-1,3-2,4-2,4-3) LPE02 58 0.0303 0.0056 18.6% 19.4% 0.000 (2-1,4-1,3-2,4-2,4-3) RSD values for LPE04 43 0.0034 0.0011 33.1% 21.9% PC01 58 0.0832 0.0105 12.6% 12.5% 0.000 (4-1,4-2,4-3) PC02• QC samples 58 0.3333 0.0151 4.5% 4.6% 0.000 (2-1,4-1,4-2,4-3) PC03 PC04 • Replicate samples 58 58 0.2238 0.1257 0.0077 0.0040 3.4% 3.1% 3.7% 4.8% 0.000 0.000 (2-1,3-1,4-1,4-2,4-3) (3-1,4-1,3-2,4-3) PC05 (independent validation) 58 0.0674 0.0248 36.8% 35.9% 0.000 (2-1,3-1,4-1,3-2,4-3) PC06 58 0.0667 0.0084 12.7% 10.1% 0.000 (2-1,4-1,3-2,4-3) PC07 58 0.0225 0.0026 11.5% 14.2% 0.000 (2-1,3-1,4-1,4-2,4-3) 19
  • 21. 14-2-2013 QC samples only Ratio (unc)Area RSD QC 25.8% 21
  • 22. 14-2-2013 Internal standard RSDQC=25.8% Internal Standard Corrected data RSDQC=20.6% 22
  • 23. 14-2-2013 Intra and Inter batch variation • Analytical Column ‘aging’ • Analytical Column replacement • Eluent ‘refills’ and small variations • Instrument malfunction/breakdown – Etc… Intra and Inter batch correction • Instead of just monitoring QC sample responses use them to correct variation 23
  • 24. 14-2-2013 QC correction QC sample Study sample Penalized smoother Response Measurement Order Van der Kloet et al., Journal of Proteome Research 2009 QC correction before after Response Response Measurement Order Measurement Order Van der Kloet et al., Journal of Proteome Research 2009 24
  • 25. 14-2-2013 QC correction van der Kloet et al., Journal of Proteome Research 2009 QC correction van der Kloet et al., Journal of Proteome Research 2009 25
  • 26. 14-2-2013 ISTD/QC corrected data RSDQC=4.1% RSDreplicates=10.0% All samples 26
  • 27. 14-2-2013 All batches Correction charts RSDQC RSDReplicates 27
  • 28. 14-2-2013 Scores plot based upon 93 lipids Uncorrected Area batches. Differences between Scores plot based on 93 components (Peak Area) 35 batch 1 30 batch 2 batch 3 batch 4 25 QC samples 20 15 PC 2 (14%) 10 5 0 -5 -10 -15 -15 Clear trends in QC 0samples. -10 -5 5 10 15 20 PC 1 (39.3%) Scores plot based upon 93 lipids ISTD Smaller differences between correction batches. Scores plot based on 93 components (ISTD correction) 15 batch 1 batch 2 batch 3 10 batch 4 QC samples 5 PC 2 (14.8%) 0 -5 -10 Spread in QC samples greatly -15 reduced. -10 However, batch to batch 5 -5 0 10 15 20 25 30 35 PC 1 (21.3%) differences remain present. 28
  • 29. 14-2-2013 Scores plot based upon 93 lipids Scores plot based on 93 components RSDqc<0.15 and RSDreps<0.15 20 15 batch 1 10 batch 2 batch 3 batch 4 PC 2 (14.7%) 5 QC samples 0 -5 -10 -15 -15 -10 -5 0 5 10 15 20 25 30 35 PC 1 (22.9%) Combining data in systems biology variables Comprehensive view of patient, animal, … : objects e.g. combine genomics, proteomics & metabolomics data 1 2 Data integration / fusion: joining data from different measurement approaches, same objects variables 1 objects Increase power of statistical analyses: Combine e.g. metabolomics batch datasets 2 ‘Equating’: (*) make comparable data from same measurement approach, different objects *Equating is psychometrical term 29
  • 30. 14-2-2013 Why not just concatenate datasets? variables • ‘Omics data typically batch data 1 objects • Metabolomics often not quantitative 2 datasets not comparable • Calibration model transfer would be solution but… ? …often no full calibration models can be made!* *Sangster et al, The Analyst 2006 (131): 1075-1078 A proposed approach: QC samples Correction for structural differences between series using quality control (QC) samples (pooled samples or representative samples)* (picture from reference below) *van der Greef et al, J Proteome Res 2007 (6): 1540-1559 30
  • 31. 14-2-2013 Problem with QC sample approach • Rationale: make medians of QC data equal for all series • Unwanted side-effect: inflation of variation in rest of data: Inflation of MAD in series 2 relative to series 1 Series 1 MAD Series 2, uncorrected Series 2, QC-corrected Lipid compounds MAD: median absolute deviation (robust SD) Alternative solution: equating variables • Combination of data from different measurement series 1 objects 2 • …in studies with limited number of internal standards (typically metabolomics!) • …or even from different studies • General: enables maximal flexibility in subsequent data analysis on combined datasets 31
  • 32. 14-2-2013 Illustration: LC–MS data • 182 (54 + 128) healthy participants (Netherlands Twin Register)* Measured in two series: • Blood samples (overnight fasting) year 1 (Y1) N=54 • Plasma analyzed with liquid chromatography–MS method for lipids + Target list for 59 lipids: LPC / PC / SPM / year 2 (Y2) N=128 ChE / TG Data per lipid corrected for class-specific internal standard *Draisma et al, OMICS 2008: 17–31 PCA scores before equating Y2 Y1 Data mean-centered prior to PCA 32
  • 33. 14-2-2013 Univariate quantile equating •Quantiles: values marking boundaries between regular intervals of the cumulative distribution function (CDF) •Example: 54 data values and associated CDF CDF 0.52 quantile 1/54 0.50 quantile (= median) 1/54 0.48 quantile Univariate quantile equating Average values of corresponding quantiles CDF Y1 x = 1.81 CDF(x) = 0.50 CDF Y2 x = 2.64 Data from: Frisby & Clatworthy, Perception 1975: 173-178 33
  • 34. 14-2-2013 Quantile equating Algorithm: 1. Number of quantiles = min {N1 , N2, …} 2. Average values of corresponding 1 1 quantiles by projection onto unit vector ( ,..., ) n n 3. Substitute averaged values for original values belonging to each quantile Often applied for quantile normalization (*) of gene arrays, between arrays (objects) over probes (variables) *Bolstad et al, Bioinformatics 2003: 185–193 Example univariate quantile equating Q-Q plot Y1 Projection onto CDF Y2 Projection onto unit vector: unit vector averaging Y2 After Y1 Y2 CDF Y1 Before 34
  • 35. 14-2-2013 PCA scores after equating LC–MS data After equating Before Y2 red: Y1 black: Y2 Y1 Data meancentered prior to PCA Y1–Y2 similarity in PCA score space* direction: location: variance: Box’sloadings D2 PCA M statistic Mahalanobis’ Y2 PC3 Y1 *Jouan-Rimbaud et al, Chemom Intell Lab Syst 1998: 129-144 35
  • 36. 14-2-2013 Y1–Y2 similarity in PCA score space direction variance location Before After equating equating All parameters: 0 = ‘dissimilar’, 1 = ‘similar’ Jouan-Rimbaud et al, Chemom Intell Lab Syst (1998) 129-144 Effects on clustering results Y2 Y1 No equating, Y1–Y2 datasets combined: Obvious Y2 between-series effect Y1 Draisma et al, Anal Chem (2010) 82 1039-1046 36
  • 37. 14-2-2013 Effects on clustering results ♂ ♀ After quantile equating, Y1–Y2 datasets combined: ♂ Y1–Y2 effect removed Biological information extractable from combined dataset ♀ Draisma et al, Anal Chem (2010) 82 1039-1046 Conclusions • ‘Garbage in = Garbage out’ so try to control data quality as much as possible • Proper measurement design allows separation of unwanted experimental variation from biological variation (IS, QCs, replicates) • Preprocessing: trade off between data quality, speed (automation) and completeness (number of features) • Road to high quality data is balanced mix of data acquisition and data processing 37
  • 38. 14-2-2013 Acknowledgements • DCL – Jorne Troost • LACDR – Evelyne Steenvoorden – Frans van der Kloet – Shanna Shi – Katrin Strassbourgh – Faisa Galud – Vanessa Gonzalez – Rob Vreeken – Margriet Hendriks – Amy Harms – Harmen Draisma – Raymond Ramakers – Thomas Hankemeier – Irina Paliukovich – Adrie Dane 38