SlideShare una empresa de Scribd logo
1 de 34
Heuristic PCA Based Feature Extraction
and
Its Application to Bioinformatics
Y-h. Taguchi, Dept. Phys., Chuo Uinv.,
Y. Murakami, Grad. Sch. Med., Osaka City Univ.

M. Iwadate, Dept. Biol. Sci., Chuo Univ.
H. Umeyama, Dept. Biol. Sci., Chuo Univ.
A. Okamoto, Dept. Sch. Health Sci.,
Aichi Univ. Edu.
0. Why PCA?
PCA = principal component analysis
Motivation:
Unsupervised Feature Selection
How PCA?
10 Ordered
Features
90 random
Features

100 Features

20 samples
Class 1
Class 2
11111111110000000000
11111111110000000000
.
.
11111111110000000000
01000000110110011111
00011110000101011101
.
.
.
01000011000110101111
How to select 10 ordered features,
without classification information?
Embedding 100 features into 2D using PCA
90 random
Features

10 Ordered
Features
PC1 represents discrimination
between class 1 and class 2

Class 1

Class 2

20 samples
Applying “weak” unitary transformation to
the space spanned by 20 samples...
20 samples

20 samples
100 Features

Class 1 Class 2 10 Ordered
Features
90 random
Features

Class 1 Class 2
The same 2D embedding.
Thus we can select 10 features.

10 Ordered
Features

90 random
Features
PC1 “weakly” represents discrimination
between class 1 and class 2

Class 1

Class 2

20 samples
Linear discriminant analysis
+ leave one out cross validation
using 10 ordered features ….

True
class 1 2
Predict 1 8 2
228
Accuracy=Sensitivity=Specificity=80%

How about real examples?
1. Real example 1: Disease associated
aberrant promoter methylation
methylation
gene
promoter
three autoimmune diseases
SLE
RA
DM
[ MZ twins (healthy+sick) + 2 healthy controls] ✕ 5
= 20 samples → ✕3 diseases = 60 samples
vs ≈ 1000 potential methylation sites
Embedding of 〜1000 promoters within 20
RA samples into 2D with PCA (PC2 vs PC3)

PC3
Outlier promoters,
Selected

PC2
PC2:RA
Male Female
◯:Sick Twin
△:Healthy Twin
+:Healthy Control 1
☓:Healthy Control 2
Twins: Healthy > Sick
Controls: No
The 4th set: No
→ The reason why
unsupervised feature
selection is needed.

20 samples
Scatter plots between healthy/RA twins.
Red dots = selected promoters
Healthy twins
RA twins
P<2.2 ✕10

-16

-12
P=2.2✕10

-12
P=3.7✕10

P=3.9✕10

-1

P<2.2✕10

-16

Individual promoters are significantly aberrantly
methylated. Thus, feature selections are successful.
After repeating the same procedures to additional two
diseases (SLE and DM)....
Among three autoimmune diseases,
selected promoters are mostly common.

No other methods can achieve such an excellent
coincidence between three autoimmune diseases.
Lessons to learn:
Predefined class definition (e.g., 'sick
twin' vs 'healthy twin + two healthy
controls') is not a good strategy to
extract “important” features that can
exhibit much more complicated behavior
(e.g., upregulated for male while
downregulated for female)
Additional Remarks
Similar procedures were applied to
squamous cell carcinoma(*) and genes with
genotype-specific DNA methylation were
extracted. These genes were identified as
cancer-related genes using literature
searches and in silico drug screening was
performed for these genes (BMC Sys, Biol.
in press, to be presented at APBC2014).
(*) 食道がん
2. Real example 2: Circulating biomarker
findings for liver diseases
Why “circulating biomaker”?
→ non-invasive, thus less stresses.
Circulating = blood, etc
Target in this talk:
microRNAs in blood
→ microRNA is non-protein coding
RNA that regulates other transcript.
Data set: 14 diseases + healthy control
For example,
2D embeddings of 〜900 blood miRNAs using PCA
in 32 lung cancer + 70 healthy controls

PC2

10 outlier
miRNAs

PC1

However PC1 does
not exhibit clear
distinction between
lung
cancer
and
normal control any
more.... (not shown
here)
Prediction

Control vs Lung Cancer
LDA with PCA, leave one out cross validation
(using 10 miRNAs, up to the 5th PC)
True
control
lung cancer
control
56
8
lung cancer
14
24
Accuracy 0.784
Specificity 0.800
Sensitivity 0.750
Precision 0.632
What is the advantage of PCA based
feature extraction? → stability
Cross validation test (10 folds) of stability of
feature extraction (100 trials):
14 diseases vs normal control ✕ 10 miRNAs
= 140 miRNAs selected.
Ideally 140 miRNAs are always selected over
100 trials.
As a result, 129 out of 140 miRNAs are
selected by 100% probabilities.
Comparison of stabilities with other feature
extraction methods
UFF(*) : 111 out of 140 miRNAs
t-test based : 40 out of 140 miRNAs
SAM : 30 out of 140 miRNAs
gsMMD : 5 and 1 out of 140 miRNAs
RFE : 1 out of 140 miRNAs
ensemble RFE : 0 out of 140 miRNAs
(*) only another unsupervised FE
Lessons to learn:
Predefined class definition (e.g., 'sick
twin' vs 'healthy twin+two healthy
controls') is not a good strategy to
extract “stable” features. Too serious
consideration
of
classification
information may injure stability of
selected features.
Additional remarks:
10 miRNAs selected as biomarkers that
discriminate 14 diseases from normal control
were largely overlapped (every 10 miRNAs
were chosen from common 12 miRNAs).
In addition to this, these 12 miRNAs
discriminate seven additional diseases from
healthy controls, even using different
measuring methodology, samples and studies
(submitted).
3. Real example 3: Analysis of proteome
during bacterial incubation
Purpose :
Antibiotics are nothing but disaster of bacteria.
They try to kill not toxic bacteria and thus cause
resistance to drugs. If any other drugs that target
to proteins that are more specific to each bacteria
are targeted, it will be much better and effective.
In order to do this, at first, we need to know how
proteome
can
change
in
response
to
environmental changes.
Data set:
Two incubation conditions:
stable (normal) and shaking (oxidative stress)
Two fractions:
cellular and supernatant
Four time points:
From early to final through meddle growth phase
Three biological replicates.
In total:
2 ✕2 ✕4 ✕ 3 = 48 samples are available
2D embedding of 48 samples using PCA
Cellular

PC2
early
supernatant

PC1

late
supernatant
PCA embeddings of proteins
23 proteins selcted
(underlined are ribosomal ptoteins)

PC2
PC1

SPy1489:hlpA
SPy2039:speB
Spy1073:rplL
SPy2005
SPy2018:emm1
Spy0059:rpmC
Spy0611:tufA
Spy0274:plr
Spy0062:rplX
SPy2043:mf
Spy0613:tpi
Spy2079:AhpC
SPy1831:rpsF}
Spy2160:rpmG
SPy1373:ptsH
SPy0731:eno
Spy1371:gapN
Spy1881:pgk
SPy0711:speC
Spy0071:rpmD
SPy2070:groEL
Spy0019
SPy0712:mf2
using 23 proteins extracted via PCA

PC2
PC1
Lessons to learn:
Even if there are no criterion about what
kind of classifications are assumed,
unsupervised feature extraction can select
prominent features.
4. Discussion
Real example 1:
Commonly methylated promoters between three
autoimmune
diseases
were
found
by
unsupervised feature extraction.
Real example 2:
Stable circulating biomarkers were selected for
14
diseases
using
unsupervised
feature
extraction.
Real example 3:
Successful extraction of prominent features with
unsupervised feature extraction
Unsupervised feature extraction seems
to be the best method, however...
When does PCA based feature extraction work?
Is PCA based feature extraction the best?
Are there any other better unsupervised feature
extraction?
How can we evaluate unsupervised feature
extraction?
Are there any variables to be maximize?
I believe that people here
should be experts on this topics.
Help me....

Más contenido relacionado

La actualidad más candente

Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases Philip Bourne
 
The Biophotonic Scanner, Supplementation & The "Cady White Paper"
The Biophotonic Scanner, Supplementation & The "Cady White Paper"The Biophotonic Scanner, Supplementation & The "Cady White Paper"
The Biophotonic Scanner, Supplementation & The "Cady White Paper"Louis Cady, MD
 
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...yuliamax
 
IncellDx Oncobreast 3Dx CSUPERB Poster
IncellDx Oncobreast 3Dx CSUPERB PosterIncellDx Oncobreast 3Dx CSUPERB Poster
IncellDx Oncobreast 3Dx CSUPERB PosterAmanda Chargin
 
Genomic evaluation of low-heritability traits: dairy cattle health as a model
Genomic evaluation of low-heritability traits: dairy cattle health as a modelGenomic evaluation of low-heritability traits: dairy cattle health as a model
Genomic evaluation of low-heritability traits: dairy cattle health as a modelJohn B. Cole, Ph.D.
 
Chimeric Antigen Receptors (paper with corresponding power point)
Chimeric Antigen Receptors (paper with corresponding power point)Chimeric Antigen Receptors (paper with corresponding power point)
Chimeric Antigen Receptors (paper with corresponding power point)Kevin B Hugins
 
Rna seq - PDX models
Rna seq - PDX models Rna seq - PDX models
Rna seq - PDX models Amitha Dasari
 
Meacho targeting
Meacho targetingMeacho targeting
Meacho targetingArun kumar
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesSean Ekins
 
Anticancer drug screening
Anticancer drug screeningAnticancer drug screening
Anticancer drug screeningshishirkawde
 
MDC Connects: Biomarker identification - Assessing Immune Function
MDC Connects: Biomarker identification - Assessing Immune FunctionMDC Connects: Biomarker identification - Assessing Immune Function
MDC Connects: Biomarker identification - Assessing Immune FunctionMedicines Discovery Catapult
 
a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...
a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...
a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...Shannon Chesley
 
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX ModelsWhole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX ModelsTom Koch
 
2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, Leiden2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, LeidenAlain van Gool
 
Reprogramming cellular identity
Reprogramming cellular identityReprogramming cellular identity
Reprogramming cellular identityCaleb Henderson
 

La actualidad más candente (20)

Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases
 
Biomarkers
BiomarkersBiomarkers
Biomarkers
 
The Biophotonic Scanner, Supplementation & The "Cady White Paper"
The Biophotonic Scanner, Supplementation & The "Cady White Paper"The Biophotonic Scanner, Supplementation & The "Cady White Paper"
The Biophotonic Scanner, Supplementation & The "Cady White Paper"
 
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
AACR 2014 Abstract# 3730: A quick and cost effective 12-cell line panel assay...
 
2007 antiproliferative, cytotoxic and antitumour activity
2007 antiproliferative, cytotoxic and antitumour activity2007 antiproliferative, cytotoxic and antitumour activity
2007 antiproliferative, cytotoxic and antitumour activity
 
IncellDx Oncobreast 3Dx CSUPERB Poster
IncellDx Oncobreast 3Dx CSUPERB PosterIncellDx Oncobreast 3Dx CSUPERB Poster
IncellDx Oncobreast 3Dx CSUPERB Poster
 
Prevalence of Resistant Enzymes and Their Therapeutic Challenges
Prevalence of Resistant Enzymes and Their Therapeutic ChallengesPrevalence of Resistant Enzymes and Their Therapeutic Challenges
Prevalence of Resistant Enzymes and Their Therapeutic Challenges
 
Ames test
Ames testAmes test
Ames test
 
Genomic evaluation of low-heritability traits: dairy cattle health as a model
Genomic evaluation of low-heritability traits: dairy cattle health as a modelGenomic evaluation of low-heritability traits: dairy cattle health as a model
Genomic evaluation of low-heritability traits: dairy cattle health as a model
 
Chimeric Antigen Receptors (paper with corresponding power point)
Chimeric Antigen Receptors (paper with corresponding power point)Chimeric Antigen Receptors (paper with corresponding power point)
Chimeric Antigen Receptors (paper with corresponding power point)
 
Rna seq - PDX models
Rna seq - PDX models Rna seq - PDX models
Rna seq - PDX models
 
Meacho targeting
Meacho targetingMeacho targeting
Meacho targeting
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
 
Anticancer drug screening
Anticancer drug screeningAnticancer drug screening
Anticancer drug screening
 
MDC Connects: Biomarker identification - Assessing Immune Function
MDC Connects: Biomarker identification - Assessing Immune FunctionMDC Connects: Biomarker identification - Assessing Immune Function
MDC Connects: Biomarker identification - Assessing Immune Function
 
a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...
a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...
a-rat-pharmacokinetic-pharmacodynamic-model-for-assessment-of-lipopolysacchar...
 
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX ModelsWhole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
 
2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, Leiden2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, Leiden
 
Reprogramming cellular identity
Reprogramming cellular identityReprogramming cellular identity
Reprogramming cellular identity
 
Grant Proposal
Grant ProposalGrant Proposal
Grant Proposal
 

Similar a Heuristic PCA Based Feature Extraction and Its Application to Bioinformatics

From empirical biomarkers to models of disease mechanisms in the transition t...
From empirical biomarkers to models of disease mechanisms in the transition t...From empirical biomarkers to models of disease mechanisms in the transition t...
From empirical biomarkers to models of disease mechanisms in the transition t...Joaquin Dopazo
 
Clasificación de riesgo en renal metastásico
Clasificación de riesgo en renal metastásicoClasificación de riesgo en renal metastásico
Clasificación de riesgo en renal metastásicoMauricio Lema
 
Navigating through disease maps
Navigating through disease mapsNavigating through disease maps
Navigating through disease mapsJoaquin Dopazo
 
Capacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCC
Capacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCCCapacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCC
Capacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCCMauricio Lema
 
Provenge (sipuleucel t)
Provenge (sipuleucel t)Provenge (sipuleucel t)
Provenge (sipuleucel t)Vinblast
 
Slides for st judes
Slides for st judesSlides for st judes
Slides for st judesSean Ekins
 
Assessing the clinical utility of cancer genomic and proteomic data across tu...
Assessing the clinical utility of cancer genomic and proteomic data across tu...Assessing the clinical utility of cancer genomic and proteomic data across tu...
Assessing the clinical utility of cancer genomic and proteomic data across tu...Gul Muneer
 
Personalized medicine
Personalized medicinePersonalized medicine
Personalized medicinecancerdrg
 
2013-11-14 NVKCL symposium, Utrecht
2013-11-14 NVKCL symposium, Utrecht2013-11-14 NVKCL symposium, Utrecht
2013-11-14 NVKCL symposium, UtrechtAlain van Gool
 
Provenge (Sipuleucel T)
Provenge (Sipuleucel T)Provenge (Sipuleucel T)
Provenge (Sipuleucel T)Cytokinine
 
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...European School of Oncology
 
Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...
Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...
Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...KineMed, Inc.
 
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel DudleyMoving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel DudleyCityAge
 
Personalized medicine via molecular interrogation, data mining and systems bi...
Personalized medicine via molecular interrogation, data mining and systems bi...Personalized medicine via molecular interrogation, data mining and systems bi...
Personalized medicine via molecular interrogation, data mining and systems bi...Gerald Lushington
 
A comparative study using different measure of filteration
A comparative study using different measure of filterationA comparative study using different measure of filteration
A comparative study using different measure of filterationpurkaitjayati29
 
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...U.S. EPA Office of Research and Development
 
MET EV 2.pptx, metabolimics,genomics,approach
MET EV 2.pptx, metabolimics,genomics,approachMET EV 2.pptx, metabolimics,genomics,approach
MET EV 2.pptx, metabolimics,genomics,approachJyotshnaBolisetty
 
dual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverydual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverySean Ekins
 

Similar a Heuristic PCA Based Feature Extraction and Its Application to Bioinformatics (20)

From empirical biomarkers to models of disease mechanisms in the transition t...
From empirical biomarkers to models of disease mechanisms in the transition t...From empirical biomarkers to models of disease mechanisms in the transition t...
From empirical biomarkers to models of disease mechanisms in the transition t...
 
Clasificación de riesgo en renal metastásico
Clasificación de riesgo en renal metastásicoClasificación de riesgo en renal metastásico
Clasificación de riesgo en renal metastásico
 
Navigating through disease maps
Navigating through disease mapsNavigating through disease maps
Navigating through disease maps
 
Capacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCC
Capacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCCCapacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCC
Capacitación fuerza de ventas BMS Nivo/Ipi 1ra línea RCC
 
Toxicokinetics
ToxicokineticsToxicokinetics
Toxicokinetics
 
Provenge (sipuleucel t)
Provenge (sipuleucel t)Provenge (sipuleucel t)
Provenge (sipuleucel t)
 
Slides for st judes
Slides for st judesSlides for st judes
Slides for st judes
 
Assessing the clinical utility of cancer genomic and proteomic data across tu...
Assessing the clinical utility of cancer genomic and proteomic data across tu...Assessing the clinical utility of cancer genomic and proteomic data across tu...
Assessing the clinical utility of cancer genomic and proteomic data across tu...
 
Personalized medicine
Personalized medicinePersonalized medicine
Personalized medicine
 
2013-11-14 NVKCL symposium, Utrecht
2013-11-14 NVKCL symposium, Utrecht2013-11-14 NVKCL symposium, Utrecht
2013-11-14 NVKCL symposium, Utrecht
 
Provenge (Sipuleucel T)
Provenge (Sipuleucel T)Provenge (Sipuleucel T)
Provenge (Sipuleucel T)
 
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
 
Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...
Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...
Personalized & Translational Medicine - KineMed, Inc. - Marc Hellerstein, MD,...
 
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel DudleyMoving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
Moving from Big Data to Better Models of Disease and Drug Response - Joel Dudley
 
Personalized medicine via molecular interrogation, data mining and systems bi...
Personalized medicine via molecular interrogation, data mining and systems bi...Personalized medicine via molecular interrogation, data mining and systems bi...
Personalized medicine via molecular interrogation, data mining and systems bi...
 
SRC TMCOS 2015 2
SRC TMCOS 2015 2SRC TMCOS 2015 2
SRC TMCOS 2015 2
 
A comparative study using different measure of filteration
A comparative study using different measure of filterationA comparative study using different measure of filteration
A comparative study using different measure of filteration
 
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
Using Computational Toxicology to Enable Risk-Based Chemical Safety Decision ...
 
MET EV 2.pptx, metabolimics,genomics,approach
MET EV 2.pptx, metabolimics,genomics,approachMET EV 2.pptx, metabolimics,genomics,approach
MET EV 2.pptx, metabolimics,genomics,approach
 
dual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverydual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discovery
 

Más de Y-h Taguchi

Tensor decomposition based and principal component analysis based unsupervise...
Tensor decomposition based and principal component analysis based unsupervise...Tensor decomposition based and principal component analysis based unsupervise...
Tensor decomposition based and principal component analysis based unsupervise...Y-h Taguchi
 
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明Y-h Taguchi
 
Tensor decomposition­based unsupervised feature extraction identified the un...
Tensor decomposition­based unsupervised  feature extraction identified the un...Tensor decomposition­based unsupervised  feature extraction identified the un...
Tensor decomposition­based unsupervised feature extraction identified the un...Y-h Taguchi
 
Tensor decomposition ­based unsupervised feature extraction applied to matrix...
Tensor decomposition ­based unsupervised feature extraction applied to matrix...Tensor decomposition ­based unsupervised feature extraction applied to matrix...
Tensor decomposition ­based unsupervised feature extraction applied to matrix...Y-h Taguchi
 
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発Y-h Taguchi
 
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...Y-h Taguchi
 
Rectified factor networks for biclustering of omics data
Rectified factor networks for biclustering of omics dataRectified factor networks for biclustering of omics data
Rectified factor networks for biclustering of omics dataY-h Taguchi
 
テンソル分解を用いた教師なし学習による変数選択
テンソル分解を用いた教師なし学習による変数選択テンソル分解を用いた教師なし学習による変数選択
テンソル分解を用いた教師なし学習による変数選択Y-h Taguchi
 
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索Y-h Taguchi
 
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定Y-h Taguchi
 
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定Y-h Taguchi
 
Principal component analysis based unsupervised feature extraction applied to...
Principal component analysis based unsupervised feature extraction applied to...Principal component analysis based unsupervised feature extraction applied to...
Principal component analysis based unsupervised feature extraction applied to...Y-h Taguchi
 
microRNA-mRNA interaction identification in Wilms tumor using principal compo...
microRNA-mRNA interaction identification in Wilms tumor using principal compo...microRNA-mRNA interaction identification in Wilms tumor using principal compo...
microRNA-mRNA interaction identification in Wilms tumor using principal compo...Y-h Taguchi
 
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...Y-h Taguchi
 
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析Y-h Taguchi
 
PCAを用いた2群の有意差検定
PCAを用いた2群の有意差検定PCAを用いた2群の有意差検定
PCAを用いた2群の有意差検定Y-h Taguchi
 
SFRP1 is a possible candidate for epigenetic therapy in non­small cell lung ...
SFRP1 is a possible candidate for epigenetic  therapy in non­small cell lung ...SFRP1 is a possible candidate for epigenetic  therapy in non­small cell lung ...
SFRP1 is a possible candidate for epigenetic therapy in non­small cell lung ...Y-h Taguchi
 
A cross-species bi-clustering approach to identifying conserved co-regulated ...
A cross-species bi-clustering approach to identifying conserved co-regulated ...A cross-species bi-clustering approach to identifying conserved co-regulated ...
A cross-species bi-clustering approach to identifying conserved co-regulated ...Y-h Taguchi
 
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定Y-h Taguchi
 
Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...Y-h Taguchi
 

Más de Y-h Taguchi (20)

Tensor decomposition based and principal component analysis based unsupervise...
Tensor decomposition based and principal component analysis based unsupervise...Tensor decomposition based and principal component analysis based unsupervise...
Tensor decomposition based and principal component analysis based unsupervise...
 
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
主成分分析を用いた教師なし学習による筋萎縮性側索硬化症とがんの遺伝的関連性の解明
 
Tensor decomposition­based unsupervised feature extraction identified the un...
Tensor decomposition­based unsupervised  feature extraction identified the un...Tensor decomposition­based unsupervised  feature extraction identified the un...
Tensor decomposition­based unsupervised feature extraction identified the un...
 
Tensor decomposition ­based unsupervised feature extraction applied to matrix...
Tensor decomposition ­based unsupervised feature extraction applied to matrix...Tensor decomposition ­based unsupervised feature extraction applied to matrix...
Tensor decomposition ­based unsupervised feature extraction applied to matrix...
 
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
遺伝子発現プロファイルからの 薬剤標的タンパクの統計的推定法の開発
 
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
Identification of Candidate Drugs for Heart Failure using Tensor Decompositio...
 
Rectified factor networks for biclustering of omics data
Rectified factor networks for biclustering of omics dataRectified factor networks for biclustering of omics data
Rectified factor networks for biclustering of omics data
 
テンソル分解を用いた教師なし学習による変数選択
テンソル分解を用いた教師なし学習による変数選択テンソル分解を用いた教師なし学習による変数選択
テンソル分解を用いた教師なし学習による変数選択
 
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
主成分分析を用いた教師なし学習による変数選択を用いたヒストン脱アセチル化酵素阻害剤の機能探索
 
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
『主成分分析を用いた教師なし学習による変数選択』 を用いたデング出血熱原因遺伝子の推定
 
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
miRNA-mRNA相互作用同定を用いた 腎芽腫関連遺伝子の推定
 
Principal component analysis based unsupervised feature extraction applied to...
Principal component analysis based unsupervised feature extraction applied to...Principal component analysis based unsupervised feature extraction applied to...
Principal component analysis based unsupervised feature extraction applied to...
 
microRNA-mRNA interaction identification in Wilms tumor using principal compo...
microRNA-mRNA interaction identification in Wilms tumor using principal compo...microRNA-mRNA interaction identification in Wilms tumor using principal compo...
microRNA-mRNA interaction identification in Wilms tumor using principal compo...
 
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
Comprehensive analysis of transcriptome andmetabolome analysis in Intrahepati...
 
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
主成分分析を用いた教師なし学習による出芽酵母 の時間周期遺伝子発現プロファイルの解析
 
PCAを用いた2群の有意差検定
PCAを用いた2群の有意差検定PCAを用いた2群の有意差検定
PCAを用いた2群の有意差検定
 
SFRP1 is a possible candidate for epigenetic therapy in non­small cell lung ...
SFRP1 is a possible candidate for epigenetic  therapy in non­small cell lung ...SFRP1 is a possible candidate for epigenetic  therapy in non­small cell lung ...
SFRP1 is a possible candidate for epigenetic therapy in non­small cell lung ...
 
A cross-species bi-clustering approach to identifying conserved co-regulated ...
A cross-species bi-clustering approach to identifying conserved co-regulated ...A cross-species bi-clustering approach to identifying conserved co-regulated ...
A cross-species bi-clustering approach to identifying conserved co-regulated ...
 
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
主成分分析を用いた教師なし学習による変数選択法を用いたがんにおけるmRNA-miRNA相互作用のより信頼性のある同定
 
Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...Identification of aberrant gene expression associated with aberrant promoter ...
Identification of aberrant gene expression associated with aberrant promoter ...
 

Último

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Último (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

Heuristic PCA Based Feature Extraction and Its Application to Bioinformatics

  • 1. Heuristic PCA Based Feature Extraction and Its Application to Bioinformatics Y-h. Taguchi, Dept. Phys., Chuo Uinv., Y. Murakami, Grad. Sch. Med., Osaka City Univ. M. Iwadate, Dept. Biol. Sci., Chuo Univ. H. Umeyama, Dept. Biol. Sci., Chuo Univ. A. Okamoto, Dept. Sch. Health Sci., Aichi Univ. Edu.
  • 2. 0. Why PCA? PCA = principal component analysis Motivation: Unsupervised Feature Selection How PCA?
  • 3. 10 Ordered Features 90 random Features 100 Features 20 samples Class 1 Class 2 11111111110000000000 11111111110000000000 . . 11111111110000000000 01000000110110011111 00011110000101011101 . . . 01000011000110101111 How to select 10 ordered features, without classification information?
  • 4. Embedding 100 features into 2D using PCA 90 random Features 10 Ordered Features
  • 5. PC1 represents discrimination between class 1 and class 2 Class 1 Class 2 20 samples
  • 6. Applying “weak” unitary transformation to the space spanned by 20 samples... 20 samples 20 samples 100 Features Class 1 Class 2 10 Ordered Features 90 random Features Class 1 Class 2
  • 7. The same 2D embedding. Thus we can select 10 features. 10 Ordered Features 90 random Features
  • 8. PC1 “weakly” represents discrimination between class 1 and class 2 Class 1 Class 2 20 samples
  • 9. Linear discriminant analysis + leave one out cross validation using 10 ordered features …. True class 1 2 Predict 1 8 2 228 Accuracy=Sensitivity=Specificity=80% How about real examples?
  • 10. 1. Real example 1: Disease associated aberrant promoter methylation methylation gene promoter three autoimmune diseases SLE RA DM [ MZ twins (healthy+sick) + 2 healthy controls] ✕ 5 = 20 samples → ✕3 diseases = 60 samples vs ≈ 1000 potential methylation sites
  • 11. Embedding of 〜1000 promoters within 20 RA samples into 2D with PCA (PC2 vs PC3) PC3 Outlier promoters, Selected PC2
  • 12. PC2:RA Male Female ◯:Sick Twin △:Healthy Twin +:Healthy Control 1 ☓:Healthy Control 2 Twins: Healthy > Sick Controls: No The 4th set: No → The reason why unsupervised feature selection is needed. 20 samples
  • 13. Scatter plots between healthy/RA twins. Red dots = selected promoters Healthy twins RA twins P<2.2 ✕10 -16 -12 P=2.2✕10 -12 P=3.7✕10 P=3.9✕10 -1 P<2.2✕10 -16 Individual promoters are significantly aberrantly methylated. Thus, feature selections are successful. After repeating the same procedures to additional two diseases (SLE and DM)....
  • 14. Among three autoimmune diseases, selected promoters are mostly common. No other methods can achieve such an excellent coincidence between three autoimmune diseases.
  • 15. Lessons to learn: Predefined class definition (e.g., 'sick twin' vs 'healthy twin + two healthy controls') is not a good strategy to extract “important” features that can exhibit much more complicated behavior (e.g., upregulated for male while downregulated for female)
  • 16. Additional Remarks Similar procedures were applied to squamous cell carcinoma(*) and genes with genotype-specific DNA methylation were extracted. These genes were identified as cancer-related genes using literature searches and in silico drug screening was performed for these genes (BMC Sys, Biol. in press, to be presented at APBC2014). (*) 食道がん
  • 17. 2. Real example 2: Circulating biomarker findings for liver diseases Why “circulating biomaker”? → non-invasive, thus less stresses. Circulating = blood, etc Target in this talk: microRNAs in blood → microRNA is non-protein coding RNA that regulates other transcript.
  • 18. Data set: 14 diseases + healthy control For example, 2D embeddings of 〜900 blood miRNAs using PCA in 32 lung cancer + 70 healthy controls PC2 10 outlier miRNAs PC1 However PC1 does not exhibit clear distinction between lung cancer and normal control any more.... (not shown here)
  • 19. Prediction Control vs Lung Cancer LDA with PCA, leave one out cross validation (using 10 miRNAs, up to the 5th PC) True control lung cancer control 56 8 lung cancer 14 24 Accuracy 0.784 Specificity 0.800 Sensitivity 0.750 Precision 0.632
  • 20.
  • 21. What is the advantage of PCA based feature extraction? → stability Cross validation test (10 folds) of stability of feature extraction (100 trials): 14 diseases vs normal control ✕ 10 miRNAs = 140 miRNAs selected. Ideally 140 miRNAs are always selected over 100 trials. As a result, 129 out of 140 miRNAs are selected by 100% probabilities.
  • 22. Comparison of stabilities with other feature extraction methods UFF(*) : 111 out of 140 miRNAs t-test based : 40 out of 140 miRNAs SAM : 30 out of 140 miRNAs gsMMD : 5 and 1 out of 140 miRNAs RFE : 1 out of 140 miRNAs ensemble RFE : 0 out of 140 miRNAs (*) only another unsupervised FE
  • 23. Lessons to learn: Predefined class definition (e.g., 'sick twin' vs 'healthy twin+two healthy controls') is not a good strategy to extract “stable” features. Too serious consideration of classification information may injure stability of selected features.
  • 24. Additional remarks: 10 miRNAs selected as biomarkers that discriminate 14 diseases from normal control were largely overlapped (every 10 miRNAs were chosen from common 12 miRNAs). In addition to this, these 12 miRNAs discriminate seven additional diseases from healthy controls, even using different measuring methodology, samples and studies (submitted).
  • 25. 3. Real example 3: Analysis of proteome during bacterial incubation Purpose : Antibiotics are nothing but disaster of bacteria. They try to kill not toxic bacteria and thus cause resistance to drugs. If any other drugs that target to proteins that are more specific to each bacteria are targeted, it will be much better and effective. In order to do this, at first, we need to know how proteome can change in response to environmental changes.
  • 26. Data set: Two incubation conditions: stable (normal) and shaking (oxidative stress) Two fractions: cellular and supernatant Four time points: From early to final through meddle growth phase Three biological replicates. In total: 2 ✕2 ✕4 ✕ 3 = 48 samples are available
  • 27. 2D embedding of 48 samples using PCA Cellular PC2 early supernatant PC1 late supernatant
  • 28. PCA embeddings of proteins 23 proteins selcted (underlined are ribosomal ptoteins) PC2 PC1 SPy1489:hlpA SPy2039:speB Spy1073:rplL SPy2005 SPy2018:emm1 Spy0059:rpmC Spy0611:tufA Spy0274:plr Spy0062:rplX SPy2043:mf Spy0613:tpi Spy2079:AhpC SPy1831:rpsF} Spy2160:rpmG SPy1373:ptsH SPy0731:eno Spy1371:gapN Spy1881:pgk SPy0711:speC Spy0071:rpmD SPy2070:groEL Spy0019 SPy0712:mf2
  • 29. using 23 proteins extracted via PCA PC2 PC1
  • 30.
  • 31. Lessons to learn: Even if there are no criterion about what kind of classifications are assumed, unsupervised feature extraction can select prominent features.
  • 32. 4. Discussion Real example 1: Commonly methylated promoters between three autoimmune diseases were found by unsupervised feature extraction. Real example 2: Stable circulating biomarkers were selected for 14 diseases using unsupervised feature extraction. Real example 3: Successful extraction of prominent features with unsupervised feature extraction
  • 33. Unsupervised feature extraction seems to be the best method, however... When does PCA based feature extraction work? Is PCA based feature extraction the best? Are there any other better unsupervised feature extraction? How can we evaluate unsupervised feature extraction? Are there any variables to be maximize?
  • 34. I believe that people here should be experts on this topics. Help me....