8. 2014/12/20 データマイニング+WEB@東京 8
オミックス解析における
p >> n 問題
• Fan C et al. Concordance among gene-
expression-based predictors for breast
cancer. N Engl J Med 2006; 355: 560 – 569
• 乳がんの予後予測に関する過去5論文再調査
• 遺伝子群に殆ど重複がなかった
• サンプルを数百に増やし同様の手順で再解析
• 先述の4つの論文で遺伝子群の重複が認められた
• p >> n 問題
• サンプル数(n)に対して説明変数(p)が極端に高次元
• オミクスデータの解析はまさにp >> n問題と隣り合わせ
• 有効な解法として、LASSO/Boosting/Random Forests
(user!2008 Fox教授の基調講演から)
16. Ryota Suzuki
R AnalyticFlow: A flowchart-style GUI for R
Kensuke Okada, Kazuo Shigemasu
BMDS: A Collection of R Functions for Bayesian Multidimensional Scaling
Junji Nakano, Ei-ji Nakama
Speeding up R by using ISM-like calls
Tomoaki Nakatani
ccgarch: An R package for modelling multivariate GARCH models with conditional correlations
Bioinformatics II (Room: E29, Chair: Ramón Díaz-Uriarte)
Jacob Michaelson, Andreas Beyer
Random Forests for eQTL Analysis: A Performance Comparison
Chihiro Higuchi, Shigeo Takenaka
Metabolome data mining of mass spectrometry measurements with random forests
Matteo Pardo, Giorgio Sberveglieri
Random Forests and Nearest Shrunken
Centroids for the Classification of eNose data
Carolin Strobl, Achim Zeileis
Why and how to use random forest variable
importance measures (and how you shouldn't)