SlideShare una empresa de Scribd logo
1 de 418
Descargar para leer sin conexión
z
z
-2-
•
(

,

)

············································· 5

SAS
• Density-Based Graph Partitioning Algorithm
,

(

)

················································································································· 35

• Scalable Variational Bayesian Matrix Factorization
,

(POSTECH)

················································································································· 59

• A Hybrid Genetic Algorithm for Accelerating Feature Selection and Parameter
Optimization of SVM
,

(

)

················································································································· 71

• Documents Recommendation Using Large Citation Data
,

(

),

(

)

······························································· 81

1: Text Mining Applications
•
,

(

)

··············································································································· 103

•
,

(

)

················································································································ 119

• Document Indexing by Ensemble Model
Yanshan Wang,

(

)

································································································· 135

•
,

,

,

,

(

)

··································································· 145

2: Feature Selection & Efficient Computing
• Fused Lasso
,

(

)

·············································································································· 139

• Classification with discrete and continuous variables via Markov Blanket Feature Selection
,

(POSTECH)

,

(

·················································································································· 175

• R
)

·············································································································· 195

• Revisiting the Bradley-Terry Model and Its Application for Information Retrieval
,

(

)

·············································································································· 205
3: Visualization & Text Analytics
•
(

)

····························································································································· 235

•
(

),

,

,

(

)

··········································· 249

•
,

,

(

)

······························································································· 265

4: SNS and Bibliography Analytics
•
,

,

,

(

)

····················································································· 285

• Modified LDA with Bibliography Information
,

(

)

············································································································· 293

•

:
,

(

)

·············································································································· 301

5: Rcommendation Systems
•
,

,

(

)

··································································································· 347

• MovieRank: Combining Structural and Feature information Ranking Measure
*,

*,

**,

* (*

, **

)

···························· 375

• A New Approach to Recommend Novel Items
,

(

)

············································································································· 371

6: Data Mining Applications
•
,

(

,

,

,

(

)

············································································································· 389

•
(

)

······························································································· 393

•
)

············································································································· 405
Data Science Center
(

)

DSC

2
2010 Æ2020

2
75

The future belongs to the one
who rules the data
-5-

DSC
DSC

-6-

DSC
•
•
•

•
•
•

•
•
•

•
•
•

DSC

-7-

DSC
DSC

-8-

DSC
DSC

-9-

DSC
Data are widely available;
what is scarce is ability to discover
wisdom from them.

•

•

•

•

DSC

•
•

•

- 10 -

DSC
9
9

9

9
9
9

9
9
9

DSC

•

Æ
Æ

Æ

- 11 -

DSC
•
•
•
•

•

•

DSC

- 12 -

DSC
DSC

- 13 -

DSC
DSC

- 14 -

DSC
Big
Data
•

•

DSC

•

•

- 15 -

DSC
•
•
•
•

DSC

- 16 -

DSC
DSC
DSC

- 17 -
DSC

- 18 -

DSC
DSC

- 19 -

DSC
DSC

- 20 -

DSC
•
•
•
•
•

•
•
•
•

•
•
•
•
•
•

•
•

•
•
•
•
•

•
•
•
•
•

DSC

˜

- 21 -

DSC
DSC

- 22 -

DSC
‹

DSC

- 23 -

DSC
DSC

- 24 -

DSC
END

2007-09-10

SBP

DBP

DSC

9:00 AM

- 25 -

DSC
DSC

- 26 -

DSC
DSC

- 27 -

DSC
DSC

- 28 -

DSC
DSC

- 29 -

DSC
DSC

- 30 -

DSC
DSC

- 31 -
- 32 -
SAS

- 33 -
- 34 -
•

•

•

•

- 35 -
[1] Pang-Ning, T., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. In Library of Congress.

•

•

•

•

•
[2] Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651-666..

- 36 -
[3] Schaeffer, S. E. (2007). Graph clustering. Computer Science Review, 1(1), 27-64.

•
k

min ∑ ∑ w jl
i =1 j∈Ci
l∉Ci

where k is the number of clusters

•
•
•
•
•
[4] Boutin, F., & Hascoet, M. (2004, July). Cluster validity indices for graph partitioning. In Information Visualisation, 2004. IV 2004. Proceedings.
Eighth International Conference on (pp. 376-381). IEEE.
[5] Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal
Statistical Society: Series B (Statistical Methodology), 63(2), 411-423.
[6] Patkar, S. B., & Narayanan, H. (2003, January). An efficient practical heuristic for good ratio-cut partitioning. In VLSI Design, 2003.
Proceedings. 16th International Conference on (pp. 64-69). IEEE.

- 37 -
•

•

•
•
•

[7] Sibson, R. (1973), “SLINK: an optimally efficient algorithm for the single-link cluster method”, The Computer Journal, Vol. 116, No. 1, pp. 30-34.
[8] Defays, D. (1977). An efficient algorithm for a complete link method. The Computer Journal, 20(4), 364-366.

•

L

L= D− A
•

A

D

d1, d2, ..., dn

A = ⎡ aij ⎤ , i,j=1, 2,
⎣ ⎦

,n

n

di = ∑ aij
j =1

•
•
•
•

[9] Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing, 17(4), 395-416.
[10] Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. Advances in neural information processing
systems, 2, 849-856.

- 38 -
•

Q=

(

1 k
∑ ∑ A jl − d j d l / 2m
2m i =1 j∈Ci

)

l∈Ci

•
•
•
•
•
[11] Newman, M. E., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical review E, 69(2), 026113.
[12] Clauset, A., Newman, M. E., & Moore, C. (2004). Finding community structure in very large networks. Physical review E, 70(6), 066111.
[13] Kehagias, A. (2012). Bad Communities with High Modularity. arXiv preprint arXiv:1209.2678.

•
•
•

[14] Daszykowski, M., Walczak, B., & Massart, D. L. (2001). Looking for natural patterns in data: Part 1. Density-based approach. Chemometrics
and Intelligent Laboratory Systems, 56(2), 83-92.

- 39 -
•
•

- 40 -
•
•
⎧
⎛ d x ,x
i
j
⎪
exp ⎜ −
⎪
⎜
wij = ⎨
d ik d k
j
⎜
⎝
⎪
⎪0
⎩

(

k
i

)

2

⎞
⎟
⎟
⎟
⎠

if x j ∈ xik and xi ∈ x k
j
ohterwise

k
i

where x is the k-nearest set of point i and d is distance between point i and k-th neighbor of point i

•
•

i

j

i

j

•

[15] Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In Advances in neural information processing systems (pp. 1601-1608).
[16] Ertoz, L., Steinbach, M., & Kumar, V. (2002, April). A new shared nearest neighbor clustering algorithm and its applications. In Workshop
on Clustering High Dimensional Data and its Applications at 2nd SIAM International Conference on Data Mining (pp. 105-115).

•

- 41 -
•
•
•

di = ∑ wij + ∑ w jk .
j∈xik

j ,k∈xik

k
where xi is the k-nearest set of point i

•

i
i

•

•
•

α
α

•
•
160

5
140

4
120

3
100

2
80

1
60

0

40

-1

20

-2
-5

-4

-3

-2

-1

0

1

2

3

4

5

0

0

- 42 -

50

100

150

200

250

300

350

400

450

500
•

•

- 43 -
•
•

•
•

- 44 -
•

•

- 45 -
•

•

- 46 -
•

•

- 47 -
•

•

- 48 -
•

∑ wij = 0.9

j∈C1

∑ wij = 0

j∈C2

•

∑ wij = 0.9

j∈C1

∑ wij = 0

j∈C2

- 49 -
•

∑ wij = 0.2

j∈C1

∑ wij = 0.3

j∈C2

•

∑ wij = 0.2

j∈C1

- 50 -

∑ wij = 0.3

j∈C2
•
•

•

- 51 -
•

•

∑ wij = 2.1

j∈C1

∑ wij = 0.5

j∈C2

- 52 -
•

∑ wij = 0.7

j∈C1

∑ wij = 1.4

j∈C2

•

- 53 -
•

•
•
•

- 54 -
•
•
•

•

6

6

5

5

4

4

3

3

2

2

1

1

0

0

-1

-1

-2

-2

-6

-4

-2

0

2

4

Cluster 1
Cluster 2
Cluster 3
-6

6

-2

0

2

4

2

1

4

3

2

2

4

3

0

5

4

-2

6

5

-4

1

0

0

-1

-1

Cluster 1
Cluster 2
Cluster 3

-2

-6

-4

Cluster 1
Cluster 2
Cluster 3

-2
-2

0

2

4

-6

- 55 -

-4
5

5

4

4

3

3

2

2

Cluster 1
Cluster 2
Cluster 3

1

1

0

0

-1

-1
-8

-6

-4

-2

0

2

4

6

5

8

-8

Cluster 1
Cluster 2
Cluster 3

4

-6

-4

-2

0

2

4

6

Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Cluster 6

5

4

3

3

2

2

1

1

0

0

-1

8

-1

-8

-6

-4

-2

0

2

4

6

-8

8

6

6

4

4

2

-6

-4

-2

0

2

4

6

8

2

0

0

-2

-2

-4

-4

-6

-6

-8

Cluster 1
Cluster 2

-8

-12

6

4

-10

-8

-6

-4

-2

0

2

4

6

-12

6

Cluster 1
Cluster 2

4

2

0

2

4

6

Cluster 1
Cluster 2
Cluster 3

-6

-8

-2

-4

-6

-4

-2

-4

-6

0

-2

-8

2

0

-10

-8

-12

-10

-8

-6

-4

-2

0

2

4

6

-12

- 56 -

-10

-8

-6

-4

-2

0

2

4

6
3.5

3.5

3

3

2.5

2.5

2

2

1.5

1.5

1

1

0.5

0.5

0

0

-0.5

-0.5

-1

Cluster 1
Cluster 2
Cluster 3

-1

-1.5

-1.5

-1.5

-1

-0.5

0

0.5

1

1.5

3.5

-1.5

-1

-0.5

0

0.5

1

1.5

-0.5

0

0.5

1

1.5

3.5

3

3

2.5

2.5

2

2

1.5

1.5

1

1

0.5

0.5

0

0

-0.5

Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Cluster 6
Cluster 7

-0.5

Cluster 1
Cluster 2
Cluster 3

-1
-1.5
-1.5

-1

-0.5

0

0.5

1

1.5

-1
-1.5
-1.5

•

•

•

•

•

- 57 -

-1
•

•

•

•

•

Thank you for Listening

Any Questions?

- 58 -
Scalable Variational Bayesian
Matrix Factorization

KDMS 2013

Outline
1. Matrix Factorization
for Collaborative Prediction
2. Regularized Matrix Factorization
vs. Bayesian Matrix Factorization
3. Scalable Variational Bayesian
Matrix Factorization
4. Related Works
5. Numerical Experiments
6. Conclusion

- 59 -

2
Item 1

Item 2

Item 3

Item 4

Matrix Factorization for Collaborative Prediction

User 1

6

9

3

?

3

0

User 2

4

?

2

0

2

0

User 3

0

0

2

3

0

1

User 4

0

?

4

?

0

2

Item Factor Matrix

|

u

2
0

3
0

1
2

0
3

User Factor Matrix
• Collaborative prediction
Filling missing entries of the user-item rating matrix
• Matrix factorization
Predicting an unknown rating by
product of user factor vector and item factor vector
3

Regularized Matrix Factorization
• Minimize the regularized squared error loss
Alternating Least Squares (ALS)
Time complexity
O(2|Ω|K2+(I+J)K3)
Parallelization
Easy
Tuning parameter
λ (regularization)

- 60 -

4
Regularized Matrix Factorization
• Minimize the regularized squared error loss
Stochastic Gradient Descent (SGD)
Time complexity
O(2|Ω|K)
Parallelization
Possible, but not easy
Tuning parameter
λ (regularization)
(learning rate)

5

Problem of parameter tuning
• Too small : overfitting
• Too large : underfitting

- 61 -

6
Problem of parameter tuning
• The value of optimal regularization parameter is
different depend on the dataset and rank K.

Regularization parameter chosen by cross-validation on various
datasets and rank K (Kim & Choi, IEEE SPL 2013)
7

Problem of parameter tuning
• SGD require tuning of regularization parameter,
learning rate and even the number of epochs.
0.005

0.007

0.010

0.015

0.020

0.005

0.9061/ 13 0.9079/ 15 0.9117/ 19 0.9168/ 28 0.9168/ 44

0.007

0.9056/ 10 0.9074/ 11 0.9112/ 13 0.9168/ 19 0.9169/ 31

0.010

0.9064/ 7

0.9077/ 8

0.9113/ 10 0.9174/ 13 0.9186/ 21

0.015

0.9099/ 5

0.9011/ 6

0.9152/ 6

0.9257/ 7

0.9390/ 7

0.020

0.9166/ 4

0.9175/ 4

0.9217/ 4

0.9314/ 4

0.9431/ 3

Netflix probe10 RMSE/optimal number of epochs of the BRSIMF for
various and values ( =40). (Tákacs et al., JMLR 2009)
- 62 -

8
Bayesian Matrix Factorization
Prior

P(U), P(V)

Likelihood
P(X |U,V)

Posterior

P(U,V |X)

Approximate the posterior by
MCMC (Salakhutdinov & Mnih, ICML 2008)
Variational method (Lim & Teh, KDDcup 2007)

MCMC on Netflix

No parameter tuning
No overfitting
High accuracy
Huge computational cost
O(2|Ω|K2+(I+J)K3)
9

Scalable Variational Bayesian Matrix Factorization
• No parameter tuning
• Linear space complexity: O(2(I+J)K)
• Linear time complexity: O(6|Ω|K)
• Easily parallelized on multi-core systems
• Optimize
element-wisely factorized variational distribution
with coordinate descent method.
- 63 -

10
Variational Bayesian Matrix Factorization
• Likelihood is given by
• Gaussian priors on factor matrices U and V:

• Approximate posterior by variational distribution by
maximizing the variational lower bound,
or equivalently minimizing the KL-divergence

11

VBMF-BCD (Lim & The KDDcup 2007)
• Matrix-wisely factorized variational distribution

VBMF-BCD
Space complexity
O((I+J)(K+K2))
Time complexity
O(2|Ω|K2+(I+J)K3)
Parallelization
Easy

- 64 -

12
Scalable VBMF: linear space complexity

Element-wisely factorized variational distribution

K=100

O((I+J)(K+K2))

O(2(I+J)K)

Netflix
I = 480,189
J = 17,770

4.4 GB

0.8 GB

Yahoo-music
I = 1,000,990
J = 624,961

131 GB

2.6 GB
13

Scalable VBMF: quadratic time complexity
Updating rules for q(uki)

Updating all variational parameters

- 65 -

14
Scalable VBMF: linear time complexity
Let Rij denote the residual on ( i, j ) observation:

With Rij , updating rule can be rewritten as

15

Scalable VBMF: linear time complexity
When
is changed to
updated to

,

- 66 -

can be easily

16
Scalable VBMF: parallelization
I

K
• Each column of variational parameters can be updated
independently from the updates of other columns.
• Parallelization can be easily done in a column-by-column
manner.
• Easy implementation with the OpenMP library on multi-core
system.
17

Related work

(Pilásy et al., ReSys 2010)

• Similar idea is used to reduce the cubic time
complexity of ALS to linear one.
RMF

Scalable VBMF

With small extra effort,
more accurate model
is obtainable without
tuning of regularization
parameter

- 67 -

18
Related Work

(Raiko et al., ECML 2007)

• Consider element-wisely factorized variational
distribution

• Update U and V by scaled gradient descent method

• Require tuning of learning rate
• Learning speed is slower than our algorithm
19

Numerical Experiments
• Compare VBMF-CD, VBMF-BCD (Lim & The KDDcup 2007),
VBMF-GD (Raiko et al., ECML 2007)

• Experimental environment
– Quad-core Intel® core™ i7-3820 @ 3.6GHz
– 64 GB memory
– Implemented in Matlab 2011a, where main computational
modules are implemented in C++ as mex files
– Parallelized with the OpenMP library

• Datasets
MovieLens10M

Netflix

Yahoo-music

# of user

69,878

480,189

1,000,990

# of item

10,677

17,770

624,961

10,000,054

100,480,507

262,810,275

# of rating

- 68 -

20
Numerical Experiments:

= 20

RMSE versus computation time on a quad-core system for each dataset:
(a) MovieLens10M, (b) Netflix, (c) Yahoo-music

MovieLens10M

Netflix

Yahoo-music

VBMF-CD

0.8589

0.9065

22.3425

VBMF-BCD

0.8671

0.9070

22.3671

VBMF-GD

0.8591

0.9167

22.5883

21

Numerical Experiments: Netflix,

= 50

Time per iter.

VBMF-BCD

66 min.

VBMF-CD

77 sec.

VBMF-GD

29 sec.

RMSE

VBMF-BCD

VBMF-CD

Iter.

Time

Iter.

Time

0.9005

19

21 h

63

74 m

0.9004

21

23 h

70

82 m

0.9003

22

24 h

84

98 m

0.9002

25

28 h

108

2h

0.9001

27

31 h

680

13 h

0.9000

30

33 h

- 69 -

22
Conclusion
• We presented scalable learning algorithm for VBMF, VBMFCD.
• VBMF-CD optimizes element-wisely factorized variational
distributions with coordinate descent method.
• Space and time complexity of VBMF-CD are linear.
• VBMF-CD can be easily parallelized.
• Experimental results confirmed the user behavior of VBMFCD such as scalability, fast learning, and prediction accuracy.

23

- 70 -
A hybrid genetic algorithm for accelerating feature selection and
parameter optimization of support vector machine
2013. 11. 29.

Introduction
• Support Vector Machine (SVM)
– One of the most popular state-of-the-art classification algorithms.
– efficiently finds non-linear solutions by exploiting kernel functions.
– Takes training time complexity O(N3).

• “Very important” issues on training SVM
– Feature selection
• SVM is a distance based algorithm (kernel matrix computation), and doesn’t include
any feature selection mechanism.
• Irrelevant features degrade the model performance.

– Parameter optimization
• Model Tradeoff parameter C, Kernel parameter σ (for the RBF kernel).
• SVM is very sensitive to the parameter settings.

– For SVM, feature selection and parameter optimization should be performed
simultaneously.
- 71 -

2
Introduction
• Genetic algorithm (GA)
– A stochastic algorithm that mimics natural evolution.
– easy, but very effective!
Selection

Parents
Genetic operation
(Crossover, Mutation)

Population
p

Replacement

Offspring

• GA-based feature selection and parameter selection of SVM [1-4]
– GA effectively finds near-optimal feature subsets and parameters.
– But, Slow. (But, MUCH better than Grid-search mechanism.)
3

Introduction
If the SVM have to be re-trained periodically, fast feature selection and
parameter optimization is required.
This study aims to avoid producing a bad offspring in the “Genetic Operation”
step of GA.
This study proposes a chromosome filtering method for faster convergence of
GA using Decision Tree (DT) for feature selection and parameter optimization
of SVM.

- 72 -

4
The proposed method
• Flowchart

Initialization

Population
Population Replacement
Evaluate fitness

no
yes

Chromosome
Filtering
Termination
condition?

no

yes

Do genetic operations

Optimized
parameters and
feature subset

5

The proposed method
• Chromosome design
– Parameters: binary representation
C:

0

0

1

0

1

10-2

σ:

1

10-1

1

101

102

103

C=1 x 10-2 + 1 x 101

2-5 , … , 25

– Feature subset: binary representation
1

0

0

1

0

…

f 1 f2 f 3 f4 f 5

1

0

{f1, f4, … , fp-1}

fp-1 fp

Genotype

Phenotype

- 73 -

6
The proposed method
• Fitness evaluation
– Decode chromosome and obtain C, σ, and a feature subset.
• Genotype Æ Phenotype

– Train SVM for a dataset
given the selected C, σ, and feature
subset.
– Fitness value: Cross Validation Accuracy

7

The proposed method
• Genetic operation
– Parent selection
• Roulette-wheel scheme - Fitness proportional selection (FPS)
• Probability of i-th chromosome ci in the population to be selected =
– where f(i) is the fitness of ci

– Crossover: N-point crossover
• Choose N random crossover points, split along those points.

– Mutation: Bit-flipping mutation
• Bitwise bit-flipping with fixed probability.

- 74 -

8
The proposed method
• Chromosome Filtering
– For each generation, chromosomes and their fitness are stored in the
knowledgebase. A DT is trained periodically based on the knowledgebase.
Using the DT, the offspring chromosomes that are likely to have bad fitness are
removed before the fitness evaluation step.
– Assumption
• Some features and parameter settings improve (or degrade) the model
performance.
• DT can find these rules.

9

The proposed method
• Chromosome Filtering (continued)
– Why DT?

Knowledgebase (sorted by fitness)

• Effectively deal with Categorical Features.
• Find Non-linear relationship.
• Use a few, relevant features in the classification
procedure.

– DT Training
• Each ci (i-th chromosome) in the knowledgebase is
labeled by
– first highest M fitness values Æ GOOD
(probable to yield a good fitness value)
– next highest M fitness values Æ NORMAL
– remaining Æ BAD
(probable to yield a bad fitness value)

c1
c2
c3
…
cM

GOOD

cM+1
cM+2
cM+3
…
c2M

NORMAL

c2M+1
c2M+2
c2M+3
…
…
…

BAD

• Input feature: chromosome (in phenotype)
• Output feature: label {GOOD, NORMAL, BAD}

- 75 -

10
The proposed method
• Chromosome Filtering (continued)
– Filtering
• A DT gives rules that assess a chromosome before fitness evaluation.
: Is a chromosome GOOD or NORMAL or BAD?
• Each chromosome has a different survival probability.
ex) GOOD: 1.0, NORMAL: 0.5, BAD: 0.2
• The DT is periodically updated, so the criteria of good chromosome changes
through the generations.

11

The proposed method
• Chromosome Filtering (continued)
– DT example

C>100

Contain
F1?

BAD

σ>1

GOOD

σ>0.25

BAD

Contain
F3?

GOOD

NORMAL

- 76 -

BAD

12
The proposed method
• Population Replacement: Steady state model
Å to verify the effectiveness of the proposed method in the initial period of GA.

– Only one chromosome in the population is updated in a generation.
– Replacement scheme [5, 6]: The offspring replaces one of its parents or the
lowest fitness chromosome in the population.
• If the offspring is superior to both parents, it replaces the similar parent.
• If it is in between the two parents, it replaces the inferior parent.
• otherwise, the most inferior chromosome in the population is replaced.

13

Experiments
• Experimental Design
–
–
–
–

10 datasets from UCI repository, all datasets were normalized to be in [-1,1].
5 independent runs, a random seed set was used for fairness.
In SVM training, 10-fold cross validation was used.
Parameter Settings
• GA parameters
–
–
–
–
–

population size Npop = 30
crossover probability pc = 0.9
mutation probability pm = 0.05
max iteration = 300
pgood=1; pnormal=0.5; pbad=0.2

• DT parameters
– CART
– Labeling: good=10, normal=10, bad=remaining
– Training starting point: 30th generation / period=10

- 77 -

14
Experiments
• Results
– Maximum fitness in the population
Datasets

#data
#feature

#
class

50th generation

100th generation

200th generation

GA

GA+DT

GA

GA+DT

GA

GA+DT

Iris

150 4

3

97.067

97.067

97.333

97.200

97.333

97.333

Wine

178 13

3

98.876

98.539

99.213

99.101

99.551

99.438

Sonar

208 60

2

87.596

87.596

88.462

90.769

91.731

92.788

Glass

214 9

6

71.682

71.963

72.336

73.364

73.645

74.112

Ionosphere

351 34

2

93.333

94.017

94.131

94.758

95.100

95.499

BreastCancer

683 9

2

97.160

97.160

97.247

97.277

97.306

97.365

Vehicle

846 18

4

81.773

81.939

82.648

83.948

85.201

85.721

Vowel

990 10

11

96.990

97.253

98.000

99.030

99.051

99.434

Yeast

1484 8

10

57.318

57.547

58.814

59.111

59.690

60.000

Segment

2310 19

7

96.736

97.004

97.212

97.411

97.740

97.818

15

Experiments
• Results
– Maximum fitness in the population
97.3

74

99.5

97.2
97.1

92
72

99

90

98.5

88

97
96.9
96.8

50

100 150 200 250 300

50

CV accuracy (%)

iris

100 150 200 250 300

86

70
68
50

wine

100 150 200 250 300

100 150 200 250 300

glass

86

95.5

97.3

95

98
84

97.2

94.5
94
93.5

96

97.1
97

93

50

sonar

50

100 150 200 250 300

96.9

82

94

80
50

ionosphere

100 150 200 250 300

breastcancer

92
50

100 150 200 250 300

vehicle

50

100 150 200 250 300

vowel

98

60

97.5
58
97
56
54

96.5

50

100 150 200 250 300

yeast

96

GA+DT
GA
50

100 150 200 250 300

segment

- 78 # generation

16
Concluding Remarks
We presented a chromosome filtering method for GA-based feature selection
and parameter optimization of SVM.

The proposed method employed a DT as a chromosome filter to remove the
offspring chromosomes that are likely to have bad fitness before the fitness
evaluation step of GA.
On most datasets, the proposed method showed faster improvement of fitness
than standard GA.

17

Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF)
grant funded by the Korea government (MSIP) (No. 2011-0030814), and the
Brain Korea 21 Program for Leading Universities & Students. This work was
also supported by the Engineering Research Institute of SNU.

- 79 -

18
References
1.

2.

3.
4.

5.
6.

Frohlich, H., Chapelle, O., & Scholkopf, B. (2003, November). Feature selection for support vector
machines by means of genetic algorithm. In Tools with Artificial Intelligence, 2003. Proceedings. 15th IEEE
International Conference on (pp. 142-148). IEEE.
Huang, C. L., & Wang, C. J. (2006). A GA-based feature selection and parameters optimization for support
vector machines. Expert Systems with applications, 31(2), 231-240.
Min, S. H., Lee, J., & Han, I. (2006). Hybrid genetic algorithms and support vector machines for bankruptcy
prediction. Expert Systems with Applications,31(3), 652-660.
Zhao, M., Fu, C., Ji, L., Tang, K., & Zhou, M. (2011). Feature selection and parameter optimization for
support vector machines: A new approach based on genetic algorithm with feature chromosomes. Expert
Systems with Applications,38(5), 5197-5204.
Bui, T. N., & Moon, B. R. (1996). Genetic algorithm and graph partitioning.Computers, IEEE Transactions
on, 45(7), 841-855.
Oh, I. S., Lee, J. S., & Moon, B. R. (2004). Hybrid genetic algorithms for feature selection. Pattern Analysis
and Machine Intelligence, IEEE Transactions on,26(11), 1424-1437.

19

End of Document

- 80 -

20
- 81 -
…

†
†
†
„
„

†

…

†
†

- 82 -
…

†
„
„
„

†
„

†
„

†

…

†
†
†
†
„

- 83 -
…

†
„
„
„
„

„
„
„
„

†

…

†
„
„

†
„
„

- 84 -
…

†
†

†
†

†

…

†
„

„

- 85 -
…

†
†

…

†

- 86 -
…

†

†

…

†

†

†

†

- 87 -
…

†

†
„
„

†

†

…

†

…

†
†

†

- 88 -
…

…

- 89 -
…

…

- 90 -
…

…

- 91 -
…

†
†

…

†
„

†

†

†
- 92 -
…

…

†
„

†
„
„
„

- 93 -
…

†
†

†

…

†

†
†

- 94 -
…

…

…

…

…

…

…

- 95 -
…

- 96 -
…

…

- 97 -
…

…

- 98 -
…

…

…

- 99 -
- 100 -
1: Text Mining Applications

- 101 -
- 102 -
- 103 -
Î

- 104 -
•
•

- 105 -
- 106 -
Æ

Æ

- 107 -
Æ

Æ

Æ

- 108 -
Æ

Æ

- 109 -
- 110 -
Æ

- 111 -
- 112 -
- 113 -
Æ
Æ

•
•
•

- 114 -
•

Æ
Æ

•
•
•

- 115 -
•
•

Æ

•

- 116 -
•

Æ

- 117 -
- 118 -
•
•

- 119 -
•

Text Mining

9

Data Mining

Text Mining

9

9

•

9

9

- 120 -
•
9

9

9

•
9

9

9

- 121 -
•

•

- 122 -
•
9
9

9

•
9

9

9

9
9

- 123 -
•

9

•
9

9
9
9

- 124 -
•
9

9

9

9

•
9

9

- 125 -
•

•
9
9

9

9

- 126 -
•

9
9
9
9

•

9
9
9
9
9

- 127 -
•

9
9
9
9

•

9

9

- 128 -
•
9
9
9
9
9

•

- 129 -
•

•
9
9

9

- 130 -
•

•

- 131 -
•

•
9

9

9

- 132 -
•
9
9

•

•

•
•
•

- 133 -
- 134 -
Document Indexing by Ensemble Model
Yanshan Wang and In-Chan Choi
Korea University
System Optimization Lab
yansh.wang@gmail.com

November 25, 2013

Yanshan Wang and In-Chan Choi (KU)

Indexing by EnM

November 25, 2013

1 / 18

November 25, 2013

2 / 18

Overview

1

The Basics
Information Retrieval and Document Indexing
Topic Modelling
Indexing by Latent Dirichlet Allocation

2

Indexing by Ensemble Model
Introduction to Ensemble Model
Algorithms
Experimental Results

3

Conclusions and Discussion

Yanshan Wang and In-Chan Choi (KU)

- 135 EnM
Indexing by-
The problem in Information Retrieval

As more information (Big
Data) becomes available, it is
more difficult to access what
users are looking for.
We need new tools to help us
understand and search among
vast amounts of information.

Source: www.betaversion.org/ stefano/linotype/news/26/

Yanshan Wang and In-Chan Choi (KU)

Indexing by EnM

November 25, 2013

3 / 18

Document Indexing is Important

Users can get desired information by indexing (or ranking)
documents (or items). The higher position the document has, the
more valuable to users.
Yanshan Wang and In-Chan Choi (KU)

- 136 EnM
Indexing by-

November 25, 2013

4 / 18
Problems in Conventional Methods: Word
Representation

The majority of rule-based and statistical Natural Language
Processing (NLP) models regards words as atomic symbols.
In Vector Space Models (VSM), a word is represented by one 1
and a lot of zeros. For example,
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0]
Its problem:
motel [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0] AND
hotel [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0] =0
The conceptual meaning of words is ignored.

Yanshan Wang and In-Chan Choi (KU)

Indexing by EnM

November 25, 2013

5 / 18

Topic Modeling

Latent Dirichlet Allocation (LDA)
[Blei et al. (2003)].
Uncover the hidden topics that
generate the collection.
Words and Documents can be
represented according to those
topics.
Use the representation to organize,
index and search the text.

Yanshan Wang and In-Chan Choi (KU)

- 137 EnM
Indexing by-

⎡
⎢
⎢
⎢
⎢
⎢
apple = ⎢
⎢
⎢
⎢
⎢
⎣

0.325
0.792
0.214
0.107
0.109
0.612
0.314
0.245

November 25, 2013

⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦

6 / 18
LDA [Blei et al. (2003)]
E

D

1
2
3

T

]

Z

1

0

Choose the number of words N ∼ Poisson(ξ).
Choose θ ∼ Drichelet(α).
For n = 1, 2, ..., N
Choose a topic zn ∼ Multinomial(θ);
Choose a word wn ∼ Multinomial(wn |zn , β), a multinomial
distribution conditioned on the topic zn .

Joint Distribution: p(θ, z, d|α, β) = p(θ|α)
Yanshan Wang and In-Chan Choi (KU)

N
n=1

p(zn |θ)p(wn |zn , β)

Indexing by EnM

November 25, 2013

7 / 18

Indexing by LDA (LDI) [Choi and Lee (2010)]
With adequate assumptions, the probability of a word wj
embodying the concept z k is
βjk
Wjk = p(z k = 1|wj = 1) = K
h=1 βjh
The document (or query) probability can be defined within the
topic space
V
k
j=1 Wj nij
k
k
,
Di (Qi ) =
Ndi
where nij denotes the number of occurrence of word wj in
document di and Ndi denotes the number of words in the
document di , i.e. Ndi = V nij .
j=1
Similarity between document and query
ρ(D, Q) = D · Q
where D · Q =

D
D

Yanshan Wang and In-Chan Choi (KU)

,

Q
Q

.
- 138 EnM
Indexing by-

November 25, 2013

8 / 18
Indexing by Ensemble Model (EnM)
[Wang et al. (2013)]
Motivation: There exit optimal weights over constituent models.
Table: A toy example. The values in the table represent similarities of
documents with respect to a given query. The scores of Ensemble 1 and
2 are defined by 0.5*Model 1+0.5*Model 2 and 0.7*Model 1+0.3*Model
2, respectively. The relevant document list is assumed to be {2,3}.

Document 1
Document 2
Document 3
(M)AP

Model 1
0.35
0.4
0.25
0.72

Yanshan Wang and In-Chan Choi (KU)

Model 2
0.2
0.1
0.7
0.72

Indexing by EnM

Ensemble 1
0.55
0.5
0.95
0.72

Ensemble 2
0.305
0.31
0.385
0.89

November 25, 2013

9 / 18

AP and MAP
Average Precision (AP) and Mean Average Precision (MAP)
Notation
|Q|
|Di |
dij ∈ Di
φki
R(dij , φki )
H=

αk φk

the number of queries in the query set;
the number of documents in the relevant document
set w.r.t. the ith query;
the jth document in Di ;
the relevant score returned by kth model w.r.t. ith
query;
the indexing position of the jth document for the ith
query returned by the kth model;
the ensemble model, a linear combination of the constituent models, where αk ≥ 0.

Definition
1
E(H, Q) ==
|Q|
Yanshan Wang and In-Chan Choi (KU)

|Q|

1
AP (H, Di ), AP (H, Di ) =
|Di |
i=1
- 139 EnM
Indexing by-

|Di |
j=1

j
R(dij , H)

November 25, 2013

.

10 / 18
Formulation
Formulation of the Optimization Problem
Since 0 ≤ AP ≤ 1, we can define the empirical loss as follows:
|Q|

(1 − AP (H, Di )), or

min
i=1
|Q|

1
(1 − i
min
|D |
i=1

|Di |
j=1

j
R(dij , H)

).

Our goal is to uncover optimal weights α’s that minimize the
empirical loss.

Difficulty
The position function R(dij , H) is nonconvex, nondifferentiable and
noncontinuous w.r.t. α’s.

Yanshan Wang and In-Chan Choi (KU)

Indexing by EnM

November 25, 2013

11 / 18

Boosting Scheme
1

Select model:
'

|Q|

φˆ = arg max
j
j

2

TXHU









i=1





Di AP (φji );







Update the weight:

where δj =
3

1
2

=

log

t
αˆ
j

/RVV

'

+

|Q|
i=1
|Q|
i=1







t
δˆ,
j









t
αˆ
j

M





M
/RVV

Di (1+AP (φji ))
Di (1−AP (φji ))

EDG
TXHU

;

Update distribution on queries:

'












M
/RVV

exp(−AP (Hi ))
,
Di =
Z







where Z is a normalizer.
Yanshan Wang and In-Chan Choi (KU)

- 140 EnM
Indexing by-

November 25, 2013

12 / 18
Coordinate Descent

Since the objective is nonconvex, not each
coordinate will reduce the loss.
Select model:

1

φˆ = arg max E(Q, φj );
j
j

Update the weight:

2

D N 

1 + AP (φji )
1
;
αj = log
2
1 − AP (φji )
If

3

Et

≤

E t−1 ,

DN

delete this coordinate.

Yanshan Wang and In-Chan Choi (KU)

Indexing by EnM

D N 

November 25, 2013

13 / 18

Parallel Coordinate Descent

The coordinate descent algorithm can be parallelized on cores.

1:
2:
3:
4:

parfor p = 1, 2, ..., Kφ do
Update the weights using αp =
end parfor
return Ensemble model H.

Yanshan Wang and In-Chan Choi (KU)

1
2

log

1+AP (φpi )
;
1−AP (φpi )

- 141 EnM
Indexing by-

November 25, 2013

14 / 18
Experimental Results on EnM
Data: MED corpus1 .
1033 documents from the National Library of Medicine.
30 queries.

Results.
1

TFIDF
LSA
pLSI
LDI
EnM

0.9
0.8

Method
TFIDF
LSI
pLSI
LDI
EnM.B
EnM.CD
EnM.PCD

MAP
0.4605
0.5026
0.5334
0.5738
0.6420
0.6461
0.6414

improvement (%)

0.6
0.5
0.4
0.3

9.1
15.8
24.6
39.4
40.3
39.3

0.2
0.1
0

1: ftp://ftp.cs.cornell.edu/pub/smart.
Yanshan Wang and In-Chan Choi (KU)

0.7

Precision

Table: MAP of various methods for
MED corpus.

0.1

0.2

0.3

0.4

0.5
Recall

0.6

0.7

0.8

0.9

1

Figure: Precision-Recall Curves for
various methods.

Indexing by EnM

November 25, 2013

15 / 18

Conclusions and Discussion
Conclusion
An ensemble model (EnM) is proposed and three algorithms are
introduced for solving the optimization problem.
The EnM outperformed any basis models through the overall recall
regimes.

Discussion
The algorithms cannot guarantee to converge to the global optimum
due to the nonconvexity of objective.
The parallel coordinate descent algorithm cannot guarantee the
optimum, even local optimum, due to the coupling between
variables.

Future Works
Approximate the objective with convex functions.
Using stochastic gradient descent for stochastic sequences and
large-scale data sets.

Yanshan Wang and In-Chan Choi (KU)

- 142 EnM
Indexing by-

November 25, 2013

16 / 18
References
Yanshan Wang and In-Chan Choi(2013)
Indexing by ensemble model
Working Paper. arXiv preprint arXiv:1309.3421.
David M, Blei, Andrew Y, Ng and Micheal I, Jordan (2003)
Latent dirichlet allocation
the Journal of machine Learning research, 3, 993-1022.
In-Chan Choi and Jae-Sung Lee (2010)
Document indexing by latent dirichlet allocation
DMIN, 409-414.
Y. Freund and R. E. Schapire (1995)
A desicion-theoretic generalization of on-line learning and an application to
boosting
Computational Learning Theory, Springer, 23-37.
My Homepage: http://optlab.korea.ac.kr/~ sam/
Yanshan Wang and In-Chan Choi (KU)

Indexing by EnM

November 25, 2013

17 / 18

November 25, 2013

18 / 18

The End

Yanshan Wang and In-Chan Choi (KU)

- 143 EnM
Indexing by-
- 144 -
,

ƒ Mobile device

,

,

,

computing

• Context-aware

- 145 -

Page 2
Communi
cation,
web
history

,

App

(

)

As is

Page 3

ƒ
• Context-aware
•

ƒ

:

/

ƒ
•

Î

•

–
•
•

:
:

ƒ
•
•

- 146 -

Page 4
–

10:00

39

12:00

3

13:00
15:00
16:00
18:00

?

18:30
?

39

?

?

?

?

Page 5

:
ƒ
•

(

,

)

ƒ
•

A

,
5511

.

- 147 -

Page 6
:
ƒ
•

(

,

,

)

ƒ
•

B

1
.
, eTL

..

Page 7

ƒ
• 2
•
,

4

•
•

/

•
•
• GPS:
•
:
•
:
•
:

/

•
•

/

/

•
•

,
•

•

,

•
•

:
On/Off:

- 148 -

Page 8
–
/

1.
2.
3.
4.
5.
6.

5.

/

1.

/
6-1.

/

(
)

/
SNS(

,
)

TV

2.

/

(

,

)

6-2.
(

/

)

/
/
3.

/

/
/

7.

/

/

8.

4.
Page 9

ƒ

2
• 1
• 2

: 10
: 50

/ 2012
/ 2012

9 ~10
11 ~12

ƒ
•
•
•

(OS 2.3
:
:1

2

4

,

)
10~30

ƒ
•
•

5Mbytes

- 149 -

Page 10
ƒ 47
•

20
•

,

,

,

,

•

cross validation 40~50%
•

1/3

/

•

•

/

Î

/

-

9562
/

/

-

(
-

2388

-

3350
2908

/

(

2012

-

906

-

/

/

/

)-

/

)-SNS(

)

229

)(

210

)

-

/

625

/

-

/

(
/

/

)-

610

(

/

)-

525

(

)-

(

/

)-

66
62

-

/

-

113

/

57

-

422

-

106

-

408

-

104

-

341

-

)/

50
40
32

(

/

/

-

/

74

71

/

(

118
)-

)-

71

(

-

142

-

/

-

121

(

91
88

146

-

93

-

/

145

/

97
)-

-

227
,

/

/

160

/

692

-

(

248

-

717

(

/

-

331

/

-

744

-

332
)-

(

766

-

/

98

)-TV

25
)-

17

Page 11

–
Location

37.6

37.5

37.4

37.3

37.2
latitude

(

37.1

37

36.9

36.8

36.7
126.4

126.5

126.6

126.7

126.8

126.9
longtitude

- 150 -

127

127.1

127.2

127.3

127.4

Page 12
ƒ
ƒ

:
1.
•
•

API
Î

2.
•

3.
•
•
Sensors
GPS
trajectory

Date/Time

Misc.
context

Page 13

–
ƒ

(
•

,

)

Î

1)
•
•
•

(merging)
50

518

- 151 -

Page 14
–
ƒ
ƒ

:

•

•

•

•

,

•

•

•

•

) DBSCAN

•

CHAMELEON

•

Multivariate Gaussian, Gaussian mixture, kernel density estimation, …

trajectory clustering

•

(CB-SMoT)

,

Page 15

– HMM  Adaboost
ƒ
• Hidden Markov model
•

/

/

•
•
•



ƒ

API

(

15~20%

)

Î
• AdaBoost
•
•

•

AdaBoost

precision
•

: 80%,

: 74.6%,

: 64.7%

- 152 -

Page 16
– Multiple instance learning
ƒ Multiple instance learning*
• Supervised learning
•
•

,
,

•

ƒ MIL
• MIL
•
•

?

,

•

:

10~20

Page 17

– Multiple instance learning
ƒ mi-SVM*

• Multiple instance learning
•

SVM

ƒ
•
•
• Chunk

:

•

•

,

,

,

,

•
•

ƒ RBF

linear kernel

•
kernel
• RBF, linear kernel

:

63:37

- 153 -

Page 18
– Multiple instance learning
ƒ mi-SVM cross validation
Sensitivity
(Recall)

/

/
/

SVM

76.0%

5.3%

77.5%

43.8%

47.9%

36.0%

16.4%

70.0%

20.1%

77.0%

22.1%

80.0%

48.6%

67.3%

51.2%

50.5%

90.0%

72.1%

56.9%

10.0%

60.0%

0.0%

35.7%

10.0%

37.8%

90.0%

45.5%

87.6%

54.9%

100.0%

89.3%

93.6%

82.4%

73.3%

90.0%

90.3%

(Precision)

82.6%

34.0%

86.7%

73.9%

68.9%

50.3%

45.2%

91.7%

34.9%

(F-measure)

79.7%

26.8%

83.2%

58.6%

68.1%

50.8%

47.7%

90.8%

47.0%

•
•

ƒ
•
•
•

ƒ
• mi-SVM
•

Viterbi

•

52.2%
•

43.9% /

60.8%
Page 19

ƒ
•
•
•

(activity)

(behavior)

•

•
•
•

coverage
1.7%

•
•
•
(2

4

)

•

- 154 -

Page 20
ƒ
•

(Transfer learning)
•

(reusable)
•

•

(cold start

•

)

/
•
•

Noise/Error
,

•
•
•

•
•
•
•
•

(

)

,

Page 21

THANK YOU!

- 155 -

Page 22
- 156 -
2: Feature Selection
 Efficient Computing

- 157 -
- 158 -
Fused Lasso

1

Contents
Motivation
•
•

Introduction
•
•

Algorithm
•
•

Results
•
•
•

Conclusion
- 159 -

2
Motivation
ƒ
ƒ

3

- 160 -

4
5

•

•

•

Reference : http://ghanahealthnest.com/2013/04/23/ghana-revenue-authority-rolls-out-a-new-malaria-control-strategy/

- 161 -

6
Introduction
ƒ
ƒ

7

Fused Lasso

- 162 -

8
•
•

•
•

9

9

9

™

9

VMales1

MMales7

VMales14
VFmales1
•
•
•

MFmales7

VFmales14

- 163 -

10
9

9

VMales1

VMales1
Y/N

MMales7

MMales7
Y/N

VMales14

VMales14
Y/N

VFmales1

VFmales1
Y/N

MFmales7

MFmales7
Y/N

VFmales114

VFmales14
Y/N
11

9

9

9

VMales1
Y/N
MMales7
Y/N
VMales14
Y/N
VFmales1
Y/N

MFmales7
Y/N
VFmales14
Y/N

- 164 -

12
•

•
•

•

Suarez, Estrella, et al. Matrix-assisted laser desorption/ionization-mass spectrometry of cuticular lipid profiles can differentiate sex,
age, and mating status of i Anopheles gambiae/i mosquitoes. Analytica chimica acta706.1 (2011): 157-163.

Suarez, Estrella, et al. Matrix-assisted laser desorption/ionization-mass spectrometry of cuticular lipid profiles can differentiate sex,
age, and mating status of i Anopheles gambiae/i mosquitoes. Analytica chimica acta706.1 (2011): 157-163.

- 165 -

13

14
Suarez Estrella, al Matrix assisted
Suarez, Estrella et al. Matrix-assisted laser desorption/ionization-mass spectrometry of cuticular li id profiles can differentiate sex,
desorption/ionization
t
f ti l lipid
fil
diff
ti t
age, and mating status of i Anopheles gambiae/i mosquitoes. Analytica chimica acta706.1 (2011): 157-163.

15

•
•

•

•

Li, Lihua, et al. Data mining techniques for cancer detection using serum proteomic profiling. Artificial intelligence in medicine 32.2
(2004): 71-83.

- 166 -

16
1. Sparsity

2. Smoothness

3. Better Performance

ƒ

ƒ

ƒ

ƒ

ƒ

ƒ

17

Algorithm
ƒ
ƒ

- 167 -

18
(1)
ƒ
ƒ
ƒ

Tibshirani, Robert, et al. Sparsity and smoothness via the fused lasso.Journal of the Royal Statistical Society: Series B (Statistical
Methodology) 67.1 (2005): 91-108.
19

(1)

m/z
/z
654.9
655
655.1
655.2
655.3
655.4
655.5
655.6
655.7
655.8
655.9
656
656.1
656.2

F
Fused
L
Lasso
0.002
0.002
1.123
4.159
1.791
1.791
1.791
1.791
1.791
1.791
1.791
1.791
1.791
1.647

Lasso

m/z

0.000
0.000
0.662
3.169
0.002
0.000
0.000
1.300
0.000
0.000
0.143
0.001
1.495
0.149

656.3
656.4
656.5
656.6
656.7
656.8
656.9
657
657.1
657.2
657.3
657.4
657.5
657.6

Fused
Lasso
1.647
1.647
1.647
1.647
1.647
1.647
1.647
1.650
1.650
1.639
1.403
1.403
1.403
1.403

- 168 -

Lasso

m/z

0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.344
2.609
0.010
0.000
0.000
1.613

657.7
657.8
657.9
658
658.1
658.2
658.3
658.4
658.5
658.6
658.7
658.8
658.9
659

Fused
Lasso
1.403
1.403
1.403
1.403
1.403
1.403
0.252
0.252
0.252
0.252
0.252
0.252
0.252
0.252

Lasso
0.000
0.000
0.000
0.010
-0.015
0.225
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000

20
(1)
ƒ
ƒ

Tibshirani, Robert, et al. Sparsity and smoothness via the fused lasso.Journal of the Royal Statistical Society: Series B (Statistical
Methodology) 67.1 (2005): 91-108.
21

Liu, Jun, Lei Yuan, and Jieping Ye. An efficient algorithm for a class of fused lasso problems. Proceedings of the 16th ACM
SIGKDD international conference on Knowledge discovery and data mining. ACM, 2010.

- 169 -

22
(2)

Liu, Jun, Lei Yuan, and Jieping Ye. An efficient algorithm for a class of fused lasso problems. Proceedings of the 16th ACM
SIGKDD international conference on Knowledge discovery and data mining. ACM, 2010.

23

Results
ƒ
ƒ
ƒ

- 170 -

24
Performance
Comparison

Average
misclass. rate

Average
Selected features

25

Fused lasso coefficient abs

6

2

4
2

1

0

0

2000

4000

6000

8000

10000

12000

m/z
Fused lasso selected features intensity value

1

B
Intensity

Others
MFemale7

1.5

2nd principal component

Coefficient

A

0.5
0
-0.5

0.5

-1
0

0

2000

4000

6000

8000

10000

12000

-1.5

m/z

- 171 -

-1

-0.5

0

0.5
1
1st principal component

1.5

2

2.5

26
4.5
Fused lasso
Lasso

4
3.5

Coefficient

3
2.5
2
1.5
1
0.5
0

0

2000

4000

6000
m/z

8000

10000

12000

27

•

•

•

- 172 -

28
29

Conclusion

- 173 -

30
1. More accurate

ƒ

2. More appropriate

3. More sophisticated

ƒ

ƒ

31

Thank You
Where New Challenges Begin!

- 174 -

32
5346 879;=: A,@?B
$#%!  )*+',(
$
2
0
2 -$0%0. 1


 

9
C  2 0
D
E
#
E
!HG E
F
- 175 -

/
9
% 

0GG
$
#
%
%




9 
 
E

I

 J  E  
 J  E    D

 J  E    D
 H -KL.
 J  E    D
 H 
  
  -L.
 J  E    D
 H
2 
 
- 176 -

  
  
 GG 
I
CE
2


D 
  EG
M
$

 
 $  E 

CE
2

!HG
D
 
 E
2 
 
 
E
 N
2 E 

 



H EG
O//

9
 E
2
  
D E
 

CE 

 F


  E GG
D
E 

 G

 

$
P 0
D 


  


 EG G
Q
D

 

 
 G  E
M 
 

/D/ 
 
  
 EG 


S F

 
D 


H

 
 
D

  R 
 
 2
E E2
D
M 

% U
0GG


 

EG
H2 
  E  E

T !
 
 
D E



E
2
$ 0 1C0 -
 V.

- 177 -
% 
 E
D
D
2  
 W 
 2  
 W 
$
D G


F G
E

X
 G
2  
  

W J
F 
 E  

    
 E
0 J
C G 
 
 2
 
 
 

- 178 -
4/ U G
G


U
EG
V G
V G  
 -G .

$  
 


/ /      

5/ $
G


U
EG
E
E
G 
 
  

G
/ /   ! ! # $%  '
D
 
 E 
  HG
D
E
C  2 0
D
E

- 179 -
C  2 0
D
E

 
  -  
 .
C 
  
 2
%E EG
H2
W E

$ / Y 

EE 


5336


 
 
 2
EG
EG
E 

%E
  

O 2 G 
 
  
 2



5336

W 
%
D G
D2 
 E 
 
 E

  
 EG   90
EG   90

9%#WZ

5336

0
G
D2 
 E 

90

90
U 
 

 GG
(

 
9
 G 
 

 )  +* ,

.-/

01!3 2
!456
$%  ' 


 7 )'  
 

    8 4)6
!
 
 !
# 

    
$
 7 )  !49 
 
%
    !46
' 
 




- 180 -

 
D
G 



 D
J
Z
:
J
Z
;
R 9%#WZ
C
0E  









N  GG




  N E 
  90

[
D G
D2 
 E 
4/ 9


 2
D =

5/ 9


 2
D ?= 
 :

$  
 


G
EG  
 G  E 

 

#
  

0
E
N  
 
 E   -

- 181 -

2 533.
Z
F
F
2 E
D

   




 - .
 

 @

  A

D+EFG

B  .-/

D H D D I J +EGG

C

?
K   
M
LN
O  
M
L

0
G  
   

 


 
 G 


EE F
 




 2  HEE F
F
F
2
2 E
D

-EFF.




 @

  A
EFF
J
V # OT  OW


 #  .-/ #OT  U I
P
GQ RS

GX C

 #  .-/ # OT  U Z
Y
GQ RS

J
V # OT  OW

GX C

#[
#
# [   ]^_ ]`0  [  bca

0

d `^

 
 [ 

d ` d ^
FF G  
   



 
   

- 182 -

 

G
2
W   2  -W.
EG
2 V
d

 UW Oe  Of
Od g  UW  Oe  Of
Od UW  UW  h Oi 'UW 
ije

GG O  



 -O.


 
D  G
2G G
E HEM  
 E D

J
U
q f H r s5 s
mno p
p
tu vu T q r w[T  H x  J I sT
.kl

]Z

  W D
 
 -]WW.

W
ZG  E  E 

%
 E EE
D
  
 
 

- 183 -
!HG E
F
D
C   

C  C  G
 D
C   

c

 

c  
 



c
0
D2

de

55d

5f

$E ^D 

a

43d

5

d3

533

5

4f6

433

5

G
^D 

1
D





0
D2J  2  
$E ^D 
 J 
 
  GE   
 
D D 
  N 
G
1
D

^D 
 J 
 
  G
X
 
  
 
D D 
  N 

J
D

aZ
b a  G 

_ 0
D2 $E ^D 
  G
^D 
 E [ 9  G2
_ 1
D 
  E
2 ` 
 
  G2

- 184 -
F
0
D2
C
c
-0  g.



O

]WW

de

d6/d3 -/5f.

h/f3 -ee/45.

ae/e4 -ee/af.

3/34

h

hf/5a -hh/de.

ed/4h -433/33.

ef/3h -433/33.

3/3a

46

h6/f -h/e6.

ea/d -433/33.

he/53 -433/33.

3

Wi0

Wi0

Wi0

4a

df/63 -4/46.

df/3f -5/eh.

d6/6f -f/h4.

53

4/h5 -ha/5d.

f/eh -e/a4.

da/hh -eh/dd.

43

dh/3h -e/6d.

df/6d -hd/f4.

ah/f5 -e3/63.

a

dh/ef -d/54.

de/35 -h/6a.

dh/de -6/66.

53

4/hh -h4/5e.

a/dh -e6/63.

da/3d -ed/fe.

43

d6/a -de/h6.

d4/53 -3/ea.

ah/a4 -d/4a.

a

0
W

d4/3d -dd/43.

aa/e -d5/aa.

a/a -d4/6e.

 

90

3/34

i
9%#WZ

3/3a



0
D2
55d

EFF j 9C

EFF j 9k

90l mn prqtos uv xw {zys| ~} € ƒ‚„ y…†
Š‡ˆ‰ ‹rl Œ€ Ž €  8 Š‡ˆs ’“‘ ;=‰ rl ”€ •–s|L 90‘
˜—8 pr™„ qš› r œ‘ ‚„ lŸž œmž  ¡¢

F
$E 
C
0
c
-0  g.



W

O

]WW
ha/de -e/da.

6

e4/5a -ef/ff.

4/fa -4/65.

hd/35 -ed/d.

5f

e3/6 -e/d.

d/da -ed/af.

6/df -h5/3.

a6

h/a4 -eh/d5.

a/3a -433/33.

h3/6 -h3/e5.

3/34

5f

e4/64 -e/e3.

6/e4 -ed/fd.

f/ad -ha/55.

3/3a

a6

he/ha -eh/5.

6/f4 -433/33.

d/3a -hh/e4.

d

ea/5 -ea/f.

dh/5 -6/5a.

h6/ah -433/33.

a3

e5/f -eh/h.

3/ha -433.

6/ad -h3/hh.

53

ea/3f -ed/ha.

5/ha -e6/ff.

h6/a5 -d/fa.

a

ea/5e -ed/54.

d/4d -5/.

ha/e4 -ed/46.

a3

e3/a -eh/f.

f/h5 -433/33.

6/5a -hd/d5.

53

ef/66 -e/ff.

4/3 -e6/e3.

d/53 -h5/6f.

a

$E 
43d

/e4 -h5/6e.

de/ha -4/d6.

3/3a

9%#WZ

5/66 -433/33.

e5/ha -ef/63.

3/34



e5/4a -eh/ea.

f

3/3a

90

a

3/34

e6/46 -ed/a5.

dh/ed -5/e.

d/53 -h5/6f.



EFF j 9C

EFF j 9k

£ {zym l¤ € ¥(¦ §‘ prqto m¨Ÿ© ƒ‚‰ ª‘ «Ÿ ¬­
®¯ ±°² prqto€ mn prqto( u³vL ´ œ­
®¯ ±°² ¶µ ¸ž·¹s º» mn ¥(¦ L¼†

- 185 -

qt› r œ‘ ‰ rl Œ­
F
G
C
0
½
 
c
-0  g.



O

]WW

d3

 

W
e5/d -ee/he.

h3/33 -433/33.

e/ -eh/dh.

90

3/34i3/3a

f

e6/hd -e/34.

h4/4e -hf/d3.

e/e3 -eh/hd.

i
9%#WZ

3/334

4f

ed/ff -ee/43.

hd/f -e/d5.

eh/56 -eh/ee.

3/33a

5f

ea/3a -ee/.

hf/eh -433/33.

eh/6f -ee/43.

h

ea/fa -ee/3e.

ha/45 -ef/4d.

eh/f6 -eh/e3.

53

e/56 -ee/6h.

h4/d5 -433/33.

e/he -eh/6.

43

ed/5f -ee/54.

h/4 -ed/h6.

e/af -eh/a6.

a

ed/3f -ee/53.

h3/3a -h/d6.

eh/ad -eh/e3.

53

e/56 -ee/5.

hf/33 -433/33.

eh/6a -ee/46.

43

ea/f -ee/6f.

h/af -ed/6f.

eh/5f -eh/ef.

a

e6/hh -eh/3.

h5/e -he/d3.

eh/f6 -eh/ha.


G
433
EFF j 9C

EFF j 9k

£ {zym EFFZ9C ¾o l¤ € ¥(¦ ¿
“Ÿ© mn ¾o™L uÀ8 rÁ‰ ƒ‚„  Â
à ;=s|L  prqto‰ Ä(ƒ ÅÆ

F
1
D
C
0
-0  g.



O

]WW

4f6

e6/56 -ee/33.

h/h4 -433/33.

e/f3 -eh/fe.

3/34i3/3a

d

e6/h4 -eh/he.

ef/5 -ef/ha.

eh/a3 -ee/5.

3/34

a

e5/d3 -eh/hh.

e6/f -ea/ef.

ea/h5 -ed/4d.

3/3a

a

e4/d3 -eh/6a.

e4 -ea/aa.

e6/af -ef/3d.

3/34

f

e3/h4 -e/h3.

e3/d -e5/a4.

he/h -e3/4h.

3/3a





c


W

h

ef/34 -eh/h3.

hh/5 -ed/da.

ea/fh -ea/ee.

4

ed/33 -ee/.

he/d -ee/df.

e/h6 -eh/5.

43

ea/36 -ee/33.

he/5 -ed/h4.

ed/h -e/f4.

a

e6/34 -e/h5.

e3/h -ef/65.

e6/h3 -ef/5.

43

e4/h4 -eh/f.

he -e6/af.

ef/e -ed/5a.

a

e4/33 -ed/ee.

e3/h -e5/6a.

e4/6e -e4/d3.

 

90

1
D 
433





9%#WZ


EFF j 9C

EFF j 9k

W {zy¦ ÇÈ› É‘ l l¤ € ¥(¦  ž mn Ê {zys|‘  ¾o Ë†
W {zys|L ‹r¦ žÌ8mÍ  ¾oL •K € ¥(» › r œ­
“Ÿ© mn ¾o™L uÀ8 rÁ‰ ƒ‚„  Â
à ;=s|L ®¯ ±°² prqto‰ Ä(ƒ ÅÆ

- 186 -
F
$E  -E
EG
M .
C
c
-0  g.



W

O

]WW

a

h/54 -e4/a3.

d/f6 -433/33.

ed/44 -e/h3.

3/34i3/3a

6

e6/5 -e/66.

h5/h -hf/3.

eh/3e -eh/hh.

3/334

e

ef/33 -eh/33.

h/ha -e6/44.

ea/66 -e/4e.

3/33a

45

ea/da -ee/45.

3/36 -e/55.

ea/ae -ed/6a.

3/334

e

ef/44 -eh/da.

h/h5 -e6/e.

ea/6a -ed/ef.

3/33a

0
4f

e3/h4 -ef/d.

3/6a -ea/d4.

e6/e -ed/6a.

f

4/de -h3/ee.

fe/55 -d4/e5.

e4/ad -ea/54.

43

hh/35 -e6/a3.

fh/hf -/66.

e4/a4 -ef/ef.

a

ha/da -e5/5h.

af/fa -dh/4f.

e3/e -e6/h3.

 

90

$E 
a3



9%#WZ


EFF j 9Ci9k

 m ”€ r‰ pr©„ Îy Ž“¡  9%#WZ‰ ωrÁ„ 4i43Ð p
;=‰ r¦ pr‰ r m Ñ ”Ò ©w   prqto mn prqtos uv ~}8 ¥(
ÐÓˆÔ -F . ¾o

F
G
C
0
½
 
 -E
EG
M .
c
-0  g.



W

O

]WW

d3

ha/e -e6/56.

h3/45 -433/33.

ed/4d -e/.

90

6

h/hh -ef/6d.

hd/6e -hh/h4.

ee/d4 -ee/h4.

i
9%#WZ
G
a3

3/34i3/3a
3/334



ef/h3 -ed/ae.

he/55 -eh/44.

eh/6h -ee/3.

3/33a

e

ed/f -eh/aa.

h6/d4 -eh/5.

e/e4 -eh/h4.



e6/d4 -ed/54.

e/ah -ef/64.

eh/ea -ee/f5.

43

e6/54 -eh/65.

hf/6e -eh/6.

eh/33 -eh/hf.

a

e5/f3 -e/hd.

e/4e -e3/33.

ed/e6 -e/fe.

43

ea/55 -eh/d.

h6/ah -eh/d4.

eh/h4 -ee/64.

a

e4/d4 -ea/d6.

h3/6 -he/5.

eh/ae -ee/4.


EFF j 9C

EFF j 9k

;=‰ ‹r¦ pr‰ r m ”Ò ©w  mn prqto™€ ƒ‚ Ö“Õ‘ ׄ  Ÿ©
 prqto€ Ø8 •–s| ÙÌ Ë8 ׄ  Â
Ú‰ $E  ;= ÜÛƒ œ‘ ¥( L¼
“Ÿ© Ý ;=s|‘ $E  ;=‘ Þ· mn prqto™‰ ƒ‚ àß p“Ÿ‘ ¬­

- 187 -
F
1
D
C
0
c

 -E
EG
M .
-0  g.



W

O

]WW

4f6

e3/h4 -ea/43.

hd/f4 -433/33.

e/3f -eh/55.

90



f

e5/f5 -e/55.

he/54 -e/a4.

ee/h5 -ee/e5.

i
9%#WZ
1
D 
a3

3/34i3/3a
3/34

6

ed/33 -ee/4e.

ef/hf -ed/65.

ed/4d -ed/33.

3/3a

4d

e5/h -e/eh.

ha/d6 -433/33.

e/da -eh/aa.

4a

ea/da -eh/h5.

he/4e -433/33.

ee/f6 -ee/6.

4a

e6/eh -eh/h.

hf/e -433/33.

eh/5d -eh/e6.

a

ea/h3 -eh/he.

e6/d4 -ea/f.

ed/33 -ed/ah.

4a

ed/33 -eh/hh.

e3/33 -433/33.

ed/4 -ed/e4.

a

e6/54 -e/fe.

hd/f4 -ed/64.

ea/de -ed/35.


EFF j 9C

EFF j 9k

;=‰ ‹r¦ pr‰ ‹r m Ñ ”Ò ©w  y?âá ©ã‰  ¾o( mn ¾o lj
ª‘ æåä r ç­
Ý ;=s|  ¾o l¤ € ¥(¦  è “Ÿ© mn prqto™L s •é“‘ ¥(
à ;=s|L  prqto‰ Ä(ƒ ÅÆ

!HG E
F
C   

- 188 -
C  C  G




  

C
$
$ 





c


 



c
d5

5

45d33

435

5



  
 -E   2

J 
 
  G 

43Z
.

  
 -E   2

 .


 b a  G 


_
$ 

F
5333

J
c  
 



- 189 -



 E ] 
 F D

ZE
F
$
- 190 -
EE 2
{z{ê‰ prqtoëÐ|‰ ®¯ ±°²
flŸ

 D
;= ì 5lŸ 

 ;=s ®¯ ±°² prqto íÈ

mî8 {zys ?8 mn prqto™(‰ u³{ê


 
mî8 ;= mî8 {zys ?“¡ mn prqto™( ®¯ ±°² prqto
„ ðïíëÐ u³ {ê“¡ ®¯ ±°² prqto‰ Ä( ÅÆ
 

E
2

mn prqto™( Þ· ®¯ ±°² prqto€ ªñ ?u ;= r‰ òós ‰8 +
ׄ ô õ­

F 
 

1E 
ö÷ pr‰ ‹r ì øù úLÐ vê“‘ û üs £ prqto ¾os| qt
ý prs ?8 þĀíÿ {ê ā—

kR0

- 191 -
0GG 
 H

W E

$ / Y 

EE 


5336


 
 
 2
EG
EG
E 

%E
  

O 2 G 
 
  
 2



5336

W 
%
D G
D2 
 E 
 
 E

  
 EG   90
EG   90

9%#WZ

5336

0
G
D2 
 E 

533d


 
 
 2
C 
  
 2 E 
D   G
D2 
 E 
$ 
 E
  
 2
C
D
 G  E G  
i
C
D
 E
E G  
i
90

$ 

9$ Z

C#

533i533h

5344


 
 
 2
%E 

  
 2 9EG
 
   
 
 $  
 EG
D -Ă  M V.
C
D
 G  E G  
i
C
D
 E
E G  
i
2
 
  
 EG  
 G  
 
 
   
 
 9$  
 EG
D
C
D
 G  E G  
i
C
D
 E
E G  
i
Z
E
D 
 E
2  0
D
E

- 192 -
C







#
$

)(
(





N  GG








   1! !46 

*+ , +* , .-/ $%  'y
yz|{

)  +* ,

.-/




01!|{!456

$%  '% }~




#
$
%

 7 )'% })~ 
 
   8 4)6
!
 
 ! )( 

% 

    
'
 7 'y !  
 y z !496 
 

  !46
80|{     !46
! 
!
-    7 'y
 y


 )   
 7 'y 8 4€6 
 
    !46

 

9%#WZ
C








N  GG

. 10/2)(
  (
  !46
3
)  +* , .-/ $%  '(




0{‚|{

   8 4)6
   !4)6

 
 
 !
 
 ! )( 

#


    
$
 7 'y !  
 y z !496 
 
%
  !46
'
#




. 10/2
80|{     !46

 
 
 !
 
 !




    


)   
 7 'y !  
 y z ) 8 1!4  )6

    !46

 

- 193 -
- 194 -
- 195 -
9
9
9
9

- 196 -
- 197 -
Data Step, Statistical Summary, Tables/Cubes, Covariance, Linear  Logistic
Regression, GLM, K-means clustering, …
- 198 -
- 199 -
- 200 -
- 201 -

Más contenido relacionado

La actualidad más candente

TERRIAN IDENTIFICATION USING CO-CLUSTERED MODEL OF THE SWARM INTELLEGENCE & S...
TERRIAN IDENTIFICATION USING CO-CLUSTERED MODEL OF THE SWARM INTELLEGENCE & S...TERRIAN IDENTIFICATION USING CO-CLUSTERED MODEL OF THE SWARM INTELLEGENCE & S...
TERRIAN IDENTIFICATION USING CO-CLUSTERED MODEL OF THE SWARM INTELLEGENCE & S...cscpconf
 
Data Hiding Method With High Embedding Capacity Character
Data Hiding Method With High Embedding Capacity CharacterData Hiding Method With High Embedding Capacity Character
Data Hiding Method With High Embedding Capacity CharacterCSCJournals
 
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSIONEFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSIONAM Publications,India
 
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...ijtsrd
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Xin-She Yang
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
B colouring
B colouringB colouring
B colouringxs76250
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
 
Integral Solutions of the Ternary Cubic Equation 3(x2+y2)-4xy+2(x+y+1)=972z3
Integral Solutions of the Ternary Cubic Equation 3(x2+y2)-4xy+2(x+y+1)=972z3Integral Solutions of the Ternary Cubic Equation 3(x2+y2)-4xy+2(x+y+1)=972z3
Integral Solutions of the Ternary Cubic Equation 3(x2+y2)-4xy+2(x+y+1)=972z3IRJET Journal
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2University of Salerno
 
17 manjula aakunuri final_paper--185-190
17 manjula aakunuri final_paper--185-19017 manjula aakunuri final_paper--185-190
17 manjula aakunuri final_paper--185-190Alexander Decker
 
Juha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the somJuha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the somArchiLab 7
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET Journal
 
Multi modal medical image fusion using weighted
Multi modal medical image fusion using weightedMulti modal medical image fusion using weighted
Multi modal medical image fusion using weightedeSAT Publishing House
 
Privacy preserving clustering on centralized data through scaling transf
Privacy preserving clustering on centralized data through scaling transfPrivacy preserving clustering on centralized data through scaling transf
Privacy preserving clustering on centralized data through scaling transfIAEME Publication
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesIRJET Journal
 

La actualidad más candente (20)

TERRIAN IDENTIFICATION USING CO-CLUSTERED MODEL OF THE SWARM INTELLEGENCE & S...
TERRIAN IDENTIFICATION USING CO-CLUSTERED MODEL OF THE SWARM INTELLEGENCE & S...TERRIAN IDENTIFICATION USING CO-CLUSTERED MODEL OF THE SWARM INTELLEGENCE & S...
TERRIAN IDENTIFICATION USING CO-CLUSTERED MODEL OF THE SWARM INTELLEGENCE & S...
 
Data Hiding Method With High Embedding Capacity Character
Data Hiding Method With High Embedding Capacity CharacterData Hiding Method With High Embedding Capacity Character
Data Hiding Method With High Embedding Capacity Character
 
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSIONEFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
EFFICIENT INDEX FOR A VERY LARGE DATASETS WITH HIGHER DIMENSION
 
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
 
A0360109
A0360109A0360109
A0360109
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
B colouring
B colouringB colouring
B colouring
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
 
Integral Solutions of the Ternary Cubic Equation 3(x2+y2)-4xy+2(x+y+1)=972z3
Integral Solutions of the Ternary Cubic Equation 3(x2+y2)-4xy+2(x+y+1)=972z3Integral Solutions of the Ternary Cubic Equation 3(x2+y2)-4xy+2(x+y+1)=972z3
Integral Solutions of the Ternary Cubic Equation 3(x2+y2)-4xy+2(x+y+1)=972z3
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2
 
17 manjula aakunuri final_paper--185-190
17 manjula aakunuri final_paper--185-19017 manjula aakunuri final_paper--185-190
17 manjula aakunuri final_paper--185-190
 
Juha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the somJuha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the som
 
Presentation1.1
Presentation1.1Presentation1.1
Presentation1.1
 
40120130406009
4012013040600940120130406009
40120130406009
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction Data
 
Multi modal medical image fusion using weighted
Multi modal medical image fusion using weightedMulti modal medical image fusion using weighted
Multi modal medical image fusion using weighted
 
Privacy preserving clustering on centralized data through scaling transf
Privacy preserving clustering on centralized data through scaling transfPrivacy preserving clustering on centralized data through scaling transf
Privacy preserving clustering on centralized data through scaling transf
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
 
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesFeature Subset Selection for High Dimensional Data Using Clustering Techniques
Feature Subset Selection for High Dimensional Data Using Clustering Techniques
 

Similar a 2013추계학술대회 인쇄용2

2013추계학술대회 인쇄용
2013추계학술대회 인쇄용2013추계학술대회 인쇄용
2013추계학술대회 인쇄용Byung Kook Ha
 
Network analysis lecture
Network analysis lectureNetwork analysis lecture
Network analysis lectureSara-Jayne Terp
 
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringAbility Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringKamleshKumar394
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clusteringIAEME Publication
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clusteringprjpublications
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...André Panisson
 
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELD
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELDFINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELD
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELDijesajournal
 
Fractal Image Compression By Range Block Classification
Fractal Image Compression By Range Block ClassificationFractal Image Compression By Range Block Classification
Fractal Image Compression By Range Block ClassificationIRJET Journal
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporaryprjpublications
 
Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932Editor IJARCET
 
Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932Editor IJARCET
 
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructurSubgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructurIAEME Publication
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Dataidescitation
 
Textural Feature Extraction of Natural Objects for Image Classification
Textural Feature Extraction of Natural Objects for Image ClassificationTextural Feature Extraction of Natural Objects for Image Classification
Textural Feature Extraction of Natural Objects for Image ClassificationCSCJournals
 
Autonomous experimental phase diagram acquisition
Autonomous experimental phase diagram acquisitionAutonomous experimental phase diagram acquisition
Autonomous experimental phase diagram acquisitionaimsnist
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringIJERD Editor
 
On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...IJAAS Team
 

Similar a 2013추계학술대회 인쇄용2 (20)

2013추계학술대회 인쇄용
2013추계학술대회 인쇄용2013추계학술대회 인쇄용
2013추계학술대회 인쇄용
 
Network analysis lecture
Network analysis lectureNetwork analysis lecture
Network analysis lecture
 
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringAbility Study of Proximity Measure for Big Data Mining Context on Clustering
Ability Study of Proximity Measure for Big Data Mining Context on Clustering
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...Exploring temporal graph data with Python: 
a study on tensor decomposition o...
Exploring temporal graph data with Python: 
a study on tensor decomposition o...
 
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELD
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELDFINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELD
FINGERPRINT CLASSIFICATION BASED ON ORIENTATION FIELD
 
Fractal Image Compression By Range Block Classification
Fractal Image Compression By Range Block ClassificationFractal Image Compression By Range Block Classification
Fractal Image Compression By Range Block Classification
 
John McGaughey - Towards integrated interpretation
John McGaughey - Towards integrated interpretationJohn McGaughey - Towards integrated interpretation
John McGaughey - Towards integrated interpretation
 
SISAP17
SISAP17SISAP17
SISAP17
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporary
 
Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932
 
Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932
 
Subgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructurSubgraph relative frequency approach for extracting interesting substructur
Subgraph relative frequency approach for extracting interesting substructur
 
K-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log DataK-means Clustering Method for the Analysis of Log Data
K-means Clustering Method for the Analysis of Log Data
 
Masters Thesis Defense Presentation
Masters Thesis Defense PresentationMasters Thesis Defense Presentation
Masters Thesis Defense Presentation
 
Textural Feature Extraction of Natural Objects for Image Classification
Textural Feature Extraction of Natural Objects for Image ClassificationTextural Feature Extraction of Natural Objects for Image Classification
Textural Feature Extraction of Natural Objects for Image Classification
 
Autonomous experimental phase diagram acquisition
Autonomous experimental phase diagram acquisitionAutonomous experimental phase diagram acquisition
Autonomous experimental phase diagram acquisition
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 
On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...On the High Dimentional Information Processing in Quaternionic Domain and its...
On the High Dimentional Information Processing in Quaternionic Domain and its...
 

Más de Byung Kook Ha

Más de Byung Kook Ha (12)

기말수정2013 2
기말수정2013 2기말수정2013 2
기말수정2013 2
 
시험 접수 방법
시험 접수 방법시험 접수 방법
시험 접수 방법
 
3회 해설
3회 해설3회 해설
3회 해설
 
2회 해설
2회 해설2회 해설
2회 해설
 
3회mos
3회mos3회mos
3회mos
 
1회 해설
1회 해설1회 해설
1회 해설
 
mos ex 2
mos ex 2mos ex 2
mos ex 2
 
mos 1회 실전모의고사
mos 1회  실전모의고사mos 1회  실전모의고사
mos 1회 실전모의고사
 
Visual analysis with tableau
Visual analysis with tableauVisual analysis with tableau
Visual analysis with tableau
 
Xe보드설치
Xe보드설치Xe보드설치
Xe보드설치
 
데이터베이스의 이해
데이터베이스의 이해데이터베이스의 이해
데이터베이스의 이해
 
회사통강의안(액세스2010)
회사통강의안(액세스2010)회사통강의안(액세스2010)
회사통강의안(액세스2010)
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

2013추계학술대회 인쇄용2