Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Próxima SlideShare
Cargando en…5
×

# 집단지성프로그래밍 - 6장 문서 필터링

602 visualizaciones

Publicado el

세미나 자료

Publicado en: Tecnología
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Inicia sesión para ver los comentarios

### 집단지성프로그래밍 - 6장 문서 필터링

1. 1. 문서 필터링 집단지성 프로그래밍 Ch.6 허윤
2. 2. Document Filtering  Filtering == Classification Problem Data Mining Problem EstimationClassification Predication Clustering Description Affinity Grouping  Document? A set of feature -> text document, image, etc. p( document ) = ?
3. 3. Spam Filtering  Binary Classification Problem ‘Spam’ or ‘Ham’  Techniques Naïve Bayesian Classifier Support Vector Machine Decision Tree  Rule vs. Model pros and cons
4. 4. Spam Filtering in Practice Referred at: Sahil Puri1 et al, “COMPARISON AND ANALYSIS OF SPAM DETECTION ALGORITHMS”, 2013, IJAIEM
5. 5. Referred at: Rene, “New insights into Gmail’s spam filtering”, 2012, emailmarketingtipps.de
6. 6. Naïve Bayesian Classifier  Bayes Theorem  Naïve? Bayesian Theorem with string independence assumption  Classifier ignore evidence term Posterior1 > posterio2 Posterior1 < posterio2
7. 7.  Example 1. 상자 A가 선택될 확률 P( A ) = 7 / 10 2. 상자 A에서 흰공 뽑힐 확률 P( 흰공 | A )= 2 / 10 3. 주머니에서는 A, 상자 A에서 흰공 뽑힐 확률 4. 흰공의 확률 ❶ ❷
8. 8.  Example ❶ ❷ 어디선가 흰공이 나왔는데… P( A | 흰공 )A에서 나왔을 확률? B에서 나왔을 확률? P( B | 흰공 ) P( A | 흰공 ) = ?
9. 9.  Bayes Rule ❶ Conditional Prob. A given B ❷ Conditional Prob. B given A ❸ Bayes Rule
10. 10.  Document Representation Extracting words from document Implementation: Preparation
11. 11. Implementation: Preparation  Representation of Classifier {'python': {'bad': 0, 'good': 6}, 'the': {'bad': 3, 'good': 3}} # getwords
12. 12.  How to access dict Implementation: Preparation
13. 13.  Training Implementation: Preparation
14. 14.  Result Implementation: Preparation
15. 15. Recall  Bayesian Theorem p( category | doc ) = p( doc ) p( doc | category ) * p( category)
16. 16. Implementation : Classifier  P( feature | category ) as prior
17. 17.  Assumed Probability to resolve data sparseness Implementation : Classifier
18. 18.  Results Implementation : Classifier
19. 19.  P( document | category ) as likelihood Implementation : Classifier
20. 20.  P( document | category ) * p( category ) Implementation : Classifier
21. 21.  Classifying Implementation : Classifier
22. 22.  Result Implementation : Classifier
23. 23.  Recall: Naïve Bayesian Classifier Fisher’s Method  Fisher’s Method First, p( document| category ) = p( feature_1| category ) * p( feature_2| category ) … * p( feature_N| category ) p( category | document ) ?? p( category | feature ) = # of documents having feature in category # of documents having feature
24. 24.  Q&A Thank You