SlideShare a Scribd company logo
1 of 40
이상 감지
(Anomaly Detection)
고등 지능 기술 연구회
(Advanced Intelligence Technology Research Society)
데이터의 메인 스트림에서 벗어난 샘플
데이터 마이닝에서 이상감지는 예상 패턴 또는 정상 범
주를 준수하지 않는 아이템, 이벤트, 관찰들의 식별을 의
Min:Max ≠ Outlier
1.5xIQR rule
IQR(Interquartile Range) = Q3 – Q1
이상 값은 전형적으로 문제의 한 증상으로 해석
일반적인 통계 정의에 따르지 않는 드문 현상
클러스터 알고리즘으로 이상 패턴에 의해 형성된
마이크로 클러스터를 검출
Anomaly detection was proposed for intrusion
detection systems (IDS) by Dorothy Denning in 1986.
초기에는 정상 임계치, 통계량의 전처리, 소프트 컴퓨팅
그리고, 귀납적 학습
사이버 침입 탐지, 신용카드 사기, 고장 감지, 시스템 건
전성 모니터링, IoT, etc.
생태계 교란을 감지
데이터에서 이상 값을 제거하는 데 자주 사용
3가지 분류
1. 비지도 이상 감지(Unsupervised anomaly detection)
- 레이블 없는 데이터에서 이상 감지
- K-means 클러스터 알고리즘으로 이상검출
2. 지도 이상 감지(Supervised anomaly detection)
- 정상(Normal), 비정상(Abnormal) 레이블이 존재
- 분류 모델 이용(SVM, Random forests, Logistic, Robust,
KNN, etc.)
3가지 분류(cont.)
3. 준지도 이상 감지(Semi-supervised anomaly detection)
- 정상(Normal) 레이블만 존재하고, 정상 모델에 의해 생성한
likelihood를 비교해서 이상 값을 추출
- NKIA’s LRSTSD based Anomaly Detection
- Twitter’s Seasonal Hybrid ESD (S-H-ESD) based Anomaly
NKIA’s Anomaly Detection Twitter’s Anomaly Detection
입력 데이터
단변량(Univariate) 다변량(Multivariate)
입력 데이터(cont.)
- Binary
- Categorical
- Continuous
- Hybrid
이상값의 종류
Point Anomalies
- 데이터 셋의 뭉치에서 벗어나는 값
이상값의 종류(cont.)
Contextual Anomalies
- 컨텍스트에 동떨어진 값
- 컨텍스트의 개념이 필요
- 조건부 이상치의 참조(Rules)
이상값의 종류(cont.)
Collective Anomalies
- 수집 문제로 발생한 이상값
Output of Anomaly Detection
- Label of normal or anomaly
- 분류문제 접근법에서 true|false or class
- Rank
- 0:1
- Threshold parameter가 필요
이상감지의 평가
- 지도학습, 분류문제 평가
- Formula:
Recall(R) = TP / (TP + FN)
Precision(P) = TP / (TP + FP)
F-measure = 2*R*P/(R+P)
The Area Under an ROC Curve
- AUC(Area Under the Curve)
- Detection Rate(TP), False Alarm Rate(TN)
- 0:1
- Equation:
Confusion Actual class
Normal Anomaly
Normal TP FP
Anomaly FN TN
Score Label
.90 ~ 1 Excellent(A)
.80 ~ .90 Good(B)
.70 ~ .80 Fair(C)
.60 ~ .70 Poor(D)
.50 ~ .60 Fail(F)
평가표 ROC(Receiver Operating
Characteristic) Curves
m = # of TP, n = # of TN, 𝑝𝑖 = 𝑇𝑃 𝑅𝑎𝑡𝑒(Detection Rate), 𝑝𝑗 = 𝑇𝑁 𝑅𝑎𝑡𝑒(𝐹𝑎𝑙𝑠𝑒 𝐴𝑙𝑎𝑟𝑚 𝑅𝑎𝑡𝑒)
유명한 이상감지 기법들
Twitter’s Anomaly Detection R pack.
Twitter open-sourced their R package for anomaly
They call their algorithm Seasonal Hybrid ESD (S-H-
ESD), which is built on Generalized ESD.
Sometimes anomalies can mess up your modeling.
Twitter’s Anomaly Detection R pack.(cont.)
res = AnomalyDetectionTs(raw_data, max_anoms=0.02,
direction='both', plot=TRUE)
Twitter’s Anomaly Detection R pack.(cont.)
v <- read.csv("D:/r/tsd_paper/cpu_5m_02.csv")
res2 = AnomalyDetectionVec(v, max_anoms=0.02, period=72,
direction='both', plot=TRUE)
Twitter’s Anomaly Detection R pack.(cont.)
AnomalyDetectionTs(x, max_anoms = 0.1, direction = "pos", alpha = 0.05, only_last = NULL, threshold = "None", e_value =
FALSE, longterm = FALSE, piecewise_median_period_weeks = 2, plot = FALSE, y_log = FALSE, xlabel = "", ylabel = "count", title
= NULL, verbose = FALSE)
X : Time series as a two column data frame where the first column consists of the timestamps and the second column consists
of the observations.
max_anoms : Maximum number of anomalies that S-H-ESD will detect as a percentage of the data.
direction : Directionality of the anomalies to be detected. Options are: 'pos' | 'neg' | 'both'.
alpha : The level of statistical significance with which to accept or reject anomalies.
only_last : Find and report anomalies only within the last day or hr in the time series. NULL | 'day' | 'hr'.
threshold : Only report positive going anoms above the threshold specified. Options are: 'None' | 'med_max' | 'p95' | 'p99'.
e_value : Add an additional column to the anoms output containing the expected value.
longterm : Increase anom detection efficacy for time series that are greater than a month. See Details below.
piecewise_median_period_weeks : The piecewise median time window as described in Vallis, Hochenbaum, and Kejariwal (2014).
Defaults to 2.
Twitter’s Anomaly Detection R pack.(cont.)
AnomalyDetectionTs(x, max_anoms = 0.1, direction = "pos", alpha = 0.05, only_last = NULL, threshold = "None", e_value =
FALSE, longterm = FALSE, piecewise_median_period_weeks = 2, plot = FALSE, y_log = FALSE, xlabel = "", ylabel = "count", title
= NULL, verbose = FALSE)
plot : A flag indicating if a plot with both the time series and the estimated anoms, indicated by circles, should also be returned.
y_log : Apply log scaling to the y-axis. This helps with viewing plots that have extremely large positive anomalies relative to the
rest of the data.
xlabel : X-axis label to be added to the output plot.
ylabel : Y-axis label to be added to the output plot.
title : Title for the output plot.
verbose : Enable debug messages
Twitter’s Anomaly Detection R pack.(cont.)
To understand how twitter’s algorithm works, you need
to know.
- Student t-distribution
- Extreme Studentized Deviate (ESD) test
- Generalized ESD
- Linear regression
- STL(Seasonal Trend LOESS)
Twitter’s Anomaly Detection R pack.(cont.)
Student t-distribution
정규 분포의 평균을 측정할 때 주로 사용되는 분포
Twitter’s Anomaly Detection R pack.(cont.)
Extreme Studentized Deviate (ESD) test
Twitter’s Anomaly Detection R pack.(cont.)
Generalized ESD
Twitter’s Anomaly Detection R pack.(cont.)
Seasonality(linear regression, LOESS, STL)
The generalized ESD works when you have a set of points from a normal distribution,
but real data has some seasonality. This is where STL comes in. It decomposes the data
into a season part, a trend and whatever’s left over using local regression (LOESS), which
fits a low order polynomial to a subset of the data and stitches them together by
weighting them. Since you can remove the trend and seasonal part with loess, you
should be left with something that is more or less normally distributed. You can apply
generalized ESD on what’s left over to detect anomalies.
#STL: “Seasonal and Trend decomposition using Loess”
Seasonality Local regression(LOESS) Polynomial regression
Twitter: Introducing practical and robust
anomaly detection in a time series
At Twitter, we observe distinct seasonal patterns in most of the time series.
Global: global anomalies typically extend above or below expected seasonality and are
therefore not subject to seasonality and underlying trend
Local: anomalies which occur inside seasonal patterns, are masked and thus are much
more difficult to detect in a robust fashion.
Positive: 슈퍼볼 경기 동안의 트윗 폭증 등(이벤트에 대한 용량 산정을 위해 사용)
Negative: 초당 쿼리수(QPS[Queries Per Second])의 증가 등 잠재적인 하드웨어나 데이터
수집 이슈를 발견
Subspace- and correlation-based outlier
detection for high-dimensional data.
주성분 분석(PCA), 요인 분석(Dimension reduction)을 이용하여
차원 축소
부분공간(Subspace)의 대비(Contrast)를 계산하여 이상을 감지
Subspace- and correlation-based outlier
detection for high-dimensional data.(cont.)
HiCS: High Contrast Subspaces for Density-Based Outlier Ranking
RNN(Replicator neural networks)
에러를 최소화해서 입력 패턴을 재생하는 방법
정상 모델을 생성하여 이상값을 추출
A schematic view of a fully connected
Replicator Neural Network.
𝑂𝐹𝑖 = i번째 요소의 Anomaly Factor 스코어
𝑛 = # of features
𝑥𝑖𝑗 = i번째 요소의 j컬럼 관측값
𝑜𝑖𝑗 = i번째 요소의 j컬럼 RNN으로 재생한 정규값
LOF(Local Outlier Factor)
Density-based anomaly detection by KNN
Score를 제공하여 해석이 용이하나 delay time이 좀 있음.
Unsupervised anomaly detection
Basic idea of LOF: comparing the local density of a point with the densities of its neighbors. A has a much lower
density than its neighbors
LOF(Local Outlier Factor)(cont.)
Illustration of the
reachability distance.
Objects B and C have the
same reachability distance
(k=3), while D is not a k
nearest neighbor
LOF(Local Outlier Factor)(cont.)
LOF scores as visualized by ELKI. While the upper right cluster has a
comparable density to the outliers close to the bottom left cluster, they
are detected correctly.
LOF(Local Outlier Factor)(cont.)
LOF scores of cpu util. vs. Time by Rlof
LRSTSD(Log regression seasonality based
approach of time series decomposition)
Anomaly score formula:
Anomaly score
1일 네트워크 트래픽Tx 7일 네트워크 트래픽Tx
𝐸𝑖 = i번째 에러
𝐴𝑖 = i번째 관측값
𝑈𝑖 = i번째 예측 상한 값
𝐿𝑖 = i번째 예측 하한 값
𝑃 = 전체 값(Parameter)
이상감지는 예측 모델 생성 시 Noise를 제거할 수 있는 기술
 예측률 향상 기대
데이터의 오탐/수집 실패를 감지
 Resampling, 보정 등 적절한 대처가 가능
관측된 이상 값과 문제와의 연관성 분석
 문제에 대한 사전 감지 기술로 활용
 고장 예측

More Related Content

What's hot

Data pipeline and data lake
Data pipeline and data lakeData pipeline and data lake
Data pipeline and data lakeDaeMyung Kang
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning David Voyles
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기Kee Hoon Lee
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...Edureka!
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [삼신할머니 말고 Ai] : StyleGan을 이용한 커스터마이징 2세 예측 프로그램
제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [삼신할머니 말고 Ai] : StyleGan을 이용한 커스터마이징 2세 예측 프로그램제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [삼신할머니 말고 Ai] : StyleGan을 이용한 커스터마이징 2세 예측 프로그램
제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [삼신할머니 말고 Ai] : StyleGan을 이용한 커스터마이징 2세 예측 프로그램BOAZ Bigdata
Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차Taekyung Han
[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영NAVER D2
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly DetectionKenneth Graham
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [대법관 김보아즈팀] : 일상 속 뉴스를 신속하게 ! 뉴스 속 판례를 정확하게 !
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [대법관 김보아즈팀] : 일상 속 뉴스를 신속하게 ! 뉴스 속 판례를 정확하게 ! 제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [대법관 김보아즈팀] : 일상 속 뉴스를 신속하게 ! 뉴스 속 판례를 정확하게 !
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [대법관 김보아즈팀] : 일상 속 뉴스를 신속하게 ! 뉴스 속 판례를 정확하게 ! BOAZ Bigdata
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.Yongho Ha
[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?
[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?
[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?Juhong Park
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurityscoopnewsgroup
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1 나무기술(주) 최유석 20170912
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1  나무기술(주) 최유석 20170912Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1  나무기술(주) 최유석 20170912
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1 나무기술(주) 최유석 20170912Yooseok Choi
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningShao-Chuan Wang
Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The EdgeArun Kejariwal

What's hot (20)

Data pipeline and data lake
Data pipeline and data lakeData pipeline and data lake
Data pipeline and data lake
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [삼신할머니 말고 Ai] : StyleGan을 이용한 커스터마이징 2세 예측 프로그램
제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [삼신할머니 말고 Ai] : StyleGan을 이용한 커스터마이징 2세 예측 프로그램제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [삼신할머니 말고 Ai] : StyleGan을 이용한 커스터마이징 2세 예측 프로그램
제 13회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [삼신할머니 말고 Ai] : StyleGan을 이용한 커스터마이징 2세 예측 프로그램
Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차Python을 활용한 챗봇 서비스 개발 1일차
Python을 활용한 챗봇 서비스 개발 1일차
[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly Detection
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [대법관 김보아즈팀] : 일상 속 뉴스를 신속하게 ! 뉴스 속 판례를 정확하게 !
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [대법관 김보아즈팀] : 일상 속 뉴스를 신속하게 ! 뉴스 속 판례를 정확하게 ! 제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [대법관 김보아즈팀] : 일상 속 뉴스를 신속하게 ! 뉴스 속 판례를 정확하게 !
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [대법관 김보아즈팀] : 일상 속 뉴스를 신속하게 ! 뉴스 속 판례를 정확하게 !
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?
[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?
[KAIST 채용설명회] 데이터 엔지니어는 무슨 일을 하나요?
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurity
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1 나무기술(주) 최유석 20170912
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1  나무기술(주) 최유석 20170912Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1  나무기술(주) 최유석 20170912
Bigquery와 airflow를 이용한 데이터 분석 시스템 구축 v1 나무기술(주) 최유석 20170912
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams

Viewers also liked

Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterArun Kejariwal
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache SparkCloudera, Inc.
Welcher Test gewinnt?
Welcher Test gewinnt?Welcher Test gewinnt?
Welcher Test gewinnt?Silke Berz
Anomaly Detection with BigML
Anomaly Detection with BigMLAnomaly Detection with BigML
Anomaly Detection with BigMLDavid Gerster
What is jubatus? How it works for you?
What is jubatus? How it works for you?What is jubatus? How it works for you?
What is jubatus? How it works for you?Kumazaki Hiroki
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup ) Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup ) Ai Makabi
Vector space - subspace By Jatin Dhola
Vector space - subspace By Jatin DholaVector space - subspace By Jatin Dhola
Vector space - subspace By Jatin DholaJatin Dhola
Time series Analysis & fpp package
Time series Analysis & fpp packageTime series Analysis & fpp package
Time series Analysis & fpp packageDr. Fiona McGroarty
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsManojit Nandi
Network_Intrusion_Detection_System_Team1Saksham Agrawal
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflixCody Rioux
単純ベイズ法による異常検知 #ml-professional
単純ベイズ法による異常検知  #ml-professional単純ベイズ法による異常検知  #ml-professional
単純ベイズ法による異常検知 #ml-professionalAi Makabi
Chapter 01 #ml-professional
Chapter 01 #ml-professionalChapter 01 #ml-professional
Chapter 01 #ml-professionalAi Makabi
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup SlidesQuantUniversity
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAdam Gibson
[devil's camp] - 알고리즘 대회와 STL (박인서)
[devil's camp] - 알고리즘 대회와 STL (박인서)[devil's camp] - 알고리즘 대회와 STL (박인서)
[devil's camp] - 알고리즘 대회와 STL (박인서)NAVER D2
Chapter 02 #ml-professional
Chapter 02  #ml-professionalChapter 02  #ml-professional
Chapter 02 #ml-professionalAi Makabi
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionKhalid Elshafie
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangVivian S. Zhang

Viewers also liked (20)

Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ Twitter
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
Welcher Test gewinnt?
Welcher Test gewinnt?Welcher Test gewinnt?
Welcher Test gewinnt?
Anomaly Detection with BigML
Anomaly Detection with BigMLAnomaly Detection with BigML
Anomaly Detection with BigML
What is jubatus? How it works for you?
What is jubatus? How it works for you?What is jubatus? How it works for you?
What is jubatus? How it works for you?
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup ) Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
Vector space - subspace By Jatin Dhola
Vector space - subspace By Jatin DholaVector space - subspace By Jatin Dhola
Vector space - subspace By Jatin Dhola
Time series Analysis & fpp package
Time series Analysis & fpp packageTime series Analysis & fpp package
Time series Analysis & fpp package
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflix
単純ベイズ法による異常検知 #ml-professional
単純ベイズ法による異常検知  #ml-professional単純ベイズ法による異常検知  #ml-professional
単純ベイズ法による異常検知 #ml-professional
Chapter 01 #ml-professional
Chapter 01 #ml-professionalChapter 01 #ml-professional
Chapter 01 #ml-professional
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup Slides
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) English
[devil's camp] - 알고리즘 대회와 STL (박인서)
[devil's camp] - 알고리즘 대회와 STL (박인서)[devil's camp] - 알고리즘 대회와 STL (박인서)
[devil's camp] - 알고리즘 대회와 STL (박인서)
Chapter 02 #ml-professional
Chapter 02  #ml-professionalChapter 02  #ml-professional
Chapter 02 #ml-professional
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly Detection
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang

Similar to Anomaly detection

Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsAnomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsCynthia Freeman
Encoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlabEncoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlabSneheshDutta
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMapAshish Patel
Nural network ER. Abhishek k. upadhyay
Nural network ER. Abhishek  k. upadhyayNural network ER. Abhishek  k. upadhyay
Nural network ER. Abhishek k. upadhyayabhishek upadhyay
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxakashayosha
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalizationKamal Bhatt
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
CBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCarl Byington
Hierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly DetectionHierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly DetectionIhor Bobak
EBDSS Max Research Report - Final
EBDSS  Max  Research Report - FinalEBDSS  Max  Research Report - Final
EBDSS Max Research Report - FinalMax Robertson
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Oswald Campesato
Cheat sheets for AI
Cheat sheets for AICheat sheets for AI
Cheat sheets for AINcib Lotfi
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learningRADO7900
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation StudyIdentification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Studyiosrjce

Similar to Anomaly detection (20)

Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsAnomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Encoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlabEncoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlab
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
Java and Deep Learning
Java and Deep LearningJava and Deep Learning
Java and Deep Learning
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMap
Nural network ER. Abhishek k. upadhyay
Nural network ER. Abhishek  k. upadhyayNural network ER. Abhishek  k. upadhyay
Nural network ER. Abhishek k. upadhyay
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalization
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
CBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCBM Fault Detection by Carl Byington
CBM Fault Detection by Carl Byington
Hierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly DetectionHierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly Detection
EBDSS Max Research Report - Final
EBDSS  Max  Research Report - FinalEBDSS  Max  Research Report - Final
EBDSS Max Research Report - Final
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)
Cheat sheets for AI
Cheat sheets for AICheat sheets for AI
Cheat sheets for AI
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation StudyIdentification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Study

Recently uploaded

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive

Recently uploaded (20)

Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data

Anomaly detection

  • 1. 이상 감지 (Anomaly Detection) 고등 지능 기술 연구회 (Advanced Intelligence Technology Research Society) 김철( 2016-07-09
  • 2. 이상감지란? 데이터의 메인 스트림에서 벗어난 샘플 데이터 마이닝에서 이상감지는 예상 패턴 또는 정상 범 주를 준수하지 않는 아이템, 이벤트, 관찰들의 식별을 의 미. outlier
  • 3. 이상감지란?(cont.) Min:Max ≠ Outlier 1.5xIQR rule IQR(Interquartile Range) = Q3 – Q1 Max Min
  • 4. 이상감지란?(cont.) 이상 값은 전형적으로 문제의 한 증상으로 해석 일반적인 통계 정의에 따르지 않는 드문 현상
  • 5. 이상감지란?(cont.) 클러스터 알고리즘으로 이상 패턴에 의해 형성된 마이크로 클러스터를 검출
  • 6. 역사 Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986. 초기에는 정상 임계치, 통계량의 전처리, 소프트 컴퓨팅 그리고, 귀납적 학습
  • 8. 응용기술 사이버 침입 탐지, 신용카드 사기, 고장 감지, 시스템 건 전성 모니터링, IoT, etc. 생태계 교란을 감지 데이터에서 이상 값을 제거하는 데 자주 사용
  • 9. 3가지 분류 1. 비지도 이상 감지(Unsupervised anomaly detection) - 레이블 없는 데이터에서 이상 감지 - K-means 클러스터 알고리즘으로 이상검출 2. 지도 이상 감지(Supervised anomaly detection) - 정상(Normal), 비정상(Abnormal) 레이블이 존재 - 분류 모델 이용(SVM, Random forests, Logistic, Robust, KNN, etc.)
  • 10. 3가지 분류(cont.) 3. 준지도 이상 감지(Semi-supervised anomaly detection) - 정상(Normal) 레이블만 존재하고, 정상 모델에 의해 생성한 likelihood를 비교해서 이상 값을 추출 - NKIA’s LRSTSD based Anomaly Detection - Twitter’s Seasonal Hybrid ESD (S-H-ESD) based Anomaly Detection NKIA’s Anomaly Detection Twitter’s Anomaly Detection
  • 12. 입력 데이터(cont.) 자료구조 - Binary - Categorical - Continuous - Hybrid
  • 13. 이상값의 종류 Point Anomalies - 데이터 셋의 뭉치에서 벗어나는 값
  • 14. 이상값의 종류(cont.) Contextual Anomalies - 컨텍스트에 동떨어진 값 - 컨텍스트의 개념이 필요 - 조건부 이상치의 참조(Rules)
  • 15. 이상값의 종류(cont.) Collective Anomalies - 수집 문제로 발생한 이상값
  • 16. Output of Anomaly Detection Label - Label of normal or anomaly - 분류문제 접근법에서 true|false or class Score - Rank - 0:1 - Threshold parameter가 필요
  • 17. 이상감지의 평가 F-Measure - 지도학습, 분류문제 평가 - Formula: Recall(R) = TP / (TP + FN) Precision(P) = TP / (TP + FP) F-measure = 2*R*P/(R+P) The Area Under an ROC Curve - AUC(Area Under the Curve) - Detection Rate(TP), False Alarm Rate(TN) - 0:1 - Equation: Confusion Actual class Normal Anomaly Predicted class Normal TP FP Anomaly FN TN 이원교차표(Crosstable) Score Label .90 ~ 1 Excellent(A) .80 ~ .90 Good(B) .70 ~ .80 Fair(C) .60 ~ .70 Poor(D) .50 ~ .60 Fail(F) 평가표 ROC(Receiver Operating Characteristic) Curves m = # of TP, n = # of TN, 𝑝𝑖 = 𝑇𝑃 𝑅𝑎𝑡𝑒(Detection Rate), 𝑝𝑗 = 𝑇𝑁 𝑅𝑎𝑡𝑒(𝐹𝑎𝑙𝑠𝑒 𝐴𝑙𝑎𝑟𝑚 𝑅𝑎𝑡𝑒)
  • 20. Twitter’s Anomaly Detection R pack. Twitter open-sourced their R package for anomaly detection. They call their algorithm Seasonal Hybrid ESD (S-H- ESD), which is built on Generalized ESD. Sometimes anomalies can mess up your modeling.
  • 21. Twitter’s Anomaly Detection R pack.(cont.) install.packages("devtools") devtools::install_github("twitter/AnomalyDetection") library(AnomalyDetection) install.packages("gtable") install.packages("scales") data(raw_data) res = AnomalyDetectionTs(raw_data, max_anoms=0.02, direction='both', plot=TRUE) res$plota
  • 22. Twitter’s Anomaly Detection R pack.(cont.) v <- read.csv("D:/r/tsd_paper/cpu_5m_02.csv") res2 = AnomalyDetectionVec(v, max_anoms=0.02, period=72, direction='both', plot=TRUE) res2$plot
  • 23. Twitter’s Anomaly Detection R pack.(cont.) Usage AnomalyDetectionTs(x, max_anoms = 0.1, direction = "pos", alpha = 0.05, only_last = NULL, threshold = "None", e_value = FALSE, longterm = FALSE, piecewise_median_period_weeks = 2, plot = FALSE, y_log = FALSE, xlabel = "", ylabel = "count", title = NULL, verbose = FALSE) Arguments X : Time series as a two column data frame where the first column consists of the timestamps and the second column consists of the observations. max_anoms : Maximum number of anomalies that S-H-ESD will detect as a percentage of the data. direction : Directionality of the anomalies to be detected. Options are: 'pos' | 'neg' | 'both'. alpha : The level of statistical significance with which to accept or reject anomalies. only_last : Find and report anomalies only within the last day or hr in the time series. NULL | 'day' | 'hr'. threshold : Only report positive going anoms above the threshold specified. Options are: 'None' | 'med_max' | 'p95' | 'p99'. e_value : Add an additional column to the anoms output containing the expected value. longterm : Increase anom detection efficacy for time series that are greater than a month. See Details below. piecewise_median_period_weeks : The piecewise median time window as described in Vallis, Hochenbaum, and Kejariwal (2014). Defaults to 2.
  • 24. Twitter’s Anomaly Detection R pack.(cont.) Usage AnomalyDetectionTs(x, max_anoms = 0.1, direction = "pos", alpha = 0.05, only_last = NULL, threshold = "None", e_value = FALSE, longterm = FALSE, piecewise_median_period_weeks = 2, plot = FALSE, y_log = FALSE, xlabel = "", ylabel = "count", title = NULL, verbose = FALSE) Arguments(cont.) plot : A flag indicating if a plot with both the time series and the estimated anoms, indicated by circles, should also be returned. y_log : Apply log scaling to the y-axis. This helps with viewing plots that have extremely large positive anomalies relative to the rest of the data. xlabel : X-axis label to be added to the output plot. ylabel : Y-axis label to be added to the output plot. title : Title for the output plot. verbose : Enable debug messages
  • 25. Twitter’s Anomaly Detection R pack.(cont.) To understand how twitter’s algorithm works, you need to know. - Student t-distribution - Extreme Studentized Deviate (ESD) test - Generalized ESD - Linear regression - LOESS - STL(Seasonal Trend LOESS)
  • 26. Twitter’s Anomaly Detection R pack.(cont.) Student t-distribution 정규 분포의 평균을 측정할 때 주로 사용되는 분포 PDF t
  • 27. Twitter’s Anomaly Detection R pack.(cont.) Extreme Studentized Deviate (ESD) test
  • 28. Twitter’s Anomaly Detection R pack.(cont.) Generalized ESD
  • 29. Twitter’s Anomaly Detection R pack.(cont.) Seasonality(linear regression, LOESS, STL) The generalized ESD works when you have a set of points from a normal distribution, but real data has some seasonality. This is where STL comes in. It decomposes the data into a season part, a trend and whatever’s left over using local regression (LOESS), which fits a low order polynomial to a subset of the data and stitches them together by weighting them. Since you can remove the trend and seasonal part with loess, you should be left with something that is more or less normally distributed. You can apply generalized ESD on what’s left over to detect anomalies. #STL: “Seasonal and Trend decomposition using Loess” Seasonality Local regression(LOESS) Polynomial regression
  • 30. Twitter: Introducing practical and robust anomaly detection in a time series Global/Local At Twitter, we observe distinct seasonal patterns in most of the time series. Global: global anomalies typically extend above or below expected seasonality and are therefore not subject to seasonality and underlying trend Local: anomalies which occur inside seasonal patterns, are masked and thus are much more difficult to detect in a robust fashion. Positive/Negative Positive: 슈퍼볼 경기 동안의 트윗 폭증 등(이벤트에 대한 용량 산정을 위해 사용) Negative: 초당 쿼리수(QPS[Queries Per Second])의 증가 등 잠재적인 하드웨어나 데이터 수집 이슈를 발견
  • 31. Subspace- and correlation-based outlier detection for high-dimensional data. 주성분 분석(PCA), 요인 분석(Dimension reduction)을 이용하여 차원 축소 부분공간(Subspace)의 대비(Contrast)를 계산하여 이상을 감지
  • 32. Subspace- and correlation-based outlier detection for high-dimensional data.(cont.) HiCS: High Contrast Subspaces for Density-Based Outlier Ranking
  • 33. RNN(Replicator neural networks) 에러를 최소화해서 입력 패턴을 재생하는 방법 정상 모델을 생성하여 이상값을 추출 A schematic view of a fully connected Replicator Neural Network. 𝑂𝐹𝑖 = i번째 요소의 Anomaly Factor 스코어 𝑛 = # of features 𝑥𝑖𝑗 = i번째 요소의 j컬럼 관측값 𝑜𝑖𝑗 = i번째 요소의 j컬럼 RNN으로 재생한 정규값
  • 34. LOF(Local Outlier Factor) Density-based anomaly detection by KNN Score를 제공하여 해석이 용이하나 delay time이 좀 있음. Unsupervised anomaly detection Basic idea of LOF: comparing the local density of a point with the densities of its neighbors. A has a much lower density than its neighbors
  • 35. LOF(Local Outlier Factor)(cont.) Formula: Illustration of the reachability distance. Objects B and C have the same reachability distance (k=3), while D is not a k nearest neighbor
  • 36. LOF(Local Outlier Factor)(cont.) LOF scores as visualized by ELKI. While the upper right cluster has a comparable density to the outliers close to the bottom left cluster, they are detected correctly.
  • 37. LOF(Local Outlier Factor)(cont.) LOF scores of cpu util. vs. Time by Rlof
  • 38. LRSTSD(Log regression seasonality based approach of time series decomposition) Anomaly score formula: Anomaly score 1일 네트워크 트래픽Tx 7일 네트워크 트래픽Tx 𝐸𝑖 = i번째 에러 𝐴𝑖 = i번째 관측값 𝑈𝑖 = i번째 예측 상한 값 𝐿𝑖 = i번째 예측 하한 값 𝑃 = 전체 값(Parameter)
  • 39. 결론 이상감지는 예측 모델 생성 시 Noise를 제거할 수 있는 기술  예측률 향상 기대 데이터의 오탐/수집 실패를 감지  Resampling, 보정 등 적절한 대처가 가능 관측된 이상 값과 문제와의 연관성 분석  문제에 대한 사전 감지 기술로 활용  고장 예측
  • 40. 참고문헌 • • ine-learning-where-is-the-difference-between-one-class- binary-class-and-m • • Using-Replicator-Neural-Networks-Hawkins- He/87a09c777dcecab4883e328669ef2af1ba8dd7be • research/D-mining/Anomaly-D/KDD-cup- 99/NN/dawak02.pdf • • 0281-6_118#page-1 • • • %8D%98%ED%8A%B8_t_%EB%B6%84%ED%8F%AC • • 2F02vnd10%2C%20%2Fm%2F0bs2j8q&cmpt=q&tz=Etc%2FGMT-9 • detection • • a-data-set-lesson-quiz.html • • • • • • • • anomaly-detection-in-a-time-series

Editor's Notes

  1. oO