8. Gogolook Confidential
★Instant Caller Identification
LINE whoscallidentifies background information of incoming unknown calls in seconds through tags reported by other users, Internet search results, and our comprehensive global database.
Instant Caller Identification
9. Gogolook Confidential
★Database with
over 600Million
Phone Numbers
LINE whoscallboasts an online database with over 600 million phone numbers. The database of LINE whoscallcovers yellow pages, spammers, telemarketers, costumer services...,etc. with numerous community tags contributed by users and comments based on real users’ experiences.
Database & Number Details
11. Gogolook Confidential
★Community Tag
★Block unwanted calls & SMSs
Contributions from the global user community has always been the pillar of LINE whoscall’sservice. LINE whoscalluser can tag aphone number and share it with others, which creates an integrated phone number database and a reliable communication network for everyone.
Block calls and SMSs intelligently to ensure a harassment-free calling experience.
Tag & Block
12. Gogolook Confidential
★World’s Largest
Yellow Page Database
★Offline Database
Available for Free
LINEwhoscallowns one of the world’s largest onlinephonenumber database in the world, which covers most of numbers of businesses and service providers essential to you daily lives.
The free database is not only available online but also offline. And they are completely free! The unlimited usage of database with over 600 million phone numbers is only on LINE whoscall.
Database Usage
13. Gogolook Confidential
3 of every 5 strangers’ calls can be identified by LINE whoscall
Over 400 million phone calls
are identified
by LINE whoscallevery month.
3000 spammer numbers
are reported
by LINE whoscalluser every day.
Number Identification
–2014.07
–2014.07
31. Gogolook Confidential
★Problem we want to solve
For an unknown phone number:
•
No google result
•
No user tag / report
•
Not a whoscalluser
Problem we want to solve
32. Gogolook Confidential
★Problem we want to solve
For an unknown phone number:
•
No google result
•
No user tag / report
•
Not a whoscalluser
Can we determine if it’s a spamnumber?
Problem we want to solve
33. Gogolook Confidential
★Problem we want to solve
For an unknown phone number:
•
No google result
•
No user tag / report
•
Not a whoscalluser
Can we determine if it’s a spam number?
推銷電話?
Problem we want to solve
34. Gogolook Confidential
★Problem we want to solve
For an unknown phone number:
•
No google result
•
No user tag / report
•
Not a whoscalluser
Can we determine if it’s a spam number?
推銷電話? 詐騙電話?
騷擾電話?
Problem we want to solve
35. Gogolook Confidential
★Problem we want to solve
For an unknown phone number:
•
No google result
•
No user tag / report
•
Not a whoscalluser
Can we determine if it’s a spam number?
推銷電話? 詐騙電話?
騷擾電話?
打錯電話?
Problem we want to solve
36. Gogolook Confidential
★Problem we want to solve
For an unknown phone number:
•
No google result
•
No user tag / report
•
Not a whoscalluser
Can we determine if it’s a spam number?
推銷電話? 詐騙電話?
騷擾電話?
打錯電話?
Problem we want to solve
(我又不是神!!)
38. Gogolook Confidential
★We think it should work because…
whoscalluserbase( = potential sensors)
•
> 10 million installations
•
> 10 thousands tags (daily)
•
> 30 million phone calls (daily)
39. Gogolook Confidential
Analysis procedures
Analysis procedures
1.
Collect call logs
2.
Compare with user tags
3.
Explore call behaviors
4.
Extract features
5.
Classify unknown numbers using machine learning techniques
40. Gogolook Confidential
★Collect call logs
•
Recruit a group of voluntary whoscallusers as our sensors.
•
Collect phone call logs from these sensors for a month.
Collect call logs
41. Gogolook Confidential
★User privacy
User privacy is kept in the highest priority.
Phone numbers are stored as one-way hash codes. (therefore unable to be reversed)
User privacy
42. Gogolook Confidential
Analysis procedures
Analysis procedures
1.
Collect call logs
2.
Compare with user tags
3.
Explore call behaviors
4.
Extract features
5.
Classify unknown numbers using machine learning techniques
44. Gogolook Confidential
★Compare with user tags
•
Compare these phone numbers with user reports from whoscalldatabase (封鎖記錄)
Compare with user tags
Normal numbers
0987-991-XXX
0986-225-XXX
02-2675-XXXX
03-862-XXXX
...
02-2543-XXXX
03-556-XXXX
886-XXXX
…
推銷電話
02-2783-XXXX
886-903-XXXX
0800-000-XXX
…
惡意電話
57. Gogolook Confidential
Features for call patterns
Ratio of recurring opponents
Fraud
Marketing
Normal
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
58. Gogolook Confidential
Features for call patterns
Ratio of missed out calls
Fraud
Marketing
Normal
0.6
0.5
0.4
0.3
0.2
0.1
0
59. Gogolook Confidential
Features for call patterns
Ratio of working time calls
Fraud
Marketing
Normal
0.6
0.5
0.4
0.3
0.2
0.1
0
0.7
60. Gogolook Confidential
Features for call patterns
Median of call durations
Fraud
Marketing
Normal
50
40
30
20
10
0
60
seconds
61. Gogolook Confidential
Features for call patterns
Ratio of out calls in contact book
Fraud
Marketing
Normal
0.10
0
0.25
0.30
0.35
0.20
0.15
0.05
62. Gogolook Confidential
Analysis procedures
Analysis procedures
1.
Collect call logs
2.
Compare with user tags
3.
Explore call behaviors
4.
Extract features
5.
Classify unknown numbers using machine learning techniques
63. Gogolook Confidential
Ratio of recurring components is less than 40%
Ratio of out calls is more than 60%
Ratio of in calls is less than 20%
Then we claim the number is a spam number
Intuitively, we can determine an unknown number by rules such as if
★Naïve method
79. Gogolook Confidential
Real-life scenario
★Real-life scenario
When will we require a spam number prediction?
Ans: The time a phone call reaches a whoscalluser
We want to predict whether a number is spam as EARLYas possible in order to prevent further victims…
84. Gogolook Confidential
Reduce the number of features
★Reduce the number of features
Features computation is time-consuming. So we want to reduce the number of features before we do classification.
85. Gogolook Confidential
Reduce the number of features
★Reduce the number of features
Features computation is time-consuming. So we want to reduce the number of features before we do classification.
當然我們不是用手去選…
86. Gogolook Confidential
Reduce the number of features
★Reduce the number of features
Features computation is time-consuming. So we want to reduce the number of features before we do classification.
Feature selection methods:
Regularization methods
Backward, forward, and stepwise methods
Bayesian feature selection
Random forest method
90. Gogolook Confidential
Ratio of out calls
Rate of out calls
Ratio of out calls in contact book
Ratio of reciprocal opponents
Ratio of recurring opponents
Median call duration of in calls
Ring duration of answered calls
and more…
★Selected features
Ratio of missed calls
Rate of new opponents
Ratio of in calls in contact book
93. Gogolook Confidential
What is power?
★What is power?
Power of class A: The probability of accurately classify a class A sampleto class A.
94. Gogolook Confidential
What is power?
★What is power?
Power of class A: The probability of accurately classify a class A sampleto class A.
性別 Classifier
97.5% this is a male
95. Gogolook Confidential
What is power?
★What is power?
Power of class A: The probability of accurately classify a class A sampleto class A.
性別 Classifier
97.5% this is a male
96. Gogolook Confidential
Power of our classifier
★Power of our classifier
0.8
0.85
0.9
0.95
1.0
3
4
5
6
7
8
9
10
#recent calls
Power
103. Gogolook Confidential
Power of SVMfor multi-classification
★Power of SVM for multi-classification
0.8
0.85
0.9
0.95
1.0
3
4
5
6
7
8
9
10
#recent calls
Power
104. Gogolook Confidential
Power of SVM for binary classification
★Power of SVM for binary classification
0.8
0.85
0.9
0.95
1.0
3
4
5
6
7
8
9
10
#recent calls
Power
105. Gogolook Confidential
What is type I error rate?
★What is type I error rate?
Type I error: The probability of misclassify a class B sampleto class A.
性別 Classifier
5% this is a male
106. Gogolook Confidential
What is type I error rate?
★What is type I error rate?
Type I error: The probability of misclassify a class B sampleto class A.
性別 Classifier
5% this is a male
107. Gogolook Confidential
Type I error comparison
★Type I error comparison
0
0.05
0.1
0.15
0.3
3
4
5
6
7
8
9
10
#recent calls
Type I error
0.2
0.25
116. Gogolook Confidential
★Naïve method
Similarly, without machine learning we can design rules such as:
Rule1: The mean of the ringing duration is less then 7 seconds.
and
Rule 2: The mean of the outcall duration is less than 3 seconds.
Then we claim that it is a one-ring spam call.
122. Gogolook Confidential
Power of SVM for multi-classification
★Power of SVM for multi-classification
0.8
0.85
0.9
0.95
1.0
3
4
5
6
7
8
9
10
#recent calls
Power
127. Gogolook Confidential
Improvements of the classification model
1.
Fraud numbers analysis
2.
Fuzzy classification algorithm
3.
Spam-category scores
4.
Cooperate with more solid outside sources
5.
Generalize to other countries.
Much more…
★Improvements of the classification model
128. Gogolook Confidential
Future perspectives
1.
User’s tag correction mechanisms
2.
Personalized penalty setting
3.
Anti-countermeasures
4.
Extend to SMS spam detection
5.
Clustering vs. user tags
6.
Spam detect Scam detection
★Future perspectives