SlideShare una empresa de Scribd logo
1 de 30
Jiang Zhu and Sean Wang

Dec 5th, 2011




                          1
•  Monitor and track user behavior on smartphones using various
 on-device sensors
•  Convert sensory traces and other context information to Personal
 Behavior Features
•  Build Risk Analysis Trees with these features and use it for
 calculation of Certainty Scores
•  Trigger various Authentication Schemes when certain application
 is launched.




                                                                      2
3
4
60%                                                                   •  “The 329 organizations
                                                                          polled had collectively lost
50%                                                                       more than 86,000 devices
                                                                          … with average cost of lost
40%                                                                       data at $49,246 per device,
30%
                                                                          worth $2.1 billion or $6.4
                                                                          million per organization.
 20%

 10%
                                                                        "The Billion Dollar Lost-Laptop Study,"
   0%                                                                      conducted by Intel Corporation and the
                                                                           Ponemon Institute, analyzed the scope
                                                                           and circumstances of missing laptop
                  Mobile Device Loss or theft                              PCs.



Strategy One Survey conducted among a U.S. sample of 3017 adults age 18 years older in September
   21-28, 2010, with an oversample in the top 20 cities (based on population).

                                                                                                                    5
Application
     Password                                      Different
                                          applications may
                                             have different
A major source of
                                               sensitivities
security vulnerabilities.
Easy to guess, reuse,
forgotten, shared
                                Usability
                            Authentication too-often or
                                 sometimes too loose




                                                               6
7
Application
Access
Control




              8
•  MobiSens app collects sensor data
   •  Motion sensors
   •  GPS and WiFi Scanning
   •  In-use applications and their traffic patterns

•  SenSec module build user behavior models
   •  Unsupervised Activity Segmentation and model the sequence using
   Language model
   •  Building Risk Analysis Tree (DT) to detect anomaly
   •  Combine above to estimate risk (online): certainty score

•  SenSec broadcast certainty score to other applications

•  Application Access Control Module uses broadcast receiver



                                                                        9
•  Feature vector calculated from a step window represent the
 behavior state within a given time window
   •  surrounding environment: GPS location, WiFi signal
   •  activity: motions, applications in use
   •  communication: network traffic

•  Using Decision Tree to detect anomaly in behaviors
   •  Each node represents a feature dimension
   •  Leaves can be one of the following
    •  Owner Detection: owner [0,1], 0: Anomaly, 1: Normal
    •  User Identification: user id [0,1,…. N], user’s identification, i.e. IMEI

•  Multiple trees can be built with subset of feature space
   •  Weighted average
   •  Voting

                                                                                   10
•  Convert feature vector series to label streams – dimension reduction

•  Using n-gram to model sequence of label stream for each sensory
 dimension – current state and transition captured
•  Step window with assigned length


                 A1           A2           A1          A4

                    G2             G5           G2          G2

               W2                  W1                  W2

                    P1          P3       P6          P1


                         A2 G2G5 W1 P1P3 A1A4 G2 W1W2 P1

                                                                     11
•  User behavior at time t depends only on the last n-1 behaviors

•  Sequence of behaviors can be predicted by n consecutive
 location in the past


•  Maximum Likelihood Estimation from training data by counting:



•  MLE assign zero probability to unseen n-grams
   Incorporate smoothing function (Katz)
    Discount probability for observed grams
    Reserve probability for unseen grams




                                                                    12
•  Feed sequence of the past behaviors in a stepping window of size
 N to n-gram model for testing
•  For a testing sequence of behavior labels



•  Estimate the average log probability this sequence is generated
 from the n-gram




•  If this likelihood drops below a threshold, flag an anomaly alert


                                                                       13
14
Anomaly
                      Preprocessing
                                                   Detection

                             Behavior Text
                                                    N-gram
                              Generation
                                Fusion              Model
MobiSens    Extract
 Trace     Features


 Sensing                     Decision Trees            ~



                                       Threshold       >

                               Anomaly Y/N

                                                               15
16
•  Total data set size: 4GB
             Dataset                •  Remove 2 heavy users
Numer of users     50
                                    •  Remove users with very
Device             Android phones    limited data duration
                                    •  Remove users that don’t
Location           Bay area
                                     have application and traffic
Averag period      30 days           data due to older MobiSens
                                     version
Number of data
                   7
types                               •  25 users with comparable
Finest sampling                      dataset size
interval (motion   200 ms
sensors)                            •  Data duration: 4 hour ~ 2.5
                                     days

                                                                     17
•  Motion Sensors (100)
   •  Used to summarize
      acceleration stream
   •  Calculated separately for each
     dimension [x,y,z,m]

•  GPS: location label via density based clustering (1)

•  WiFi: (SSIDs, RSSIs) pairs ranked by signal strength (6)

•  Applications: Bitmap of well-known applications (60 + 1)

•  Application Traffic Pattern: Tx/Rx traffic vectors (120 + 2)

•  Step Window Size: 5 seconds



                                                                  18
•  User Identification Test and Owner Detection Test for randomly
 selected partial data set (4 users) with 1:1 training/test split
   •  ~ 99% accuracy
   •  number of leaves: 56 , size of tree: 111

•  Using non-motion attributes yields lower accuracy (96%)
   •  Significant tree size reduction, number of leaves: 3, size of tree: 5
   •  Cross entropy may be significant to easily distinguish users using some
   features.

•  Using only motion attributes can distinguish different users
   •  ~ 98% accuracy
   •  very large tree, number of leaves: 267, size of tree 533
   •  may cause performance issues on mobile platform



                                                                                19
•  Apply cross-entropy filter to remove users that could be identified
 easily using a small set of features
•  12 users with 210k data instances

•  User identification : train RAT model on 66% instances and rest
 as testing
                    84.8%           83.5                79.3
  100
                                                               7649
   80
   60                                                                 Accuracy
   40                                                                 Size Factor
   20                221                 35

    0
              All           Non-Motion        Motion-Only
                                                                                20
21
•  Experiments to discover anomaly usage with ~80% accuracy with
 only days of training data
                                                                   22
•  Extended data set for feature construction
   TCP, UDP traffic; sound; ambient lighting; battery status, etc.

•  Data and Modeling
   Gain more insights into the data, features and factorized relationships among
   various sensors
   Try other classification methods and compare results: LR, SVM, Random
   Forest, etc

•  Enhanced security of SenSec components
   Integration with Android security framework and other applications

•  Privacy challenges
   Data collection, model training, privacy policy, etc.

•  Energy efficiency


                                                                                   23
24
Thank you.
26
!

    27
•  Data Collection                    9.=$(1/6'9.=$;1'
                                                               (1/6$/<'                 9.=$(1/6'7+"@1/:
   •  Running app list
                                        !55;$"+#$./                                                                    A$21;.<<1,'
                                                               C./#,.;
                                     D0                                     31%$"1'                                   !55;$"+#$./6
   •  Per-app traffic pattern                                             4,.2$;1'!40
                                                                                                                        !"#$%$#&'
                                                                           9166+<1'                                  ()**+,$-+#$./'

•  IPC Interface                   !"#$%$#&'                                4..;                                       0/#1,2+"1
                                (1<*1/#+#$./       31%$"1'
                                                  C./#,.;;1,    9.:1;
                                                                                             (#.,+<1'               718+%$.,'9.:1;$/<'
   •  Certainty Score                                          4)68$/<
                                                                          B1=(1,%$"1'        (&6#1*                    !;<.,$#8*6
                                    3+#+'
   Broadcast mechanism           !<<,1<+#.,                                  3+#+'
                                                                           >?"8+/<1'                    9.=$(1/6'
                                                                              !40                         3+#+'
                                                                3+#+                                                3+#+'4,15,."166.,
                                                                                                        >?"8+/<1'
                                (1/6.,'                        D5;.+:                                      !40
                                B$:<1#6


                            E+F'9.=$(1/6'9.=$;1'!55;$"+#$./                             E=F'G$1,'H                       E"F'G$1,'I




•  Offline-Model Push via Data Exchange API
   •  Risk Analysis Tree can be trained using global data on the MobiSens Server
   and pushed back to the mobile device

                                                                                                                                      28
•  MobiSens Server
   •  Offline Clustering
    •  K-means package from Weka Data Mining Toolkit
    •  Using aggregated data from all users
   •  Offline RAT training
    •  Decision Tree package from Weka Data Mining Toolkit
    •  Construct training data set and design evaluation strategy

•  MobiSens Client
   •  Retrive RAT model from MobiSens Server
   •  On-device n-gram label sequence construction (n=1,2,3; window size =5s)
   •  RAT inference using Weka Toolkit on device
   •  Status bar notification based on certainty value




                                                                                29
•  Reactive API to Team Access
   API call from Team Access to SenSec to retrieve the current Certainty Score
   given the context

   getCertaintyScore(SenSecContextType ctx, count)


•  Proactive API to Team Acess and other equivalent modules
   Broadcast Receiver on Certainty Score

   certaintyScore{
       CertaintyScoreType scores[];
       WindowSizeType window_size;
       SenSecContextType ctx;
   }

                                                                                 30

Más contenido relacionado

La actualidad más candente

Activity recognition based on a multi-sensor meta-classifier
Activity recognition based on a multi-sensor meta-classifierActivity recognition based on a multi-sensor meta-classifier
Activity recognition based on a multi-sensor meta-classifier
Oresti Banos
 
Near field communication
Near field communicationNear field communication
Near field communication
Dheeraj Raja
 
Assessment Test Framework for Collecting and Evaluating Fall - Related Data u...
Assessment Test Framework for Collecting and Evaluating Fall - Related Data u...Assessment Test Framework for Collecting and Evaluating Fall - Related Data u...
Assessment Test Framework for Collecting and Evaluating Fall - Related Data u...
Martin Ebner
 

La actualidad más candente (12)

Wearable Computing - Part III: The Activity Recognition Chain (ARC)
Wearable Computing - Part III: The Activity Recognition Chain (ARC)Wearable Computing - Part III: The Activity Recognition Chain (ARC)
Wearable Computing - Part III: The Activity Recognition Chain (ARC)
 
Suspicious Activity Detection
Suspicious Activity DetectionSuspicious Activity Detection
Suspicious Activity Detection
 
Activity recognition based on a multi-sensor meta-classifier
Activity recognition based on a multi-sensor meta-classifierActivity recognition based on a multi-sensor meta-classifier
Activity recognition based on a multi-sensor meta-classifier
 
Wearable technologies: what's brewing in the lab?
Wearable technologies: what's brewing in the lab?Wearable technologies: what's brewing in the lab?
Wearable technologies: what's brewing in the lab?
 
Wearable Computing - Part II: Sensors
Wearable Computing - Part II: SensorsWearable Computing - Part II: Sensors
Wearable Computing - Part II: Sensors
 
Sherlock: Monitoring sensor broadcasted data to optimize mobile environment
Sherlock: Monitoring sensor broadcasted data to optimize mobile environmentSherlock: Monitoring sensor broadcasted data to optimize mobile environment
Sherlock: Monitoring sensor broadcasted data to optimize mobile environment
 
Sensors, threats, responses and challenges - Dr Emil Lupu (Imperial College L...
Sensors, threats, responses and challenges - Dr Emil Lupu (Imperial College L...Sensors, threats, responses and challenges - Dr Emil Lupu (Imperial College L...
Sensors, threats, responses and challenges - Dr Emil Lupu (Imperial College L...
 
Near field communication
Near field communicationNear field communication
Near field communication
 
Assessment Test Framework for Collecting and Evaluating Fall - Related Data u...
Assessment Test Framework for Collecting and Evaluating Fall - Related Data u...Assessment Test Framework for Collecting and Evaluating Fall - Related Data u...
Assessment Test Framework for Collecting and Evaluating Fall - Related Data u...
 
The DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with DementiaThe DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with Dementia
 
Dl 0n mobile jeff shomaker_jan-2018_final
Dl 0n mobile jeff shomaker_jan-2018_finalDl 0n mobile jeff shomaker_jan-2018_final
Dl 0n mobile jeff shomaker_jan-2018_final
 
Iotweek Iotcrawler Concept Pitches
Iotweek Iotcrawler Concept PitchesIotweek Iotcrawler Concept Pitches
Iotweek Iotcrawler Concept Pitches
 

Similar a SenSec: Mobile Application Security through Passive Sensing

Human Activity Recognition in Android
Human Activity Recognition in AndroidHuman Activity Recognition in Android
Human Activity Recognition in Android
Surbhi Jain
 
Bringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialBringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potential
Adrian Hornsby
 
A Context Aware Mobile Social Web
A Context Aware Mobile Social WebA Context Aware Mobile Social Web
A Context Aware Mobile Social Web
wasvel
 
Sense networks
Sense networksSense networks
Sense networks
Ben Allen
 
CYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGY
CYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGYCYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGY
CYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGY
jmical
 

Similar a SenSec: Mobile Application Security through Passive Sensing (20)

MoMIE research overview
MoMIE research overviewMoMIE research overview
MoMIE research overview
 
NoxEye.pptx
NoxEye.pptxNoxEye.pptx
NoxEye.pptx
 
Human Activity Recognition in Android
Human Activity Recognition in AndroidHuman Activity Recognition in Android
Human Activity Recognition in Android
 
[DSC Europe 23] Mihailo Ilic - Scalable and Interoperable Data Flow Managemen...
[DSC Europe 23] Mihailo Ilic - Scalable and Interoperable Data Flow Managemen...[DSC Europe 23] Mihailo Ilic - Scalable and Interoperable Data Flow Managemen...
[DSC Europe 23] Mihailo Ilic - Scalable and Interoperable Data Flow Managemen...
 
From Context-awareness to Human Behavior Patterns
From Context-awareness to Human Behavior PatternsFrom Context-awareness to Human Behavior Patterns
From Context-awareness to Human Behavior Patterns
 
SenseDroid
SenseDroidSenseDroid
SenseDroid
 
Bringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialBringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potential
 
Detecting and Improving Distorted Fingerprints using rectification techniques.
Detecting and Improving Distorted Fingerprints using rectification techniques.Detecting and Improving Distorted Fingerprints using rectification techniques.
Detecting and Improving Distorted Fingerprints using rectification techniques.
 
Mobile fraud detection using neural networks
Mobile fraud detection using neural networksMobile fraud detection using neural networks
Mobile fraud detection using neural networks
 
A Context Aware Mobile Social Web
A Context Aware Mobile Social WebA Context Aware Mobile Social Web
A Context Aware Mobile Social Web
 
Iot architecture
Iot architectureIot architecture
Iot architecture
 
Context is King: AR, AI, Salience, and the Constant Next Scenario
Context is King: AR, AI, Salience, and the Constant Next ScenarioContext is King: AR, AI, Salience, and the Constant Next Scenario
Context is King: AR, AI, Salience, and the Constant Next Scenario
 
[EUC2014] cODA: An Open-Source Framework to Easily Design Context-Aware Andro...
[EUC2014] cODA: An Open-Source Framework to Easily Design Context-Aware Andro...[EUC2014] cODA: An Open-Source Framework to Easily Design Context-Aware Andro...
[EUC2014] cODA: An Open-Source Framework to Easily Design Context-Aware Andro...
 
iotarchitecture-190506052723.pdf
iotarchitecture-190506052723.pdfiotarchitecture-190506052723.pdf
iotarchitecture-190506052723.pdf
 
Sense networks
Sense networksSense networks
Sense networks
 
Fog computing
Fog computingFog computing
Fog computing
 
Senslab - open hardware - fossa2010
Senslab - open hardware - fossa2010Senslab - open hardware - fossa2010
Senslab - open hardware - fossa2010
 
Defending Behind the Mobile Device
Defending Behind the Mobile DeviceDefending Behind the Mobile Device
Defending Behind the Mobile Device
 
UPA Israel event 2011 - Eran Aharonson
UPA Israel event 2011 - Eran AharonsonUPA Israel event 2011 - Eran Aharonson
UPA Israel event 2011 - Eran Aharonson
 
CYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGY
CYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGYCYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGY
CYBER INTELLIGENCE &amp; RESPONSE TECHNOLOGY
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

SenSec: Mobile Application Security through Passive Sensing

  • 1. Jiang Zhu and Sean Wang Dec 5th, 2011 1
  • 2. •  Monitor and track user behavior on smartphones using various on-device sensors •  Convert sensory traces and other context information to Personal Behavior Features •  Build Risk Analysis Trees with these features and use it for calculation of Certainty Scores •  Trigger various Authentication Schemes when certain application is launched. 2
  • 3. 3
  • 4. 4
  • 5. 60% •  “The 329 organizations polled had collectively lost 50% more than 86,000 devices … with average cost of lost 40% data at $49,246 per device, 30% worth $2.1 billion or $6.4 million per organization. 20% 10% "The Billion Dollar Lost-Laptop Study," 0% conducted by Intel Corporation and the Ponemon Institute, analyzed the scope and circumstances of missing laptop Mobile Device Loss or theft PCs. Strategy One Survey conducted among a U.S. sample of 3017 adults age 18 years older in September 21-28, 2010, with an oversample in the top 20 cities (based on population). 5
  • 6. Application Password Different applications may have different A major source of sensitivities security vulnerabilities. Easy to guess, reuse, forgotten, shared Usability Authentication too-often or sometimes too loose 6
  • 7. 7
  • 9. •  MobiSens app collects sensor data •  Motion sensors •  GPS and WiFi Scanning •  In-use applications and their traffic patterns •  SenSec module build user behavior models •  Unsupervised Activity Segmentation and model the sequence using Language model •  Building Risk Analysis Tree (DT) to detect anomaly •  Combine above to estimate risk (online): certainty score •  SenSec broadcast certainty score to other applications •  Application Access Control Module uses broadcast receiver 9
  • 10. •  Feature vector calculated from a step window represent the behavior state within a given time window •  surrounding environment: GPS location, WiFi signal •  activity: motions, applications in use •  communication: network traffic •  Using Decision Tree to detect anomaly in behaviors •  Each node represents a feature dimension •  Leaves can be one of the following •  Owner Detection: owner [0,1], 0: Anomaly, 1: Normal •  User Identification: user id [0,1,…. N], user’s identification, i.e. IMEI •  Multiple trees can be built with subset of feature space •  Weighted average •  Voting 10
  • 11. •  Convert feature vector series to label streams – dimension reduction •  Using n-gram to model sequence of label stream for each sensory dimension – current state and transition captured •  Step window with assigned length A1 A2 A1 A4 G2 G5 G2 G2 W2 W1 W2 P1 P3 P6 P1 A2 G2G5 W1 P1P3 A1A4 G2 W1W2 P1 11
  • 12. •  User behavior at time t depends only on the last n-1 behaviors •  Sequence of behaviors can be predicted by n consecutive location in the past •  Maximum Likelihood Estimation from training data by counting: •  MLE assign zero probability to unseen n-grams Incorporate smoothing function (Katz) Discount probability for observed grams Reserve probability for unseen grams 12
  • 13. •  Feed sequence of the past behaviors in a stepping window of size N to n-gram model for testing •  For a testing sequence of behavior labels •  Estimate the average log probability this sequence is generated from the n-gram •  If this likelihood drops below a threshold, flag an anomaly alert 13
  • 14. 14
  • 15. Anomaly Preprocessing Detection Behavior Text N-gram Generation Fusion Model MobiSens Extract Trace Features Sensing Decision Trees ~ Threshold > Anomaly Y/N 15
  • 16. 16
  • 17. •  Total data set size: 4GB Dataset •  Remove 2 heavy users Numer of users 50 •  Remove users with very Device Android phones limited data duration •  Remove users that don’t Location Bay area have application and traffic Averag period 30 days data due to older MobiSens version Number of data 7 types •  25 users with comparable Finest sampling dataset size interval (motion 200 ms sensors) •  Data duration: 4 hour ~ 2.5 days 17
  • 18. •  Motion Sensors (100) •  Used to summarize acceleration stream •  Calculated separately for each dimension [x,y,z,m] •  GPS: location label via density based clustering (1) •  WiFi: (SSIDs, RSSIs) pairs ranked by signal strength (6) •  Applications: Bitmap of well-known applications (60 + 1) •  Application Traffic Pattern: Tx/Rx traffic vectors (120 + 2) •  Step Window Size: 5 seconds 18
  • 19. •  User Identification Test and Owner Detection Test for randomly selected partial data set (4 users) with 1:1 training/test split •  ~ 99% accuracy •  number of leaves: 56 , size of tree: 111 •  Using non-motion attributes yields lower accuracy (96%) •  Significant tree size reduction, number of leaves: 3, size of tree: 5 •  Cross entropy may be significant to easily distinguish users using some features. •  Using only motion attributes can distinguish different users •  ~ 98% accuracy •  very large tree, number of leaves: 267, size of tree 533 •  may cause performance issues on mobile platform 19
  • 20. •  Apply cross-entropy filter to remove users that could be identified easily using a small set of features •  12 users with 210k data instances •  User identification : train RAT model on 66% instances and rest as testing 84.8% 83.5 79.3 100 7649 80 60 Accuracy 40 Size Factor 20 221 35 0 All Non-Motion Motion-Only 20
  • 21. 21
  • 22. •  Experiments to discover anomaly usage with ~80% accuracy with only days of training data 22
  • 23. •  Extended data set for feature construction TCP, UDP traffic; sound; ambient lighting; battery status, etc. •  Data and Modeling Gain more insights into the data, features and factorized relationships among various sensors Try other classification methods and compare results: LR, SVM, Random Forest, etc •  Enhanced security of SenSec components Integration with Android security framework and other applications •  Privacy challenges Data collection, model training, privacy policy, etc. •  Energy efficiency 23
  • 24. 24
  • 26. 26
  • 27. ! 27
  • 28. •  Data Collection 9.=$(1/6'9.=$;1' (1/6$/<' 9.=$(1/6'7+"@1/: •  Running app list !55;$"+#$./ A$21;.<<1,' C./#,.; D0 31%$"1' !55;$"+#$./6 •  Per-app traffic pattern 4,.2$;1'!40 !"#$%$#&' 9166+<1' ()**+,$-+#$./' •  IPC Interface !"#$%$#&' 4..; 0/#1,2+"1 (1<*1/#+#$./ 31%$"1' C./#,.;;1, 9.:1; (#.,+<1' 718+%$.,'9.:1;$/<' •  Certainty Score 4)68$/< B1=(1,%$"1' (&6#1* !;<.,$#8*6 3+#+' Broadcast mechanism !<<,1<+#., 3+#+' >?"8+/<1' 9.=$(1/6' !40 3+#+' 3+#+ 3+#+'4,15,."166., >?"8+/<1' (1/6.,' D5;.+: !40 B$:<1#6 E+F'9.=$(1/6'9.=$;1'!55;$"+#$./ E=F'G$1,'H E"F'G$1,'I •  Offline-Model Push via Data Exchange API •  Risk Analysis Tree can be trained using global data on the MobiSens Server and pushed back to the mobile device 28
  • 29. •  MobiSens Server •  Offline Clustering •  K-means package from Weka Data Mining Toolkit •  Using aggregated data from all users •  Offline RAT training •  Decision Tree package from Weka Data Mining Toolkit •  Construct training data set and design evaluation strategy •  MobiSens Client •  Retrive RAT model from MobiSens Server •  On-device n-gram label sequence construction (n=1,2,3; window size =5s) •  RAT inference using Weka Toolkit on device •  Status bar notification based on certainty value 29
  • 30. •  Reactive API to Team Access API call from Team Access to SenSec to retrieve the current Certainty Score given the context getCertaintyScore(SenSecContextType ctx, count) •  Proactive API to Team Acess and other equivalent modules Broadcast Receiver on Certainty Score certaintyScore{ CertaintyScoreType scores[]; WindowSizeType window_size; SenSecContextType ctx; } 30