SlideShare a Scribd company logo
1 of 22
Semantically Enriched Machine Learning Approach to
Filter YouTube Comments for Socially Augmented User
                      Models
          Ahmad Ammari, Vania Dimitrova, Dimoklis
          Despotakis
          School of Computing, University of Leeds,
          Leeds, UK




                             Presented By:

                             Ahmad Ammari
                             User and Community Modelling
                             School of Computing, University of Leeds,
                             UK
Outline
• The ImREAL Project
• Socially Augmented User Modelling
• Research Objective, Roadmap,
  Challenges
• The Social Noise Filtering Approach
  –   Machine Learning – Based
  –   Methodology
  –   Comment Content Pre-Processing
  –   Semantic Enrichment
  –   Scoring and Labelling the Training Dataset
• Experimental Description / Results
• Evaluation
• Conclusions & Future Work
Immersive Reflective
                            Experience-based Adaptive
Specific Targeted Research Project STReP – FP7
         Learning
Partners
  University of Leeds, UK;               Trinity College Dublin, Ireland;
  Graz University of Technology, Austria; University of Erlangen-Nuremberg, Ger;
  Delft University of Technology, NL;     Imaginary SRL - IMA, Italy;
  Empower The User, ETU, Ireland;
                                    Problem:
 Experience in a simulated world is disconnected from the ‘real-
                            world’


                REALITY                                  VIRTUALITY

                                      ImREAL
           Augmented Reality         Approach         Augmented Virtuality
Augmented Simulated Experiential
                                           Learning




    Interactive
      User
      model

    Adaptive
                  Simulated Experiential
                  Learning Environment


    coach
                                             Augmented
                                                user         Real
                                              modelling      world
     Practice
                                                            activity
                                                            model-
                                                              ling
     Provide                                    Meta-
     content                                   cognitive               Records of Real
                                                                                         Other participants
                                                                         Job-related
                                                                                         (e.g. customers,
                                              scaffolding               Experiences
                                                                                            managers)




Simulated Learning Environment                                                Real World Experience
Augmented User Modelling
Socially Augmented User Modelling
                                                Open
                                            Social Spaces
           Simulated
          Environment



          User
         Profiles
                                                Sports
                           Psycholo   Social
                              gy
                                      Profile
                                        s
                                                   Diseases


                                  Politic
                                    s
Existing User
   Socially
    Model
Augmented User   Limited      Weighted Social
    Model        Scope!!         Interests
Broad Research Objective
Mining Social Media Content

generated by Users having awareness
 and/or Interest in an Activity Domain

to Derive Social Profiles


that Augment Existing User Models
Research Roadmap / Challenges
   • Three-Phase Research Roadmap
               towards achieving the Broad Objective
Phase One




                                        Phase Three
                         Phase Two


             Social
              Noise
            Filtration
The Social Noise Filtering Approach
• Supervised Machine Learning Model
  – Historic Content with known relevance states are
    used for training
  – Machine Learning Model learns the underlying
    rules
  – Model is used to predict unknown relevance
    states for new content with certain prediction
    confidence
The Social Noise Filtration Service:
                      Methodology

                          Semantically
                         Enriched Job
Experimental            Interview Bag of
      CASE STUDY:
ly Controlled Analyze   Filtering YouTube Comments
                         Words (JIBoW)
 Comments

  Social Media Source: YouTube
  Subject Content: Public Comments on Shared
  Videos
                    SCORE
  Activity Domain: Job Interview
                        Term – Comment
                              Matrix
                        (Training Corpus)
                                            S
                                            C
  Public
              Pre-                          O
Comments                                    R
            Process                         E
   On
                                            S
 YouTube
YouTube Video Selection
• Selected as part of a research study by
  [Despotakis, Lau & Dimitrova, 2011]
• Four Job Interview-related categories are
  manually identified from video content
  – Guides / Best Practices
  – Interviewee’s Stories
  – Interviewer’s Stories
  – Interview Mock Examples
• Videos from all categories are selected to
  retrieve the comment set for ML training
Comment Content Pre-Processing
• Objective: Deriving dataset for
  Classification
      Stop                 tfidf
                                           Comment
                                            – Term
      Word     Stemming
                          Weighting         Matrix
     Removal
                                             CTM
       1          2           3                4


                          I think most
                          Americans are like the
                          first example




                          think – Americans – like – first –
                          example
Semantically Enriched Job Interview
                                      Bag of Words
   • A Semantically Enriched Job Interview Bag of Words (JIBoW)
     used as Novel Means to Score and Label Training YouTube
     Comment Set
   • Collection of Textual Comments on Job Interview Videos [*]
        – Experimentally controlled
        – Closed social space
   • Text and Semantic Pre-Processing Phases
   • Semantically Expanded by the WordNet Lexicon and DISCO
     with Word Synonyms, Antonyms, Derivations, and
     semantically similar words




[*] Despotakis, Lau, Dimitrova (2011): A Semantic
Approach to Extract Individual Viewpoints from User
Comments on An Activity, AUM Workshop, UMAP
2011, Girona, Spain
Scoring and Labelling Training Corpus
• A Novel Term Frequency – based Mathematical Model
• Computes a Relevance Score for each observation in the
  training comment dataset
   – Intersection Size between Comment BoW and JIBoW
   – Score is Normalized by the Average Intersection Size




  • A Threshold is used to classify the comments for
    training a binary classifier
  • Labels observation (noisy, relevant) accordingly
Example Scoring & Labelling
C1: “The interviewee looks confident, he should
have some job experience in his work life”

  Comment       JIBOW
    BOW          w10
  interviewee    w21
   confident     w34
      job        w4
  experience     w57
     work        w113
      life       wn
Example Scored & Labelled Comments
Datasets
• YouTube API for Retrieval, Lucene API for Pre-
  Processing
• Post –YouTube Corpus Description:
         Analysis Data        Experimentally Controlled Corpus




• Training Corpus: 1159 Instances
   – Classified by the scoring model for Training C4.5 & Naïve
     Bayes Multinomial (NBM) Classifiers
   – {724 Noisy, 435 Relevant}
• Derived a Comment Term Matrix : 1159 Instances X 903
  tfidf Term Weights + 1 Discrete Class Column
Experimental Results
• Three variations of Training-to-Testing ratio
  Models for each classifier have been trained &
  tested
         See Evaluation
                                  ROC Area
             Results

• The Two Classifiers show good performance
  in predicting relevant & noisy comments in the
  testing data sets
• C4.5 is slightly better in predicting noisy
  comments from within the total noise in the
  data
• NBM shows less risk in misclassifying
  relevant comments as noise
Evaluation
Human-based Evaluation Experiment was
conducted to measure how well the service:
Goal1: Considers the comments that show
awareness in the application domain (Job
Interviews) See Example Question and
                    Records


Goal2: Considers the comments that their authors
are likely interested in the application domain
            See Example Question and
                    Records
Evaluation Results
                   Number of Evaluators                                  2
                   Number of Evaluated Comments (15% of Whole           180
                   Dataset)
                   Number of Comment Scored as Relevant                  90
                   Comments
                  Number of Comment Scored as Noisy Comments
                 Evaluator 2                                  90
                                                        Evaluator 1
      Goal 2           Goal 1                          Goal 2             Goal 1
                                                                   9%
                         3%                                                              Noisy
                                    Noisy
                                                                          15%
        17 24                                         46%
        % %                                                                              Relevant
                                    Releva                              19%
                       42%                                   45%                66%
        59                    55%   nt                                                   Doesn't
        %                                                                                know
                                    Doesn't
                                    know

     Metric            Goal 2       Goal 1           Metric             Goal 2        Goal 1
Total Match Rate        51.1%       68.3%       Total Match Rate        32.2%         60.0%
Total Mismatch                                  Total Mismatch
                        48.9%       31.7%                               67.8%         40.0%
Rate                                            Rate
Precision (Noisy)       42.2%       76.7%       Precision (Noisy)       36.7%         90.6%
Precision                                       Precision
                        76.7%       63.3%                               73.3%         44.4%
(Relevant)                                      (Relevant)
Recall (Noisy)          73.1%       67.6%       Recall (Noisy)          84.6%         68.2%
Summary
• Conclusions
  – High Rate of YouTube Video comments are Noisy
  – ML Models are good in Predicting and Filtering
    out Comments that do not show author
    awareness nor interests in the Activity Domain of
    Interests
• Future Work
  – Add more filters to improve the Scoring and
    Labelling Mechanism based on Evaluation
    Baseline
  – Exploit Activity Modelling Ontology to Derive
    JIBoW
  – Evaluate Impact of Semantic Enrichment
YouTube-based Social Profiling Service:
                                   Methodology
     YouTube / SM Comments          Noise Filtration Service            Comments Predicted as
                                                                             Relevant

                                                                           RC1    … ……. RCn
                                                                                     …….


                                          Clusters of Social Profiles
Profile1    Profile2    ProfileN
x   y      u   o      p   q   
e   r      x   o      x   c   
e   y      f   g      z   s   

        Associations of
                                                                         Profiling Source Authors
    Frequent Characteristics
                                  YT User Profiles
                            Uploaded YT Video meta data
                            Favored YT Video meta data
     ImREAL                 Comments on the YT Videos
    Simulators                       Social Profiling Corpus
Presented By:

Ahmad Ammari
User and Community Modelling
School of Computing, University of
Leeds, UK

More Related Content

What's hot

Demola affective robotics_20120502
Demola affective robotics_20120502Demola affective robotics_20120502
Demola affective robotics_20120502Rod Walsh
 
Sandhya's portfolio
Sandhya's portfolioSandhya's portfolio
Sandhya's portfolioSandhya
 
Academics' online presence - assessing & shaping visibility 2012
Academics' online presence - assessing & shaping visibility 2012Academics' online presence - assessing & shaping visibility 2012
Academics' online presence - assessing & shaping visibility 2012Laura Czerniewicz
 
Lecture 4: Social Web Personalization (2012)
Lecture 4: Social Web Personalization (2012)Lecture 4: Social Web Personalization (2012)
Lecture 4: Social Web Personalization (2012)Lora Aroyo
 
CUbRIK at SMILA Conference in Berlin
CUbRIK at SMILA Conference in BerlinCUbRIK at SMILA Conference in Berlin
CUbRIK at SMILA Conference in BerlinCUbRIK Project
 
Multi Level Education (181)
Multi Level Education (181)Multi Level Education (181)
Multi Level Education (181)Rebecca Obounou
 
A Best Practice Approach to the Design of Natural User Interfaces (ERGOSIGN)
A Best Practice Approach to the Design of Natural User Interfaces (ERGOSIGN)A Best Practice Approach to the Design of Natural User Interfaces (ERGOSIGN)
A Best Practice Approach to the Design of Natural User Interfaces (ERGOSIGN)Ergosign GmbH
 
2012 Award Winning Poster
2012 Award Winning Poster2012 Award Winning Poster
2012 Award Winning PosterEric B. Bauman
 
3D context-aware mobile maps for tourism - ENTER2011 PhD Workshop
3D context-aware mobile maps for tourism - ENTER2011 PhD Workshop3D context-aware mobile maps for tourism - ENTER2011 PhD Workshop
3D context-aware mobile maps for tourism - ENTER2011 PhD WorkshopZornitza Yovcheva
 
Conole Wolverhampton Keynote
Conole Wolverhampton KeynoteConole Wolverhampton Keynote
Conole Wolverhampton Keynotegrainne
 
Sakai Learning Capabilities Design Lenses in Action
Sakai Learning Capabilities Design Lenses in Action Sakai Learning Capabilities Design Lenses in Action
Sakai Learning Capabilities Design Lenses in Action Jon Hays
 
Who is the Customer? What is Experience? Indispensable Insights to empower yo...
Who is the Customer? What is Experience? Indispensable Insights to empower yo...Who is the Customer? What is Experience? Indispensable Insights to empower yo...
Who is the Customer? What is Experience? Indispensable Insights to empower yo...CHI Poland
 
Lightweight Concurrency
Lightweight ConcurrencyLightweight Concurrency
Lightweight ConcurrencyAndreas Heil
 

What's hot (16)

Ple 2.0 ed-media
Ple 2.0 ed-mediaPle 2.0 ed-media
Ple 2.0 ed-media
 
Demola affective robotics_20120502
Demola affective robotics_20120502Demola affective robotics_20120502
Demola affective robotics_20120502
 
Sandhya's portfolio
Sandhya's portfolioSandhya's portfolio
Sandhya's portfolio
 
Abertay4
Abertay4Abertay4
Abertay4
 
Academics' online presence - assessing & shaping visibility 2012
Academics' online presence - assessing & shaping visibility 2012Academics' online presence - assessing & shaping visibility 2012
Academics' online presence - assessing & shaping visibility 2012
 
Lecture 4: Social Web Personalization (2012)
Lecture 4: Social Web Personalization (2012)Lecture 4: Social Web Personalization (2012)
Lecture 4: Social Web Personalization (2012)
 
CUbRIK at SMILA Conference in Berlin
CUbRIK at SMILA Conference in BerlinCUbRIK at SMILA Conference in Berlin
CUbRIK at SMILA Conference in Berlin
 
Multi Level Education (181)
Multi Level Education (181)Multi Level Education (181)
Multi Level Education (181)
 
A Best Practice Approach to the Design of Natural User Interfaces (ERGOSIGN)
A Best Practice Approach to the Design of Natural User Interfaces (ERGOSIGN)A Best Practice Approach to the Design of Natural User Interfaces (ERGOSIGN)
A Best Practice Approach to the Design of Natural User Interfaces (ERGOSIGN)
 
2012 Award Winning Poster
2012 Award Winning Poster2012 Award Winning Poster
2012 Award Winning Poster
 
Networked Innovation And Collaboration
Networked Innovation And CollaborationNetworked Innovation And Collaboration
Networked Innovation And Collaboration
 
3D context-aware mobile maps for tourism - ENTER2011 PhD Workshop
3D context-aware mobile maps for tourism - ENTER2011 PhD Workshop3D context-aware mobile maps for tourism - ENTER2011 PhD Workshop
3D context-aware mobile maps for tourism - ENTER2011 PhD Workshop
 
Conole Wolverhampton Keynote
Conole Wolverhampton KeynoteConole Wolverhampton Keynote
Conole Wolverhampton Keynote
 
Sakai Learning Capabilities Design Lenses in Action
Sakai Learning Capabilities Design Lenses in Action Sakai Learning Capabilities Design Lenses in Action
Sakai Learning Capabilities Design Lenses in Action
 
Who is the Customer? What is Experience? Indispensable Insights to empower yo...
Who is the Customer? What is Experience? Indispensable Insights to empower yo...Who is the Customer? What is Experience? Indispensable Insights to empower yo...
Who is the Customer? What is Experience? Indispensable Insights to empower yo...
 
Lightweight Concurrency
Lightweight ConcurrencyLightweight Concurrency
Lightweight Concurrency
 

Viewers also liked

How to encourage more comments on your blog
How to encourage more comments on your blogHow to encourage more comments on your blog
How to encourage more comments on your blogMarie Ennis-O'Connor
 
MOE EBR training slides Dec 2010
MOE EBR training slides Dec 2010MOE EBR training slides Dec 2010
MOE EBR training slides Dec 2010ecocommish
 
Peer review exercise 1
Peer review exercise 1Peer review exercise 1
Peer review exercise 1s1170031
 
Blogging 101 - Brian Cormier
Blogging 101  - Brian CormierBlogging 101  - Brian Cormier
Blogging 101 - Brian CormierBrian Cormier
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your BusinessBarry Feldman
 

Viewers also liked (6)

03 ms office
03 ms office03 ms office
03 ms office
 
How to encourage more comments on your blog
How to encourage more comments on your blogHow to encourage more comments on your blog
How to encourage more comments on your blog
 
MOE EBR training slides Dec 2010
MOE EBR training slides Dec 2010MOE EBR training slides Dec 2010
MOE EBR training slides Dec 2010
 
Peer review exercise 1
Peer review exercise 1Peer review exercise 1
Peer review exercise 1
 
Blogging 101 - Brian Cormier
Blogging 101  - Brian CormierBlogging 101  - Brian Cormier
Blogging 101 - Brian Cormier
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
 

Similar to Aum workshop paper_presentation

Integrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataIntegrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataDhaval Thakker
 
Lak12 - Leeds - Deriving Group Profiles from Social Media
Lak12 - Leeds - Deriving Group Profiles from Social Media Lak12 - Leeds - Deriving Group Profiles from Social Media
Lak12 - Leeds - Deriving Group Profiles from Social Media lydia-lau
 
Learning, technology and collaboration in mobile environments
Learning, technology and collaboration in mobile environmentsLearning, technology and collaboration in mobile environments
Learning, technology and collaboration in mobile environmentsThe Open University
 
What if annotations were reusable: a preliminary discussion
What if annotations were reusable: a preliminary discussionWhat if annotations were reusable: a preliminary discussion
What if annotations were reusable: a preliminary discussionRiina Vuorikari
 
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...Hans Põldoja
 
Use of OERs and Non-OERs at , Asia e University (AeU), Malaysia
Use of OERs and Non-OERs at , Asia e University (AeU), MalaysiaUse of OERs and Non-OERs at , Asia e University (AeU), Malaysia
Use of OERs and Non-OERs at , Asia e University (AeU), Malaysia Uttarakhand Open University
 
FLiPD Technologies
FLiPD TechnologiesFLiPD Technologies
FLiPD TechnologiesDaniel Novak
 
Advisoryboard2
Advisoryboard2Advisoryboard2
Advisoryboard2garagenoda
 
Lecture 5: Personalization on the Social Web (2013)
Lecture 5: Personalization on the Social Web (2013)Lecture 5: Personalization on the Social Web (2013)
Lecture 5: Personalization on the Social Web (2013)Lora Aroyo
 
Patterns, Components, and Code, Oh My!
Patterns, Components, and Code, Oh My!Patterns, Components, and Code, Oh My!
Patterns, Components, and Code, Oh My!Erin Malone
 
Conole icem keynote_final_28_sept
Conole icem keynote_final_28_septConole icem keynote_final_28_sept
Conole icem keynote_final_28_septGrainne Conole
 
Digital learning theory stack
Digital learning theory stackDigital learning theory stack
Digital learning theory stackJo Colley
 
Web-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital CompetencesWeb-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital CompetencesHans Põldoja
 
Technical Communication Lab Projects
Technical Communication Lab ProjectsTechnical Communication Lab Projects
Technical Communication Lab ProjectsDebopriyo Roy
 
The CrowdSearch framework
The CrowdSearch frameworkThe CrowdSearch framework
The CrowdSearch frameworkEleonora Ciceri
 
Eportfolios as flexible learning spaces
Eportfolios as flexible learning spacesEportfolios as flexible learning spaces
Eportfolios as flexible learning spacesVanguard Visions
 
Exploring Learning Ecologies: Models and Experiences So Far
Exploring Learning Ecologies: Models and Experiences So Far Exploring Learning Ecologies: Models and Experiences So Far
Exploring Learning Ecologies: Models and Experiences So Far BCcampus
 
Conole keynote in_suedu
Conole keynote in_sueduConole keynote in_suedu
Conole keynote in_sueduGrainne Conole
 

Similar to Aum workshop paper_presentation (20)

Designing Dippler
Designing DipplerDesigning Dippler
Designing Dippler
 
Integrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataIntegrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched data
 
Lak12 - Leeds - Deriving Group Profiles from Social Media
Lak12 - Leeds - Deriving Group Profiles from Social Media Lak12 - Leeds - Deriving Group Profiles from Social Media
Lak12 - Leeds - Deriving Group Profiles from Social Media
 
Learning, technology and collaboration in mobile environments
Learning, technology and collaboration in mobile environmentsLearning, technology and collaboration in mobile environments
Learning, technology and collaboration in mobile environments
 
Mapping Living Labs Esteve Almirall
Mapping Living Labs Esteve AlmirallMapping Living Labs Esteve Almirall
Mapping Living Labs Esteve Almirall
 
What if annotations were reusable: a preliminary discussion
What if annotations were reusable: a preliminary discussionWhat if annotations were reusable: a preliminary discussion
What if annotations were reusable: a preliminary discussion
 
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
Web-Based Self- and Peer-Assessment of Teachers’ Educational Technology Compe...
 
Use of OERs and Non-OERs at , Asia e University (AeU), Malaysia
Use of OERs and Non-OERs at , Asia e University (AeU), MalaysiaUse of OERs and Non-OERs at , Asia e University (AeU), Malaysia
Use of OERs and Non-OERs at , Asia e University (AeU), Malaysia
 
FLiPD Technologies
FLiPD TechnologiesFLiPD Technologies
FLiPD Technologies
 
Advisoryboard2
Advisoryboard2Advisoryboard2
Advisoryboard2
 
Lecture 5: Personalization on the Social Web (2013)
Lecture 5: Personalization on the Social Web (2013)Lecture 5: Personalization on the Social Web (2013)
Lecture 5: Personalization on the Social Web (2013)
 
Patterns, Components, and Code, Oh My!
Patterns, Components, and Code, Oh My!Patterns, Components, and Code, Oh My!
Patterns, Components, and Code, Oh My!
 
Conole icem keynote_final_28_sept
Conole icem keynote_final_28_septConole icem keynote_final_28_sept
Conole icem keynote_final_28_sept
 
Digital learning theory stack
Digital learning theory stackDigital learning theory stack
Digital learning theory stack
 
Web-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital CompetencesWeb-based Self- and Peer Assessment of Teachers Digital Competences
Web-based Self- and Peer Assessment of Teachers Digital Competences
 
Technical Communication Lab Projects
Technical Communication Lab ProjectsTechnical Communication Lab Projects
Technical Communication Lab Projects
 
The CrowdSearch framework
The CrowdSearch frameworkThe CrowdSearch framework
The CrowdSearch framework
 
Eportfolios as flexible learning spaces
Eportfolios as flexible learning spacesEportfolios as flexible learning spaces
Eportfolios as flexible learning spaces
 
Exploring Learning Ecologies: Models and Experiences So Far
Exploring Learning Ecologies: Models and Experiences So Far Exploring Learning Ecologies: Models and Experiences So Far
Exploring Learning Ecologies: Models and Experiences So Far
 
Conole keynote in_suedu
Conole keynote in_sueduConole keynote in_suedu
Conole keynote in_suedu
 

More from Ahmad Ammari

Cis 2303 lo1 part 1_weeks_1_2 - student ver
Cis 2303 lo1 part 1_weeks_1_2 - student verCis 2303 lo1 part 1_weeks_1_2 - student ver
Cis 2303 lo1 part 1_weeks_1_2 - student verAhmad Ammari
 
Distributed data mining
Distributed data miningDistributed data mining
Distributed data miningAhmad Ammari
 
You tube Group Profiling Services
You tube Group Profiling ServicesYou tube Group Profiling Services
You tube Group Profiling ServicesAhmad Ammari
 

More from Ahmad Ammari (6)

Itecn453 lec01
Itecn453 lec01Itecn453 lec01
Itecn453 lec01
 
Cis 2303 lo1 part 1_weeks_1_2 - student ver
Cis 2303 lo1 part 1_weeks_1_2 - student verCis 2303 lo1 part 1_weeks_1_2 - student ver
Cis 2303 lo1 part 1_weeks_1_2 - student ver
 
Itec410 lec01
Itec410 lec01Itec410 lec01
Itec410 lec01
 
Distributed data mining
Distributed data miningDistributed data mining
Distributed data mining
 
Blog clustering
Blog clusteringBlog clustering
Blog clustering
 
You tube Group Profiling Services
You tube Group Profiling ServicesYou tube Group Profiling Services
You tube Group Profiling Services
 

Recently uploaded

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 

Recently uploaded (20)

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 

Aum workshop paper_presentation

  • 1. Semantically Enriched Machine Learning Approach to Filter YouTube Comments for Socially Augmented User Models Ahmad Ammari, Vania Dimitrova, Dimoklis Despotakis School of Computing, University of Leeds, Leeds, UK Presented By: Ahmad Ammari User and Community Modelling School of Computing, University of Leeds, UK
  • 2. Outline • The ImREAL Project • Socially Augmented User Modelling • Research Objective, Roadmap, Challenges • The Social Noise Filtering Approach – Machine Learning – Based – Methodology – Comment Content Pre-Processing – Semantic Enrichment – Scoring and Labelling the Training Dataset • Experimental Description / Results • Evaluation • Conclusions & Future Work
  • 3. Immersive Reflective Experience-based Adaptive Specific Targeted Research Project STReP – FP7 Learning Partners University of Leeds, UK; Trinity College Dublin, Ireland; Graz University of Technology, Austria; University of Erlangen-Nuremberg, Ger; Delft University of Technology, NL; Imaginary SRL - IMA, Italy; Empower The User, ETU, Ireland; Problem: Experience in a simulated world is disconnected from the ‘real- world’ REALITY VIRTUALITY ImREAL Augmented Reality Approach Augmented Virtuality
  • 4. Augmented Simulated Experiential Learning Interactive User model Adaptive Simulated Experiential Learning Environment coach Augmented user Real modelling world Practice activity model- ling Provide Meta- content cognitive Records of Real Other participants Job-related (e.g. customers, scaffolding Experiences managers) Simulated Learning Environment Real World Experience
  • 5. Augmented User Modelling Socially Augmented User Modelling Open Social Spaces Simulated Environment User Profiles Sports Psycholo Social gy Profile s Diseases Politic s Existing User Socially Model Augmented User Limited Weighted Social Model Scope!! Interests
  • 6. Broad Research Objective Mining Social Media Content generated by Users having awareness and/or Interest in an Activity Domain to Derive Social Profiles that Augment Existing User Models
  • 7. Research Roadmap / Challenges • Three-Phase Research Roadmap towards achieving the Broad Objective Phase One Phase Three Phase Two Social Noise Filtration
  • 8. The Social Noise Filtering Approach • Supervised Machine Learning Model – Historic Content with known relevance states are used for training – Machine Learning Model learns the underlying rules – Model is used to predict unknown relevance states for new content with certain prediction confidence
  • 9. The Social Noise Filtration Service: Methodology Semantically Enriched Job Experimental Interview Bag of CASE STUDY: ly Controlled Analyze Filtering YouTube Comments Words (JIBoW) Comments Social Media Source: YouTube Subject Content: Public Comments on Shared Videos SCORE Activity Domain: Job Interview Term – Comment Matrix (Training Corpus) S C Public Pre- O Comments R Process E On S YouTube
  • 10. YouTube Video Selection • Selected as part of a research study by [Despotakis, Lau & Dimitrova, 2011] • Four Job Interview-related categories are manually identified from video content – Guides / Best Practices – Interviewee’s Stories – Interviewer’s Stories – Interview Mock Examples • Videos from all categories are selected to retrieve the comment set for ML training
  • 11. Comment Content Pre-Processing • Objective: Deriving dataset for Classification Stop tfidf Comment – Term Word Stemming Weighting Matrix Removal CTM 1 2 3 4 I think most Americans are like the first example think – Americans – like – first – example
  • 12. Semantically Enriched Job Interview Bag of Words • A Semantically Enriched Job Interview Bag of Words (JIBoW) used as Novel Means to Score and Label Training YouTube Comment Set • Collection of Textual Comments on Job Interview Videos [*] – Experimentally controlled – Closed social space • Text and Semantic Pre-Processing Phases • Semantically Expanded by the WordNet Lexicon and DISCO with Word Synonyms, Antonyms, Derivations, and semantically similar words [*] Despotakis, Lau, Dimitrova (2011): A Semantic Approach to Extract Individual Viewpoints from User Comments on An Activity, AUM Workshop, UMAP 2011, Girona, Spain
  • 13. Scoring and Labelling Training Corpus • A Novel Term Frequency – based Mathematical Model • Computes a Relevance Score for each observation in the training comment dataset – Intersection Size between Comment BoW and JIBoW – Score is Normalized by the Average Intersection Size • A Threshold is used to classify the comments for training a binary classifier • Labels observation (noisy, relevant) accordingly
  • 14. Example Scoring & Labelling C1: “The interviewee looks confident, he should have some job experience in his work life” Comment JIBOW BOW w10 interviewee w21 confident w34 job w4 experience w57 work w113 life wn
  • 15. Example Scored & Labelled Comments
  • 16. Datasets • YouTube API for Retrieval, Lucene API for Pre- Processing • Post –YouTube Corpus Description: Analysis Data Experimentally Controlled Corpus • Training Corpus: 1159 Instances – Classified by the scoring model for Training C4.5 & Naïve Bayes Multinomial (NBM) Classifiers – {724 Noisy, 435 Relevant} • Derived a Comment Term Matrix : 1159 Instances X 903 tfidf Term Weights + 1 Discrete Class Column
  • 17. Experimental Results • Three variations of Training-to-Testing ratio Models for each classifier have been trained & tested See Evaluation ROC Area Results • The Two Classifiers show good performance in predicting relevant & noisy comments in the testing data sets • C4.5 is slightly better in predicting noisy comments from within the total noise in the data • NBM shows less risk in misclassifying relevant comments as noise
  • 18. Evaluation Human-based Evaluation Experiment was conducted to measure how well the service: Goal1: Considers the comments that show awareness in the application domain (Job Interviews) See Example Question and Records Goal2: Considers the comments that their authors are likely interested in the application domain See Example Question and Records
  • 19. Evaluation Results Number of Evaluators 2 Number of Evaluated Comments (15% of Whole 180 Dataset) Number of Comment Scored as Relevant 90 Comments Number of Comment Scored as Noisy Comments Evaluator 2 90 Evaluator 1 Goal 2 Goal 1 Goal 2 Goal 1 9% 3% Noisy Noisy 15% 17 24 46% % % Relevant Releva 19% 42% 45% 66% 59 55% nt Doesn't % know Doesn't know Metric Goal 2 Goal 1 Metric Goal 2 Goal 1 Total Match Rate 51.1% 68.3% Total Match Rate 32.2% 60.0% Total Mismatch Total Mismatch 48.9% 31.7% 67.8% 40.0% Rate Rate Precision (Noisy) 42.2% 76.7% Precision (Noisy) 36.7% 90.6% Precision Precision 76.7% 63.3% 73.3% 44.4% (Relevant) (Relevant) Recall (Noisy) 73.1% 67.6% Recall (Noisy) 84.6% 68.2%
  • 20. Summary • Conclusions – High Rate of YouTube Video comments are Noisy – ML Models are good in Predicting and Filtering out Comments that do not show author awareness nor interests in the Activity Domain of Interests • Future Work – Add more filters to improve the Scoring and Labelling Mechanism based on Evaluation Baseline – Exploit Activity Modelling Ontology to Derive JIBoW – Evaluate Impact of Semantic Enrichment
  • 21. YouTube-based Social Profiling Service: Methodology YouTube / SM Comments Noise Filtration Service Comments Predicted as Relevant RC1 … ……. RCn ……. Clusters of Social Profiles Profile1 Profile2 ProfileN x y  u o  p q  e r  x o  x c  e y  f g  z s  Associations of Profiling Source Authors Frequent Characteristics YT User Profiles Uploaded YT Video meta data Favored YT Video meta data ImREAL Comments on the YT Videos Simulators Social Profiling Corpus
  • 22. Presented By: Ahmad Ammari User and Community Modelling School of Computing, University of Leeds, UK