SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Statistical Affect Detection
      in Collaborative Chat
CSCW 2013: Mining Social Media Data, Feb. 23

 Michael Brooks, Katie Kuksenok, Megan K. Torkildson,
Daniel Perry, John J. Robinson, Taylor Jackson Scott, Ona
Anicello, Ariana Zukowski, Paul Harris, Cecilia R. Aragon

  Scientific
  Collaboration
  & Creativity
  Lab
Scientific Collaboration & Creativity Lab   2/27/2013   2
June, 2007
  6:07:57     Ray cool, it worked                          amusement, relief
  6:08:04    Matt woot                                      excitement, joy
  6:08:07     Ray awesome, I don't think he needs that    acceptance, no affect
                  long of a sleep after turning it off

  6:08:47          We enhanced eready to detect the             no affect
                   sticking

  6:08:58    Matt good job                               supportive, acceptance
  6:09:21          seems it did well there                happiness, no affect
  6:09:26     Ray yeah, pretty cool huh?                  interest, agreement,
                                                               happiness

  6:09:43    Matt helps keep me from having to stopaic          no affect
                  and restart

  6:09:55     Ray indeed, that was the point                   agreement



            Scientific Collaboration & Creativity Lab    2/27/2013             3
Nearby Supernova Factory
• 30 astrophysicists
• US / France
• Daily remote operation of
  telescope
• Rely on chat to communicate




      Scientific Collaboration & Creativity Lab   2/27/2013   4
5
6
SNfactory Chat Logs
• Four years of logs - 449,684 messages
• Manual coding for affective expressions
  –   27,344 chat messages coded
  –   1-5 coders per message
  –   30 affect codes
  –   Multiple codes allowed



Scott et al. SIGDOC 2012. Adapting Grounded Theory to Construct a Taxonomy
of Affect in Collaborative Online Chat.


        Scientific Collaboration & Creativity Lab       2/27/2013            7
June, 2007
  6:07:57     Ray cool, it worked                          amusement, relief
  6:08:04    Matt woot                                      excitement, joy
  6:08:07     Ray awesome, I don't think he needs that    acceptance, no affect
                  long of a sleep after turning it off

  6:08:47          We enhanced eready to detect the             no affect
                   sticking

  6:08:58    Matt good job                               supportive, acceptance
  6:09:21          seems it did well there                happiness, no affect
  6:09:26     Ray yeah, pretty cool huh?                  interest, agreement,
                                                               happiness

  6:09:43    Matt helps keep me from having to stopaic          no affect
                  and restart

  6:09:55     Ray indeed, that was the point                   agreement



            Scientific Collaboration & Creativity Lab    2/27/2013             8
Top 13 Affect Codes
                          Times Used                            Reliability (Kappa)
int…                                     4351        interest                                 0.808
am…                               3213           amusement                        0.611
co…                       1763                   considering               0.49
agr…                  1623                        agreement                0.491
an…                1212                           annoyance                               0.77
co…            1125                                confusion                      0.615
acc…          975                                 acceptance                          0.657
ap…          799                                apprehension                0.529
fru…    541                                       frustration                  0.55
sup…    518                                       supportive                     0.583
sur…   464                                          surprise                   0.543
ant…   426                                       anticipation            0.424
ser…   369                                          serenity                      0.602


              Scientific Collaboration & Creativity Lab            2/27/2013                  9
Linguistic Inquiry and Word Count
               (LIWC)
• Detects words for Positive / Negative Emotions


     I wish every day                       Positive: 15%
     could be sunny                         Negative: 8%
     and warm. Rain                         …
     makes me angry.




      Scientific Collaboration & Creativity Lab    2/27/2013   10
June, 2005
 11:44:08    Gabri ok that's better                                        relief, serenity
 11:44:17   Marcel GREAT !                                             excitement, happiness,
                                                                             relief, joy
 11:44:17    Gabri let's start aic and see                             anticipation, no affect
 11:44:23   Marcel yes ...                                                    no affect
 11:44:31   Derek Great what?                                                confusion
 11:44:32    Gabri can you do that?                                      interest, no affect
 11:44:50           derek.. it seems that now the focus is ok                 no affect
 11:45:04           and we can finally start observing                        no affect
 11:45:23   Derek Oh good!                                              happiness, relief, joy
 11:45:48           I have been waiting for this moment, because I          amusement
                    want to leave the room and get my midnight
                    snack. ;)
 11:46:54    Gabri go...                                                amusement, no affect
 11:47:02           and enjoy your snack                                amusement, no affect
 11:47:13   Derek HEhe.                                                     amusement
 11:47:18           I will bring it back here of course.                    amusement


             Scientific Collaboration & Creativity Lab               2/27/2013                11
The telescope is stuck! >:(
   frustration


The telescope is stuuuuuuuuuck...
   annoyance


The telescope is stuck??
   confusion




       Scientific Collaboration & Creativity Lab   2/27/2013   12
• Word counts
• Emoticons
• Word sets
   –   Swear words
   –   Pronouns
   –   Negations
   –   Participant names
• Characters
   – Capitalization
   – Letter repetition
   – Punctuation
• Metadata
   – segment duration, length, rate


       Scientific Collaboration & Creativity Lab   2/27/2013   13
• Word counts
• Emoticons
• Word sets
   –   Swear words
   –   Pronouns
   –   Negations
   –   Participant names
• Characters
   – Capitalization
   – Letter repetition
   – Punctuation
• Metadata
   – segment duration, length, rate


       Scientific Collaboration & Creativity Lab   2/27/2013   14
Emoticons
Naomi: I think we'd better stopaic... :(       sadness
Matt: today was a gym + laundry day :)         amusement, happiness
Marcel: and she can't teach over an ssh-       amusement
channel ;-)




       Scientific Collaboration & Creativity Lab         2/27/2013    15
Word Sets
                              Swear Words
Ray: why the **** doesn't stop_script *******       rage
STOP THE ******* SCRIPT
Matt: ******* ******* ******* I think I broke it    frustration, anger,
                                                    apprehension,
                                                    embarrassment


                                 Negations
Paul: but I wouldn't hazzard a guess                apprehension
Ray: cannot talk to camera                          frustration, no-affect




        Scientific Collaboration & Creativity Lab           2/27/2013        16
Character Features
                       Letter Repetition
Ray: noooooooooooooooo, it must be stopped        annoyance, anger, fear
Marcel: AAaah too late, they will find meeee      amusement


                           Punctuation
Rick: looks like something bad happened here...     apprehension
Rene: 1 month before max??!?                        surprise, confusion,
                                                    considering


                         Capitalization
Marcel: ON TARGET !                                   relief, joy
Paul: we must set-up adopt an EXPLODING STAR          amusement, no-affect



       Scientific Collaboration & Creativity Lab          2/27/2013          17
Feature Value
Alice: ok, so where was                              “ok”      1
the ******* SN on the                        “telescope”       0
        image?                                  “where”        1
                                                    “SN”       1
                                                “image”        1
                                         question marks        1
                                                  swears       1
                                             emoticon :)       0
                                    1st person pronouns        0
                                                 capitals      2
                                               repetition      0
                                            punctuation        1
                                                   length      45
                                                        …
    Scientific Collaboration & Creativity Lab      2/27/2013        18
Feature importance
   Confusion             Messages labeled Confusion
   ???? length           Ben: ??? - the answer is likely found in
# question marks            the otsim code
  "understand"           Marcel: well ... I'm not so sure ...
    "confus_"            Gary: Why do we care at all then?
      "why"              Ray: ummm I mean how does it get to
      "what"                the header
    "nothing"
     "wrong"
   msg. length
    "thought"


       Scientific Collaboration & Creativity Lab   2/27/2013        19
Feature importance
   Apprehension          Messages labeled Apprehension
       "bad"             Pascal: the problem is than the
   "something"              automated detection will not work ...
    "problem"               too much galaxy
       "we"              Ray: But now bad stuff in window
      "seem"             Ben: pascal, we had a problem with
       "too"                do_fchart
    msg. length          Gabriel: So something is completely
       "not"                wrong
# 3rd sg. Pronouns
    # swearing


       Scientific Collaboration & Creativity Lab   2/27/2013    20
Feature importance
  Amusement             Messages labeled Amusement
  emoticon ";)"         Kevin: hehe
  emoticon ":)"         Ray: hahahaah
    laughter            Stef: lol ok derek :)
 emoticon ";-)"         Ray: He never sleeps -- you know that.
      "fun"             Pascal: but I think it could be interesting
laughter length             for Extreeeeeeeeeeme photometry
       "p"                  study ;-)
# people names
     "sleep"
       "of"


      Scientific Collaboration & Creativity Lab   2/27/2013       21
Specialized Features
• Count words based on the data
• Medium-specific features
   – Emoticons, punctuation…
• Context-specific features
   – People names, jargon…
• Affect-specific features
   – Swearing vs. emoticons




      Scientific Collaboration & Creativity Lab   2/27/2013   22
5:17:48   Marcel ok, so let's cycle the stuff                             September, 2006
5:18:04     Rick ok…
5:18:40   Marcel damn mouse cutandpast
5:19:03      Ray off 1 right? then on 1?
5:19:32   Marcel have you telnet sdsugreen ??
5:19:58      Ray director on lbl2 looks dead
5:20:34   Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ?
5:20:36      Ray what is best way to revive it
5:20:39            baytech
5:20:40            yes
5:20:46            not sdsu
5:21:08            go ahead and do it I am not evneon this **** shift...grrr
5:21:22   Marcel ok, maybe we have to kill director and restart it mkanually
5:21:32      Ray yeah but that's tricky; all these damn arguments
5:23:53     Rick emile, I have no idea what's going on here
5:23:57            only that it is bad


          Scientific Collaboration & Creativity Lab               2/27/2013          23
5:17:48   Marcel ok, so let's cycle the stuff                             September, 2006
5:18:04     Rick ok…
5:18:40   Marcel damn mouse cutandpast
5:19:03      Ray off 1 right? then on 1?
5:19:32   Marcel have you telnet sdsugreen ??
5:19:58      Ray director on lbl2 looks dead
5:20:34   Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ?
5:20:36      Ray what is best way to revive it
5:20:39            baytech
5:20:40            yes
5:20:46            not sdsu
5:21:08            go ahead and do it I am not evneon this **** shift...grrr
5:21:22   Marcel ok, maybe we have to kill director and restart it mkanually
5:21:32      Ray yeah but that's tricky; all these damn arguments
5:23:53     Rick emile, I have no idea what's going on here
5:23:57            only that it is bad


          Scientific Collaboration & Creativity Lab               2/27/2013          24
Classifier    F-measure        Precision    Recall   Accuracy
Naïve Bayes        0.650           0.637      0.691         0.637
Logistic Reg.      0.730           0.731      0.731         0.730
SVM (SMO)          0.759           0.766      0.751         0.761
   C4.5 (J48)      0.700           0.724      0.680         0.710




  Scientific Collaboration & Creativity Lab     2/27/2013           25
Support Vector Machine
• Accurate
• Fast




                                   # “ok”
• Transparent



                                                  # swear words
                                            “frustration” applies
                                            “frustration” does not apply


     Scientific Collaboration & Creativity Lab      2/27/2013      26
Support Vector Machine
• Accurate
• Fast




                                   # “ok”
                                                 ?
• Transparent



                                                     # swear words
                                            “frustration” applies
                                            “frustration” does not apply


     Scientific Collaboration & Creativity Lab        2/27/2013      27
Precision   Recall
                0.0   0.1   0.2   0.3   0.4   0.5      0.6   0.7   0.8   0.9   1.0

     interest
 amusement
 considering
  agreement
  annoyance
   confusion
  acceptance
apprehension
  frustration
  supportive
    surprise
 anticipation
    serenity

        Scientific Collaboration & Creativity Lab              2/27/2013             28
Interpretability
• How is the classifier
  making decisions?




                                    # “ok”
• What features are
  important in the model?


                                                   # swear words
                                             “frustration” applies
                                             “frustration” does not apply


      Scientific Collaboration & Creativity Lab      2/27/2013      29
Feature importance
  Amusement             Messages labeled Amusement
  emoticon ";)"         Kevin: hehe
  emoticon ":)"         Ray: hahahaah
    laughter            Stef: lol ok derek :)
 emoticon ";-)"         Ray: He never sleeps -- you know that.
      "fun"             Pascal: but I think it could be interesting
laughter length             for Extreeeeeeeeeeme photometry
       "p"                  study ;-)
# people names
     "sleep"
       "of"


      Scientific Collaboration & Creativity Lab   2/27/2013       30
Interpretable Classifiers
• Explain classification errors
• Suggest improvement strategies
• Discover interesting anomalies




      Scientific Collaboration & Creativity Lab   2/27/2013   31
Future Work




Scientific Collaboration & Creativity Lab   2/27/2013   32
Sequential Modeling
5:19:58      Ray director on lbl2 looks dead
5:20:34   Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ?
5:20:36      Ray what is best way to revive it
5:20:39           baytech
5:20:40           yes
5:20:46           not sdsu
5:21:08           go ahead and do it I am not evneon this **** shift...grrr
5:21:22   Marcel ok, maybe we have to kill director and restart it mkanually
5:21:32      Ray yeah but that's tricky; all these damn arguments
5:23:53     Rick emile, I have no idea what's going on here
5:23:57           only that it is bad




          Scientific Collaboration & Creativity Lab              2/27/2013         33
Interactive Visual Analysis




Scientific Collaboration & Creativity Lab   2/27/2013   34
Affect in Twitter
                   45000



                   40000



                   35000



                   30000
Number of Tweets




                   25000




                                                               game resumes
                   20000




                                                                 blackout




                                                                                                              game over
                                                    halftime




                                                                              game resumes
                                 kickoff




                   15000



                   10000



                   5000



                      0




                                                      Time (EST), 2/3/2013                   positive   negative          neutral




                           Scientific Collaboration & Creativity Lab                             2/27/2013                          35
Classify…
                                      • Positive/negative/neutral
                                        sentiment
                                      • Highly granular emotions
                                      • Anything else you can label
  github.com/etcgroup/aloe
                                      In…
Download it, use it, & tell us what   • longer, formal documents (blog
           you think!                   posts, reviews)
                                      • individual sentences
        Michael Brooks                • instant messages
       mjbrooks@uw.edu                • tweets
http://depts.washington.edu/sccl
                                      • Anything else you can put in CSV


         Scientific Collaboration & Creativity Lab   2/27/2013        36
Statistical Affect Detection in Collaborative Chat

Más contenido relacionado

Destacado

1.1 menyatakan kuantiti secara intuitif
1.1 menyatakan kuantiti secara intuitif1.1 menyatakan kuantiti secara intuitif
1.1 menyatakan kuantiti secara intuitifshahfira
 
Presentacion historia 1
Presentacion historia 1Presentacion historia 1
Presentacion historia 1salon36ulsa
 
Pokok krismas
Pokok krismasPokok krismas
Pokok krismasshahfira
 
Numerasi k1 (mengenal_nombor)
Numerasi k1 (mengenal_nombor)Numerasi k1 (mengenal_nombor)
Numerasi k1 (mengenal_nombor)shahfira
 
Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)shahfira
 
Numerasi k2 (membilang_0_-_10)
Numerasi k2 (membilang_0_-_10)Numerasi k2 (membilang_0_-_10)
Numerasi k2 (membilang_0_-_10)shahfira
 
Numerasi k2 (membilang 0 -10)
Numerasi k2 (membilang 0 -10)Numerasi k2 (membilang 0 -10)
Numerasi k2 (membilang 0 -10)shahfira
 
Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)shahfira
 
Numerasi k1 (pra_nombor)
Numerasi k1 (pra_nombor)Numerasi k1 (pra_nombor)
Numerasi k1 (pra_nombor)shahfira
 

Destacado (13)

IPL
IPLIPL
IPL
 
Unit 1
Unit 1Unit 1
Unit 1
 
1.1 menyatakan kuantiti secara intuitif
1.1 menyatakan kuantiti secara intuitif1.1 menyatakan kuantiti secara intuitif
1.1 menyatakan kuantiti secara intuitif
 
Urok1
Urok1Urok1
Urok1
 
Presentacion historia 1
Presentacion historia 1Presentacion historia 1
Presentacion historia 1
 
Pokok krismas
Pokok krismasPokok krismas
Pokok krismas
 
Robocop
RobocopRobocop
Robocop
 
Numerasi k1 (mengenal_nombor)
Numerasi k1 (mengenal_nombor)Numerasi k1 (mengenal_nombor)
Numerasi k1 (mengenal_nombor)
 
Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)
 
Numerasi k2 (membilang_0_-_10)
Numerasi k2 (membilang_0_-_10)Numerasi k2 (membilang_0_-_10)
Numerasi k2 (membilang_0_-_10)
 
Numerasi k2 (membilang 0 -10)
Numerasi k2 (membilang 0 -10)Numerasi k2 (membilang 0 -10)
Numerasi k2 (membilang 0 -10)
 
Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)Numerasi k2 (membilang_11_-_20)
Numerasi k2 (membilang_11_-_20)
 
Numerasi k1 (pra_nombor)
Numerasi k1 (pra_nombor)Numerasi k1 (pra_nombor)
Numerasi k1 (pra_nombor)
 

Último

99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdfPaige Cruz
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 

Último (20)

99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf99.99% of Your Traces  Are (Probably) Trash (SRECon NA 2024).pdf
99.99% of Your Traces Are (Probably) Trash (SRECon NA 2024).pdf
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 

Statistical Affect Detection in Collaborative Chat

  • 1. Statistical Affect Detection in Collaborative Chat CSCW 2013: Mining Social Media Data, Feb. 23 Michael Brooks, Katie Kuksenok, Megan K. Torkildson, Daniel Perry, John J. Robinson, Taylor Jackson Scott, Ona Anicello, Ariana Zukowski, Paul Harris, Cecilia R. Aragon Scientific Collaboration & Creativity Lab
  • 2. Scientific Collaboration & Creativity Lab 2/27/2013 2
  • 3. June, 2007 6:07:57 Ray cool, it worked amusement, relief 6:08:04 Matt woot excitement, joy 6:08:07 Ray awesome, I don't think he needs that acceptance, no affect long of a sleep after turning it off 6:08:47 We enhanced eready to detect the no affect sticking 6:08:58 Matt good job supportive, acceptance 6:09:21 seems it did well there happiness, no affect 6:09:26 Ray yeah, pretty cool huh? interest, agreement, happiness 6:09:43 Matt helps keep me from having to stopaic no affect and restart 6:09:55 Ray indeed, that was the point agreement Scientific Collaboration & Creativity Lab 2/27/2013 3
  • 4. Nearby Supernova Factory • 30 astrophysicists • US / France • Daily remote operation of telescope • Rely on chat to communicate Scientific Collaboration & Creativity Lab 2/27/2013 4
  • 5. 5
  • 6. 6
  • 7. SNfactory Chat Logs • Four years of logs - 449,684 messages • Manual coding for affective expressions – 27,344 chat messages coded – 1-5 coders per message – 30 affect codes – Multiple codes allowed Scott et al. SIGDOC 2012. Adapting Grounded Theory to Construct a Taxonomy of Affect in Collaborative Online Chat. Scientific Collaboration & Creativity Lab 2/27/2013 7
  • 8. June, 2007 6:07:57 Ray cool, it worked amusement, relief 6:08:04 Matt woot excitement, joy 6:08:07 Ray awesome, I don't think he needs that acceptance, no affect long of a sleep after turning it off 6:08:47 We enhanced eready to detect the no affect sticking 6:08:58 Matt good job supportive, acceptance 6:09:21 seems it did well there happiness, no affect 6:09:26 Ray yeah, pretty cool huh? interest, agreement, happiness 6:09:43 Matt helps keep me from having to stopaic no affect and restart 6:09:55 Ray indeed, that was the point agreement Scientific Collaboration & Creativity Lab 2/27/2013 8
  • 9. Top 13 Affect Codes Times Used Reliability (Kappa) int… 4351 interest 0.808 am… 3213 amusement 0.611 co… 1763 considering 0.49 agr… 1623 agreement 0.491 an… 1212 annoyance 0.77 co… 1125 confusion 0.615 acc… 975 acceptance 0.657 ap… 799 apprehension 0.529 fru… 541 frustration 0.55 sup… 518 supportive 0.583 sur… 464 surprise 0.543 ant… 426 anticipation 0.424 ser… 369 serenity 0.602 Scientific Collaboration & Creativity Lab 2/27/2013 9
  • 10. Linguistic Inquiry and Word Count (LIWC) • Detects words for Positive / Negative Emotions I wish every day Positive: 15% could be sunny Negative: 8% and warm. Rain … makes me angry. Scientific Collaboration & Creativity Lab 2/27/2013 10
  • 11. June, 2005 11:44:08 Gabri ok that's better relief, serenity 11:44:17 Marcel GREAT ! excitement, happiness, relief, joy 11:44:17 Gabri let's start aic and see anticipation, no affect 11:44:23 Marcel yes ... no affect 11:44:31 Derek Great what? confusion 11:44:32 Gabri can you do that? interest, no affect 11:44:50 derek.. it seems that now the focus is ok no affect 11:45:04 and we can finally start observing no affect 11:45:23 Derek Oh good! happiness, relief, joy 11:45:48 I have been waiting for this moment, because I amusement want to leave the room and get my midnight snack. ;) 11:46:54 Gabri go... amusement, no affect 11:47:02 and enjoy your snack amusement, no affect 11:47:13 Derek HEhe. amusement 11:47:18 I will bring it back here of course. amusement Scientific Collaboration & Creativity Lab 2/27/2013 11
  • 12. The telescope is stuck! >:( frustration The telescope is stuuuuuuuuuck... annoyance The telescope is stuck?? confusion Scientific Collaboration & Creativity Lab 2/27/2013 12
  • 13. • Word counts • Emoticons • Word sets – Swear words – Pronouns – Negations – Participant names • Characters – Capitalization – Letter repetition – Punctuation • Metadata – segment duration, length, rate Scientific Collaboration & Creativity Lab 2/27/2013 13
  • 14. • Word counts • Emoticons • Word sets – Swear words – Pronouns – Negations – Participant names • Characters – Capitalization – Letter repetition – Punctuation • Metadata – segment duration, length, rate Scientific Collaboration & Creativity Lab 2/27/2013 14
  • 15. Emoticons Naomi: I think we'd better stopaic... :( sadness Matt: today was a gym + laundry day :) amusement, happiness Marcel: and she can't teach over an ssh- amusement channel ;-) Scientific Collaboration & Creativity Lab 2/27/2013 15
  • 16. Word Sets Swear Words Ray: why the **** doesn't stop_script ******* rage STOP THE ******* SCRIPT Matt: ******* ******* ******* I think I broke it frustration, anger, apprehension, embarrassment Negations Paul: but I wouldn't hazzard a guess apprehension Ray: cannot talk to camera frustration, no-affect Scientific Collaboration & Creativity Lab 2/27/2013 16
  • 17. Character Features Letter Repetition Ray: noooooooooooooooo, it must be stopped annoyance, anger, fear Marcel: AAaah too late, they will find meeee amusement Punctuation Rick: looks like something bad happened here... apprehension Rene: 1 month before max??!? surprise, confusion, considering Capitalization Marcel: ON TARGET ! relief, joy Paul: we must set-up adopt an EXPLODING STAR amusement, no-affect Scientific Collaboration & Creativity Lab 2/27/2013 17
  • 18. Feature Value Alice: ok, so where was “ok” 1 the ******* SN on the “telescope” 0 image? “where” 1 “SN” 1 “image” 1 question marks 1 swears 1 emoticon :) 0 1st person pronouns 0 capitals 2 repetition 0 punctuation 1 length 45 … Scientific Collaboration & Creativity Lab 2/27/2013 18
  • 19. Feature importance Confusion Messages labeled Confusion ???? length Ben: ??? - the answer is likely found in # question marks the otsim code "understand" Marcel: well ... I'm not so sure ... "confus_" Gary: Why do we care at all then? "why" Ray: ummm I mean how does it get to "what" the header "nothing" "wrong" msg. length "thought" Scientific Collaboration & Creativity Lab 2/27/2013 19
  • 20. Feature importance Apprehension Messages labeled Apprehension "bad" Pascal: the problem is than the "something" automated detection will not work ... "problem" too much galaxy "we" Ray: But now bad stuff in window "seem" Ben: pascal, we had a problem with "too" do_fchart msg. length Gabriel: So something is completely "not" wrong # 3rd sg. Pronouns # swearing Scientific Collaboration & Creativity Lab 2/27/2013 20
  • 21. Feature importance Amusement Messages labeled Amusement emoticon ";)" Kevin: hehe emoticon ":)" Ray: hahahaah laughter Stef: lol ok derek :) emoticon ";-)" Ray: He never sleeps -- you know that. "fun" Pascal: but I think it could be interesting laughter length for Extreeeeeeeeeeme photometry "p" study ;-) # people names "sleep" "of" Scientific Collaboration & Creativity Lab 2/27/2013 21
  • 22. Specialized Features • Count words based on the data • Medium-specific features – Emoticons, punctuation… • Context-specific features – People names, jargon… • Affect-specific features – Swearing vs. emoticons Scientific Collaboration & Creativity Lab 2/27/2013 22
  • 23. 5:17:48 Marcel ok, so let's cycle the stuff September, 2006 5:18:04 Rick ok… 5:18:40 Marcel damn mouse cutandpast 5:19:03 Ray off 1 right? then on 1? 5:19:32 Marcel have you telnet sdsugreen ?? 5:19:58 Ray director on lbl2 looks dead 5:20:34 Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ? 5:20:36 Ray what is best way to revive it 5:20:39 baytech 5:20:40 yes 5:20:46 not sdsu 5:21:08 go ahead and do it I am not evneon this **** shift...grrr 5:21:22 Marcel ok, maybe we have to kill director and restart it mkanually 5:21:32 Ray yeah but that's tricky; all these damn arguments 5:23:53 Rick emile, I have no idea what's going on here 5:23:57 only that it is bad Scientific Collaboration & Creativity Lab 2/27/2013 23
  • 24. 5:17:48 Marcel ok, so let's cycle the stuff September, 2006 5:18:04 Rick ok… 5:18:40 Marcel damn mouse cutandpast 5:19:03 Ray off 1 right? then on 1? 5:19:32 Marcel have you telnet sdsugreen ?? 5:19:58 Ray director on lbl2 looks dead 5:20:34 Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ? 5:20:36 Ray what is best way to revive it 5:20:39 baytech 5:20:40 yes 5:20:46 not sdsu 5:21:08 go ahead and do it I am not evneon this **** shift...grrr 5:21:22 Marcel ok, maybe we have to kill director and restart it mkanually 5:21:32 Ray yeah but that's tricky; all these damn arguments 5:23:53 Rick emile, I have no idea what's going on here 5:23:57 only that it is bad Scientific Collaboration & Creativity Lab 2/27/2013 24
  • 25. Classifier F-measure Precision Recall Accuracy Naïve Bayes 0.650 0.637 0.691 0.637 Logistic Reg. 0.730 0.731 0.731 0.730 SVM (SMO) 0.759 0.766 0.751 0.761 C4.5 (J48) 0.700 0.724 0.680 0.710 Scientific Collaboration & Creativity Lab 2/27/2013 25
  • 26. Support Vector Machine • Accurate • Fast # “ok” • Transparent # swear words “frustration” applies “frustration” does not apply Scientific Collaboration & Creativity Lab 2/27/2013 26
  • 27. Support Vector Machine • Accurate • Fast # “ok” ? • Transparent # swear words “frustration” applies “frustration” does not apply Scientific Collaboration & Creativity Lab 2/27/2013 27
  • 28. Precision Recall 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 interest amusement considering agreement annoyance confusion acceptance apprehension frustration supportive surprise anticipation serenity Scientific Collaboration & Creativity Lab 2/27/2013 28
  • 29. Interpretability • How is the classifier making decisions? # “ok” • What features are important in the model? # swear words “frustration” applies “frustration” does not apply Scientific Collaboration & Creativity Lab 2/27/2013 29
  • 30. Feature importance Amusement Messages labeled Amusement emoticon ";)" Kevin: hehe emoticon ":)" Ray: hahahaah laughter Stef: lol ok derek :) emoticon ";-)" Ray: He never sleeps -- you know that. "fun" Pascal: but I think it could be interesting laughter length for Extreeeeeeeeeeme photometry "p" study ;-) # people names "sleep" "of" Scientific Collaboration & Creativity Lab 2/27/2013 30
  • 31. Interpretable Classifiers • Explain classification errors • Suggest improvement strategies • Discover interesting anomalies Scientific Collaboration & Creativity Lab 2/27/2013 31
  • 32. Future Work Scientific Collaboration & Creativity Lab 2/27/2013 32
  • 33. Sequential Modeling 5:19:58 Ray director on lbl2 looks dead 5:20:34 Marcel ok, one thind at a time. have you cycled the baytech on sdsugreen ? 5:20:36 Ray what is best way to revive it 5:20:39 baytech 5:20:40 yes 5:20:46 not sdsu 5:21:08 go ahead and do it I am not evneon this **** shift...grrr 5:21:22 Marcel ok, maybe we have to kill director and restart it mkanually 5:21:32 Ray yeah but that's tricky; all these damn arguments 5:23:53 Rick emile, I have no idea what's going on here 5:23:57 only that it is bad Scientific Collaboration & Creativity Lab 2/27/2013 33
  • 34. Interactive Visual Analysis Scientific Collaboration & Creativity Lab 2/27/2013 34
  • 35. Affect in Twitter 45000 40000 35000 30000 Number of Tweets 25000 game resumes 20000 blackout game over halftime game resumes kickoff 15000 10000 5000 0 Time (EST), 2/3/2013 positive negative neutral Scientific Collaboration & Creativity Lab 2/27/2013 35
  • 36. Classify… • Positive/negative/neutral sentiment • Highly granular emotions • Anything else you can label github.com/etcgroup/aloe In… Download it, use it, & tell us what • longer, formal documents (blog you think! posts, reviews) • individual sentences Michael Brooks • instant messages mjbrooks@uw.edu • tweets http://depts.washington.edu/sccl • Anything else you can put in CSV Scientific Collaboration & Creativity Lab 2/27/2013 36

Notas del editor

  1. Researchers working with social media have more data available than ever before.There is great potential for new insights, but the data sets are very large and complex. How can we help people understand data sets collected from social media and other online communication?Our research group is studying how a combination of visualization and machine learning can be integrated into a qualitative research workflow to help researchers dig into these new data sources in a rich, but also scalable way.
  2. In this paper, we focus on a large collection of chat logs from scientists working together on a specific project.Our group is doing ongoing qualitative research to understand how, when, and why the scientists express emotion, or affect, and how affect relates to creativity and problem solving in this data set.The data set is too large to manually code it ourselves, and privacy and specialized domain knowledge prevent us from using something like Mechanical Turk.In this talk, I will present some of the issues we have explored around using machine learning to automatically label the data, in support of scalable rich analysis.I will focus on the importance of developing a diverse, specialized feature set and the use of interpretable classification algorithms.
  3. I’ll start by giving a bit of background about the data…
  4. Ray and Matt are discussing a new program that Ray created to automatically un-stick the telescope, saving the scientists a lot of time.Many lines have multiple types of affect, while some lines have no affect.
  5. Most affect codes are very rare.Reliability ranges from 0.4 to 0.8
  6. Before I go on…LIWC is an popular text analysis tool that can be used for finding emotions or sentiment in text.LIWC processes blocks of text, counting words that belong to specific sets of dictionary words that have been previously determined to have particular meanings.This is called a lexicon-based approach.The words sunny and warm are part of LIWC’s Positive Affect lexicon, while angry is part of its Negative Affect lexicon.So, LIWC would output that this text has two positive words and one negative word.
  7. For data sets like ours, we believe that this kind of approach is not appropriate.While LIWC’s validity has been carefully studied for very narrow domains of English writing, informal online communications such as chat messages and tweets use a lot of domain-specific vocabulary and non-standard textual cues to communicate affect, almost becoming another language entirely. The medium and the context of communication are often critical to correctly understanding emotional content.
  8. Let me illustrate this with a quick example. This is a chat message rewritten three ways.LIWC is not built to recognize expressions such as emoticons, or intentionally mispelled words. Punctuation cues are not taken into account.Furthermore, in general English, a word like stuck may not have strong emotional connotations, but in our data set, it is used when scientists are struggling with telescope problems. Therefore it is quite an effective way to recognize frustration, for example. LIWC and other tools that use standard English lexicons will miss out on these signals.So if we aren’t going to use a predefined, validated lexicon of affect-laden words, what will we use to recognize affect?
  9. We based our features on a combination of previous literature and our knowledge of this chat data set we were working with.
  10. We look at all of the words that occur anywhere in the training data and select the most common 4-600 of those.Each becomes a feature that our classifiers can use to recognize affect. The words do not come from a predefined list, but from the data itself.This helps us pick up on jargon and other unconventional word usage.
  11. Using a list of over 2000 punctuation patterns recognized as emoticons, we also add the most frequently occurring emoticons to the feature set.
  12. In addition to these corpus-based features, we have a several specific types of words that we look for. So, we have a feature for the # of swear words in the message, or the number of negation words.
  13. We look at character-level features like the number of repeated consecutive letters, sequences of exclamation points, or the number of capital letters.These are used extensively in chat messages and other informal online communication to signal emotion, mood, or affect.
  14. Here’s an example to illustrate how this works.On the right, is a subset of the features that we extract from the message.In reality the list is about 800 features long.
  15. I’m going to skip ahead for a moment to some results.One we train and evaluate classifiers for the affect codes that we want to automatically label, one thing we can do is look and see which of those 800+ features were actually important.This example shows the top 10 most highly weighted features for the classifier trained to recognize confusion.On the right are a few example messages that our coders labeled with confusion.Clearly, the presence of question marks and certain key words (understand, why, what…) are useful for knowing when someone is confused.
  16. Compare that to the top features for Apprehension.A different set of key words has risen to the top…, in addition to the number of 3rdsg pronouns and swear words.The examples on the right can help you see how those words are used and why they might be associated with apprehension.
  17. And for amusement, emoticons and laughter expressions were the most useful features.Note that the presence of names of specific scientists were also important factors in labeling for amusement.
  18. The conclusion we want to stress is that for communication that resembles chat, specialized features are critical for recognizing a wide range of affect codes.Features that were intimately based on the data (word counts and emoticons) but also features specific to the communication medium (emoticons and punctuation) were highly utilized.And the usefulness of each feature varied greatly from one type of affect to another.
  19. Now, I’llexplain in more detail how those features are used in classification, and why we strongly recommend using interpretable, transparent classification algorithms for automated or partially automated coding as part of qualitative research.As I’ve said, we focused only on the 13 most frequently used types of affect. We created one binary classifier for each affect code.
  20. This means that the problem facing the classifier is the following: Given Ray’s message “what is the best way to revive it”, does the code frustration apply?
  21. We compared the performance of a wide variety of classification algorithms, a few of which are shown here. We selected a linear support vector machine because it had a very promising performance characteristics, but also because it is fast to train and use, and provides a level of transparency to its inner workings not afforded by lots of other algorithms.
  22. I’ll explain a little about how linear SVMs are used to classify text.Let’s say that you have only two features, #ok and #swears. The messages in your training data can each be plotted in this 2D space.In this example there is a pretty clear separation between those that were manually labeled with the frustration code and those which were not.When you train an SVM classifier on this data, it finds a line that best separates the frustrated messages from the non-frustrated messages (according to a particular definition of “best separates”). Such as this one.
  23. Then, given a new unlabeled message with few swear words and a medium number of “ok”s, the classifier can label it as non-frustrated because it falls on that side of the line.
  24. This chart shows precision and recall from 10-fold cross validation for each of our 13 affect codes, using balanced data.Precision is the percent of messages out of all of the messages that the classifier labeled as positive, which were truly supposed to be positive.Recall is the percent out of all of the truly positive messages that the classifier successfully labeled as positive.So, performance is between 60 and 80% for most codes, with a high 93% for interest.But, how can we know if these classifiers are actually useful for automatically coding chat messages for our research?
  25. Now, this is what I meant when I said the SVM is relatively transparent or interpretable.Supposed we learned the following model from the data.From this, we can see that swear words have more predictive power for frustration, while # of “ok” hardly makes any difference.In other words, by looking at the slope of the line, we can find out which features were the most important.
  26. This is exactly where these tables from earlier came from.Examination of the SVM feature weights gives us a very easy way to gain a measure of insight into how and why the classifier behaves the way it does, which can help us understand how useful it might be for automatic coding.
  27. And in general, webelieve that for this kind of application, understand how/why the classifier does or doesn’t work may be far more important than optimizing specific classification performance metrics (like precision, recall, accuracy, f1 score)
  28. Sequential modeling approaches such as hidden-markov modelsContext is clearly important to understanding the emotion communicated in chat messages. Looking at messages in isolation can only get you so far.Sequential modeling techniques can more directly take contextual information into account.
  29. Further, we are studying how visual analytics and interactive machine learning can be combined to create powerful tools for analyzing large social communication data sets.
  30. Finally, we are extending this work by developing new features and algorithms for processing tweets, where data set size can easily extends into the millions of messages, and different signals are used to communicate affect.
  31. We have published the code from this study on GitHub, as a Java program called ALOE.ALOE uses the Weka machine learning library, and can easily be extended and used for affect classification and other text classification work. We invite you to try it out and let us know what you think.Questions?