SlideShare una empresa de Scribd logo
1 de 45
Detecting, Modeling, & Predicting
    User Temporal Intention
         in Social Media
          Hany M. SalahEldeen
          Old Dominion University

        Advisor: Dr. Michael L. Nelson

       JCDL ‘12 Doctoral Consortium
Michael Jackson Dies




                   Snapshot on: June 25th 2009
http://web.archive.org/web/20090625232522/http://www.cnn.com/
Jeff tweets about it…




          Published on: June 25th 2009
https://twitter.com/mdnitehk/status/2333993907
Jenny is off the grid
Jeff’s friend Jenny was on a vacation in Hawaii
for a month…
Jenny starts catching up a month later




                                             Read on: July26th 2009


When she came back she checked Jeff’s tweets and was
shocked!
          https://twitter.com/mdnitehk/status/2333993907
Jenny follows the link on July 26th




                     CNN page on: July 26th 2009
 http://web.archive.org/web/20090726234411/http://www.cnn.com/
Jenny is confused!
• Implication:
  – Jenny thought Jeff is making a joke about her
    favorite singer and she got mad at him


• Problem:
  – The tweet and the resource the tweet links to
    have become unsynchronized.
The Egyptian Revolution
Reading about it on Storify in
       March 2012….




     http://storify.com/maq4sure/egypts-revolution
I noticed some shared images are missing




       http://storify.com/maq4sure/egypts-revolution
Some tweets are still intact…




https://twitter.com/miss_amy_qb/status/32477898581483521
…and some lost their meaning with the
    disappearance of the images



       https://twitter.com/aishes/status/32485352102952960
                                                                Missing ?




    https://twitter.com/omar_chaaban/status/32203697597452289
The tweet remains but the shared
      image disappeared…




       http://yfrog.com/h5923xrvbqqvgzj
Cairo….we have a problem
• Implication:
  – The reader cannot understand what the author of
    the tweet meant because the image is not
    available.


• Problem:
  – The post is available but the linked resource
    (image) is completely missing.
The Anatomy of a Tweet
The Anatomy of a Tweet
                                      Author’s username
                                      Other user mention
Social
 Post                                                Tweet Body




   Interaction Publishing Shortened URL   Hash Tag
   options     timestamp to resource

                        Shared Resource
3 URIs = 3 Chances to fail
Explanation in MJ’s example
            t3   t4   t5        t7   t8   t9   …   tn
  t1   t2                  t6
User’s Temporal Intention
The Focus of our research              Instrumented shortener



  Share time               Implicit       Explicit

   Click time              Implicit       Explicit
                                      Instrumented web client
      Out of our scope
           Purview of                  Engineering problem
     Facebook, Twitter, Goog
                                        Solved by providing
            le, …etc
                                               tools
Sometimes you want a
       previous version




                 The Correct Temporal
                      Intention

CNN.com at the closest time to the tweet: 25th June 2009 ~ 7pm
Sometimes you want the
      current version




                The Correct Temporal
                     Intention


In this case the current state of the press releases page
Research Question

  Can we estimate the users’
intention at the time of posting
   and reading to predict and
maintain temporal consistency?
Research Goals
• Detect the temporal intention of the:
    1.   Author upon sharing time
    2.   The reader upon dereferencing time
• Model this intention as a function of time, nature of the resource,
   and its context.
• Predict how resources change with time and the intention behind
   sharing them to minimize inconsistency.
• Implement the prediction model to automatically preserve
   vulnerable social content that is prone to change or loss.
• Create an environment implementing this framework that
   provides a smooth temporal navigation of the social web.
Related Work
•   User’s Web Search Intention       • Persistence of shared resources
     –   A. Ashkan ECIR ’09                – M. Nelson D-Lib ‘02
     –   C. Lee AINA ‘05                   – R. Sanderson OR’11
     –   A. Loser IRSW ‘08                 – F. McCown JCDL ‘07
     –   L. Azzopardi ECIR ‘09
     –   R. Baeza-Yates SPIR‘06
     –   N. Dai HT ’11
                                      • URL Shortening
                                           – D. Antoniades WWW ’11
•   Commercial Intention
     –   Q. Guo SIGIR ’10             • Tweeting, Micro-blogging and Popularity
     –   A. Benczur AIRWeb ’07
                                           – S. Wu WWW ’11
                                           – A. Java SNA-KDD ’07
•   Sentiment Analysis
                                           – H. Kwak WWW ’10
     –   G. Mishne AAAI ‘06
     –   J. Bollen JCS ‘11
                                      •   Social Networks Growth and Evolution
•   Access to Archives
                                           – B. Meeder WWW ’11
     –   H. Van de Sompel OR‘09
Dissertation Plan
  BEGIN
          Read Literature
          Collect Datasets
          Analyze Archives Coverage
          Analyze Shortened URIs
          Prototype Application
          Analyze Shared Resources Persistence and Coverage
                                                  Current
          Analyze Contextual Intention
                                                   State

          Create Intention-based dataset
          Extract Intention Features
          Train a Parametric Model to predict intention
          Evaluate, test, cross-validate the model
          Create a mockup application
          Extend the model to induce preservation
          Finish Writing the Dissertation


PhD Defense
Dissertation Plan
  BEGIN
          Read Literature
          Collect Datasets
          Analyze Archives Coverage
          Analyze Shortened URIs
          Prototype Application
          Analyze Shared Resources Persistence and Coverage

          Analyze Contextual Intention

          Create Intention-based dataset
          Extract Intention Features
          Train a Parametric Model to predict intention
          Evaluate, test, cross-validate the model
          Create a mockup application
          Extend the model to induce preservation
          Finish Writing the Dissertation


PhD Defense
Estimating Web Archiving Coverage
• Goal: Estimate how much of the public web is present in the public archives
  and how many copies are available?
• Action:
   – Getting 4 different datasets from 4 different sources:
          •   Search Engines Indices
          •   Bit.ly
          •   DMOZ
          •   Delicious.
• Results:                                         *




• Publications:
     – How much of the web is archived? JCDL '11
* Table Courtesy of Ahmed AlSum JCDL 2011
Dissertation Plan
  BEGIN
          Read Literature
          Collect Datasets
          Analyze Archives Coverage
          Analyze Shortened URIs
          Prototype Application
          Analyze Shared Resources Persistence and Coverage

          Analyze Contextual Intention

          Create Intention-based dataset
          Extract Intention Features
          Train a Parametric Model to predict intention
          Evaluate, test, cross-validate the model
          Create a mockup application
          Extend the model to induce preservation
          Finish Writing the Dissertation


PhD Defense
Shortened URI analysis
•   Goal: Have a better understanding of URI shortening and
    resolving, understand the effect of time on this process and the correlation
    between the page’s features and characteristics, and its resolution.

•   Action:
     – Fresh Bit.lys
     – Get hourly clicklogs, rate of change, social networking spread, and other
       contextual information
     – Longitudinal study

•   Evaluation:
     – Compare results with frequency of change analysis of Cho and Garcia-
       Molina.
     – Compare results with Antoniades et al. WWW 2011.
Dissertation Plan
  BEGIN
          Read Literature
          Collect Datasets
          Analyze Archives Coverage
          Analyze Shortened URIs
          Prototype Application
          Analyze Shared Resources Persistence and Coverage
          Analyze Contextual Intention

          Create Intention-based dataset
          Extract Intention Features
          Train a Parametric Model to predict intention
          Evaluate, test, cross-validate the model
          Create a mockup application
          Extend the model to induce preservation
          Finish Writing the Dissertation


PhD Defense
Estimating Loss of Shared Resources
               in Social Media
•   Goal: Estimate how much of the public web is present in the public archives
    and how many copies are available?
•   Action:
     – Sampling from 6 public events
     – Events spanning 3 years
     – Existence in the current web
     – Existence in the public archives
     – Find relation with time
•   Results:
     – After 1st year ~11% will be lost
     – After that we will continue on losing 0.02% daily
•   Publications:
     – A year after the Egyptian revolution, 10% of the social media documentation is gone.
       http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html
     – Losing my revolution: How Many Resources Shared on Social Media Have Been Lost?
       TPDL '12
Dissertation Plan
  BEGIN
          Read Literature
          Collect Datasets
          Analyze Archives Coverage
          Analyze Shortened URIs
          Prototype Application
          Analyze Shared Resources Persistence and Coverage

          User Intention Analysis
          Create Intention-based dataset
          Extract Intention Features
          Train a Parametric Model to predict intention
          Evaluate, test, cross-validate the model
          Create a mockup application
          Extend the model to induce preservation
          Finish Writing the Dissertation


PhD Defense
User Intention Analysis
•   Goal: Have a better understanding of User Intention and what factors affect
    it. Also create a new testing and training set.

•   Action:
     –   Get a sample set of tweets selected at random
     –   Extract the URIs
     –   Get closest Memento
     –   Download the snapshot & current version
     –   Use Amazon’s Mechanical Turk in choosing the best version

•   Evaluation:
     – Measure cross-rater agreement and confidence.
Proposed Work
•   Data Gathering
•   Feature Extraction
•   Modeling the intention engine
•   Evaluation
•   Application: Prediction and Preservation
Possible Solution for Jenny
Possible Solution for Jenny



       The resource has changed since last time it was shared
       Do you wish to see the version the author intended or
       the current version?

                      Current Version     Intended Version
Proposed Framework


                                               Archived Version




                 Feature
                                  Classifier
                Extraction

              Example Features:                Current Version

              - Tweet Content
              - Click Logs
              - Other Tweets
              - Shared Resource
              - Timemaps
Extra Slides
Archive Shortener Application
Estimating Shared Resources Loss in Social Media
Estimating Shared Resources Loss in Social Media
My Publications
•   S. G. Ainsworth, A. Alsum, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. How
    much of the web is archived? In Proceedings of the 11th annual international
    ACM/IEEE joint conference on Digital libraries, JCDL '11, pages 133{136, 2011.

•   H. SalahEldeen and M. L. Nelson. Losing my revolution: How much social media
    content has been lost? Accepted in TPDL 2012


•   H. SalahEldeen and M. L. Nelson. Losing my revolution: A year after the Egyptian
    revolution, 10% of the social media documentation is gone. http://ws-
    dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html.
References
•   D. Antoniades, I. Polakis, G. Kontaxis, E. Athanasopoulos, S. Ioannidis, E. P. Markatos, and T. Karagiannis. we.b: the web of short
    urls. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 715 {724, New
    York, NY, USA, 2011. ACM.
•   A. Ashkan, C. L. Clarke, E. Agichtein, and Q. Guo. Classifying and characterizing query intent. In Proceedings of the 31th
    European Conference on IR Research on Advances in Information Retrieval, ECIR '09, pages 578{586, Berlin, Heidelberg, 2009.
    Springer-Verlag.
•   L. Azzopardi and M. de Rijke. Query intention acquisition: A case study on automatically inferring structured queries. In
    Proceedings DIR-2006, 2006.
•   R. Baeza-Yates, L. Calderon-Benavides, and C. Gonzalez-Caro. The intention behind web queries. In F. Crestani, P. Ferragina, and
    M. Sanderson, editors, String Processing and Information Retrieval, volume 4209 of Lecture Notes in Computer Science, pages
    98{109. Springer Berlin / Heidelberg, 2006. 10.1007/11880561 9.
•   A. Benczur, I. Bro, K. Csalogany, and T. Sarlos. Web spam detection via commercial intent analysis. In Proceedings of the 3rd
    international workshop on Adversarial information retrieval on the web, AIRWeb '07, pages 89{92, New York, NY, USA, 2007.
    ACM.
•   J. Bollen, H. Mao, and X.-J. Zeng. Twitter mood predicts the stock market. CoRR, abs/1010.3003, 2010.
•   N. Dai, X. Qi, and B. D. Davison. Bridging link and query intent to enhance web search. In Proceedings of the 22nd ACM
    conference on Hypertext and hypermedia, HT '11, pages 17{26, New York, NY, USA, 2011. ACM.
•   N. Dai, X. Qi, and B. D. Davison. Enhancing web search with entity intent. In Proceedings of the 20th international conference
    companion on World wide web, WWW '11, pages 29{30, New York, NY, USA, 2011. ACM.
•   K. Durant and M. Smith. Predicting the political sentiment of web log posts using supervised machine learning techniques
    coupled with feature selection. In O. Nasraoui, M. Spiliopoulou, J. Srivastava, B. Mobasher, and B. Masand, editors, Advances in
    Web Mining and Web Usage Analysis, volume 4811 of Lecture Notes in Computer Science, pages 187{206. Springer Berlin /
    Heidelberg, 2007. 10.1007/978-3-540-77485-3 11.
References
•   Q. Guo and E. Agichtein. Ready to buy or just browsing?: detecting web searcher goals from interaction data. In Proceedings of the 33rd
    international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pages 130{137, New
    York, NY, USA, 2010. ACM.
•   A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th
    WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, WebKDD/SNA-KDD '07, pages 56{65, New
    York, NY, USA, 2007. ACM.
•   H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international
    conference on World wide web, WWW '10, pages 591{600, New York, NY, USA, 2010. ACM.
•   C.-H. L. Lee and A. Liu. Modeling the query intention with goals. In Proceedings of the 19th International Conference on Advanced
    Information Networking and Applications - Volume 2, AINA '05, pages 535{540, Washington, DC, USA, 2005. IEEE Computer Society.
•   A. Loser, W. M. Barczynski, and F. Brauer. What's the intention behind your query? a few observations from a large developer community.
    In IRSW, 2008.
•   F. McCown, N. Diawara, and M. L. Nelson. Factors aecting website reconstruction from the web infrastructure. In JCDL '07: Proceedings of
    the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 39{48, 2007.
•   B. Meeder, B. Karrer, A. Sayedi, R. Ravi, C. Borgs, and J. Chayes. We know who you followed last summer: inferring social link creation times
    in twitter. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 517{526, New York, NY, USA, 2011.
    ACM.
•   G. Mishne. Predicting movie sales from blogger sentiment. In In AAAI 2006 Spring Symposium on Computational Approaches to Analysing
    Weblogs (AAAI-CAAW), 2006.
•   M. L. Nelson and B. D. Allen. Object persistence and availability in digital libraries. D-Lib Magazine, 8(1), 2002.
•   R. Sanderson, M. Phillips, and H. Van de Sompel. Analyzing the persistence of referenced web resources with memento.
    CoRR, abs/1105.3459, 2011.
•   H. Van de Sompel, M. L. Nelson, R. Sanderson, L. Balakireva, S. Ainsworth, and H. Shankar. Memento: Time travel for the web.
    CoRR, abs/0911.1112, 2009.
•   S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on twitter. In Proceedings of the 20th international conference
    on World wide web, WWW '11, pages 705{714, New York, NY, USA, 2011. ACM.

Más contenido relacionado

Similar a Hany's Doctoral Consortium

Building and Managing Social Media Collections
Building and Managing Social Media CollectionsBuilding and Managing Social Media Collections
Building and Managing Social Media CollectionsJason Casden
 
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social NetworksBang Hui Lim
 
Paperprotopreso
PaperprotopresoPaperprotopreso
PaperprotopresoRschDev
 
Learning Analytics - CET Seminar 2012
Learning Analytics - CET Seminar 2012Learning Analytics - CET Seminar 2012
Learning Analytics - CET Seminar 2012Andrew Deacon
 
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012lljohnston
 
Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationMarieke van Erp
 
Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Introduction to Information Architecture & Design - SVA Workshop 03/22/14Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Introduction to Information Architecture & Design - SVA Workshop 03/22/14Robert Stribley
 
Introduction to Information Architecture & Design - SVA Workshop 06/21/14
Introduction to Information Architecture & Design - SVA Workshop 06/21/14Introduction to Information Architecture & Design - SVA Workshop 06/21/14
Introduction to Information Architecture & Design - SVA Workshop 06/21/14Robert Stribley
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignCommunitySense
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New ScienceAnita de Waard
 
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)TimelessFuture
 
Charleston 2013: The Social Side of Research
Charleston 2013: The Social Side of ResearchCharleston 2013: The Social Side of Research
Charleston 2013: The Social Side of ResearchWilliam Gunn
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...SEAD
 
Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...
Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...
Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...Riverside County Office of Education
 
Introduction to Information Architecture & Design - 10/03/15
Introduction to Information Architecture & Design - 10/03/15Introduction to Information Architecture & Design - 10/03/15
Introduction to Information Architecture & Design - 10/03/15Robert Stribley
 
Introducing PRIME:Publisher, Repository and Institutional Metadata Exchange
Introducing PRIME:Publisher, Repository and Institutional Metadata ExchangeIntroducing PRIME:Publisher, Repository and Institutional Metadata Exchange
Introducing PRIME:Publisher, Repository and Institutional Metadata ExchangeBrian Hole
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
Towards Cognitive Agents for BigData Discovery
Towards Cognitive Agents for BigData DiscoveryTowards Cognitive Agents for BigData Discovery
Towards Cognitive Agents for BigData DiscoveryJack Park
 

Similar a Hany's Doctoral Consortium (20)

Building and Managing Social Media Collections
Building and Managing Social Media CollectionsBuilding and Managing Social Media Collections
Building and Managing Social Media Collections
 
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
 
Paperprotopreso
PaperprotopresoPaperprotopreso
Paperprotopreso
 
Learning Analytics - CET Seminar 2012
Learning Analytics - CET Seminar 2012Learning Analytics - CET Seminar 2012
Learning Analytics - CET Seminar 2012
 
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
 
Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and Visualisation
 
Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Introduction to Information Architecture & Design - SVA Workshop 03/22/14Introduction to Information Architecture & Design - SVA Workshop 03/22/14
Introduction to Information Architecture & Design - SVA Workshop 03/22/14
 
Introduction to Information Architecture & Design - SVA Workshop 06/21/14
Introduction to Information Architecture & Design - SVA Workshop 06/21/14Introduction to Information Architecture & Design - SVA Workshop 06/21/14
Introduction to Information Architecture & Design - SVA Workshop 06/21/14
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems Design
 
Looking for Data: Finding New Science
Looking for Data: Finding New ScienceLooking for Data: Finding New Science
Looking for Data: Finding New Science
 
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)Towards Research Engines: Supporting Search Stages in Web Archives (2015)
Towards Research Engines: Supporting Search Stages in Web Archives (2015)
 
Charleston 2013: The Social Side of Research
Charleston 2013: The Social Side of ResearchCharleston 2013: The Social Side of Research
Charleston 2013: The Social Side of Research
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...
Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...
Dean R Berry Loss of Privacy: Necessary Evil or Unwanted Invasion Student Pro...
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
 
Introduction to Information Architecture & Design - 10/03/15
Introduction to Information Architecture & Design - 10/03/15Introduction to Information Architecture & Design - 10/03/15
Introduction to Information Architecture & Design - 10/03/15
 
Ngsp
NgspNgsp
Ngsp
 
Introducing PRIME:Publisher, Repository and Institutional Metadata Exchange
Introducing PRIME:Publisher, Repository and Institutional Metadata ExchangeIntroducing PRIME:Publisher, Repository and Institutional Metadata Exchange
Introducing PRIME:Publisher, Repository and Institutional Metadata Exchange
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
Towards Cognitive Agents for BigData Discovery
Towards Cognitive Agents for BigData DiscoveryTowards Cognitive Agents for BigData Discovery
Towards Cognitive Agents for BigData Discovery
 

Más de heinestien

MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1heinestien
 
Doctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeenDoctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeenheinestien
 
Zen & the art of data mining
Zen & the art of data miningZen & the art of data mining
Zen & the art of data miningheinestien
 
Reading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource SharingReading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource Sharingheinestien
 
Carbon Dating The Web: Estimating the Age of Web Resources
Carbon Dating The Web: Estimating the Age of Web ResourcesCarbon Dating The Web: Estimating the Age of Web Resources
Carbon Dating The Web: Estimating the Age of Web Resourcesheinestien
 
Tpdl Doctoral consortium 2012
Tpdl Doctoral consortium 2012Tpdl Doctoral consortium 2012
Tpdl Doctoral consortium 2012heinestien
 
Losing My Revolution Long Paper TPDL2012
Losing My Revolution Long Paper TPDL2012Losing My Revolution Long Paper TPDL2012
Losing My Revolution Long Paper TPDL2012heinestien
 

Más de heinestien (7)

MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1
 
Doctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeenDoctoral Defense: Hany SalahEldeen
Doctoral Defense: Hany SalahEldeen
 
Zen & the art of data mining
Zen & the art of data miningZen & the art of data mining
Zen & the art of data mining
 
Reading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource SharingReading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource Sharing
 
Carbon Dating The Web: Estimating the Age of Web Resources
Carbon Dating The Web: Estimating the Age of Web ResourcesCarbon Dating The Web: Estimating the Age of Web Resources
Carbon Dating The Web: Estimating the Age of Web Resources
 
Tpdl Doctoral consortium 2012
Tpdl Doctoral consortium 2012Tpdl Doctoral consortium 2012
Tpdl Doctoral consortium 2012
 
Losing My Revolution Long Paper TPDL2012
Losing My Revolution Long Paper TPDL2012Losing My Revolution Long Paper TPDL2012
Losing My Revolution Long Paper TPDL2012
 

Último

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Hany's Doctoral Consortium

  • 1. Detecting, Modeling, & Predicting User Temporal Intention in Social Media Hany M. SalahEldeen Old Dominion University Advisor: Dr. Michael L. Nelson JCDL ‘12 Doctoral Consortium
  • 2. Michael Jackson Dies Snapshot on: June 25th 2009 http://web.archive.org/web/20090625232522/http://www.cnn.com/
  • 3. Jeff tweets about it… Published on: June 25th 2009 https://twitter.com/mdnitehk/status/2333993907
  • 4. Jenny is off the grid Jeff’s friend Jenny was on a vacation in Hawaii for a month…
  • 5. Jenny starts catching up a month later Read on: July26th 2009 When she came back she checked Jeff’s tweets and was shocked! https://twitter.com/mdnitehk/status/2333993907
  • 6. Jenny follows the link on July 26th CNN page on: July 26th 2009 http://web.archive.org/web/20090726234411/http://www.cnn.com/
  • 7. Jenny is confused! • Implication: – Jenny thought Jeff is making a joke about her favorite singer and she got mad at him • Problem: – The tweet and the resource the tweet links to have become unsynchronized.
  • 9. Reading about it on Storify in March 2012…. http://storify.com/maq4sure/egypts-revolution
  • 10. I noticed some shared images are missing http://storify.com/maq4sure/egypts-revolution
  • 11. Some tweets are still intact… https://twitter.com/miss_amy_qb/status/32477898581483521
  • 12. …and some lost their meaning with the disappearance of the images https://twitter.com/aishes/status/32485352102952960 Missing ? https://twitter.com/omar_chaaban/status/32203697597452289
  • 13. The tweet remains but the shared image disappeared… http://yfrog.com/h5923xrvbqqvgzj
  • 14. Cairo….we have a problem • Implication: – The reader cannot understand what the author of the tweet meant because the image is not available. • Problem: – The post is available but the linked resource (image) is completely missing.
  • 15. The Anatomy of a Tweet
  • 16. The Anatomy of a Tweet Author’s username Other user mention Social Post Tweet Body Interaction Publishing Shortened URL Hash Tag options timestamp to resource Shared Resource
  • 17. 3 URIs = 3 Chances to fail
  • 18. Explanation in MJ’s example t3 t4 t5 t7 t8 t9 … tn t1 t2 t6
  • 19. User’s Temporal Intention The Focus of our research Instrumented shortener Share time Implicit Explicit Click time Implicit Explicit Instrumented web client Out of our scope Purview of Engineering problem Facebook, Twitter, Goog Solved by providing le, …etc tools
  • 20. Sometimes you want a previous version The Correct Temporal Intention CNN.com at the closest time to the tweet: 25th June 2009 ~ 7pm
  • 21. Sometimes you want the current version The Correct Temporal Intention In this case the current state of the press releases page
  • 22. Research Question Can we estimate the users’ intention at the time of posting and reading to predict and maintain temporal consistency?
  • 23. Research Goals • Detect the temporal intention of the: 1. Author upon sharing time 2. The reader upon dereferencing time • Model this intention as a function of time, nature of the resource, and its context. • Predict how resources change with time and the intention behind sharing them to minimize inconsistency. • Implement the prediction model to automatically preserve vulnerable social content that is prone to change or loss. • Create an environment implementing this framework that provides a smooth temporal navigation of the social web.
  • 24. Related Work • User’s Web Search Intention • Persistence of shared resources – A. Ashkan ECIR ’09 – M. Nelson D-Lib ‘02 – C. Lee AINA ‘05 – R. Sanderson OR’11 – A. Loser IRSW ‘08 – F. McCown JCDL ‘07 – L. Azzopardi ECIR ‘09 – R. Baeza-Yates SPIR‘06 – N. Dai HT ’11 • URL Shortening – D. Antoniades WWW ’11 • Commercial Intention – Q. Guo SIGIR ’10 • Tweeting, Micro-blogging and Popularity – A. Benczur AIRWeb ’07 – S. Wu WWW ’11 – A. Java SNA-KDD ’07 • Sentiment Analysis – H. Kwak WWW ’10 – G. Mishne AAAI ‘06 – J. Bollen JCS ‘11 • Social Networks Growth and Evolution • Access to Archives – B. Meeder WWW ’11 – H. Van de Sompel OR‘09
  • 25. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Current Analyze Contextual Intention State Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense
  • 26. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Analyze Contextual Intention Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense
  • 27. Estimating Web Archiving Coverage • Goal: Estimate how much of the public web is present in the public archives and how many copies are available? • Action: – Getting 4 different datasets from 4 different sources: • Search Engines Indices • Bit.ly • DMOZ • Delicious. • Results: * • Publications: – How much of the web is archived? JCDL '11 * Table Courtesy of Ahmed AlSum JCDL 2011
  • 28. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Analyze Contextual Intention Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense
  • 29. Shortened URI analysis • Goal: Have a better understanding of URI shortening and resolving, understand the effect of time on this process and the correlation between the page’s features and characteristics, and its resolution. • Action: – Fresh Bit.lys – Get hourly clicklogs, rate of change, social networking spread, and other contextual information – Longitudinal study • Evaluation: – Compare results with frequency of change analysis of Cho and Garcia- Molina. – Compare results with Antoniades et al. WWW 2011.
  • 30. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Analyze Contextual Intention Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense
  • 31. Estimating Loss of Shared Resources in Social Media • Goal: Estimate how much of the public web is present in the public archives and how many copies are available? • Action: – Sampling from 6 public events – Events spanning 3 years – Existence in the current web – Existence in the public archives – Find relation with time • Results: – After 1st year ~11% will be lost – After that we will continue on losing 0.02% daily • Publications: – A year after the Egyptian revolution, 10% of the social media documentation is gone. http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html – Losing my revolution: How Many Resources Shared on Social Media Have Been Lost? TPDL '12
  • 32. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage User Intention Analysis Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense
  • 33. User Intention Analysis • Goal: Have a better understanding of User Intention and what factors affect it. Also create a new testing and training set. • Action: – Get a sample set of tweets selected at random – Extract the URIs – Get closest Memento – Download the snapshot & current version – Use Amazon’s Mechanical Turk in choosing the best version • Evaluation: – Measure cross-rater agreement and confidence.
  • 34. Proposed Work • Data Gathering • Feature Extraction • Modeling the intention engine • Evaluation • Application: Prediction and Preservation
  • 36. Possible Solution for Jenny The resource has changed since last time it was shared Do you wish to see the version the author intended or the current version? Current Version Intended Version
  • 37. Proposed Framework Archived Version Feature Classifier Extraction Example Features: Current Version - Tweet Content - Click Logs - Other Tweets - Shared Resource - Timemaps
  • 38.
  • 41. Estimating Shared Resources Loss in Social Media
  • 42. Estimating Shared Resources Loss in Social Media
  • 43. My Publications • S. G. Ainsworth, A. Alsum, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. How much of the web is archived? In Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, JCDL '11, pages 133{136, 2011. • H. SalahEldeen and M. L. Nelson. Losing my revolution: How much social media content has been lost? Accepted in TPDL 2012 • H. SalahEldeen and M. L. Nelson. Losing my revolution: A year after the Egyptian revolution, 10% of the social media documentation is gone. http://ws- dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html.
  • 44. References • D. Antoniades, I. Polakis, G. Kontaxis, E. Athanasopoulos, S. Ioannidis, E. P. Markatos, and T. Karagiannis. we.b: the web of short urls. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 715 {724, New York, NY, USA, 2011. ACM. • A. Ashkan, C. L. Clarke, E. Agichtein, and Q. Guo. Classifying and characterizing query intent. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR '09, pages 578{586, Berlin, Heidelberg, 2009. Springer-Verlag. • L. Azzopardi and M. de Rijke. Query intention acquisition: A case study on automatically inferring structured queries. In Proceedings DIR-2006, 2006. • R. Baeza-Yates, L. Calderon-Benavides, and C. Gonzalez-Caro. The intention behind web queries. In F. Crestani, P. Ferragina, and M. Sanderson, editors, String Processing and Information Retrieval, volume 4209 of Lecture Notes in Computer Science, pages 98{109. Springer Berlin / Heidelberg, 2006. 10.1007/11880561 9. • A. Benczur, I. Bro, K. Csalogany, and T. Sarlos. Web spam detection via commercial intent analysis. In Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, AIRWeb '07, pages 89{92, New York, NY, USA, 2007. ACM. • J. Bollen, H. Mao, and X.-J. Zeng. Twitter mood predicts the stock market. CoRR, abs/1010.3003, 2010. • N. Dai, X. Qi, and B. D. Davison. Bridging link and query intent to enhance web search. In Proceedings of the 22nd ACM conference on Hypertext and hypermedia, HT '11, pages 17{26, New York, NY, USA, 2011. ACM. • N. Dai, X. Qi, and B. D. Davison. Enhancing web search with entity intent. In Proceedings of the 20th international conference companion on World wide web, WWW '11, pages 29{30, New York, NY, USA, 2011. ACM. • K. Durant and M. Smith. Predicting the political sentiment of web log posts using supervised machine learning techniques coupled with feature selection. In O. Nasraoui, M. Spiliopoulou, J. Srivastava, B. Mobasher, and B. Masand, editors, Advances in Web Mining and Web Usage Analysis, volume 4811 of Lecture Notes in Computer Science, pages 187{206. Springer Berlin / Heidelberg, 2007. 10.1007/978-3-540-77485-3 11.
  • 45. References • Q. Guo and E. Agichtein. Ready to buy or just browsing?: detecting web searcher goals from interaction data. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pages 130{137, New York, NY, USA, 2010. ACM. • A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, WebKDD/SNA-KDD '07, pages 56{65, New York, NY, USA, 2007. ACM. • H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, WWW '10, pages 591{600, New York, NY, USA, 2010. ACM. • C.-H. L. Lee and A. Liu. Modeling the query intention with goals. In Proceedings of the 19th International Conference on Advanced Information Networking and Applications - Volume 2, AINA '05, pages 535{540, Washington, DC, USA, 2005. IEEE Computer Society. • A. Loser, W. M. Barczynski, and F. Brauer. What's the intention behind your query? a few observations from a large developer community. In IRSW, 2008. • F. McCown, N. Diawara, and M. L. Nelson. Factors aecting website reconstruction from the web infrastructure. In JCDL '07: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 39{48, 2007. • B. Meeder, B. Karrer, A. Sayedi, R. Ravi, C. Borgs, and J. Chayes. We know who you followed last summer: inferring social link creation times in twitter. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 517{526, New York, NY, USA, 2011. ACM. • G. Mishne. Predicting movie sales from blogger sentiment. In In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW), 2006. • M. L. Nelson and B. D. Allen. Object persistence and availability in digital libraries. D-Lib Magazine, 8(1), 2002. • R. Sanderson, M. Phillips, and H. Van de Sompel. Analyzing the persistence of referenced web resources with memento. CoRR, abs/1105.3459, 2011. • H. Van de Sompel, M. L. Nelson, R. Sanderson, L. Balakireva, S. Ainsworth, and H. Shankar. Memento: Time travel for the web. CoRR, abs/0911.1112, 2009. • S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on twitter. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 705{714, New York, NY, USA, 2011. ACM.