Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Mapping big data science


Eche un vistazo a continuación

1 de 117 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Mapping big data science (20)


Más de Han Woo PARK (20)

Más reciente (20)


Mapping big data science

  1. 1. Virtual Knowledge Studio (VKS)
  2. 2. Google in 1998
  3. 3. Google and PageRank
  4. 4. Contemporaries • Pre-Google Guys - Bill Gates & Steve Jobs: 1955 - Jeff Bezos: 1964 • Google Guys - Sergey Brin & Larry Page: 1973 - Elon Musk, Evan Williams, & Jack Dorsey: 1971/2/6 • Post-Google Guys - Kevin Systrom & Mark Zuckerberg: 1983/4
  5. 5. Park (2003)
  6. 6. Bonacich, P. (2004). The Invasion of the Physicists. Social Networks 26(3): 285-288 Graph structure in the web
  7. 7. Introduction  Webometricsis broadly defined as the study of web- based content (e.g.,text,images,audio-visual objects,and hyperlinks) with primarily quantitative indicatorsfor social science research goals and visualization techniques derived from information science and social network analysis.
  8. 8. 8 • Han Woo Park - “hidden” and “relational” data about lots of people as well as the few individuals, or small groups • Lev Manovich - “surface” data about lots of people (i.e., statistical, mathematical or computational techniques for analyzing data) - “deep” data about the few individuals or small groups (i.e., hermeneutics, participant observation, thick description, semiotics, and close reading)
  9. 9. First type of Webometrics • Hyperlink Network Analysis - Inter-linkage: who linked to whom matrix - Co-inlink: a link to two different nodes from a third node - Co-outlink: A link from two different nodes to a third node Björneborn (2003)
  10. 10. My First SSCI Research: 44 Websites  categorically selected sites  financial sites the most central  revenue sources: advertising & e.c.  common payment: credit card
  11. 11. The future of social relations  The social benefits of internet use will far outweigh the negatives over the next decade.They say this is because email, social networks, and other online tools offer ‘low‐friction’ opportunities to create, enhance, and rediscover social ties that make a difference in people’s lives.  Some 85%agreed with the statement: “In 2020,when I look at the bigpicture and consider my personal friendships,marriage and other relationships,I see that the internet has mostly been apositive force on my social world.And this will only grow more true in the future.”  “There's no escapingpeople anymore,and I believe that will yield better relationships.”—Jeff Jarvis,
  12. 12. M.Castells (2009), Communication Power  1) Networkingpower: the power over who and what is included in the network. ‘Mass self-communication’, the use of new media for private messages that are able to reach masses  2) Network power: the power of the protocols of network communication. In mass self-communication the diversity of formats is the rule and that this amplifies the diffusion of messages  3) Networked power: the power of certain nodes over other nodes inside the network.This is the managerial, agenda- setting, editorial and decision makingpower in the organizations that own or operate networks.  4) Network-makingpower: the capacity to set-up and program a network – of multimedia or traditional mass communication- by their owners and controllers
  13. 13. Given that social mediaconnect individuals in dramatically different ways, research questions are like these: W hat do people talk? W ho can see what? W ho can reply to whom? How longis content visible? W hat can link to what? W ho can link to whom? Webometricsand Hyperlink Network Analysiscan be particularlyuseful to answer these questions!!!
  14. 14. Big Data and Social Webometrics Network Analysis Increasing data size in terms of the no. of nodes Micro ≦100 nodes →10K Meso ≦1000 nodes →1000K Macro ≦10000 nodes →100,000K Super- Macro ≥10000 nodes → ∽ 출처: 박한우(2014)
  15. 15. “Those studies perpetuate the idea that linking behaviour is not random, and that links are ‘socially significant in some way’. In this perspective, links have an ‘information side-effect’, they can be used to understand other facts even though they were not individually designed to do so: ‘information side-effects are by-products of data intended for one use which can be mined in order to understand some tangential, and possibly larger scale, phenomena’
  16. 16. Park and his colleagues were extensively cited: 9 times! • Barnett GA, Chung CJ and Park HW (2011) Uncovering transnational hyperlink patterns and web mediated contents: a new approach based on domain. Social Science Computer Review 29(3): 369–384. • Hsu C and Park HW (2011) Sociology of hyperlink networks of Web 1.0, Web 2.0, and Twitter: a case study of South Korea. Social Science Computer Review 29(3): 354–368. • Park HW (2003) Hyperlink network analysis: a new method for the study of social structure on the web. Connections 25(1): 49–61. • Park HW (2010) Mapping the e-science landscape in South Korea using the webometrics method. Journal of Computer-Mediated Communication 15(2): 211–229. • Park HW and Jankowski NW (2008) A hyperlink network analysis of citizen blogs in South Korean politics. Javnost: The Public 15(2): 5–16. • Park HW and Thelwall M (2003) Hyperlink analyses of the World Wide Web: a review. Journal of Computer-Mediated Communication 8(4). • Park HW and Thelwall M (2008) Developing network indicators for ideological landscapes from the political blogosphere in South Korea. Journal of Computer- Mediated Communication 13(4): 856–879. • Park HW, Kim C and Barnett GA (2004) Socio-communicational structure among political actors on the web in South Korea. New Media & Society 6(3): 403–423. • Park HW, Thelwall M and Kluver R (2005) Political hyperlinking in South Korea: technical indicators of ideology and content. Sociological Research Online 12(3).
  17. 17. A comment from those who are NOT doing a hyperlink analysis • In a chapter of The Sage Handbook of Online Research Methods edited by Fielding et al. (2008), Horgan emphasizes that ‘link analysis’ has become an active research domain in examining social behavior online. 17
  18. 18. php?title=Online_Research
  19. 19. 2nd type of Webometrics: Web Visibility  Web mention as an indicator of online viral power and reputation  Presence or appearance of actors or issues beingdiscussed by the public (Internet users) on the web.  Trackingweb visibility is powerful way to get an insight into public reactions to actors or issues.
  20. 20. Construct validity of webometrics data Ackland, R. (2013). Web Social Science: Concepts, Data and Tools for Social Scientists in the Digital Age. Sage. P. 16.
  21. 21. How to either empirically or theoretically demonstrate the construct validity of web data for social science research? • By testing whether the online network displays structural signatures that are consistent with those displayed by real-world actors. – For example: Does Facebook friendship network data display homophily on the basis of race, ethnicity, etc.? • By testing whether variables constructed from web data are correlated with other accepted measures of the construct. – For example: If counts of inbound hyperlinks to academic project websites are correlated with other characteristics of academic teams (e.g. publications, industry connections) that are used as proxies of academic authority or performance, then this is evidence of the construct validity of hyperlink data in the context of scientometrics. • If it can be shown that an actor's position in an online network has influence on his or her performance or outcomes in a manner that accords with what is found offline.
  22. 22. How different across disciplines?
  23. 23. WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITICS WITH E-RESEARCH TOOLS Park, H. W. (2010). Mapping the e-science landscape in South Korea using the webometrics method. Journal of Computer-Mediated Communication, Vol. 15, No. 2. 211 – 229 Computational perspective based on the use of high performance computing to facilitate high-speed processing of large volumes of digital data e-Science in humanities and social sciences The networking perspective based on virtual collaboration through the Grid Two major strands exist in computational science (also called e-Science) ? A third alternative strand
  24. 24. Computational Social Science (CSS) A minor but growing approach to the study of society Focus on the methodological perspective based on the use of new digital tools to manage the data deluge
  25. 25. Computational (Social) Science  Focus on the methodological perspective based on the use of new digital tools to manage the data deluge.  D evelopment of e-science tools to automate research process.  Experimentation with new types of data visualization.
  26. 26. Measuring information exposure in dynamic and dependent networks (ExpoNet) According to the OECD's Global Science Forum 2013 report, social scientists' inability to anticipate the Arab Spring was partly due to a failure to understand 'the new ways in which humans communicate' via social media and the ways they are exposed to information. And social media's mixed record for predicting the results of recent UK elections suggests better tools and a unified methodology are needed to analyse and extract political meaning from this new type of data. •
  27. 27. Why Data Science? Savage and Burrows (2007, p. 886) lament, “Fifty years ago, academic social scientists might be seen as occupying the apex of the – generally limited – social science research ‘apparatus’. Now they occupy an increasingly marginal position in the huge research infrastructure”. Bonacich, P. (2004). The Invasion of the Physicists. Social Networks 26(3): 285-288
  28. 28. All models are wrong but some are useful Emergence of data author on dataverse
  29. 29. Andersons claims  Data is everythingwe need.  We don't have to settle for models.  Agnostic statistics.  Out with every theory of human behavior.  This approach to science — hypothesize, model, test — is becomingobsolete.  Petabytes allow us to say: "Correlation is enough." We can stop lookingfor models.  W hat can science learn from Google? E-Science.
  30. 30. Big data and the end of theory?  Does big data have the answers? Maybe some, but not all, says - Mark Graham  In 2008, Chris Anderson, then editor of W ired, wrote a provocative piece titled The End of Theory. Anderson was referring to the ways that computers, algorithms, and big data can potentially generate more insightful, useful, accurate, or true results than specialists or domain experts who traditionally craft carefully targeted hypotheses and research strategies.  W e may one day get to the point where sufficient quantities of big data can be harvested to answer all of the social questions that most concern us. I doubt it though. There will always be digital divides; always be uneven data shadows; and always be biases in how information and technology are used and produced.  And so we shouldn't forget the important role of specialists to contextualize and offer insights into what our data do, and maybe more importantly, don't tell us.
  31. 31. The Coming of Triple Divide? There are three main gaps I’d like to emphasize in the present/future of Big Data research community: 1) Developing/Transitional VS Developed/Advanced countries, 2) Researcher in academia VS Researcher in commercial sector, 3) Researchers with computational skills VS Less computational scholars.
  32. 32. Method used Developed Country/Region Developing Country/Region Mixed Region N % N % N % Social- Informetics 114 74.51 30 83.33 9 52.94 Scientometrics 28 18.30 6 16.67 8 47.06 Webometrics 11 7.19 0 0 0 0 Total 153 100 36 100 17 100 No. of articles in each category of methods by the developed/developing division Skoric, M. M. (2013, Online First). The implications of big data for developing and transitional economies: Extending the Triple Helix?. Scientometrics.
  33. 33.
  34. 34. 4 September 2008 Volume 455 Number 7209 pp1-136 "what Big Data sets mean for contemporary science”
  35. 35. This approach to science is attributed to the late Jim Gray, one of the most influential computer scientists, at Microsoft.
  36. 36. Science published a special issue (February 11, 2011) looking broadly at increasingly data-driven research efforts as a scientific domain (Science staff, 2011). Data Science is composed of interrelated clusters of research tasks. For example, the technologies on data collection, curation, and access, and the unique skill sets have increasingly been central to Data Science (Science staff, 2011).
  37. 37. Phrase map of highly occurring keywords 1999-2005 Halevi, G., & Moed, H. F. (2012).
  38. 38. Phrase map of highly occurring keywords 2006-2012 Halevi, G., & Moed, H. F. (2012).
  39. 39. Park, H. W., & Leydesdorff, L. (2013 Work-In-Progress). Decomposing a Data-Driven Science Using a Scientometric Method.  But, Halevi and Moed (2012), and Rousseau (2012) are based on descriptive statistics. Therefore, we intend to add the network perspective both in the social (in terms of co- authorship) and semantic networks.  Furthermore, we extend search queries to various terminologies related to Data Science because the term “big data” is regarded only as one among a list of policy priority issues.  We show where the research system in Data Science is “hot” in terms of international collaborations and prevailingsemantics.
  40. 40. Park, H.W.@, & Leydesdorff, L. (2013). Decomposing Social and Semantic Networks in Emerging “Big Data” Research. Journal of Informetrics*. 7 (3), 756-765.
  41. 41. The Signal and the Noise: W hy Most Predictions Fail but Some Don't. Nate Silver I do not go as far as a Popper in asserting that such theories are therefore unscientific or that they lack any value. However, the fact that the few theories we can test have produced quite poor results suggests that many of the ideas we haven’t tested are very wrong as well. We are undoubtedly living with many delusions that we do not even realize. page 15
  42. 42. OECD (2012).OECD Technology Foresight Forum 2012 - Harnessingdata as a new source of growth: Big data analytics and policies. OECD Headquarters, Paris, France 22 October 2012
  43. 43.
  44. 44. Algorithmic management of socially shared information: Facebook as a designed social system Which features should be deployed? [Ugander-Karrer-Backstrom-Kleinberg 2013] Which discussions will be most active? [Backstrom- Kleinberg-Lee-DanescuNiculescuMizil 2013] Which memes will receive the most reshares? [Cheng-Adamic-Dow-Kleinberg-Leskovec 2014] Which links should be emphasized? [Backstrom-Kleinberg 2014]
  45. 45. Typical FB user writes 60-70% of comments to ≈ 15 people. [Backstrom-Bakshy-Kleinberg-Lento-Rosenn 2011]
  46. 46. Economics in the age of big data l
  47. 47.
  48. 48.
  49. 49. Big Data for 2030 SDG
  50. 50. A more recent development was made with the establishment of journals that included the term “Data Science” in their titles: • Data Science Journal in 2002 • Journal of Data Science in 2003 • EPJ Data Science in 2012 • GigaScience in 2012 • BigData & Society in 2015
  51. 51. 1Ying Huang • Jannik Schuehle • Alan L. Porter • JanYoutie
  52. 52.
  53. 53.
  54. 54.
  55. 55.
  56. 56. innovation-observatory/files/infographics/big-data_en.pdf
  57. 57. 6/80-of-marketers-will-run-cross-channel- marketing-campaigns-in-2014-study
  58. 58. ttp:// W inter Bridge:A GlobalView of BigData
  59. 59. The chart Tim Cook doesn’t want you to see 5
  60. 60. Kim, G. H., Trimi, S., & Chung, J. H. (2014). Big-data applications in the government sector. Communications of the ACM, 57(3), 78-85
  61. 61. Kim, G. H., Trimi, S., & Chung, J. H. (2014). Big-data applications in the government sector. Communications of the ACM, 57(3), 78-85
  62. 62. Kim, G. H., Trimi, S., & Chung, J. H. (2014). Big-data applications in the government sector. Communications of the ACM, 57(3), 78-85
  63. 63. Yet, there still are serious problems to overcome. A trenchant critique concerning the big data field as it is nowadays came in the form of six statements intending to temper unbridled enthusiasm. [42] These six provocative statements are:  Bigdata change the definition of knowledge;  Claims to accuracy and objectivity are misleading;  More data are not always better data;  Taken out of context, bigdata loses its meaning;  Just because it is accessible, it does not make it ethical; and  (Limited) access to bigdata creates a new digital divide. Rousseau (2012)
  64. 64. Big Data's Slippery Issue of Causation vs. Correlation
  65. 65. Big Data's Slippery Issue of Causation vs. Correlation
  66. 66. W inter Bridge:A GlobalView of BigData
  67. 67. W inter Bridge:A GlobalView of BigData
  68. 68. 78
  69. 69.
  70. 70.
  71. 71.
  72. 72. questions-trope-conservatives-are-happier-liberals
  73. 73. Kobayashi, T., & Boase, J. (2012). No Such Effect? The Implications of Measurement Error in Self-Report Measures of Mobile Communication Use. Communication Methods and Measures, 6, 1–18. DOI: 10.1080/19312458.2012.679243
  74. 74. N. A. Christakis, & J. H. Fowler (2009). Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives.  NY Times
  75. 75. 85
  76. 76. Christakis, N. A., & Fowler, J. H. (2014). Friendship and natural selection. Proceedings of the National Academy of Sciences, 111(3), 10796–10801. Friendship and natural selection
  77. 77. 창조를 위해선 적당히 좁은 세상이 필요함 Financial success of Broadway musicals 1945 to 1989
  78. 78. 좁은 세상과 예술적 성공 Artistic success of a show
  79. 79. 89 Borgatti et al (2009) Structural holes
  80. 80. Using Big Data to Fight Range Anxiety in Electric Vehicles • The software acquires data from five sources: Google Maps (for route, terrain, and traffic data), (for weather), driver history (through driving behavior measurements), vehicle manufacturers (for vehicle modeling data), and battery manufacturers (for battery modeling data).
  81. 81. W inter Bridge:A GlobalView of BigData
  82. 82. Mike Thelwall: WA 2.0
  83. 83. March Smith: NodeXL
  84. 84. Han Woo PARK KrKWIC, WeboNaver, WeboDaum
  85. 85.
  86. 86.
  87. 87.
  88. 88.
  89. 89.
  90. 90.
  91. 91.
  92. 92.
  93. 93.
  94. 94. ArcGIS 를 이용한 오픈데이터 툴. 세계은행 데이터 등 cool
  95. 95.
  96. 96. Oreilly 10 data trends on our radar for 2016 1. Metadata 2. Systems optimization via deep neural networks For example, as shown in the screenshot below, a search on Google for "let it be lyrics" returns the lyrics of the classic Beatles song at the top of the search results. But a search for "let it go lyrics" doesn't return such an interface element, despite the immense popularity of this Disney song and the wide availability of its lyrics.
  97. 97. Help users ask good questions, rather than attempt to answer bad ones. You can see this in action on LinkedIn, where typing "micr" into a search box triggers search suggestions like "Jobs at Microsoft" and "People who work at Microsoft":
  98. 98. Artificial Intelligence and Intelligence Augmentation: Very Different Approaches Yield Very Different Results “Artificial intelligence” is the idea of a computer system that, by reproducing human cognition, allows that system to function autonomously and effectively in a given domain. An AI system demonstrates a kind of intentionality—it initiates action in its environment and pursues goals “Intelligence augmentation,” on the other hand, is the idea of a computer system that supplements and supports human thinking, analysis, and planning, leaving the intentionality of a human actor at the heart of the human-computer interaction. Because intelligence augmentation focuses on the interaction of humans and computers, rather than on computers alone, it is also referred to as “HCI.” intelligence-augmentation-debate
  99. 99. Twitter taught Microsoft’s AI chatbot to be a racist asshole in less than a day
  100. 100.
  101. 101. Prof. Han Woo PARK Department of Media and Communincation, YeungNam University, Korea WCU WEBOMETRICS INSTITUTE INVESTIGATING INTERNET-BASED POLITIC WITH E-RESEARCH TOOLS