네이버 검색엔진 조사도구_사이버컴학회(4_march2014)

사이버소셜여론 조사도구로서의 Naver
ㅡ웹보메트릭스와 빅데이터 분석
Virtual Knowledge Studio (VKS)
Prof. Dr. Han Woo PARK
CyberEmotions Research Institute
Dept. of Media & Communication
YeungNam University
214-1 Dae-dong, Gyeongsan-si,
Gyeongsangbuk-do 712-749
Republic of Korea
www.hanpark.net
cerc.yu.ac.kr
eastasia.yu.ac.kr
asia-triplehelix.org
한국사이버커뮤니케이션학회 2014년‘네이버와 포털’특별 세미나

웹 빅데이터와 검색엔진
 검색엔진이 없는 웹은 Chaos: 검색엔진은 나침반
 검색엔진이 색인한 웹문서의 수는 민간 crawler 능가함
 검색엔진을 이용한 데이터 수집은 누구나 가능함
 검색엔진의 조사결과에 대해서 Replication 용이함
 API를 이용하여 자동화된 분석도구의 개발과 접근
 실시간에 가까운 사회네트워크분석과 시각적 가시화
 다른 API 서비스와 결합해서 통합적 사회조사분석 수행

http://www.worldwidewebsize.com/

http://www.statisticbrain.com/total-
number-of-pages-indexed-by-google/

How Google searches 30 trillion web pages,
100 billion times a month
Search starts, of course, with crawling and
indexing, and Google says that the web now has
30 trillion unique individual pages. That up an
astonishing 30 times in five years:
Google reported in 2008 that the web had just
one trillion pages.
Google says that it stores information about
those 30 trillion pages in the Google Index,
which is now at 100 million gigabytes. That’s
about a thousand terabytes, and you’d need over
three million 32GB USB thumb drives to store all
that data. http://venturebeat.com/2013/03/01/how-google-searches-30-
trillion-web-pages-100-billion-times-a-month/

Introduction
 Webometrics is broadly defined as the study of web-
based content (e.g., text, images, audio-visual objects, and
hyperlinks) with primarily quantitative indicators for
social science research goals and visualization techniques
derived from information science and social network
analysis.

잠깐만요! 인터넷 연결이 필요합니당.
 [인터뷰] 대구방송TBC「서은진 기자의 SNS
세상」2012년
 'SNS, 학문으로 진화하다' http://bit.ly/wIsvjz

9
• Han Woo Park
- “hidden” and “relational” data about
lots of people as well as the few
individuals, or small groups
• Lev Manovich
- “surface” data about lots of people (i.e.,
statistical, mathematical or computational
techniques for analyzing data)
- “deep” data about the few individuals or small
groups (i.e., hermeneutics, participant
observation, thick description, semiotics, and
close reading)

First type of Webometrics
• Hyperlink Network Analysis
- Inter-linkage: who linked to whom matrix
- Co-inlink: a link to two different nodes from a third node
- Co-outlink: A link from two different nodes to a third node
Björneborn (2003)

하이퍼링크 데이터 수집과 검색엔진
 AltaVista
- 2000년 전후에 주로 사용
- but 2004년에 Yahoo가 인수한 후에 폐쇄
 Yahoo
- AltaVista 하이퍼링크 검색을 2005년 9월에 “Site Explorer” 서비스로
재개함
- 연구자들이 자동 다운로드를 위해서 API 서비스도 제공
- However, Yahoo도 API 서비스를 2011년 4월에 중지
- Site Explorer도 2011년 11월에 중지
- Yahoo 검색부분 Bing 인계됨
 Bing
- 링크 데이터 옵션을 최근 제공한다고 했으나, 실제 이용방식이 난해함
 Google: Find pages that are similar to, or link to, a URL
- 링크 데이터 옵션이 있으나, 제한적 서비스라는 이유로 일부 연구자만
사용 중

http://cybermetrics.wlv.ac.uk/QueriesForWebometrics.htm
M. Thelwall 관점: 링크와 검색엔진

R. Ackland 관점: 링크와 검색엔진

2nd type of Webometrics: Web Visibility
 웹가시성: 온라인 파워의 지표
- 대중들 사이에서 논의 중인 행위자와 이슈의 현재 상황과
출현 방식에 대한 조사
- 행위자와 이슈에 대한 대중반응에 대한 통찰력을 얻고 지속
적 추적 가능함
 검색엔진을 이용한 웹가시성 조사
- 구전효과(WOM): 관찰조사와 모니터링
- 웹영향력 측정: 검색 빈도, 웹문서 규모, 하이퍼링크
- 사회관계망 조사: 검색어의 공동출현 빈도

박한우, 소셜 여론조사의 실제와 과제- ‘저비용 고효율’ SNS
로 여론 읽기. 월간 <신문과 방송>, 2012년 7월. 84-88쪽.

웹보메트릭스에 대해서 더 궁금하다면
 http://www.hanpark.net
 Research Section에 가면 논문 원문 제공
 홈페이지 아래 Slideshow에 가면 파워포인트 제공
 Software Section에 가면 비영리학술연구용 툴 제공
 https://www.facebook.com/groups/asiatriplehelix/
* 이 슬라이는 네이버에 초점을 맞추고 있기에 검색엔
진과 웹보메트릭스 빅데이터 분석과 관련된 상세한
논의는 최소 내용만 포함함

검색엔진 네이버와 웹보메트릭스 조사분석
 국내 최고 이용자를 보유한 포털: 정보생산자
 네이버가 색인한 웹문서 분량: 정보 없음
 1st 웹보메트릭스: 하이퍼링크 검색 불가
 2nd 웹보메트릭스: 웹가시성 조사 가능
 API 제공으로 자동화된 조사도구 개발과 접근
- 1일 1계정당 25,000건 검색 가능
- But 기간과 상세검색 불가로 풍부한 분석이 어려움
http://blog.naver.com/mu1tong?Redirect=Log&logNo=20203387135

웹가시성 조사: 김무성 의원 사례

검색어 타당성 테스트
• 네이버의 경우
- 뉴스: 김무성
- 이미지: 김무성
- 동영상: 김무성
- 블로그: 김무성
- 웹문서: 김무성 < 김무성 의원
- 지식백과: 김무성
- 지식iN: 김무성 < 김무성 의원
- 사이트: 김무성
- 카페: 김무성
*뉴스라이브러리 & 오픈캐스트 검색 타당성 향후 점검
* 의미망 분석범위: 뉴스, 블로그, 웹문서, 카페, 지식iN
- 플랫폼별, 통합분석 모두 수행
* 하이퍼링크 네트워크분석: 사이트를 분석범위로 설정

김무성 트렌드 검색: PC
http://trend.naver.com/

김무성 트렌드 검색: 모바일

김무성 PC와 모바일 비교
0
20
40
60
80
100
120
20100628~20100704
20100802~20100808
20100906~20100912
20101011~20101017
20101115~20101121
20101220~20101226
20110124~20110130
20110228~20110306
20110404~20110410
20110509~20110515
20110613~20110619
20110718~20110724
20110822~20110828
20110926~20111002
20111031~20111106
20111205~20111211
20120109~20120115
20120213~20120219
20120319~20120325
20120423~20120429
20120528~20120603
20120702~20120708
20120806~20120812
20120910~20120916
20121015~20121021
20121119~20121125
20121224~20121230
20130128~20130203
20130304~20130310
20130408~20130414
20130513~20130519
20130617~20130623
20130722~20130728
20130826~20130901
20130930~20131006
20131104~20131110
20131209~20131215
20140113~20140119
20140217~20140223
PC
Mobile

네이버 오픈API를 이용한 e-리서치 도구개발
 David Stuart
- M. Thelwall 제자
- 영국옥스퍼드인터넷연구소 방문교수시 공동작업 (2008-2009년)
- 구글 번역기를 이용해서 네이버 오픈API 분석
 WeboNaver 개발과 보완
- 박한우, 박세정, David Stuart, 이승욱 (2010). API를 활용한 검색 프로
그램 WeboNaver의 이해와 적용 : 18대 국회의원의 웹 가시성 분석
과 신종플루 관련단어의 연관성 분석. Journal of the Korean Data
Analysis Society. 11권 6호 (B). 3427-3440.

WeboNaver 관련 논문
• 박한우, 박세정, David Stuart, 이승욱 (2009). API를 활용한 검색 프로그램 WeboNaver의 이해와 적용 : 18대 국회의원의 웹
가시성 분석과 신종플루 관련단어의 연관성 분석. Journal of the Korean Data Analysis Society. 11권 6호 (B). 3427-3440.
• 박한우 (2010년 12월). e-사이언스 시대의 인문사회학 연구하기-인터넷 연구방법을 중심으로. 사회과학연구. 30권, 2호,
195-211.
• 임연수, 박한우 (2010년 2월). 10.28 재보궐 선거의 블로그 캠페인에 대한 웹계량화 분석. Journal of the Korean Data
Analysis Society, 12권, 1호 (B), 539-551.
• Khan, G. F., & Park, H. W. @ (2011). Measuring the Triple Helix on the Web: Longitudinal Trends in the University-
Industry-Government Relationship in Korea. Journal of the American Society for Information Science and
Technology*.16 (12), 2443-2455.
• Khan, G.F., Cho, S.E., & Park, H. W. @ (2012). A Comparison of the Daegu and Edinburgh Musical Industries: A Triple
Helix Approach. Scientometrics*. 90 (1), 85-99.
• Lim, Y. S., & Park, H. W. @ (2011). How Do Congressional Members Appear on the Web?: Tracking the Web Visibility of
South Korean Politicians. Government Information Quarterly*. 28 (4), 514-521.
• Lim, Y. S., & Park, H.W. @ (2013). The Structural Relationship between Politicians' Web Visibility and Political Finance
Networks: A Case Study of South Korea's National Assembly Members. New Media & Society*. 15(1), 93-108.
• Nam, Y., Lee, Y.-O., Park, H.W. @ (2013). Can web ecology provide a clearer understanding of people’s information
behavior during election campaigns?. Social Science Information*. 52(1), 91-109.
• Nam, Y., Lee, Y., & Park, H.W.@ (2014 Accepted). Measuring web ecology by Facebook, Twitter, Blog and online news:
2012 general election in South Korea. Quality & Quantity*. DOI: 10.1007/s11135-014-0016-9.
• Ozel, B., & Park, H. W. @ (2012). Examining Korean political figures using co-word analysis in agreement with facial
expressions in posted self-images. COLLNET JOURNAL OF SCIENTOMETRICS & INFORMATION MANAGEMEN, 6 (1), 43-
60.
• Ozel, B., & Park, H. W. @ (2012). Online Image Content Analysis of Political Figures: An Exploratory Study, Quality &
Quantity*. 46 (4), 1013–1024. DOI 10.1007/s11135-011-9445-x
• Sams, S., Lim, Y. S., & Park, H. W. @ (2011). E-research applications for tracking online socio-political capital in the Asia-
Pacific region. Asian Journal of Communication*. 21 (5), 450-466.
• Vergeer, M., Lim, Y. S., & Park, H. W. (2011). Mediated relations: New methods to study online social capital. Asian
Journal of Communication*. 21 (5), 430-449.

Interface
25
WeboNaver API (ver. 2012-03-26)
Save Data Type
-> 기록 방식을 선택
Data Sources
-> 검색이 되는 카테고리를 선택
OutPut Format
-> 자료가 저장될 때에 포멧을 선택
Query File
-> 검색할 단어가 들어있는 TXT를 선택
Naver API, Authentication Key
-> 가지고 있는 키를 적어넣는다.
Run Queries -> 검색을 실시

Input file
26
반드시 인코딩 방식 UTF-8 으로 해야 함

Output file – Hit Count
27
When you tick it to collect only hit number, it shows
the number of web pages containing search query

Results – Web Visibility (co-occurrence)

31
Output file
WCU
WEBOMETRICS
INSTITUTE
INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS
WCU
WEBOMETRICS
INSTITUTE
INVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Manipulate
32
WeboNaver API (ver. 2012-03-26)
Manipulate
Parsed Records
-> API URL (혹은 Short URL)을 저장한 TEXT
문서를 넣는다.
Converted_Count
진행 상황을 알수 있다.
프로그램 실행

Webometric Analyst
일반적으로 domain을 많이 연구하는데, 여기서 Domain을 예로
들었다. 이를 확인하기 위해 ‘Domain’을 클릭한다.

Webometric Analyst
그 결과, 검색어 ‘김연아’ 에 대해 dreamlive.tistory.com 도메인
에서 19.6%로 가장 많은 검색 결과가 나타났고, 그 뒤를 이어
www.youtube.com(16.6%), blog.daum.net(2.5%)이 나타난다.

인터넷에서
뜨려면
웃는
사진을
올려라!

웹보네이버 이용의 상세한 절차는
특강과 워크숍이 필요함
• WeboNaver 관련 파워포인트
• http://www.slideshare.net/hanpark/understa
nding-webonaver
• http://www.slideshare.net/WcuAtYeungNam/
webo-naver-manual24-dec2009sj
• http://www.slideshare.net/goharferozkhan/ss
kbusanworkshop
• http://www.slideshare.net/goharferozkhan/st
andfordthconferencepresentation
• 비공개 내부 자료들

네이버를 e-리서치 도구로 활용과 이슈
 웹가시성(WebVisibility) 분석의 장점
- 인터넷 이용자들(공중)이 온라인 공간에서 주목하고
있는 행위자, 사건 또는 이슈들의 현존성 파악이 용이
(Ackland, Gibson, Lusoli, & Ward, 2010; Gauvin, 2010).

네이버를 e-리서치 도구로 활용과 이슈
 Park, H.W. (2012). How do social scientists use link
data from search engines to understand Internet-
based political and electoral communication.
Quality & QuantityVolume 46, Number 2, 679-
693, DOI: 10.1007/s11135-010-9421-x
http://www.springerlink.com/content/m5922633j2235586/

• Our claim, however, is that a search engine does not need to be
exhaustive, reliable, and objective. The essential purpose of the search
engine lies in returning useful information in a short period of time,
not in providing comprehensive and unbiased coverage.
• As emphasized by Thelwall (2008), the search engine should be viewed
as an engineering product, not as a mathematical tool. Further,
Elgesem (2008, p.239) argued that “search engines are objective in the
sense that these engines try to be consistent with their own stated
policies.” With respect to coverage and consistency, problems may
occur due to the nature of the unstructured web. In other words, the
lack of reliability may not be caused by the search engine.
• Science organizes, structures, and evaluates information to develop a
systematic body of knowledge. It is up to the researcher to draw the
appropriate conclusions, using his or her expertise, about the
information gathered from the web using search engines. While search
engines collect data from the entire web, finding the truth from the
information is the business of academics (Caldas et al., 2008).

Prof. Han Woo PARK
CyberEmotions Research Center
Department of Media and Communincation,
YeungNam University, Korea
hanpark@ynu.ac.kr
http://www.hanpark.net
Formerly,
World Class University Webometrics Institute
WCU
WEBOMETRICS
INSTITUTE
INVESTIGATING INTERNET-BASED POLITIC WITH E-RESEARCH TOOLS

네이버 검색엔진 조사도구_사이버컴학회(4_march2014)

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (16)

Similar a 네이버 검색엔진 조사도구_사이버컴학회(4_march2014)

Similar a 네이버 검색엔진 조사도구_사이버컴학회(4_march2014) (20)

Más de Han Woo PARK

Más de Han Woo PARK (20)

네이버 검색엔진 조사도구_사이버컴학회(4_march2014)