SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
論文サーベイ@研究会2012.04.19:	
  
                               	
  
           N.Shibata,	
  Y.Kajikawa,	
  I.Sakata,	
  	
  
     “Link	
  Predic?on	
  in	
  Cita?on	
  Networks”	
  
Journal	
  of	
  the	
  American	
  society	
  for	
  informa?on	
  
  science	
  and	
  technology,	
  63(1):	
  78-­‐85,	
  2012	


佐々木一	
  Hajime	
  SASAKI	
  
政策ビジョン研究センター 特任研究員	
  
工学系研究科総合研究機構イノベーション政策研究センター 連携研究員	
  
技術経営戦略学専攻 坂田一郎研究室 協力研究員	
  
概要と結論	
•  概要:学術論文の引用関係の予測問題を、グラ
   フ構造のリンク予測問題と捉え、5つの学術分野
   を対象に11の素性を適用しSVMを分類器とした
   モデル化を行った。	
  

•  結論1:分類器の性能指標であるf値より、良いリ
   ンク予測のモデル化ができた。	
  
•  結論2:分野の構造によって、効果のある素性が
   異なる。従って、分野の構造ごとに異なるモデル
   を適用させる必要があることがわかった。	
  
Introduc?on	
•  The	
  number	
  of	
  academic	
  papers	
  exponen?ally	
  increases	
  (Price,	
  
   1965),	
  each	
  academic	
  area	
  becomes	
  specialized	
  and	
  segmented.	
  
•  The	
  individual	
  scien?st	
  has	
  to	
  focus	
  on	
  or	
  specialize	
  in	
  only	
  a	
  few	
  
   scien?fic	
  subdomains	
  to	
  keep	
  up	
  with	
  the	
  growth	
  of	
  the	
  domains,	
  
   which	
  means	
  that	
  researchers	
  must	
  focus	
  on	
  increasingly	
  narrowing	
  
   domains.	
  

Research	
  Ques?on:	
  What	
  factors	
  affect	
  the	
  existence	
  of	
  links	
  using	
  
features	
  intrinsic	
  to	
  the	
  network	
  itself,	
  namely,	
  link	
  predic+on,	
  which	
  
will	
  help	
  scholars	
  to	
  know	
  which	
  paper	
  to	
  cite	
  and	
  managers	
  to	
  
iden?fy	
  future	
  core	
  papers?	
  
	
  
•  In	
  this	
  ar?cle,	
  The	
  authors	
  u?lize	
  textual,	
  topological,	
  and	
  abribute	
  
     features	
  for	
  link	
  predic?on,	
  which	
  are	
  considered	
  to	
  influence	
  ci?ng	
  
     behaviors.	
  
既存研究	
•    Liben-­‐Nowell	
  and	
  Kleinberg	
  (2003)	
  :	
  proposed	
  a	
  model	
  for	
  link	
  predic?on	
  in	
  large	
  
     coauthorship	
  networks.	
  
•    Clauset,	
  Moore,	
  and	
  Newman	
  (2008):	
  inves?gated	
  the	
  hierarchical	
  structure	
  of	
  social	
  
     networks	
  to	
  predict	
  missing	
  connec?ons	
  in	
  par?ally	
  known	
  networks	
  with	
  high	
  accuracy.	
  
•    Popescul	
  and	
  Ungar	
  (2003):	
  proposed	
  a	
  new	
  approach	
  for	
  Sta?s?cal	
  Rela?onal	
  Learning	
  to	
  
     build	
  link	
  predic?on	
  models.	
  
•    Hasan,	
  Chaoji,	
  Salem,	
  and	
  Zaki	
  (2006):	
  tested	
  several	
  supervised	
  learning	
  models	
  (decision	
  
     tree,	
  k-­‐nearest	
  neighbor,	
  mul?layer	
  percep?on,	
  support	
  vector	
  machine	
  [SVM],	
  radial	
  basis	
  
     func?on	
  [RBF]	
  network)	
  for	
  link	
  predic?ons	
  
•    Murata	
  and	
  Moriyasu	
  (2008):	
  applied	
  the	
  model	
  of	
  Liben-­‐Nowell	
  and	
  Kleinberg	
  to	
  social	
  
     networks	
  of	
  Ques?on-­‐Answering	
  Bulle?n	
  Boards.	
  	
  
•    Caragea,	
  Bahirwani,	
  Aljandal,	
  and	
  Hsu	
  (2009)	
  :proposed	
  an	
  algorithm	
  to	
  predict	
  poten?al	
  
     friendships	
  based	
  on	
  a	
  clustering	
  approach	
  in	
  Live-­‐	
  Journal,	
  a	
  social	
  network	
  journal	
  service	
  
     with	
  a	
  focus	
  on	
  user	
  interac?ons.	
  	
  
•    Lu,	
  Jin,	
  and	
  Zhou	
  (2009)	
  :presented	
  a	
  local	
  path	
  index	
  to	
  es?mate	
  the	
  likelihood	
  of	
  the	
  
     existence	
  of	
  a	
  link	
  between	
  two	
  nodes.	
  	
  
•    Seglen	
  (1994)	
  :analysed	
  the	
  trends	
  of	
  papers	
  in	
  the	
  journals	
  with	
  large	
  impact	
  factors.	
  	
  
•    Vinkler	
  and	
  Davidson	
  (2002)	
  :indicated	
  that	
  the	
  papers	
  in	
  growing	
  journals	
  in	
  terms	
  of	
  the	
  
     number	
  of	
  papers	
  are	
  more	
  likely	
  to	
  be	
  cited.	
  	
  
•    Hwang,	
  Wylie,	
  Wei,	
  and	
  Liao	
  (2010):	
  proposed	
  recommenda?on	
  engines	
  based	
  on	
  the	
  
     coauthorship	
  networks.
本研究の特徴	
•  1:The	
  focus	
  is	
  on	
  cita?on	
  networks.	
  
引用ネットワークに着目した。	
  
	
  
•  2:The	
  authors	
  apply	
  SVMs	
  as	
  our	
  supervised	
  learning	
  
     method,	
  as	
  SVM	
  is	
  the	
  best	
  learner	
  according	
  to	
  Hasan	
  et	
  
     al.	
  (2006).	
  	
  
教師あり学習における分類器としてSVMを利用した。	
  
	
  
•  3:	
  The	
  authors	
  use	
  more	
  comprehensive	
  features	
  op?mized	
  
     for	
  cita?on	
  networks.	
  	
  
引用ネットワークを対象するにあたって、網羅的な素性を適
用した。	
  
本研究の意義	
•  Helps	
  us	
  make	
  decisions	
  whether	
  to	
  link	
  more	
  accurately	
  even	
  with	
  
   a	
  huge	
  number	
  data.	
  

•  Applica?on:引用推薦システムを構築する	
  
Cita?on	
  recommenda?on	
  system	
  for	
  authors	
  of	
  scien?fic	
  publica?ons	
  
and	
  patents. 	
  
     –  The	
  reviewers	
  of	
  scien?fic	
  papers	
  can	
  reduce	
  their	
  ?me	
  to	
  check	
  
        whether	
  the	
  references	
  in	
  those	
  papers	
  are	
  adequate	
  or	
  not.(査読に
        おいて、適切な論文を引用しているかどうかを効率的に判断できる)	
  	
  
     –  Second,	
  well-­‐	
  organized	
  link	
  predic?on	
  can	
  reveal	
  how	
  and	
  why	
  
        authors	
  cite	
  other	
  scien?fic	
  papers.	
  	
  (著者が引用した理由がわかる)	
  
     –  Finally,	
  link	
  predic?on	
  can	
  bond	
  different	
  research	
  fields	
  with	
  similar	
  
        topics	
  but	
  from	
  different	
  disciplines.(類する問題を扱っている異なる
        学術分野をつなぐことができる)
SVMにおけるマージン最大化	




赤丸と青丸を分ける直線は無数に存在。 SVMでは
その無数の直線の中から、もっとも適したものを選
ぶために「マージン最大化」を考える。	
              f(x)=0:	
  分離超平面	
	
  
マージンとは、分離を行う直線と、その直線にもっと
も近い丸との距離のこと。 データにはばらつきがあ    この線を満たすパラメータ決定	
  
るので、間違った判断をしないためにはこのマージ
ンが大きい方が良さそう。	
              手法:Support	
  Vector	
  Machine	
図の例では、青い直線より赤い線の方がマージンが
大きいので、赤い直線の優れた分離だと考えられる。
SVMはマージンがもっとも大きい直線を見つけること
で、未知のデータも正しく分類しようとする。	
  
補足:なんでマージンが 2/||ω||なの?	




             b=0とすると、	
  
             d	
  =	
  1/a	
  
             マージン:2d=2/a	
  
線形分離できない場合
オーバーフィッティング	


A	




B	


              オーバーフィットして,サンプル(パラメー
              タ)を増やしても真の解に近づかない。	
  
              	
  なめらかさなどの制約をおいて対処する
              (正則化)	

C	
             予測モデルは	
  
                シンプルにしたい。
and w = (w1 , w2 , . . . , wd ) is the parameter vector of the same
dimension that specifies the model. A positive value of wj
indicates that the j-th feature xj positively contributes to the
prediction, while a negative value contributes to it negatively.
             できるだけ確信度を持って間違いを少なく
The sign function returns +1 when its argument is positive,
             するという項(損失)と、できるだけシンプル
and returns −1 otherwise. Given the data set X and Y , the
             なモデルを採用するという項(正則化項)の
SVM learning algorithm finds the optimal parameter w∗ that
             和を最小化したい。	
minimizes the following objective function:
                   max{1 − yi h(xi ), 0} + c w 2 ,
                                               2
               i
             損失関数:間違った判別の       正則化項:	
  
             際にペナルティ。	
         学習データに対して過度に適応して
FORMATION SCIENCE AND TECHNOLOGY—January 2012   79
                                しまうと、未知のデータに対する性能
                               DOI: 10.1002/asi
                                (汎化性能)が逆に落ちてしまう	
  
                                オーバーフィッティング防止。	
  


               全体を最小にするようなパラメータ(ウェイト)を決めたい。
素性 (全部で11種)	
Topological	
  Features	
  
•  (1)	
  The	
  number	
  of	
  common	
  neighbours.	
  (共通ノード数)	
  
•  (2)	
  Link-­‐based	
  Jaccard	
  coefficient.	
  (共通ノードの割合)	
  
•  (3)	
  Difference	
  in	
  betweenness	
  centrality.(媒介中心の高いnodeを引用)	
  
•  (4)	
  Difference	
  in	
  the	
  number	
  of	
  in-­‐links.	
  (リンク数が多いnode引用)	
  
•  (5)	
  Is	
  same	
  cluster(同じクラスタ内かどうか)	
  
Seman3c	
  Features	
  
•  (6)	
  Cosine	
  similarity	
  of	
  term	
  frequency–inverse	
  document	
  frequency	
  (M–idf)	
  
   vectors.(同じ意味的特徴を有しているか)	
  
A5ribute	
  Features	
  
•  (7)	
  Difference	
  in	
  publica+on	
  year.(最近のものは良く引用される)	
  
•  (8)	
  The	
  number	
  of	
  common	
  authors.(共通著者数)	
  
•  (9)	
  Is	
  self	
  cita+on.(同じ著者)	
  
•  (10)	
  Is	
  published	
  in	
  same	
  journal.(同じジャーナルかどうか)	
  
•  (11)	
  Number	
  of	
  +mes	
  “to”	
  cited.(富めるものはますます富む)
Dataset	

TABLE 2.    Datasets of citation networks.

Dataset                                               Query                               Published through         No. of papers          No. of citations

A Innovation                     innovation*                                                    2009                     20,564               106,619
B Nano Bio                       nano* and bio*                                                 2009                     33,830               175,875
C Organic LED                    ((organic* or polymer*) and (electroluminescen* or             2009                     19,486               196,123
                                    electro-luminescen* or electro luminescen* or
                                    light emitting or LED*)) or OLED*
D Solar Cells                    solar cell*                                                    2008                     18,587               111,051
E Secondary Batteries (*)        ((secondary or storage or rechargeable or reserve)             2008                     20,430               145,008
                                    and cell*) or batter*



Data and Experiment                                                          TABLE 3.     Prediction results.

   In this article, five large-scale citation datasets, Innovation,           Dataset                            Precision           Recall              F1
Nano Bio, Organic LED, Solar Cells, and Secondary Batter-                    A   Innovation                       0.75              0.91              0.82
ies, are collected as shown in Table 2. We searched databases                B   Nano Bio                         0.83              0.76              0.79
of academic papers and patents using the same query for each                 C   Organic LED                      0.79              0.71              0.74
domain. The databases of academic papers used are the Sci-                   D   Solar Cells                      0.76              0.72              0.74
ence Citation Index Expanded (SCI-EXPANDED), the Social                      E   Secondary Batteries              0.80              0.77              0.77
Sciences Citation Index (SSCI), and the Arts & Humanities
Citation Index (A&HCI) compiled by the Institute for Sci-
entific Information (ISI). After collecting data, we extracted                    4. We repeated step 3 five times in total with different choice
the papers and citations in the largest-graph component to                          of answer set.
Cross	
  Valida?on(交差検定)	
   •  1.	
  These	
  exis?ng	
  cita?ons	
  are	
  divided	
  into	
  five	
  groups	
  (posi?ve	
  instances,	
  
      namely,	
  P[1]	
  to	
  P[5]).	
  
   •  2.	
  We	
  randomly	
  created	
  the	
  same	
  number	
  of	
  pair	
  where	
  cita?ons	
  did	
  not	
  exist	
  
      (nega?ve	
  instances,	
  namely,	
  N[1]	
  to	
  N[5]).	
  
   •  3.	
  In	
  the	
  first	
  experiment,	
  P[2]	
  to	
  P[5]	
  and	
  N[2]	
  to	
  N[5]	
  were	
  used	
  as	
  the	
  
      training	
  data	
  and	
  P[1]	
  and	
  N[1]	
  were	
  used	
  as	
  the	
  test	
  data.	
  
   •  4.	
  We	
  repeated	
  step3	
  five	
  ?mes	
  in	
  total	
  with	
  different	
  choice	
  of	
  answer	
  set.	

         引用有りデータ	
                                                       引用無しデータ	
         テストデータ	
 学習データ	
                                                テストデータ	
 学習データ	


1回目:	
       P1	
        P2	
        P3	
        P4	
         P5	
          N1	
        N2	
        N3	
         N4	
        N5	

2回目:	
       P1	
        P2	
        P3	
        P4	
         P5	
          N1	
        N2	
        N3	
         N4	
        N5	

3回目:	
       P1	
        P2	
        P3	
        P4	
         P5	
          N1	
        N2	
        N3	
         N4	
        N5	

4回目:	
       P1	
        P2	
        P3	
        P4	
         P5	
          N1	
        N2	
        N3	
         N4	
        N5	

5回目:	
       P1	
        P2	
        P3	
        P4	
         P5	
          N1	
        N2	
        N3	
         N4	
        N5
評価指標:Precision,	
  Recall,	
  F-­‐value	
交差行列	
                               True	
  Result	
  
                                     (真の結果)	
                                     Posi?ve	
           Nega?ve
                                     (正例)	
              (負例)	
              精度	
    	
           Posi?ve	
                                                                           TP	
  
Predic?on	
  
                 (正例)	
                 TP	
 FP	
                            Precision:	
  =	
  	
                                                                                                   TP	
  +	
  FP	
  
  (予測)	
         Nega?ve
                 (負例)	
                 FN	
 TN	
                再現率	
                     TP	
                                       2	
  *	
  Precision	
  *	
  Recall	
  
                Recall:	
  =	
  	
                             F-­‐value:	
  =	
  	
                                     TP	
  +	
  FN	
                                   Precision	
  +	
  Recall	
  	
  
                                                                                 精度と再現率の調和平均
2008                    18,587               111,051
e or reserve)                 2008                    20,430               145,008


                                             Result	
         TABLE 3.      Prediction results.

on,      Dataset                             Precision           Recall                F1
 er-     A   Innovation                        0.75               0.91             0.82
ses      B   Nano Bio                          0.83               0.76             0.79
ach      C   Organic LED                       0.79               0.71             0.74
 ci-     D   Solar Cells                       0.76               0.72             0.74
 ial     E   Secondary Batteries               0.80               0.77             0.77
 ies
 ci-         f-­‐value:	
  0.74~0.82:	
  
ted           4. We repeated step 3 five times in total with different choice
  to
             Based	
  on	
  the	
  results,	
  we	
  obtained	
  the	
  learning	
  
                  of answer set.
led          model	
  on	
  our	
  training	
  data.	
                As a learner, we employed L2-regularized and L2-loss
  D,
Weights	
  of	
  features	
                                                        Posi?ve	
  contribu?on:	
  >= 0.5	
                                                                                       Nega?ve	
  contribu?on:	
  <= -­‐0.5	
 TABLE 4.     Weights of features.                                                     No	
  contribu?on:	
  	
  -­‐0.5~0.5	
                                                                                                                                                E. Secondary
 Features                                         A. Innovation          B. Nano Bio           C. Organic LED            D. Solar Cells           Batteries

 1. No. common neighbors                               0.566                 0.889                   0.520                    0.683                 0.987
 2. Link-based Jaccard coefficient                      1.354                 2.198                  −6.150                   −0.703                −4.742
 3. Difference in betweenness centrality              −1.446                −6.107                  −2.175                   −5.468               −10.049
 4. Difference in the number of in-links               0.052                 0.033                   0.034                    0.045                 0.047
 5. Is same cluster                                    0.018                 0.086                  −0.308                   −0.160                −0.062
 6. Cosine similarity of tf-idf vectors              −19.897               −17.817                 −15.527                    1.624                 1.519
 7. Difference in publication year                     0.018                 0.046                   0.032                    0.009                 0.008
 8. The number of common authors                      −0.112                 0.476                   0.403                    0.152                 0.036
 9. Is self-citation                                   1.975                 0.756                   0.605                    0.865                 0.918
 10. Is published in same journal                      0.726                 0.614                   0.198                    0.027                −0.108
 11. Number of times “to” cited                       −0.018                −0.019                  −0.015                   −0.031                −0.033


・Especially,	
  (2),	
  (3)	
  and	
  (6)	
  largely	
  affected	
  the	
  predic?ons	
  of	
  cita?ons.	
  
・(2):	
  (A)	
  (B)	
  comprise	
  mul?ple	
  research	
  fields	
  and	
  most	
  cita?ons	
  are	
  in	
  each	
  research	
  field	
  so	
  
that	
  papers	
  ofite	
  locally.	
  (C),	
  (D)	
  and	
  (E)	
  are	
  contained	
  in	
  a	
  research	
  field	
  with	
  a	
  single.	
   cases, because
  the existence c a citation with a probability from 74% to                       of common neighbours positively affected all
・(3):	
  igiven are	
  that	
  core	
  nodes	
  and	
  citation network. thewhich	
  have	
  different	
  values	
  of	
   have, the more
  82%, t	
  is	
  r a pair of papers and the entire peripheral	
  nodes,	
   more common neighbours two papers
  Especially three features, (2) link-based Jaccard coefficient,                   related they are. That the self-citation result had a posi-
betweenness	
  centrality,	
  centrality, andin	
  the	
  cita?on	
  ntive effect is reasonable because authors tend to cite their
  (3) difference in betweenness
                                         are	
  linked	
   (6) cosine sim- etworks.	
  	
  
・(6):	
  same	
  as	
  vectors, largely affected the predictions of own papers. The feature of is published in the same jour-
  ilarity of tf–idf (3)	
  
・(1):	
  the	
  more	
  common	
  neighbours	
  two	
  papers	
  have,	
  affectedore	
  r(A) Innovations are.	
   and (B) Nano Bio
  citations.                                                                      nal the	
  m only elated	
  they	
   (0.726)
・(9):	
  because	
  authors	
  tend	
  tcontributed positivelypin (0.614) positively. Similar to the result of link-based Jaccard
     Link-based Jaccard coefficient o	
  cite	
  their	
  own	
   apers.	
  
  the cases of (A) Innovations (weight: 1.354) and (B) Nano                       coefficient, papers tend to cite in each research field in the
・(10):	
  same	
  anegatively in the cases of (C) Organic LED case of research fields with multiple issues.
  Bio (2.198) but       s	
  (3),(6)	
  
	
(−6.150), (D) Solar Cells (−0.703) and (E) Secondary                               In summary, different models are required for differ-
 Batteries (−4.742). These results indicate that the former                       ent types of research areas—research fields with a single
 research areas, such as (A) Innovations and (B) Nano Bio,                        issue or research fields with multiple issues. In the case
Summary	
	
  
•  It	
  is	
  difficult	
  to	
  build	
  a	
  universal	
  learner	
  for	
  link	
  
     predic?on	
  and	
  we	
  need	
  to	
  build	
  learners	
  based	
  on	
  the	
  
     characteris?cs	
  of	
  each	
  research	
  domain.	
  	
  
	
  
•  Different	
  models	
  are	
  required	
  for	
  different	
  types	
  of	
  
     research	
  areas—research	
  fields	
  with	
  a	
  single	
  issue	
  or	
  
     research	
  fields	
  with	
  mul?ple	
  issues.	
  	
  
     –  The	
  first	
  one	
  is	
  the	
  research	
  field	
  with	
  mul?ple	
  issues	
  such	
  
        as	
  (A)	
  Innova?ons	
  and	
  (B)	
  Nano	
  Bio.	
  
     –  The	
  second	
  one	
  is	
  a	
  simple	
  research	
  field	
  type	
  with	
  
        commonly	
  understood	
  targets	
  of	
  research	
  and	
  
        development	
  such	
  as	
  (C)	
  Organic	
  LED,	
  (D)	
  Solar	
  Cells	
  and	
  
        (E)	
  Secondary	
  Baberies.

Más contenido relacionado

La actualidad más candente

Semi-supervised learning approach using modified self-training algorithm to c...
Semi-supervised learning approach using modified self-training algorithm to c...Semi-supervised learning approach using modified self-training algorithm to c...
Semi-supervised learning approach using modified self-training algorithm to c...IJECEIAES
 
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...CSCJournals
 
184816386 x mining
184816386 x mining184816386 x mining
184816386 x mining496573
 
Reflectivity Parameter Extraction from RADAR Images Using Back Propagation Al...
Reflectivity Parameter Extraction from RADAR Images Using Back Propagation Al...Reflectivity Parameter Extraction from RADAR Images Using Back Propagation Al...
Reflectivity Parameter Extraction from RADAR Images Using Back Propagation Al...IJECEIAES
 
Bhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueBhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueVijayananda Mohire
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningMLAI2
 
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsAlbert Orriols-Puig
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesIRJET Journal
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...csandit
 
Vensoft IEEE 2014 2015 Matlab Projects tiltle Image Processing Wireless Signa...
Vensoft IEEE 2014 2015 Matlab Projects tiltle Image Processing Wireless Signa...Vensoft IEEE 2014 2015 Matlab Projects tiltle Image Processing Wireless Signa...
Vensoft IEEE 2014 2015 Matlab Projects tiltle Image Processing Wireless Signa...Vensoft Technologies
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...MLAI2
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
 
20141003.journal club
20141003.journal club20141003.journal club
20141003.journal clubHayaru SHOUNO
 

La actualidad más candente (16)

Semi-supervised learning approach using modified self-training algorithm to c...
Semi-supervised learning approach using modified self-training algorithm to c...Semi-supervised learning approach using modified self-training algorithm to c...
Semi-supervised learning approach using modified self-training algorithm to c...
 
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...
A linear-Discriminant-Analysis-Based Approach to Enhance the Performance of F...
 
184816386 x mining
184816386 x mining184816386 x mining
184816386 x mining
 
Reflectivity Parameter Extraction from RADAR Images Using Back Propagation Al...
Reflectivity Parameter Extraction from RADAR Images Using Back Propagation Al...Reflectivity Parameter Extraction from RADAR Images Using Back Propagation Al...
Reflectivity Parameter Extraction from RADAR Images Using Back Propagation Al...
 
Bhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueBhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogue
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
 
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data sets
 
Bc34333339
Bc34333339Bc34333339
Bc34333339
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
 
C42021115
C42021115C42021115
C42021115
 
Ds2 statistics
Ds2 statisticsDs2 statistics
Ds2 statistics
 
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
X-TREPAN : A Multi Class Regression and Adapted Extraction of Comprehensible ...
 
Vensoft IEEE 2014 2015 Matlab Projects tiltle Image Processing Wireless Signa...
Vensoft IEEE 2014 2015 Matlab Projects tiltle Image Processing Wireless Signa...Vensoft IEEE 2014 2015 Matlab Projects tiltle Image Processing Wireless Signa...
Vensoft IEEE 2014 2015 Matlab Projects tiltle Image Processing Wireless Signa...
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and Python
 
20141003.journal club
20141003.journal club20141003.journal club
20141003.journal club
 

Similar a 論文サーベイ(Sasaki)

Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationDai-Hai Nguyen
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...ssuser4b1f48
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchTill Blume
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptxthanhdowork
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesIRJET Journal
 
Overview of DuraMat software tool development
Overview of DuraMat software tool developmentOverview of DuraMat software tool development
Overview of DuraMat software tool developmentAnubhav Jain
 
Predicting Molecular Properties
Predicting Molecular PropertiesPredicting Molecular Properties
Predicting Molecular PropertiesYassin Youssfi
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...IAEME Publication
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Ganesan Narayanasamy
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science CommunicationIsabelle Augenstein
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Daniel Valcarce
 
Cs 1004 -_data_warehousing_and_data_mining
Cs 1004 -_data_warehousing_and_data_miningCs 1004 -_data_warehousing_and_data_mining
Cs 1004 -_data_warehousing_and_data_mininghari91
 
On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...IJDKP
 
On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...IJDKP
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...cscpconf
 

Similar a 論文サーベイ(Sasaki) (20)

Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identification
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data search
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
 
Lodhi final viva voce
Lodhi final viva voceLodhi final viva voce
Lodhi final viva voce
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning Techniques
 
Overview of DuraMat software tool development
Overview of DuraMat software tool developmentOverview of DuraMat software tool development
Overview of DuraMat software tool development
 
Predicting Molecular Properties
Predicting Molecular PropertiesPredicting Molecular Properties
Predicting Molecular Properties
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
169 s170
169 s170169 s170
169 s170
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
 
Edbt2014 talk
Edbt2014 talkEdbt2014 talk
Edbt2014 talk
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
 
Cs 1004 -_data_warehousing_and_data_mining
Cs 1004 -_data_warehousing_and_data_miningCs 1004 -_data_warehousing_and_data_mining
Cs 1004 -_data_warehousing_and_data_mining
 
On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...
 
On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...On Using Network Science in Mining Developers Collaboration in Software Engin...
On Using Network Science in Mining Developers Collaboration in Software Engin...
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI),International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI),
 
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
 

Más de Hajime Sasaki

whitepapermemo_web3sasakihajime.pdf
whitepapermemo_web3sasakihajime.pdfwhitepapermemo_web3sasakihajime.pdf
whitepapermemo_web3sasakihajime.pdfHajime Sasaki
 
知識社会マネジメント1_0408.pdf
知識社会マネジメント1_0408.pdf知識社会マネジメント1_0408.pdf
知識社会マネジメント1_0408.pdfHajime Sasaki
 
WI2研究会(公開用) “データ分析でよく使う前処理の整理と対処”
WI2研究会(公開用) “データ分析でよく使う前処理の整理と対処” WI2研究会(公開用) “データ分析でよく使う前処理の整理と対処”
WI2研究会(公開用) “データ分析でよく使う前処理の整理と対処” Hajime Sasaki
 
Picmet15sasaki20150805.ppt
Picmet15sasaki20150805.pptPicmet15sasaki20150805.ppt
Picmet15sasaki20150805.pptHajime Sasaki
 
Sasaki.informs2014(2)
Sasaki.informs2014(2)Sasaki.informs2014(2)
Sasaki.informs2014(2)Hajime Sasaki
 
Chapter.13: Goals, Power and Sample Size "Doing Bayesian Data Analysis: A Tu...
Chapter.13: Goals, Power and Sample Size "Doing Bayesian Data Analysis:  A Tu...Chapter.13: Goals, Power and Sample Size "Doing Bayesian Data Analysis:  A Tu...
Chapter.13: Goals, Power and Sample Size "Doing Bayesian Data Analysis: A Tu...Hajime Sasaki
 
A Patent Landscape of Distribution Service Innovation(IAMOT2010)
A Patent Landscape of Distribution Service Innovation(IAMOT2010)A Patent Landscape of Distribution Service Innovation(IAMOT2010)
A Patent Landscape of Distribution Service Innovation(IAMOT2010)Hajime Sasaki
 

Más de Hajime Sasaki (7)

whitepapermemo_web3sasakihajime.pdf
whitepapermemo_web3sasakihajime.pdfwhitepapermemo_web3sasakihajime.pdf
whitepapermemo_web3sasakihajime.pdf
 
知識社会マネジメント1_0408.pdf
知識社会マネジメント1_0408.pdf知識社会マネジメント1_0408.pdf
知識社会マネジメント1_0408.pdf
 
WI2研究会(公開用) “データ分析でよく使う前処理の整理と対処”
WI2研究会(公開用) “データ分析でよく使う前処理の整理と対処” WI2研究会(公開用) “データ分析でよく使う前処理の整理と対処”
WI2研究会(公開用) “データ分析でよく使う前処理の整理と対処”
 
Picmet15sasaki20150805.ppt
Picmet15sasaki20150805.pptPicmet15sasaki20150805.ppt
Picmet15sasaki20150805.ppt
 
Sasaki.informs2014(2)
Sasaki.informs2014(2)Sasaki.informs2014(2)
Sasaki.informs2014(2)
 
Chapter.13: Goals, Power and Sample Size "Doing Bayesian Data Analysis: A Tu...
Chapter.13: Goals, Power and Sample Size "Doing Bayesian Data Analysis:  A Tu...Chapter.13: Goals, Power and Sample Size "Doing Bayesian Data Analysis:  A Tu...
Chapter.13: Goals, Power and Sample Size "Doing Bayesian Data Analysis: A Tu...
 
A Patent Landscape of Distribution Service Innovation(IAMOT2010)
A Patent Landscape of Distribution Service Innovation(IAMOT2010)A Patent Landscape of Distribution Service Innovation(IAMOT2010)
A Patent Landscape of Distribution Service Innovation(IAMOT2010)
 

Último

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Último (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

論文サーベイ(Sasaki)

  • 1. 論文サーベイ@研究会2012.04.19:     N.Shibata,  Y.Kajikawa,  I.Sakata,     “Link  Predic?on  in  Cita?on  Networks”   Journal  of  the  American  society  for  informa?on   science  and  technology,  63(1):  78-­‐85,  2012 佐々木一  Hajime  SASAKI   政策ビジョン研究センター 特任研究員   工学系研究科総合研究機構イノベーション政策研究センター 連携研究員   技術経営戦略学専攻 坂田一郎研究室 協力研究員  
  • 2. 概要と結論 •  概要:学術論文の引用関係の予測問題を、グラ フ構造のリンク予測問題と捉え、5つの学術分野 を対象に11の素性を適用しSVMを分類器とした モデル化を行った。   •  結論1:分類器の性能指標であるf値より、良いリ ンク予測のモデル化ができた。   •  結論2:分野の構造によって、効果のある素性が 異なる。従って、分野の構造ごとに異なるモデル を適用させる必要があることがわかった。  
  • 3. Introduc?on •  The  number  of  academic  papers  exponen?ally  increases  (Price,   1965),  each  academic  area  becomes  specialized  and  segmented.   •  The  individual  scien?st  has  to  focus  on  or  specialize  in  only  a  few   scien?fic  subdomains  to  keep  up  with  the  growth  of  the  domains,   which  means  that  researchers  must  focus  on  increasingly  narrowing   domains.   Research  Ques?on:  What  factors  affect  the  existence  of  links  using   features  intrinsic  to  the  network  itself,  namely,  link  predic+on,  which   will  help  scholars  to  know  which  paper  to  cite  and  managers  to   iden?fy  future  core  papers?     •  In  this  ar?cle,  The  authors  u?lize  textual,  topological,  and  abribute   features  for  link  predic?on,  which  are  considered  to  influence  ci?ng   behaviors.  
  • 4. 既存研究 •  Liben-­‐Nowell  and  Kleinberg  (2003)  :  proposed  a  model  for  link  predic?on  in  large   coauthorship  networks.   •  Clauset,  Moore,  and  Newman  (2008):  inves?gated  the  hierarchical  structure  of  social   networks  to  predict  missing  connec?ons  in  par?ally  known  networks  with  high  accuracy.   •  Popescul  and  Ungar  (2003):  proposed  a  new  approach  for  Sta?s?cal  Rela?onal  Learning  to   build  link  predic?on  models.   •  Hasan,  Chaoji,  Salem,  and  Zaki  (2006):  tested  several  supervised  learning  models  (decision   tree,  k-­‐nearest  neighbor,  mul?layer  percep?on,  support  vector  machine  [SVM],  radial  basis   func?on  [RBF]  network)  for  link  predic?ons   •  Murata  and  Moriyasu  (2008):  applied  the  model  of  Liben-­‐Nowell  and  Kleinberg  to  social   networks  of  Ques?on-­‐Answering  Bulle?n  Boards.     •  Caragea,  Bahirwani,  Aljandal,  and  Hsu  (2009)  :proposed  an  algorithm  to  predict  poten?al   friendships  based  on  a  clustering  approach  in  Live-­‐  Journal,  a  social  network  journal  service   with  a  focus  on  user  interac?ons.     •  Lu,  Jin,  and  Zhou  (2009)  :presented  a  local  path  index  to  es?mate  the  likelihood  of  the   existence  of  a  link  between  two  nodes.     •  Seglen  (1994)  :analysed  the  trends  of  papers  in  the  journals  with  large  impact  factors.     •  Vinkler  and  Davidson  (2002)  :indicated  that  the  papers  in  growing  journals  in  terms  of  the   number  of  papers  are  more  likely  to  be  cited.     •  Hwang,  Wylie,  Wei,  and  Liao  (2010):  proposed  recommenda?on  engines  based  on  the   coauthorship  networks.
  • 5. 本研究の特徴 •  1:The  focus  is  on  cita?on  networks.   引用ネットワークに着目した。     •  2:The  authors  apply  SVMs  as  our  supervised  learning   method,  as  SVM  is  the  best  learner  according  to  Hasan  et   al.  (2006).     教師あり学習における分類器としてSVMを利用した。     •  3:  The  authors  use  more  comprehensive  features  op?mized   for  cita?on  networks.     引用ネットワークを対象するにあたって、網羅的な素性を適 用した。  
  • 6. 本研究の意義 •  Helps  us  make  decisions  whether  to  link  more  accurately  even  with   a  huge  number  data.   •  Applica?on:引用推薦システムを構築する   Cita?on  recommenda?on  system  for  authors  of  scien?fic  publica?ons   and  patents.    –  The  reviewers  of  scien?fic  papers  can  reduce  their  ?me  to  check   whether  the  references  in  those  papers  are  adequate  or  not.(査読に おいて、適切な論文を引用しているかどうかを効率的に判断できる)     –  Second,  well-­‐  organized  link  predic?on  can  reveal  how  and  why   authors  cite  other  scien?fic  papers.    (著者が引用した理由がわかる)   –  Finally,  link  predic?on  can  bond  different  research  fields  with  similar   topics  but  from  different  disciplines.(類する問題を扱っている異なる 学術分野をつなぐことができる)
  • 7. SVMにおけるマージン最大化 赤丸と青丸を分ける直線は無数に存在。 SVMでは その無数の直線の中から、もっとも適したものを選 ぶために「マージン最大化」を考える。   f(x)=0:  分離超平面   マージンとは、分離を行う直線と、その直線にもっと も近い丸との距離のこと。 データにはばらつきがあ この線を満たすパラメータ決定   るので、間違った判断をしないためにはこのマージ ンが大きい方が良さそう。   手法:Support  Vector  Machine 図の例では、青い直線より赤い線の方がマージンが 大きいので、赤い直線の優れた分離だと考えられる。 SVMはマージンがもっとも大きい直線を見つけること で、未知のデータも正しく分類しようとする。  
  • 8. 補足:なんでマージンが 2/||ω||なの? b=0とすると、   d  =  1/a   マージン:2d=2/a  
  • 10. オーバーフィッティング A B オーバーフィットして,サンプル(パラメー タ)を増やしても真の解に近づかない。    なめらかさなどの制約をおいて対処する (正則化) C 予測モデルは   シンプルにしたい。
  • 11. and w = (w1 , w2 , . . . , wd ) is the parameter vector of the same dimension that specifies the model. A positive value of wj indicates that the j-th feature xj positively contributes to the prediction, while a negative value contributes to it negatively. できるだけ確信度を持って間違いを少なく The sign function returns +1 when its argument is positive, するという項(損失)と、できるだけシンプル and returns −1 otherwise. Given the data set X and Y , the なモデルを採用するという項(正則化項)の SVM learning algorithm finds the optimal parameter w∗ that 和を最小化したい。 minimizes the following objective function: max{1 − yi h(xi ), 0} + c w 2 , 2 i 損失関数:間違った判別の 正則化項:   際にペナルティ。   学習データに対して過度に適応して FORMATION SCIENCE AND TECHNOLOGY—January 2012 79 しまうと、未知のデータに対する性能 DOI: 10.1002/asi (汎化性能)が逆に落ちてしまう   オーバーフィッティング防止。   全体を最小にするようなパラメータ(ウェイト)を決めたい。
  • 12. 素性 (全部で11種) Topological  Features   •  (1)  The  number  of  common  neighbours.  (共通ノード数)   •  (2)  Link-­‐based  Jaccard  coefficient.  (共通ノードの割合)   •  (3)  Difference  in  betweenness  centrality.(媒介中心の高いnodeを引用)   •  (4)  Difference  in  the  number  of  in-­‐links.  (リンク数が多いnode引用)   •  (5)  Is  same  cluster(同じクラスタ内かどうか)   Seman3c  Features   •  (6)  Cosine  similarity  of  term  frequency–inverse  document  frequency  (M–idf)   vectors.(同じ意味的特徴を有しているか)   A5ribute  Features   •  (7)  Difference  in  publica+on  year.(最近のものは良く引用される)   •  (8)  The  number  of  common  authors.(共通著者数)   •  (9)  Is  self  cita+on.(同じ著者)   •  (10)  Is  published  in  same  journal.(同じジャーナルかどうか)   •  (11)  Number  of  +mes  “to”  cited.(富めるものはますます富む)
  • 13. Dataset TABLE 2. Datasets of citation networks. Dataset Query Published through No. of papers No. of citations A Innovation innovation* 2009 20,564 106,619 B Nano Bio nano* and bio* 2009 33,830 175,875 C Organic LED ((organic* or polymer*) and (electroluminescen* or 2009 19,486 196,123 electro-luminescen* or electro luminescen* or light emitting or LED*)) or OLED* D Solar Cells solar cell* 2008 18,587 111,051 E Secondary Batteries (*) ((secondary or storage or rechargeable or reserve) 2008 20,430 145,008 and cell*) or batter* Data and Experiment TABLE 3. Prediction results. In this article, five large-scale citation datasets, Innovation, Dataset Precision Recall F1 Nano Bio, Organic LED, Solar Cells, and Secondary Batter- A Innovation 0.75 0.91 0.82 ies, are collected as shown in Table 2. We searched databases B Nano Bio 0.83 0.76 0.79 of academic papers and patents using the same query for each C Organic LED 0.79 0.71 0.74 domain. The databases of academic papers used are the Sci- D Solar Cells 0.76 0.72 0.74 ence Citation Index Expanded (SCI-EXPANDED), the Social E Secondary Batteries 0.80 0.77 0.77 Sciences Citation Index (SSCI), and the Arts & Humanities Citation Index (A&HCI) compiled by the Institute for Sci- entific Information (ISI). After collecting data, we extracted 4. We repeated step 3 five times in total with different choice the papers and citations in the largest-graph component to of answer set.
  • 14. Cross  Valida?on(交差検定) •  1.  These  exis?ng  cita?ons  are  divided  into  five  groups  (posi?ve  instances,   namely,  P[1]  to  P[5]).   •  2.  We  randomly  created  the  same  number  of  pair  where  cita?ons  did  not  exist   (nega?ve  instances,  namely,  N[1]  to  N[5]).   •  3.  In  the  first  experiment,  P[2]  to  P[5]  and  N[2]  to  N[5]  were  used  as  the   training  data  and  P[1]  and  N[1]  were  used  as  the  test  data.   •  4.  We  repeated  step3  five  ?mes  in  total  with  different  choice  of  answer  set. 引用有りデータ 引用無しデータ テストデータ 学習データ テストデータ 学習データ 1回目: P1 P2 P3 P4 P5 N1 N2 N3 N4 N5 2回目: P1 P2 P3 P4 P5 N1 N2 N3 N4 N5 3回目: P1 P2 P3 P4 P5 N1 N2 N3 N4 N5 4回目: P1 P2 P3 P4 P5 N1 N2 N3 N4 N5 5回目: P1 P2 P3 P4 P5 N1 N2 N3 N4 N5
  • 15. 評価指標:Precision,  Recall,  F-­‐value 交差行列 True  Result   (真の結果) Posi?ve   Nega?ve (正例) (負例) 精度   Posi?ve   TP   Predic?on   (正例) TP FP Precision:  =   TP  +  FP   (予測) Nega?ve (負例) FN TN 再現率 TP   2  *  Precision  *  Recall   Recall:  =   F-­‐value:  =   TP  +  FN   Precision  +  Recall     精度と再現率の調和平均
  • 16. 2008 18,587 111,051 e or reserve) 2008 20,430 145,008 Result TABLE 3. Prediction results. on, Dataset Precision Recall F1 er- A Innovation 0.75 0.91 0.82 ses B Nano Bio 0.83 0.76 0.79 ach C Organic LED 0.79 0.71 0.74 ci- D Solar Cells 0.76 0.72 0.74 ial E Secondary Batteries 0.80 0.77 0.77 ies ci- f-­‐value:  0.74~0.82:   ted 4. We repeated step 3 five times in total with different choice to Based  on  the  results,  we  obtained  the  learning   of answer set. led model  on  our  training  data. As a learner, we employed L2-regularized and L2-loss D,
  • 17. Weights  of  features Posi?ve  contribu?on:  >= 0.5 Nega?ve  contribu?on:  <= -­‐0.5 TABLE 4. Weights of features. No  contribu?on:    -­‐0.5~0.5 E. Secondary Features A. Innovation B. Nano Bio C. Organic LED D. Solar Cells Batteries 1. No. common neighbors 0.566 0.889 0.520 0.683 0.987 2. Link-based Jaccard coefficient 1.354 2.198 −6.150 −0.703 −4.742 3. Difference in betweenness centrality −1.446 −6.107 −2.175 −5.468 −10.049 4. Difference in the number of in-links 0.052 0.033 0.034 0.045 0.047 5. Is same cluster 0.018 0.086 −0.308 −0.160 −0.062 6. Cosine similarity of tf-idf vectors −19.897 −17.817 −15.527 1.624 1.519 7. Difference in publication year 0.018 0.046 0.032 0.009 0.008 8. The number of common authors −0.112 0.476 0.403 0.152 0.036 9. Is self-citation 1.975 0.756 0.605 0.865 0.918 10. Is published in same journal 0.726 0.614 0.198 0.027 −0.108 11. Number of times “to” cited −0.018 −0.019 −0.015 −0.031 −0.033 ・Especially,  (2),  (3)  and  (6)  largely  affected  the  predic?ons  of  cita?ons.   ・(2):  (A)  (B)  comprise  mul?ple  research  fields  and  most  cita?ons  are  in  each  research  field  so   that  papers  ofite  locally.  (C),  (D)  and  (E)  are  contained  in  a  research  field  with  a  single.   cases, because the existence c a citation with a probability from 74% to of common neighbours positively affected all ・(3):  igiven are  that  core  nodes  and  citation network. thewhich  have  different  values  of   have, the more 82%, t  is  r a pair of papers and the entire peripheral  nodes,   more common neighbours two papers Especially three features, (2) link-based Jaccard coefficient, related they are. That the self-citation result had a posi- betweenness  centrality,  centrality, andin  the  cita?on  ntive effect is reasonable because authors tend to cite their (3) difference in betweenness are  linked   (6) cosine sim- etworks.     ・(6):  same  as  vectors, largely affected the predictions of own papers. The feature of is published in the same jour- ilarity of tf–idf (3)   ・(1):  the  more  common  neighbours  two  papers  have,  affectedore  r(A) Innovations are.   and (B) Nano Bio citations. nal the  m only elated  they   (0.726) ・(9):  because  authors  tend  tcontributed positivelypin (0.614) positively. Similar to the result of link-based Jaccard Link-based Jaccard coefficient o  cite  their  own   apers.   the cases of (A) Innovations (weight: 1.354) and (B) Nano coefficient, papers tend to cite in each research field in the ・(10):  same  anegatively in the cases of (C) Organic LED case of research fields with multiple issues. Bio (2.198) but s  (3),(6)   (−6.150), (D) Solar Cells (−0.703) and (E) Secondary In summary, different models are required for differ- Batteries (−4.742). These results indicate that the former ent types of research areas—research fields with a single research areas, such as (A) Innovations and (B) Nano Bio, issue or research fields with multiple issues. In the case
  • 18. Summary   •  It  is  difficult  to  build  a  universal  learner  for  link   predic?on  and  we  need  to  build  learners  based  on  the   characteris?cs  of  each  research  domain.       •  Different  models  are  required  for  different  types  of   research  areas—research  fields  with  a  single  issue  or   research  fields  with  mul?ple  issues.     –  The  first  one  is  the  research  field  with  mul?ple  issues  such   as  (A)  Innova?ons  and  (B)  Nano  Bio.   –  The  second  one  is  a  simple  research  field  type  with   commonly  understood  targets  of  research  and   development  such  as  (C)  Organic  LED,  (D)  Solar  Cells  and   (E)  Secondary  Baberies.