14. 4.1.2 Lexical Features
クエリ間のBoW類似度を表す2つの方法
• N-word Jaccard
• “the car james bond drive”を2-wordsでやると
[“the car”, “car james”, “james bond”, “bond drive”]
• N-char Jaccard
• 同様に文字単位で定義
14
vi : the N-word set of query qi
vi k : the term-frequency of the kth N-word in set vi
m, n : the size of set vi and vj
ki , kj : the indexes of that N- word in set vi and vj
vi ki
, vijkj
: the term frequencies of that common N-word in set vi and vj
15. 4.1.3 Template Features
Huang et al.(2009)の方法
substring/superstring, add/remove
words, stemming, spelling correction, acronym and
abbreviation, etc.
要はタイプミスや派生語の編集距離
Levenshtein edit distance
15
ed(qi , qj) : the Levenshtein edit distance between query qi and qj
len(qi): the length of query qi
16. 4.1.4 Temporal Features
連続するクエリ間のtime interval
時間的に近ければ近いほど同じタスクである確率
が高い
16
t(qi) : the time query qi is issued
d(qi): the dwelltime of query qi (the sum of dwelltimes of clicks after qi)