HomePage of Takehiro Yamamoto

Information Access Design

Task-oriented Search

Task-oriented Web Search

This paper tackles the problem of mining subtasks of a given search goal from data. For example, when a searcher wants to travel to London, she may need to accomplish several subtasks such as “book flights,” “book a hotel,” “find good restaurants” and “decide which sightseeing spots to visit.” As another example, if a searcher wants to lose weight, there may exist several alternative solutions such as “do physical exercise,” “take diet pills,” and “control calorie intake.” In this paper, we refer to such subtasks or solutions as subtasks, and propose to utilize sponsored search data for finding subgoals of a given query by means of query clustering. Advertisements (ads) reflect advertisers’ tremendous efforts in trying to match a given query with implicit user needs. Moreover, ads are usually associated with a particular action or transaction. We therefore hypothesized that they are useful for subgoal mining. To our knowledge, our work is the first to use sponsored search data for this purpose. Our experimental results show that sponsored search data is a good resource for obtaining related queries and for identifying subgoals via query clustering. In particular, our method that combines ad impressions from sponsored search data and query co-occurrences from session data outperforms a state-of-the-art query clustering method that relies on document clicks rather than ad impressions in terms of purity, NMI, Rand Index, $F_{1}$-measure and subgoal recall.

Subgoal Mining via Query Clustering


  • Takehiro Yamamoto, Tetsuya Sakai, Mayu Iwata, Yu Chen, Ji-Rong Wen and Katsumi Tanaka:
    “The Wisdom of Advertisers: Mining Subgoals via Query Clustering”
    Proceedings of the 21st ACM Conference on Information and Knowledge Management (CIKM 2012), pp.505-514, October 2012. [pdf]


This research was conducted while I was an intern (under supervision by Dr. Tetsuya Sakai) at Microsoft Research Asia, and was supported in part by Microsoft Research CORE Project (April 2016 – March 2017).