Optimizing Search Engine Query Processing for Complex Ranking Functions
报告人: Torsten Suel, 纽约大学工学院计算机科学与工程系教授
时间: 4月1日,09:00- 10:30
地点: 0638太阳集团软件园校区办公楼202会议室
主持人:禹晓辉
Abstract:Current search engines use highly complex ranking functions based on hundreds of features, which are often derived using machine-learning techniques. While such functions return high-quality results, they create efficiency challenges as we cannot afford to fully evaluate them on all documents in the union, or even intersection, of the query terms. To address this, search engines use a series of cascading rankers, starting with a very simple ranking function and then applying increasingly complex and expensive ranking functions on smaller and smaller sets of candidate results. Some recent work has started to look at how to optimize query processing under this approach.
In this talk, we first give a brief introduction to query processing in search engines, and discuss the cascading approach. We then focus on one problem arising under this approach, the design of the initial cascade. Here, the goal is to very quickly identify a set of good candidate documents that should be passed to the second and further cascades. Recent work by Asadi and Lin showed that while top-k computations on either union or intersection works well, a further optimization using global document ordering based on spam scores resulted in significant reductions in quality. We propose a framework that builds specialized single-term and pairwise index structures, and then during query time selectively accesses these structures based on a cost budget and early-termination techniques. Using an end-to-end evaluation with a complex machine-learned ranker, we show that we can find candidates about an order of magnitude faster than a conjunctive top-k computation while almost matching the quality. If time permits, we may discuss some other related problems.
Bio: Torsten Suel is a Professor in the Department of Computer Science and Engineering at the NYU Polytechnic School of Engineering, where he directs a research group working on search engines, databases, and web mining. He holds a Diplom degree from the Technical University of Braunschweig (Germany), and a Ph.D. from the University of Texas at Austin. He joined the department in 1998 after postdoctoral and visiting positions at the NEC Research Institute, UC Berkeley, and Bell Labs. During 2008, he was a Principal Research Scientist at Yahoo! Research in Santa Clara, CA. He is currently on a sabbatical stay at NYU Shanghai, until May 2015.