Research Areas

Domain-Specific Search

Search engines are not always for the web. There are many more in-house search engines built on top of offline documents to serve domain-specific purposes. Classical search engine models, such as the vector space model, BM25, language models, and learning-to-rank methods, supply a basic but highly effective set of methods and off-the-shelf tools for practitioners to build in-house search engines. The users of these domain-specific search engines are usually professional searchers, including law enforcement officers, patent examiners, lawyers, physicians, and writers, who have demands of unique search functionality to facilitate their in-depth investigations and analysis. The searches are usually more complex, take longer and require flexible and reliable switching between collection browsing, keyword search, and structured search. In the past, we have worked on domains such as prior arts, dark web, human-trafficking, illicit goods, counterfeit electronics, and disease controls. In the future, we expect to explore new demands from a variety of domains for the greater good.

Dynamic Search

When a search engine interacts with a human user, both enter a series of dynamically changing states. Governed by the goal of satisfying the user’s information need, dynamic search aims to statistically model this information-seeking process. Dynamic search identifies itself from ad-hoc search and relevance feedback models by admitting and handling the temporal dependency among individual searches and assuming a long-term goal that focuses on accomplishing a task (in contrast to killing time by being entertained with a chatbot).

Here, the user and the search engine form a partnership to explore the possible information space to find documents that are rewarding. The family of reinforcement learning (RL) methods fit well into this trial-end-error setting. Thus, much of our effort is inspired by, but not directly derived from, reinforcement learning. Ultimately, we focus on bridging understanding of human users, where the Information Retrieval (IR) community is strong, and mathematical modelings of dynamic systems that excite a wide audience in artificial intelligence and machine learning.

TREC

Oftentimes, complex information needs require more than one query in a search session to adequately satisfy the user’s search task. Given a series of queries that a user enters in the same session, how do the earlier queries, returned search results, and click information interact with the user and the targeted search goal? This research investigates techniques in query formulation, query expansion, user interactions, and relevance feedback to gain an in-depth understanding of user behaviors in search sessions and to better model search activities with complex information needs.

This research has participated in TREC 2012 Session track evaluation and won the 2nd position in whole session search (RL2-RL4). It is published in SIGIR 2013.

Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014) from Grace Yang

Evaluation

Evaluation gives an idea of how well or poorly a system works. Evaluations can be manual or based on testbeds and metrics - Our Lab is experienced in the latter. We have organized or helped with evaluations for the National Institute of Standards and Technology (NIST), Defense Advanced Research Projects Agency (DARPA), and U.S. Patent and Trademark Office (USPTO). A complete campaign of search engine evaluation involves defining tasks, providing standard datasets, collecting human annotations, designing the evaluation schemes and metrics and managing the participation.

Although we need to do all of above, our Lab primarily focuses on the scientific aspect of an evaluation. We design human-centric evaluation metrics that model complex user behavior in the metric itself. We also investigate how these metrics act as optimization objectives for the machine learning algorithms. Conducting evaluation campaigns is hard work but definitely a rewarding experience.

Evaluation for Interactive Systems

We propose a new track focused on domain-specific search tasks in which professional searchers explore complex content spread across a corpus. To help such users, we need retrieval algorithms that can dynamically adjust as the user makes sense of the entities and relationships mentioned in the corpus. While TREC hosts evaluations in several domains, e.g. TREC Medical and TREC Legal, we propose to create domain-agnostic evaluation protocols for studying retrieval systems that “hang in there” and evolve along with the user’s own understanding.

For details, please visit TREC Dynamic Domain Track Website.

New Interfaces

For a long period of time, a search engine interface was equivalent to a query box and 10 blue links. We are not in a position to judge its effectiveness, nor aesthetic value. But we are skeptical that the lack of innovation in forms of user inputs indicates that this represents the apex of search engine user interface performance.

In our Lab, we experiment and study new types of interfaces for information seeking and sense-making. We appply novel mathematical algorithms and experiment equipping human users with virtual reality (VR), augmented reality (AR), voice input, and smart glasses. We are interested in discovering new and more natural interactions between humans and machines, with an enduring focus on how these new interfaces would enhance the algorithms’ effectiveness.

Privacy

Privacy and personalization seem naturally opposed. While users enjoy personalized services from search engines, recommender systems, social media, transportation, and deliveries, they grant those companies entrance to their personal life without a complete understanding of the risks. Privacy has become a battlefield for the governments, the companies, the innocent users, and competitors of those companies including small businesses and professors. As academic researchers, we cannot change the current policies, but we can research and improve the situation from the technical perspective.

Our Lab is interested in creating privacy-preserving information retrieval algorithms that would perform information seeking tasks while protecting users’ privacy. We are also interested in revealing privacy risks to the users before they submit any data to the companies. Ultimately, we hope to help every user manage their own data and deserved services, breaking the curse of centralized data ownership.

Search Engines as Bots

Search engines are perhaps the most successful application that has changed how people seek information and acquire knowledge. We view search engines as intelligent bots who interact with human users and provide answers to them. In the meantime, you might only see lists of relevant documents being returned to the user. However, as AI and search engine researchers, we envision a much richer mode of interaction between humans and search engines. Essentially, search engines, which already serve this role in the current primitive form, will continue to be bots that assist humans in finding answers. The range of interaction, communication, and mutual growth between the two would cover collaboratively finishing a task (e.g. collecting information and making decisions to purchase a home), exploring an unknown knowledge field, life-long learning, and many more. The key distinguishing feature of search engines from other AI fields is that we will always have humans in the loop. The human plays an important role in our research and search engines will always centrally focus on human users.

Other

Algorithms for Machine Learning & Reinforcement Learning
Conversational Search
Deep Reinforcement Learning
Dialogue Systems
Game Design & Evaluation
Graphical Models
Information Seeking
Knowledge Discovery & Ontology Construction
Natural Language Processing & Understanding
Optimization & Inference
Privacy; Machine Learning vs. Privacy
Privacy-Preserving Information Retrieval
Question Answering
Representation Learning
Self-Driving Cars
Virtual Reality & Augmented Reality