Research Spotlight: Learning to interact with learning agents
Many real-world systems involve repeatedly making decisions under uncertainty—for instance, choosing one of the several products to recommend to a user in an online recommendation service, or dynamically allocating resources among available stock options in a financial market. Machine learning (ML) algorithms driving these systems typically operate under the assumption that they are interacting with static components, e.g., users‘ preferences are fixed, trading tools providing stock recommendations are static, and data distributions are stationary. This assumption is often violated in modern systems, as these algorithms are increasingly interacting with and seeking information from learning agents including people, robots, and adaptive adversaries. Consequently, many well-studied ML frameworks and algorithmic techniques fail to provide desirable theoretical guarantees—for instance, algorithms might converge to a sub-optimal solution or fail arbitrarily bad in these settings.
Researchers at the Machine Teaching Group, MPI-SWS are designing novel ML algorithms that have to interact with agents that are adaptive or learning over time, especially in situations when the algorithm’s decisions directly affect the state dynamics of these agents. In recent work , they have studied the above-mentioned problem in the context of two fundamental machine learning frameworks: (i) online learning using experts‘ advice and (ii) active learning using labeling oracles. In particular, they consider a setting where experts/oracles themselves are learning agents. For instance, active learning algorithms typically query labels from an oracle, e.g., a (possibly noisy) domain expert; however, in emerging crowd-powered systems, these experts are getting replaced by inexpert participants who could themselves be learning over time (e.g., volunteers in citizen science projects). They have shown that when these experts/oracles themselves are learning agents, well-studied algorithms (like the EXP3 algorithm) fail to converge to the optimal solution and can have arbitrarily bad performance for this new problem setting. Furthermore, they provide an impossibility result showing that without sharing any information across experts, it is impossible to achieve convergence guarantees. This calls for developing novel algorithms with practical ways of coordination between the central algorithm and learning agents to achieve desired guarantees.
Currently, researchers at the Machine Teaching Group are studying these challenges in the context of designing next-generation human-AI collaborative systems. As a concrete application setting, consider a car driving scenario where the goal is to develop an assistive AI agent to drive the car in an auto-pilot mode, but giving control back to the human driver in safety-critical situations. They study this setting by casting it as a multi-agent reinforcement learning problem. When the human agent has a stationary policy (i.e., the actions take by the human driver in different states/scenarios are fixed), it is trivial to learn an optimal policy for the AI agent that maximizes the overall performance of this collaborative system. However, in real-life settings where a human driver would adapt their behavior in response to the presence of an auto-pilot mode, they show that the problem of learning an optimal policy for the AI agent becomes computationally intractable. This work is one of the recent additions to an expanding set of results and algorithmic techniques developed by MPI-SWS researchers in the nascent area of Machine Teaching [2, 3].
 Adish Singla, Hamed Hassani, and Andreas Krause. Learning to Interact with Learning Agents. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), 2018.
 Xiaojin Zhu, Adish Singla, Sandra Zilles, and Anna N. Rafferty. An Overview of Machine Teaching. arXiv 1801.05927, 2018.
 Maya Cakmak, Anna N. Rafferty, Adish Singla, Xiaojin Zhu, and Sandra Zilles. Workshop on Teaching Machines, Robots, and Humans. NIPS 2017.