Abstract: A core element in sequential decision making problems, such as contextual bandits and reinforcement learning, is the feedback on the quality of the performed actions. However, in many real-world applications, such feedback is restricted. In this work, we study decision making problems with querying budget, that is, when the total amount of feedback is restricted by a hard budget and the agent can choose when to query for feedback. We propose a simple algorithmic principle which we refer to as Confidence Budget Matching (CBM), analyze its performance on a variety of sequential budgeted learning problems, and establish its robustness relatively to more naive approaches.
email firstname.lastname@example.org for info.