Online learning with graph-structured feedback against adaptive adversaries
نویسندگان
چکیده
We derive upper and lower bounds for the policy regret of T -round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of Õ(T ) and Õ(T ) for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of Ω̃(T ) is achieved in the case of full-information feedback. We also study the particular loss structure of an oblivious adversary with switching costs, and show that in such a setting, non-revealing strongly-observable feedback graphs achieve a lower bound of Ω̃(T ), as well.
منابع مشابه
Online Learning with Switching Costs and Other Adaptive Adversaries
We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player’s performance using a new notion of regret, also known as policy regret, which better captures the adversary’s adaptiveness to the player’s behavior. In a setting where losses are allowed to drift, we...
متن کاملOnline formative assessments: exploring their educational value
Introduction: Online formative assessments (OFA’s) have beenincreasingly recognised in medical education as resources thatpromote self-directed learning. Formative assessments are usedto support the self-directed learning of students. Online formativeassessments have been identified to be less time consuming withautomated feedback. This pilot study aimed to determine whetherparticipation and pe...
متن کاملThe Potentiality of Dynamic Assessment in Massive Open Online Courses (MOOCs): The Case of Listening Comprehension MOOCs
Massive Open Online Courses (MOOCs) as a new shaking educational development provide the scene for achieving social inclusion and dissemination of knowledge. Anyhow, facilitating network learning experiences through creating an adaptive learning environment can pave the way for this open and energetic way to learning. The present study aimed to explore the possible role of Dynamic Assessment (D...
متن کاملAdaptive fuzzy pole placement for stabilization of non-linear systems
A new approach for pole placement of nonlinear systems using state feedback and fuzzy system is proposed. We use a new online fuzzy training method to identify and to obtain a fuzzy model for the unknown nonlinear system using only the system input and output. Then, we linearized this identified model at each sampling time to have an approximate linear time varying system. In order to stabilize...
متن کاملOnline Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret
Online learning algorithms are designed to learn even when their input is generated by an adversary. The widely-accepted formal definition of an online algorithm’s ability to learn is the game-theoretic notion of regret. We argue that the standard definition of regret becomes inadequate if the adversary is allowed to adapt to the online algorithm’s actions. We define the alternative notion of p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018