Online learning with graph-structured feedback against adaptive adversaries

نویسندگان

Zhili Feng

Po-Ling Loh

چکیده

We derive upper and lower bounds for the policy regret of T -round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of Õ(T ) and Õ(T ) for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of Ω̃(T ) is achieved in the case of full-information feedback. We also study the particular loss structure of an oblivious adversary with switching costs, and show that in such a setting, non-revealing strongly-observable feedback graphs achieve a lower bound of Ω̃(T ), as well.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Learning with Switching Costs and Other Adaptive Adversaries

We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player’s performance using a new notion of regret, also known as policy regret, which better captures the adversary’s adaptiveness to the player’s behavior. In a setting where losses are allowed to drift, we...

متن کامل

Online formative assessments: exploring their educational value

Introduction: Online formative assessments (OFA’s) have beenincreasingly recognised in medical education as resources thatpromote self-directed learning. Formative assessments are usedto support the self-directed learning of students. Online formativeassessments have been identified to be less time consuming withautomated feedback. This pilot study aimed to determine whetherparticipation and pe...

متن کامل

The Potentiality of Dynamic Assessment in Massive Open Online Courses (MOOCs): The Case of Listening Comprehension MOOCs

Massive Open Online Courses (MOOCs) as a new shaking educational development provide the scene for achieving social inclusion and dissemination of knowledge. Anyhow, facilitating network learning experiences through creating an adaptive learning environment can pave the way for this open and energetic way to learning. The present study aimed to explore the possible role of Dynamic Assessment (D...

متن کامل

Adaptive fuzzy pole placement for stabilization of non-linear systems

A new approach for pole placement of nonlinear systems using state feedback and fuzzy system is proposed. We use a new online fuzzy training method to identify and to obtain a fuzzy model for the unknown nonlinear system using only the system input and output. Then, we linearized this identified model at each sampling time to have an approximate linear time varying system. In order to stabilize...

متن کامل

Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret

Online learning algorithms are designed to learn even when their input is generated by an adversary. The widely-accepted formal definition of an online algorithm’s ability to learn is the game-theoretic notion of regret. We argue that the standard definition of regret becomes inadequate if the adversary is allowed to adapt to the online algorithm’s actions. We define the alternative notion of p...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

Online learning with graph-structured feedback against adaptive adversaries

نویسندگان

چکیده

منابع مشابه

Online Learning with Switching Costs and Other Adaptive Adversaries

Online formative assessments: exploring their educational value

The Potentiality of Dynamic Assessment in Massive Open Online Courses (MOOCs): The Case of Listening Comprehension MOOCs

Adaptive fuzzy pole placement for stabilization of non-linear systems

Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret

عنوان ژورنال:

اشتراک گذاری