Sequential Learning without Feedback

نویسندگان

  • Manjesh Kumar Hanawal
  • Csaba Szepesvári
  • Venkatesh Saligrama
چکیده

In many security and healthcare systems a sequence of features/sensors/tests are used for detection and diagnosis. Each test outputs a prediction of the latent state, and carries with it inherent costs. Our objective is to learn strategies for selecting tests to optimize accuracy & costs. Unfortunately it is often impossible to acquire-in-situ ground truth annotations and we are left with the problem of unsupervised sensor selection (USS). We pose USS as a version of stochastic partial monitoring problem with an unusual reward structure (even noisy annotations are unavailable). Unsurprisingly no learner can achieve sublinear regret without further assumptions. To this end we propose the notion of weak-dominance. This is a condition on the joint probability distribution of test outputs and latent state and says that whenever a test is accurate on an example, a later test in the sequence is likely to be accurate as well.We empirically verify that weak dominance holds on real datasets and prove that it is a maximal condition for achieving sublinear regret. We reduce USS to a special case of multi-armed bandit problem with side information and develop polynomial time algorithms that achieve sublinear regret.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The effect of self-control feedback on the learning of generalized motor program and parameters during physical and observational practice

The purpose of this study was to examine the effect of self-control feedback on the learningof generalized motor program and parameters during physical and observational practice. Participants (n=90) were randomly assigned to physical and observational practice (self-control, yoked and instructor KR) groups. They practiced a sequential timing task. The task required participants to press four k...

متن کامل

Impact of immediate feedback on the learning of medical students in pharmacology

Introduction: Providing feedback to students is an essentialcomponent in medical education and has been shown to improvethe students’ learning. The purpose of this study is to evaluatethe effect of computer-based immediate feedback on the medicalstudents’ learning in a pharmacology course.Methods: In this prospective intervention study some feedbackmodules in pharmacology (FMP) were prepared in...

متن کامل

Sequential Learning for Dialog Act Classification in Tutorial Dialog

Dialog act classification or tagging is the task of assigning labels such as “question”, “assertion”, “positive feedback” and “negative feedback” to the turns in a dialog. In this project, we study the dialog act classification task as applied to human-human tutoring dialogs in the domain of thermodynamics. We initially establish a baseline by posing the task as a classification problem and app...

متن کامل

Learning Sequential Tasks by Incrementally Adding Higher Orders

An incremental, higher-order, non-recurrent network combines two properties found to be useful for learning sequential tasks: higherorder connections and incremental introduction of new units. The network adds higher orders when needed by adding new units that dynamically modify connection weights. Since the new units modify the weights at the next time-step with information from the previous s...

متن کامل

Augmenting Reinforcement Learning with Human Feedback

As computational agents are increasingly used beyond research labs, their success will depend on their ability to learn new skills and adapt to their dynamic, complex environments. If human users — without programming skills — can transfer their task knowledge to agents, learning can accelerate dramatically, reducing costly trials. The TAMER framework guides the design of agents whose behavior ...

متن کامل

A Reinforcement-and-Generalization Model of Sequential Effects in Identification Learning

Responses in identification-learning tasks depend on events from recent trials. A model for these sequential effects is proposed, based on previous work in category learning and founded on theories of reinforcement learning and generalization. The model is compared to two other theories in their predictions of the influence of previous stimuli and previous feedback. Two experimental paradigms a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1610.05394  شماره 

صفحات  -

تاریخ انتشار 2016