نتایج جستجو برای: regret minimization

تعداد نتایج: 37822  

Journal: :CoRR 2017
Susan Athey Stefan Wager

We consider the problem of using observational data to learn treatment assignment policies that satisfy certain constraints specified by a practitioner, such as budget, fairness, or functional form constraints. This problem has previously been studied in economics, statistics, and computer science, and several regret-consistent methods have been proposed. However, several key analytical compone...

2014
Noam Brown Tuomas Sandholm

Regret matching is a widely-used algorithm for learning how to act. We begin by proving that regrets on actions in one setting (game) can be transferred to warm start the regrets for solving a different setting with same structure but different payoffs that can be written as a function of parameters. We prove how this can be done by carefully discounting the prior regrets. This provides, to our...

2017
Joon Kwon Vianney Perchet

Blackwell approachability is an online learning setup generalizing the classical problem of regret minimization by allowing for instance multi-criteria optimization, global (online) optimization of a convex loss, or online linear optimization under some cumulative constraint. We consider partial monitoring where the decision maker does not necessarily observe the outcomes of his decision (unlik...

2016
Alexander Rakhlin Karthik Sridharan

We present efficient algorithms for the problem of contextual bandits with i.i.d. covariates, an arbitrary sequence of rewards, and an arbitrary class of policies. Our algorithm BISTRO requires d calls to the empirical risk minimization (ERM) oracle per round, where d is the number of actions. The method uses unlabeled data to make the problem computationally simple. When the ERM problem itself...

2011
Shota Yasutake Kohei Hatano Shuji Kijima Eiji Takimoto Masayuki Takeda

This paper proposes an algorithm for online linear optimization problem over permutations; the objective of the online algorithm is to find a permutation of {1, . . . , n} at each trial so as to minimize the “regret” for T trials. The regret of our algorithm is O(n √ T lnn) in expectation for any input sequence. A naive implementation requires more than exponential time. On the other hand, our ...

2013
Shivani Agarwal Rohit Vaish

We learnt that under certain conditions on weights, a weighted-average plug-in classifier (or any learning algorithm that outputs such a classifier for the same training sample) is universally Bayes consistent w.r.t 0-1 loss. One might wonder for what other learning algorithms can similar statements be made. Can some of the other commonly studied/used learning algorithms be shown to be Bayes co...

2009
Katrina Ligett Anupam Gupta R. Ravi Eva Tardos Avrim Blum Eyal Even-Dar Mohammad Taghi Hajiaghayi

Computer systems increasingly involve the interaction of multiple self-interested agents. The designers of these systems have objectives they wish to optimize, but by allowing selfish agents to interact in the system, they lose the ability to directly control behavior. What is lost by this lack of centralized control? What are the likely outcomes of selfish behavior? In this work, we consider l...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید