نتایج جستجو برای: minimax regret

تعداد نتایج: 12162  

2011
Jennifer Wortman Vaughan

i=1 pi log 1 pi . To do this, we must bound the two terms on the right hand side of the bound above. Step 1: Bounding the Range of the Regularizer We begin by deriving upper and lower bounds on the entropy function H(~ p). The lower bound is easy. Since for all i, 0 ≤ pi ≤ 1 , pi log 1 p i ≥ 0. (Remember that we define 0 log(1/0) to be 0 by convention.) As we discussed before, H(~ p) = 0 is ach...

2008
Janyl Jumadinova Prithviraj Dasgupta

In this paper, we consider the problem of dynamic pricing by a set of competing sellers in an information economy where buyers differentiate products along multiple attributes, and buyer preferences can change temporally. Previous research in this area has either focused on dynamic pricing along a limited number of (e.g. binary) attributes, or, assumes that each seller has access to private inf...

2006
Shie Mannor Nahum Shimkin

We consider online learning in repeated decision problems, within the framework of a repeated game against an arbitrary opponent. For repeated matrix games, well known results establish the existence of no-regret strategies; such strategies secure a long-term average payoff that comes close to the maximal payoff that could be obtained, in hindsight, by playing any fixed action against the obser...

Journal: :CoRR 2017
Palash Dey

Lu and Boutilier proposed a novel approach based on"minimax regret"to use classical score based voting rules in the setting where preferences can be any partial (instead of complete) orders over the set of alternatives. We show here that such an approach is vulnerable to a new kind of manipulation which was not present in the classical (where preferences are complete orders) world of voting. We...

2004
Craig Boutilier Tuomas Sandholm Rob Shields

Recent algorithms provide powerful solutions to the problem of determining cost-minimizing (or revenue-maximizing) allocations of items in combinatorial auctions. However, in many settings, criteria other than cost (e.g., the number of winners, the delivery date of items, etc.) are also relevant in judging the quality of an allocation. Furthermore, the bid taker is usually uncertain about her p...

Journal: :CoRR 2011
Vianney Perchet Philippe Rigollet

We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewa...

2015
Vianney Perchet Philippe Rigollet Sylvain Chassang Erik Snowberg

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost...

2013
H. Brendan McMahan Jacob D. Abernethy

We design and analyze minimax-optimal algorithms for online linear optimization games where the player’s choice is unconstrained. The player strives to minimize regret, the difference between his loss and the loss of a post-hoc benchmark strategy. While the standard benchmark is the loss of the best strategy chosen from a bounded comparator set, we consider a very broad range of benchmark funct...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید