نتایج جستجو برای: minimax regret
تعداد نتایج: 12162 فیلتر نتایج به سال:
i=1 pi log 1 pi . To do this, we must bound the two terms on the right hand side of the bound above. Step 1: Bounding the Range of the Regularizer We begin by deriving upper and lower bounds on the entropy function H(~ p). The lower bound is easy. Since for all i, 0 ≤ pi ≤ 1 , pi log 1 p i ≥ 0. (Remember that we define 0 log(1/0) to be 0 by convention.) As we discussed before, H(~ p) = 0 is ach...
In this paper, we consider the problem of dynamic pricing by a set of competing sellers in an information economy where buyers differentiate products along multiple attributes, and buyer preferences can change temporally. Previous research in this area has either focused on dynamic pricing along a limited number of (e.g. binary) attributes, or, assumes that each seller has access to private inf...
We consider online learning in repeated decision problems, within the framework of a repeated game against an arbitrary opponent. For repeated matrix games, well known results establish the existence of no-regret strategies; such strategies secure a long-term average payoff that comes close to the maximal payoff that could be obtained, in hindsight, by playing any fixed action against the obser...
Lu and Boutilier proposed a novel approach based on"minimax regret"to use classical score based voting rules in the setting where preferences can be any partial (instead of complete) orders over the set of alternatives. We show here that such an approach is vulnerable to a new kind of manipulation which was not present in the classical (where preferences are complete orders) world of voting. We...
Recent algorithms provide powerful solutions to the problem of determining cost-minimizing (or revenue-maximizing) allocations of items in combinatorial auctions. However, in many settings, criteria other than cost (e.g., the number of winners, the delivery date of items, etc.) are also relevant in judging the quality of an allocation. Furthermore, the bid taker is usually uncertain about her p...
We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewa...
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost...
We design and analyze minimax-optimal algorithms for online linear optimization games where the player’s choice is unconstrained. The player strives to minimize regret, the difference between his loss and the loss of a post-hoc benchmark strategy. While the standard benchmark is the loss of the best strategy chosen from a bounded comparator set, we consider a very broad range of benchmark funct...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید