نتایج جستجو برای: minimax regret

تعداد نتایج: 12162  

2014
Mariusz MAKUCHOWSKI

The problem of finding a robust spanning tree has been analysed. The problem consists of determining a minimum spanning tree of a graph with uncertain edge costs. We should determine a spanning tree that minimizes the difference in costs between the tree selected and the optimal tree. While doing this, all possible realizations of the edge costs should be taken into account. This issue belongs ...

2005
Bikramjit Banerjee Jing Peng

We present new results on the efficiency of no-regret algorithms in the context of multiagent learning. We use a known approach to augment a large class of no-regret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the average actual payoffs of the resulting learner gets (1) close to the best response against (eventually...

2011
Kevin Regan Craig Boutilier

Imprecise-reward Markov decision processes (IRMDPs) are MDPs in which the reward function is only partially specified (e.g., by some elicitation process). Recent work using minimax regret to solve IRMDPs has shown, despite their theoretical intractability, how the set of policies that are nondominated w.r.t. reward uncertainty can be exploited to accelerate regret computation. However, the numb...

2004
Jun-ichi Takeuchi

We study the problem of data compression, gambling and prediction of a sequence zn = z1z2 ... z, from a certain alphabet X , in terms of regret [4] and redundancy with respect to a general exponential family, a general smooth family, and also Markov sources. In particular, we show that variants of Jeffreys mixture asymptotically achieve their minimax values. These results are generalizations of...

2011
Wojciech Kotlowski Peter Grünwald

The paper considers sequential prediction of individual sequences with log loss (online density estimation) using an exponential family of distributions. We first analyze the regret of the maximum likelihood (“follow the leader”) strategy. We find that this strategy is (1) suboptimal and (2) requires an additional assumption about boundedness of the data sequence. We then show that both problem...

2014
H. Brendan McMahan Francesco Orabona

We study algorithms for online linear optimization in Hilbert spaces, focusing on the case where the player is unconstrained. We develop a novel characterization of a large class of minimax algorithms, recovering, and even improving, several previous results as immediate corollaries. Moreover, using our tools, we develop an algorithm that provides a regret bound ofO ( U √ T log(U √ T log T + 1)...

2018
Lilian Besson Emilie Kaufmann

An online reinforcement learning algorithm is anytime if it does not need to know in advance the horizon T of the experiment. A well-known technique to obtain an anytime algorithm from any nonanytime algorithm is the “Doubling Trick”. In the context of adversarial or stochastic multi-armed bandits, the performance of an algorithm is measured by its regret, and we study two families of sequences...

2011
Wlodzimierz Ogryczak Patrice Perny Paul Weng

In this paper, we propose an exact solution method to generate fair policies in Multiobjective Markov Decision Processes (MMDPs). MMDPs consider n immediate reward functions, representing either individual payoffs in a multiagent problem or rewards with respect to different objectives. In this context, we focus on the determination of a policy that fairly shares regrets among agents or objectiv...

2010
Rene Saran Roberto Serrano

In contexts in which players have no priors, we analyze a learning process based on ex-post regret as a guide to understand how to play games of incomplete information under private values. The conclusions depend on whether players interact within a fixed set (fixed matching) or they are randomly matched to play the game (random matching). The relevant long run predictions are minimal sets that...

2015
Wenjun Ma Weiru Liu Kevin McAreavey

Game-theoretic security resource allocation problems have generated significant interest in the area of designing and developing security systems. These approaches traditionally utilize the Stackelberg game model for security resource scheduling in order to improve the protection of critical assets. The basic assumption in Stackelberg games is that a defender will act first, then an attacker wi...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید