minimax regret

Perturbation Algorithm for a Minimax Regret Minimum Spanning Tree Problem

2014

Mariusz MAKUCHOWSKI

The problem of finding a robust spanning tree has been analysed. The problem consists of determining a minimum spanning tree of a graph with uncertain edge costs. We should determine a spanning tree that minimizes the difference in costs between the tree selected and the optimal tree. While doing this, all possible realizations of the edge costs should be taken into account. This issue belongs ...

متن کامل

Efficient No-Regret Multiagent Learning

2005

Bikramjit Banerjee Jing Peng

We present new results on the efficiency of no-regret algorithms in the context of multiagent learning. We use a known approach to augment a large class of no-regret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the average actual payoffs of the resulting learner gets (1) close to the best response against (eventually...

متن کامل

Robust Online Optimization of Reward-Uncertain MDPs

2011

Kevin Regan Craig Boutilier

Imprecise-reward Markov decision processes (IRMDPs) are MDPs in which the reward function is only partially specified (e.g., by some elicitation process). Recent work using minimax regret to solve IRMDPs has shown, despite their theoretical intractability, how the set of policies that are nondominated w.r.t. reward uncertainty can be exploited to accelerate regret computation. However, the numb...

متن کامل

Asymptotically minimax regret by Bayes mixtures - Information Theory, 1998. Proceedings. 1998 IEEE International Symposium on

2004

Jun-ichi Takeuchi

We study the problem of data compression, gambling and prediction of a sequence zn = z1z2 ... z, from a certain alphabet X , in terms of regret [4] and redundancy with respect to a general exponential family, a general smooth family, and also Markov sources. In particular, we show that variants of Jeffreys mixture asymptotically achieve their minimax values. These results are generalizations of...

متن کامل

Maximum Likelihood vs. Sequential Normalized Maximum Likelihood in On-line Density Estimation

2011

Wojciech Kotlowski Peter Grünwald

The paper considers sequential prediction of individual sequences with log loss (online density estimation) using an exponential family of distributions. We first analyze the regret of the maximum likelihood (“follow the leader”) strategy. We find that this strategy is (1) suboptimal and (2) requires an additional assumption about boundedness of the data sequence. We then show that both problem...

متن کامل

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

2014

H. Brendan McMahan Francesco Orabona

We study algorithms for online linear optimization in Hilbert spaces, focusing on the case where the player is unconstrained. We develop a novel characterization of a large class of minimax algorithms, recovering, and even improving, several previous results as immediate corollaries. Moreover, using our tools, we develop an algorithm that provides a regret bound ofO ( U √ T log(U √ T log T + 1)...

متن کامل

What Doubling Tricks Can and Can't Do for Multi-Armed Bandits

2018

Lilian Besson Emilie Kaufmann

An online reinforcement learning algorithm is anytime if it does not need to know in advance the horizon T of the experiment. A well-known technique to obtain an anytime algorithm from any nonanytime algorithm is the “Doubling Trick”. In the context of adversarial or stochastic multi-armed bandits, the performance of an algorithm is measured by its regret, and we study two families of sequences...

متن کامل

On Minimizing Ordered Weighted Regrets in Multiobjective Markov Decision Processes

2011

Wlodzimierz Ogryczak Patrice Perny Paul Weng

In this paper, we propose an exact solution method to generate fair policies in Multiobjective Markov Decision Processes (MMDPs). MMDPs consider n immediate reward functions, representing either individual payoffs in a multiagent problem or rewards with respect to different objectives. In this context, we focus on the determination of a policy that fairly shares regrets among agents or objectiv...

متن کامل

Ex-post regret learning in games with fixed and random matching: The case of private values

2010

Rene Saran Roberto Serrano

In contexts in which players have no priors, we analyze a learning process based on ex-post regret as a guide to understand how to play games of incomplete information under private values. The conclusions depend on whether players interact within a fixed set (fixed matching) or they are randomly matched to play the game (random matching). The relevant long run predictions are minimal sets that...

متن کامل

Game-Theoretic Resource Allocation with Real-Time Probabilistic Surveillance Information

2015

Wenjun Ma Weiru Liu Kevin McAreavey

Game-theoretic security resource allocation problems have generated significant interest in the area of designing and developing security systems. These approaches traditionally utilize the Stackelberg game model for security resource scheduling in order to improve the protection of critical assets. The basic assumption in Stackelberg games is that a defender will act first, then an attacker wi...

متن کامل