نتایج جستجو برای: minimax regret
تعداد نتایج: 12162 فیلتر نتایج به سال:
The problem of finding a robust spanning tree has been analysed. The problem consists of determining a minimum spanning tree of a graph with uncertain edge costs. We should determine a spanning tree that minimizes the difference in costs between the tree selected and the optimal tree. While doing this, all possible realizations of the edge costs should be taken into account. This issue belongs ...
We present new results on the efficiency of no-regret algorithms in the context of multiagent learning. We use a known approach to augment a large class of no-regret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the average actual payoffs of the resulting learner gets (1) close to the best response against (eventually...
Imprecise-reward Markov decision processes (IRMDPs) are MDPs in which the reward function is only partially specified (e.g., by some elicitation process). Recent work using minimax regret to solve IRMDPs has shown, despite their theoretical intractability, how the set of policies that are nondominated w.r.t. reward uncertainty can be exploited to accelerate regret computation. However, the numb...
We study the problem of data compression, gambling and prediction of a sequence zn = z1z2 ... z, from a certain alphabet X , in terms of regret [4] and redundancy with respect to a general exponential family, a general smooth family, and also Markov sources. In particular, we show that variants of Jeffreys mixture asymptotically achieve their minimax values. These results are generalizations of...
The paper considers sequential prediction of individual sequences with log loss (online density estimation) using an exponential family of distributions. We first analyze the regret of the maximum likelihood (“follow the leader”) strategy. We find that this strategy is (1) suboptimal and (2) requires an additional assumption about boundedness of the data sequence. We then show that both problem...
Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations
We study algorithms for online linear optimization in Hilbert spaces, focusing on the case where the player is unconstrained. We develop a novel characterization of a large class of minimax algorithms, recovering, and even improving, several previous results as immediate corollaries. Moreover, using our tools, we develop an algorithm that provides a regret bound ofO ( U √ T log(U √ T log T + 1)...
An online reinforcement learning algorithm is anytime if it does not need to know in advance the horizon T of the experiment. A well-known technique to obtain an anytime algorithm from any nonanytime algorithm is the “Doubling Trick”. In the context of adversarial or stochastic multi-armed bandits, the performance of an algorithm is measured by its regret, and we study two families of sequences...
In this paper, we propose an exact solution method to generate fair policies in Multiobjective Markov Decision Processes (MMDPs). MMDPs consider n immediate reward functions, representing either individual payoffs in a multiagent problem or rewards with respect to different objectives. In this context, we focus on the determination of a policy that fairly shares regrets among agents or objectiv...
In contexts in which players have no priors, we analyze a learning process based on ex-post regret as a guide to understand how to play games of incomplete information under private values. The conclusions depend on whether players interact within a fixed set (fixed matching) or they are randomly matched to play the game (random matching). The relevant long run predictions are minimal sets that...
Game-theoretic security resource allocation problems have generated significant interest in the area of designing and developing security systems. These approaches traditionally utilize the Stackelberg game model for security resource scheduling in order to improve the protection of critical assets. The basic assumption in Stackelberg games is that a defender will act first, then an attacker wi...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید