نتایج جستجو برای: individual regret

تعداد نتایج: 446840  

2006
Amy Greenwald Zheng Li Casey Marks

We introduce a general class of learning algorithms, regret-matching algorithms, and a regret-based framework for analyzing their performance in online decision problems. Our analytic framework is based on a set Φ of transformations over the set of actions. Specifically, we calculate a Φ-regret vector by comparing the average reward obtained by an agent over some finite sequence of rounds to th...

2016
Alex Slivkins

Problem 1: rewards from a small interval. Consider a version of the problem in which all the realized rewards are in the interval [12 , 1 2 + ] for some ∈ (0, 1 2). Define versions of UCB1 and Successive Elimination attain improved regret bounds (both logarithmic and root-T) that depend on the . Hint: Use a more efficient version of Hoeffding Inequality in the slides from the first lecture. It ...

Journal: :Theor. Comput. Sci. 2008
Jan Poland

The nonstochastic multi-armed bandit problem, first studied by Auer, Cesa-Bianchi, Freund, and Schapire in 1995, is a game of repeatedly choosing one decision from a set of decisions (“experts”), under partial observation: In each round t , only the cost of the decision played is observable. A regret minimization algorithm plays this game while achieving sublinear regret relative to each decisi...

Journal: :IEEE Trans. Information Theory 2000
Qun Xie Andrew R. Barron

For problems of data compression, gambling, and prediction of individual sequences 1 the following questions arise. Given a target family of probability mass functions ( 1 ), how do we choose a probability mass function ( 1 ) so that it approximately minimizes the maximum regret /belowdisplayskip10ptminus6pt max (log 1 ( 1 ) log 1 ( 1 )̂) and so that it achieves the best constant in the asymptot...

2017
Gabriel de Oliveira Ramos Bruno Castro da Silva Ana L. C. Bazzan

Reinforcement learning (RL) is a challenging task, especially in highly competitive multiagent scenarios. We consider the route choice problem, in which self-interested drivers aim at choosing routes that minimise their travel times. Employing RL here is challenging because agents must adapt to each others’ decisions. In this paper, we investigate how agents can overcome such condition by minim...

Journal: :Cognition & emotion 2012
Daniel P Weisberg Sarah R Beck

Previous research found that children first experience regret at 5 years and relief at 7. In two experiments, we explored three possibilities for this lag: (1) relief genuinely develops later than regret; (2) tests of relief have previously been artefactually difficult; or (3) evidence for regret resulted from false positives. In Experiment 1 (N=162 4- to 7-year-olds) children chose one of two ...

2000
MARCEL ZEELENBERG

This article deals with the rationality and functionality of the existence of regret and its in ̄ uence on decision making. First, regret is de® ned as a negative, cognitively based emotion that we experience when realizing or imagining that our present situation would have been better had we acted differently. Next, it is discussed whether this experience can be considered rational and it is ar...

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه شیراز - دانشکده الهیات 1392

برای قیامت در قرآن و روایات نام های متفاوتی بیان شده است. هر کدام از این نام‏ها بیانگر بعدی از ابعاد آن روز بوده، و به تنهایی می‏تواند مسائل بسیاری را در این رابطه بازگو کند. به گفته مرحوم "فیض کاشانی" در زیر هر یک از این نام ها سری نهفته شده، و در هر توصیفی معنای مهمی بیان گشته، باید کوشید تا این معانی را درک کرد، و این اسرار را یافت. کتب و مقالاتی در موضوع نام های قیامت به نگارش درآمده است ک...

2016
Fanny Yang

In this section we show how the refined upper bound on the regret of the EXP algorithm proved using the potential function approach (KL divergence) also gives us a better bound for the expert game setup with bandit feedback. Last lecture we showed how in the case of expert prediction with bandit feedback using the Exp3 algorithm, the regret is upper bounded by T 2/3n1/3 using a rough upper boun...

2013
Wei Han Alexander Rakhlin Karthik Sridharan

We study the problem of online learning with a notion of regret defined with respect to a set of strategies. We develop tools for analyzing the minimax rates and for deriving regret-minimization algorithms in this scenario. While the standard methods for minimizing the usual notion of regret fail, through our analysis we demonstrate existence of regret-minimization methods that compete with suc...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید