نتایج جستجو برای: minimax regret
تعداد نتایج: 12162 فیلتر نتایج به سال:
The design of a minimum risk classifier based on data usually stems from the stationarity assumption that the conditions during training and test are the same: the misclassification costs assumed during training must be in agreement with real costs, and the same statistical process must have generated both training and test data. Unfortunately, in real world applications, these assumptions may ...
Bandit convex optimization is a special case of online convex optimization with partial information. In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at a single point. In some cases, the minimax regret of these problems is known to be strictly worse than the minimax regret in the correspo...
We consider a sequential learning problem with Gaussian payoffs and side observations: after selecting an action i, the learner receives information about the payoff of every action j in the form of Gaussian observations whose mean is the same as the mean payoff, but the variance depends on the pair (i, j) (and may be infinite). The setup allows a more refined information transfer from one acti...
We consider decision problems under complete ignorance and extend the minimax regret principle to situations where, after taking an action, maker does not necessarily learn state of world. For example, if only learns what outcome is, then all she knows is that actual one (possibly several) states yield observed chosen action. refer this situation as imperfect ex-post information. show that, giv...
We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of Õ( √ HSAT+HSA+H √ T ) where H is the time horizon, S the number of states, A the number of actions and T the number of timesteps. This result improves over the best previous known bound Õ(HS √ AT ) achiev...
Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.
The paper considers a classical optimization problem on a network whose arc costs are partially known. It is assumed that an interval estimate is given for each arc cost and no further information about the statistical distribution of the truth value of the arc cost is known. In this context, given a spanning arborescence in the network, its cost can take on different values according to the ch...
In the last three lectures we have been discussing the online learning algorithms where we receive the instance x and then its label y for t = 1, ..., T . Specifically in the last lecture we talked about online learning from experts and online prediction. We saw many algorithms like Halving algorithm, Weighted Majority (WM) algorithm and lastly Weighted Majority Continuous (WMC) algorithm. We a...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید