نتایج جستجو برای: regret minimization

تعداد نتایج: 37822  

2016
Noam Brown Tuomas Sandholm

Counterfactual Regret Minimization (CFR) is a popular iterative algorithm for approximating Nash equilibria in imperfect-information multi-step two-player zero-sum games. We introduce the first general, principled method for warm starting CFR. Our approach requires only a strategy for each player, and accomplishes the warm start at the cost of a single traversal of the game tree. The method pro...

Journal: :CoRR 2017
Peter H. Jin Sergey Levine Kurt Keutzer

Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial or non-Markovian observations ...

2016
Huanan Zhang Cong Shi Chao Qin Cheng Hua

Problems with Nonstationary Demands Huanan Zhang∗, Cong Shi∗, Chao Qin†, Cheng Hua‡ ∗ Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI 48109 {zhanghn, shicong}@umich.edu † Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL 60208, [email protected] ‡ Yale School of Management, Yale University, New Haven, CT 06511, cheng....

2014
Adrian Aloysius BOCK Adrian Bock

In this thesis, we present new approximation algorithms as well as hardness of approximation results for N P-hard vehicle routing problems related to public transportation. We consider two different problem classes that also occur frequently in areas such as logistics, robotics, or distribution systems. For the first problem class, the goal is to visit as many locations in a network as possible...

2012
Michael Johanson Nolan Bard Marc Lanctot Richard G. Gibson Michael H. Bowling

Recently, there has been considerable progress towards algorithms for approximating Nash equilibrium strategies in extensive games. One such algorithm, Counterfactual Regret Minimization (CFR), has proven to be effective in two-player zero-sum poker domains. While the basic algorithm is iterative and performs a full game traversal on each iteration, sampling based approaches are possible. For i...

2016
Oren Anava Shie Mannor

We address the problem of sequential prediction in the heteroscedastic setting, when both the signal and its variance are assumed to depend on explanatory variables. By applying regret minimization techniques, we devise an efficient online learning algorithm for the problem, without assuming that the error terms comply with a specific distribution. We show that our algorithm can be adjusted to ...

2013
Oren Anava Elad Hazan Shie Mannor Ohad Shamir

In this paper we address the problem of predicting a time series using the ARMA (autoregressive moving average) model, under minimal assumptions on the noise terms. Using regret minimization techniques, we develop effective online learning algorithms for the prediction problem, without assuming that the noise terms are Gaussian, identically distributed or even independent. Furthermore, we show ...

2017
Pierre Ménard Aurélien Garivier

We propose the kl-UCB algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins’ lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with s...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید