نتایج جستجو برای: q policy

تعداد نتایج: 381585  

Journal: :CoRR 2017
Pierre H. Richemond Brendan Maginnis

Two main families of reinforcement learning algorithms, Q-learning and policy gradients, have recently been proven to be equivalent when using a softmax relaxation on one part, and an entropic regularization on the other. We relate this result to the well-known convex duality of Shannon entropy and the softmax function. Such a result is also known as the Donsker-Varadhan formula. This provides ...

Journal: :Automatica 2008
Shalabh Bhatnagar K. Mohan Babu

We propose two algorithms for Q-learning that use the two timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state-action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments usin...

2011
Marco Battaglini Salvatore Nunnari Thomas Palfrey

Legislative Bargaining and the Dynamics of Public Investment by Marco Battaglini, Salvatore Nunnari, Thomas Palfrey * We present a legislative bargaining model of the provision of a durable public good over an infinite horizion. In each period, there is a societal endowment which can either be invested in the public good or consumed. We characterize the optimal public policy, defined by the tim...

2013
Sherief Abdallah Michael Kaisers

Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal policies in Markov decision processes. However, Q-learning shows artifacts in non-stationary environments, e.g., the probability of playing the optimal action may decrease if Q-values deviate significantly from the true values, a situation that may arise in the initial phase as well as after change...

Journal: :Cancer discovery 2014
Howard Koh American Association Cancer

To mark the 50th anniversary of the Surgeon General's first report on smoking and health, and to promote this year's report, Howard Koh, MD, MPH, assistant secretary for health at the U.S. Department of Health and Human Services, spoke about the importance of and the continuing need for tobacco-control efforts.

2008
Bibhas Chakraborty Victor Strecher Susan Murphy

We consider finite-horizon fitted Q-iteration with linear function approximation to learn a policy from a training set of trajectories. We show that fitted Q-iteration can give biased estimates and invalid confidence intervals for the parameters that feature in the policy. We propose a regularized estimator called soft-threshold estimator, derive it as an approximate empirical Bayes estimator, ...

Journal: :IJSDS 2011
Veena Goswami G. B. Mund

This paper analyzes a discrete-time infinite-buffer Geo/Geo/2 queue, in which the number of servers can be adjusted depending on the number of customers in the system one at a time at arrival or at service completion epoch. Analytical closed-form solutions of the infinite-buffer Geo/Geo/2 queueing system operating under the triadic (0, Q N, M) policy are derived. The total expected cost functio...

Journal: :IEEE transactions on neural networks 1999
Junhong Nie Simon Haykin

One of the fundamental issues in the operation of a mobile communication system is the assignment of channels to cells and to calls. Since the number of channels allocated to a mobile communication system is limited, efficient utilization of these communication channels by using efficient channel assignment strategies is not only desirable but also imperative. This paper presents a novel approa...

2000
Fabio C. Bagliano Alberto Dalmazzo Giancarlo Marini

In a model of oligopolistic competition in the banking sector, we analyse how the monetary policy rule chosen by the Central Bank can in ̄uence the incentive of banks to set high interest rates on loans over the business cycle. We exploit the basic model to investigate the potential impact of EMU implementation on collusion among banks. In particular, we consider the possible e€ects of the Europ...

2016
Matthew Hausknecht

Temporal-difference-based deep-reinforcement learning methods have typically been driven by off-policy, bootstrap Q-Learning updates. In this paper, we investigate the effects of using on-policy, Monte Carlo updates. Our empirical results show that for the DDPG algorithm in a continuous action space, mixing on-policy and off-policy update targets exhibits superior performance and stability comp...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید