q learning

Deep Reinforcement Learning with Double Q-Learning

2016

Hado van Hasselt Arthur Guez David Silver

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether this harms performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-le...

متن کامل

Double Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms

2017

Markus Dumke

Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q(σ) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning. Experiments sugges...

متن کامل

Vision-based Robotic Arm Imitation by Human Gesture

Journal: :CoRR 2017

Zhiqiang Tang Jinxin Xu

One of the most efficient ways for a learning-based robotic arm to learn to process complex tasks as human, is to directly learn from observing how human complete those tasks, and then imitate. Our idea is based on success of Deep Q-Learning (DQN) algorithm according to reinforcement learning, and then extend to Deep Deterministic Policy Gradient (DDPG) algorithm. We developed a learning-based ...

متن کامل

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email

1998

Marilyn A. Walker Jeanne Frommer Shrikanth S. Narayanan

This paper describes a novel method by which a dialogue agent can learn to choose an optimal dialogue strategy. While it is widely agreed that dialogue strategies should be formulated in terms of communicative intentions, there has been little work on automatically optimizing an agent's choices when there are multiple ways to realize a communicative intention. Our method is based on a combinati...

متن کامل

Unsupervised training of adaptation rate using q-learning in large vocabulary continuous speech recognition

2007

Masafumi Nishida Yasuo Horiuchi Akira Ichikawa

This paper describes a novel approach based on unsupervised training of the MAP adaptation rate using Q-learning. Qlearning is a reinforcement learning technique and is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. The proposed method defines the likelihood of the adapted mod...

متن کامل

Learning to Explore with Meta-Policy Gradient

2018

Tianbing Xu Qiang Liu Liang Zhao Jian Peng

The performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy. Existing exploration methods are mostly based on adding noise to the on-going actor policy and can only explore local regions close to what the actor policy dictates. In this work, we develop a simple meta-policy gradient al...

متن کامل

Convergence of Optimistic and Incremental Q-Learning

2001

Eyal Even-Dar Yishay Mansour

Yishay Mansourt Vie sho,v the convergence of tV/O deterministic variants of Qlearning. The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an Eoptimal policy. The second is a new and novel algo...

متن کامل

A Q-learning Based Continuous Tuning of Fuzzy Wall Tracking

Journal: International Journal of Engineering 2012

A. Ebrahimzadeh, Sepideh Valiollahi,

A simple easy to implement algorithm is proposed to address wall tracking task of an autonomous robot. The robot should navigate in unknown environments, find the nearest wall, and track it solely based on locally sensed data. The proposed method benefits from coupling fuzzy logic and Q-learning to meet requirements of autonomous navigations. Fuzzy if-then rules provide a reliable decision maki...

متن کامل

An Analysis of the Pheromone Q-Learning Algorithm

2002

Dorothy Ndedi Monekosso Paolo Remagnino

The Phe-Q machine learning technique, a modified Q-learning technique, was developed to enable co-operating agents to communicate in learning to solve a problem. The Phe-Q learning technique combines Q-learning with synthetic pheromone to improve on the speed of convergence. The Phe-Q update equation includes a belief factor that reflects the confidence the agent has in the pheromone (the commu...

متن کامل

Hybrid Q-Learning Applied to Ubiquitous recommender system

Journal: :CoRR 2013

Djallel Bouneffouf

Ubiquitous information access becomes more and more important nowadays and research is aimed at making it adapted to users. Our work consists in applying machine learning techniques in order to bring a solution to some of the problems concerning the acceptance of the system by users. To achieve this, we propose a fundamental shift in terms of how we model the learning of recommender system: ins...

متن کامل