نتایج جستجو برای: q learning

تعداد نتایج: 717428  

2016
Hado van Hasselt Arthur Guez David Silver

The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether this harms performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-le...

2017
Markus Dumke

Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q(σ) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning. Experiments sugges...

Journal: :CoRR 2017
Zhiqiang Tang Jinxin Xu

One of the most efficient ways for a learning-based robotic arm to learn to process complex tasks as human, is to directly learn from observing how human complete those tasks, and then imitate. Our idea is based on success of Deep Q-Learning (DQN) algorithm according to reinforcement learning, and then extend to Deep Deterministic Policy Gradient (DDPG) algorithm. We developed a learning-based ...

1998
Marilyn A. Walker Jeanne Frommer Shrikanth S. Narayanan

This paper describes a novel method by which a dialogue agent can learn to choose an optimal dialogue strategy. While it is widely agreed that dialogue strategies should be formulated in terms of communicative intentions, there has been little work on automatically optimizing an agent's choices when there are multiple ways to realize a communicative intention. Our method is based on a combinati...

2007
Masafumi Nishida Yasuo Horiuchi Akira Ichikawa

This paper describes a novel approach based on unsupervised training of the MAP adaptation rate using Q-learning. Qlearning is a reinforcement learning technique and is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. The proposed method defines the likelihood of the adapted mod...

2018
Tianbing Xu Qiang Liu Liang Zhao Jian Peng

The performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy. Existing exploration methods are mostly based on adding noise to the on-going actor policy and can only explore local regions close to what the actor policy dictates. In this work, we develop a simple meta-policy gradient al...

2001
Eyal Even-Dar Yishay Mansour

Yishay Mansourt Vie sho,v the convergence of tV/O deterministic variants of Qlearning. The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an Eoptimal policy. The second is a new and novel algo...

A simple easy to implement algorithm is proposed to address wall tracking task of an autonomous robot. The robot should navigate in unknown environments, find the nearest wall, and track it solely based on locally sensed data. The proposed method benefits from coupling fuzzy logic and Q-learning to meet requirements of autonomous navigations. Fuzzy if-then rules provide a reliable decision maki...

2002
Dorothy Ndedi Monekosso Paolo Remagnino

The Phe-Q machine learning technique, a modified Q-learning technique, was developed to enable co-operating agents to communicate in learning to solve a problem. The Phe-Q learning technique combines Q-learning with synthetic pheromone to improve on the speed of convergence. The Phe-Q update equation includes a belief factor that reflects the confidence the agent has in the pheromone (the commu...

Journal: :CoRR 2013
Djallel Bouneffouf

Ubiquitous information access becomes more and more important nowadays and research is aimed at making it adapted to users. Our work consists in applying machine learning techniques in order to bring a solution to some of the problems concerning the acceptance of the system by users. To achieve this, we propose a fundamental shift in terms of how we model the learning of recommender system: ins...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید