نتایج جستجو برای: Temporal Difference Learning

تعداد نتایج: 1222164  

2006
David Stracuzzi Nima Asgharbeygi

The ability to transfer knowledge from one domain to another is an important aspect of learning. Knowledge transfer increases learning efficiency by freeing the learner from duplicating past efforts. In this paper, we demonstrate how reinforcement learning agents can use relational representations to transfer knowledge across related domains.

1997
David J. Foster Richard G. M. Morris Peter Dayan

Peter Dayan E25-210, MIT Cambridge, MA 02139 We provide a model of the standard watermaze task, and of a more challenging task involving novel platform locations, in which rats exhibit one-trial learning after a few days of training. The model uses hippocampal place cells to support reinforcement learning, and also, in an integrated manner, to build and use allocentric coordinates.

2015
Mayank Daswani Jan Leike

What is happiness for reinforcement learning agents? We seek a formal definition satisfying a list of desiderata. Our proposed definition of happiness is the temporal difference error, i.e. the difference between the value of the obtained reward and observation and the agent’s expectation of this value. This definition satisfies most of our desiderata and is compatible with empirical research o...

2013
Clement Gehring Doina Precup

Exploration is still one of the crucial problems in reinforcement learning, especially for agents acting in safety-critical situations. We propose a new directed exploration method, based on a notion of state controlability. Intuitively, if an agent wants to stay safe, it should seek out states where the effects of its actions are easier to predict; we call such states more controllable. Our ma...

Journal: :IEEE Transactions on Automatic Control 2021

Value functions derived from Markov decision processes arise as a central component of algorithms well performance metrics in many statistics and engineering applications machine learning. Computation the solution to associated Bellman equations is challenging most practical cases interest. A popular class approximation techniques, known temporal difference (TD) learning algorithms, are an impo...

2002
Ari Shapiro Gil Fuchs Robert Levinson

This paper demonstrates the use of pattern-weights in order to develop a strategy for an automated player of a non-cooperative version of the game of Diplomacy. Diplomacy is a multi-player, zerosum and simultaneous move game with imperfect information. Patternweights represent stored knowledge of various aspects of a game that are learned through experience. An automated computer player is deve...

2017
Christopher Lockhart

APPLICATION OF TEMPORAL DIFFERENCE LEARNING TO THE GAME OF SNAKE Christopher Lockhart

2002
ESWAR SIVARAMAN Martin T. Hagan Eswar Sivaraman

Submittedinpartialfulfillmentof thecourserequirementsfor " NeuralNetworks " ECEN5733 May2000

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید