نتایج جستجو برای: geo grid reinforcement

تعداد نتایج: 139042  

Journal: :Neural networks : the official journal of the International Neural Network Society 2010
Marek Grzes Daniel Kudenko

Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains of how to compute the potential function which is used to shape the reward that is given to the learning agent. I...

2003
Paul A. Crook Gillian Hayes

Due to the unavoidable fact that a robot’s sensors will be limited in some manner, it is entirely possible that it can find itself unable to distinguish between differing states of the world (the world is in effect partially observable). If reinforcement learning is used to train the robot, then this confounding of states can have a serious effect on its ability to learn optimal and stable poli...

Journal: :JIP 2014
Ming Xiang Quan Bai William Liu

Smart Grid is the trend of next generation electrical power system which makes the power grid intelligent and energy efficient. It requires high level of network reliability to support the two-way communication among electrical services, electrical units such as smart meters, and applications. The wireless mesh network infrastructure can provide redundant routes for the Smart Grid communication...

2004
Myriam Abramson

An explicit exploration strategy is necessary in reinforcement learning (RL) to balance the need to reduce the uncertainty associated with the expected outcome of an action and the need to converge to a solution. This dependency is more acute in on-policy reinforcement learning where the exploration guides the search for an optimal solution. The need for a self-regulating exploration is manifes...

2016
Rahul Desai B P Patil

This paper describes and evaluates the performance of various reinforcement learning algorithms with shortest path algorithms that are widely used for routing packets through the network. Shortest path routing is the simplest policy used for routing the packets along the path having minimum number of hops. In high traffic or high mobility conditions, the shortest path get flooded with huge numb...

2002
Ralf Schoknecht Artur Merke

Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. However, in practical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. Our main theorem yields sufficient ...

1997
Stephan Pareigis

We propose local error estimates together with algorithms for adap-tive a-posteriori grid and time reenement in reinforcement learning. We consider a deterministic system with continuous state and time with innnite horizon discounted cost functional. For grid re-nement we follow the procedure of numerical methods for the Bellman-equation. For time reenement we propose a new criterion, based on ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید