نتایج جستجو برای: geo grid reinforcement
تعداد نتایج: 139042 فیلتر نتایج به سال:
Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains of how to compute the potential function which is used to shape the reward that is given to the learning agent. I...
Due to the unavoidable fact that a robot’s sensors will be limited in some manner, it is entirely possible that it can find itself unable to distinguish between differing states of the world (the world is in effect partially observable). If reinforcement learning is used to train the robot, then this confounding of states can have a serious effect on its ability to learn optimal and stable poli...
Smart Grid is the trend of next generation electrical power system which makes the power grid intelligent and energy efficient. It requires high level of network reliability to support the two-way communication among electrical services, electrical units such as smart meters, and applications. The wireless mesh network infrastructure can provide redundant routes for the Smart Grid communication...
An explicit exploration strategy is necessary in reinforcement learning (RL) to balance the need to reduce the uncertainty associated with the expected outcome of an action and the need to converge to a solution. This dependency is more acute in on-policy reinforcement learning where the exploration guides the search for an optimal solution. The need for a self-regulating exploration is manifes...
This paper describes and evaluates the performance of various reinforcement learning algorithms with shortest path algorithms that are widely used for routing packets through the network. Shortest path routing is the simplest policy used for routing the packets along the path having minimum number of hops. In high traffic or high mobility conditions, the shortest path get flooded with huge numb...
Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. However, in practical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. Our main theorem yields sufficient ...
We propose local error estimates together with algorithms for adap-tive a-posteriori grid and time reenement in reinforcement learning. We consider a deterministic system with continuous state and time with innnite horizon discounted cost functional. For grid re-nement we follow the procedure of numerical methods for the Bellman-equation. For time reenement we propose a new criterion, based on ...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید