نتایج جستجو برای: policy iterations
تعداد نتایج: 276392 فیلتر نتایج به سال:
In this paper, we report a performance bound for the widely used least-squares policy iteration (LSPI) algorithm. We first consider the problem of policy evaluation in reinforcement learning, that is, learning the value function of a fixed policy, using the least-squares temporal-difference (LSTD) learning method, and report finite-sample analysis for this algorithm. To do so, we first derive a...
The backpressure routing policy is known to be a throughput optimal policy that supports any feasible traffic demand in data networks, but may have poor delay performance when packets traverse loops in the network. In this paper, we study loop-free backpressure routing policies that forward packets along directed acyclic graphs (DAGs) to avoid the looping problem. These policies use link revers...
Most locomotion control strategies are developed for flat terrain. We explore the use of reinforcement learning to develop motor skills for the highly dynamic traversal of terrains having sequences of gaps, walls, and steps. Results are demonstrated using simulations of a 21-link planar dog and a 7-link planar biped. Our approach is characterized by: non-parametric representation of the value f...
In a previous paper, we introduced a coordinate-splitting (CS) form of the equations of motion for multibody systems which together with a modiied nonlinear iteration (CM), is particularly eeective in the solution of certain nonlinear highly oscillatory systems. In this paper, we examine the convergence of the CS and CM iterations and explain the improved convergence of the CM iteration. An exa...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید