نتایج جستجو برای: q policy
تعداد نتایج: 381585 فیلتر نتایج به سال:
1 Q I − − ) ( y , where the j th entry of the row vector y is the probability that the system state seen by the first arrival during a busy period is j and 1 Q I − − ) ( is the fundamental matrix associated with the standard GI/M/1 queue. In this paper, we present the entries of 1 Q I − − ) ( explicitly. Also, we illustrate how to find y by examples such as the N -policy GI/M/1 queue with or wi...
This paper explores the performance of fitted neural Q iteration for reinforcement learning in several partially observable environments, using three recurrent neural network architectures: Long ShortTerm Memory [7], Gated Recurrent Unit [3] and MUT1, a recurrent neural architecture evolved from a pool of several thousands candidate architectures [8]. A variant of fitted Q iteration, based on A...
We review the deep reinforcement learning setting, in which an agent receiving high-dimensional input from an environment learns a control policy without supervision using multilayer neural networks. We then extend the Neural Fitted Q Iteration value-based reinforcement learning algorithm (Riedmiller et al) by introducing a novel variation which we call Regularized Convolutional Neural Fitted Q...
This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in offpolicy gradient temporal-difference learning....
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید