نتایج جستجو برای: bellman
تعداد نتایج: 4956 فیلتر نتایج به سال:
In this paper, we obtained some new Gronwall-Bellman type integral inequalities and we give some consequences of our results.
Reinforcement learning algorithms often work by finding functions that satisfy the Bellman equation. This yields an optimal solution for prediction with Markov chains and for controlling a Markov decision process (MDP) with a finite number of states and actions. This approach is also frequently applied to Markov chains and MDPs with infinite states. We show that, in this case, the Bellman equat...
The paper is concerned with fully nonlinear second order Hamilton{Jacobi{Bellman{ Isaacs equations of elliptic type in separable Hilbert spaces which have unbounded rst and second order terms. The viscosity solution approach is adapted to the equations under consideration and the existence and uniqueness of viscosity solutions is proved. A stochastic optimal control problem driven by a paraboli...
In a previous paper, we showed how classical ideas for dynamic programming in discrete networks can be adapted to hybrid systems. The approach is based on discretization of the continuous Bellman inequality which gives a lower bound on the optimal cost. The lower bound is maximized by linear programming to get an approximation of the optimal solution. In this paper, we apply ideas from infinite...
In this paper we give explicit error bounds for approximations of the optimal policy function in the stochastic dynamic programming problem. The approximated policy function is obtained by using the Bellman equation with an approximated value function and the error bounds depend on the primitive data of the problem. Neither differentiability of the return function nor interiority of solutions i...
We consider a special case of partially observable Markov decision processes that arises when state information is perfect but arrives with a delay. We rst formulate the decision process in its standard form and derive the Bellman equation that corresponds to it. We then introduce a second decision process that has a much simpler Bellman equation than the rst, and is therefore, in general, much...
In this paper we present two versions of AntNet, a novel approach to adaptive learning of routing tables in wide area best-effort datagram networks. AntNet is a distributed multi-agent system inspired by the stigmergy model of communication observed in ant colonies. We report simulation results for AntNet on realistically sized networks using as performance measures throughput, packet delays an...
v1(°'(t), we obtain a system of linear equations whose solution is clearly xi = x,(°'. It follows that a set of forcing functions which minimize G subject to the linear equations of (2.1), together with the original constraints, yields a value of G which is at most G(x1(P)(T), . . . , xv(z)(T)). The general result follows inductively. This monotonicity is not surprising, since we are using the ...
We present two dynamic programming strategies for a general class of decision processes. Each of these algorithms includes among others the following graph theoretic optimization algorithms as special cases: the Ford-Bellman Strategy for optimal paths in acyclic digraphs, the Greedy Method for optimal forests and spanning trees in undirected graphs. In our general decision model, we deene sever...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید