نتایج جستجو برای: Policy iterations

تعداد نتایج: 276392  

Journal: :journal of ai and data mining 2015
f. tatari m. b. naghibi-sistani

in this paper, the optimal adaptive leader-follower consensus of linear continuous time multi-agent systems is considered. the error dynamics of each player depends on its neighbors’ information. detailed analysis of online optimal leader-follower consensus under known and unknown dynamics is presented. the introduced reinforcement learning-based algorithms learn online the approximate solution...

Journal: :Formal Methods in System Design 2015
Pierre Roux Pierre-Loïc Garoche

Policy iterations is a technique based on game theory that relies on a sequence of numerical optimization queries to compute the fixpoint of a set of equations. It has been proposed to support the static analysis of programs as an alternative to widening, when the latter is ineffective. This happens for instance with highly numerical codes, such as found at cores of control command applications...

Journal: :Nonlinear Analysis: Hybrid Systems 2017

2013
Pierre Roux Pierre-Loïc Garoche

Among precise abstract interpretation methods developed during the last decade, policy iterations is one of the most promising. Despite its efficiency, it has not yet seen a broad usage in static analyzers. We believe the main explanation to this restrictive use, beside the novelty of the technique, lies in its lack of integration in the classic abstract domain framework. This prevents an easy ...

Journal: :Oper. Res. Lett. 2014
Eugene A. Feinberg Jefferson Huang

This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iter...

1999
Omid Madani

We describe a few structural properties enjoyed by the policy space of problems such as in nite-horizon MDPs. From these properties we derive constraints limiting the number of iterations of algorithms such as the policy iteration algorithm for in nite-horizon MDPs and the Ho man-Karp algorithm for simple stochastic games. An open problem is to characterize the growth of the worst-case number o...

Journal: :IEEE Transactions on Automatic Control 2023

The infinite-horizon optimal control problem for nonlinear systems is studied. In the context of model-based, iterative learning strategies we propose an alternative definition and construction temporal difference error arising in Policy Iteration strategies. such architectures error computed via evolution Hamiltonian function (or, possibly, its integral) along trajectories closed-loop s...

2016
Shivaram Kalyanakrishnan Utkarsh Mall Ritish Goyal

Policy Iteration (PI) is a widely-used family of algorithms for computing an optimal policy for a given Markov Decision Problem (MDP). Starting with an arbitrary initial policy, PI repeatedly updates to a dominating policy until an optimal policy is found. The update step involves switching the actions corresponding to a set of “improvable” states, which are easily identified. Whereas progress ...

Journal: :CoRR 2015
Assalé Adjé Pierre-Loïc Garoche Victor Magron

In order to address the imprecision often introduced by widening operators, policy iteration based on min-computations amounts to consider the characterization of reachable states of a program as an iterative computation of policies, starting from a post-fixpoint. Computing each policy and the associated invariant relies on a sequence of numerical optimizations. While the early papers rely on L...

2014
Bruno Scherrer

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration (API) (Bertsekas & Tsitsiklis, 1996), Conservative Policy Iteration (CPI) (Kakade & Langford, 2002), a natural adaptation of the Policy Search by Dynamic Programming algorithm (Bagn...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید