Policy iterations

نتایج جستجو برای: Policy iterations

تعداد نتایج: 276392 فیلتر نتایج به سال:

optimal adaptive leader-follower consensus of linear multi-agent systems: known and unknown dynamics

Journal: :journal of ai and data mining 2015

f. tatari m. b. naghibi-sistani

in this paper, the optimal adaptive leader-follower consensus of linear continuous time multi-agent systems is considered. the error dynamics of each player depends on its neighbors’ information. detailed analysis of online optimal leader-follower consensus under known and unknown dynamics is presented. the introduced reinforcement learning-based algorithms learn online the approximate solution...

متن کامل

Practical policy iterations - A practical use of policy iterations for static analysis: the quadratic case

Journal: :Formal Methods in System Design 2015

Pierre Roux Pierre-Loïc Garoche

Policy iterations is a technique based on game theory that relies on a sequence of numerical optimization queries to compute the fixpoint of a set of equations. It has been proposed to support the static analysis of programs as an alternative to widening, when the latter is ineffective. This happens for instance with highly numerical codes, such as found at cores of control command applications...

متن کامل

A Sums-of-Squares extension of policy iterations

Journal: :Nonlinear Analysis: Hybrid Systems 2017

متن کامل

Integrating Policy Iterations in Abstract Interpreters

2013

Pierre Roux Pierre-Loïc Garoche

Among precise abstract interpretation methods developed during the last decade, policy iterations is one of the most promising. Despite its efficiency, it has not yet seen a broad usage in static analyzers. We believe the main explanation to this restrictive use, beside the novelty of the technique, lies in its lack of integration in the classic abstract domain framework. This prevents an easy ...

متن کامل

The value iteration algorithm is not strongly polynomial for discounted dynamic programming

Journal: :Oper. Res. Lett. 2014

Eugene A. Feinberg Jefferson Huang

This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iter...

متن کامل

On Constraints on the Search Path of Policy Iteration

1999

Omid Madani

We describe a few structural properties enjoyed by the policy space of problems such as in nite-horizon MDPs. From these properties we derive constraints limiting the number of iterations of algorithms such as the policy iteration algorithm for in nite-horizon MDPs and the Ho man-Karp algorithm for simple stochastic games. An open problem is to characterize the growth of the worst-case number o...

متن کامل

Model-Based Policy Iterations for Nonlinear Systems via Controlled Hamiltonian Dynamics

Journal: :IEEE Transactions on Automatic Control 2023

The infinite-horizon optimal control problem for nonlinear systems is studied. In the context of model-based, iterative learning strategies we propose an alternative definition and construction temporal difference error arising in Policy Iteration strategies. such architectures error computed via evolution Hamiltonian function (or, possibly, its integral) along trajectories closed-loop s...

متن کامل

Batch-Switching Policy Iteration

2016

Shivaram Kalyanakrishnan Utkarsh Mall Ritish Goyal

Policy Iteration (PI) is a widely-used family of algorithms for computing an optimal policy for a given Markov Decision Problem (MDP). Starting with an arbitrary initial policy, PI repeatedly updates to a dominating policy until an optimal policy is found. The update step involves switching the actions corresponding to a set of “improvable” states, which are easily identified. Whereas progress ...

متن کامل

A Sums-of-Squares Extension of Policy Iterations

Journal: :CoRR 2015

Assalé Adjé Pierre-Loïc Garoche Victor Magron

In order to address the imprecision often introduced by widening operators, policy iteration based on min-computations amounts to consider the characterization of reachable states of a program as an iterative computation of policies, starting from a post-fixpoint. Computing each policy and the associated invariant relies on a sequence of numerical optimizations. While the early papers rely on L...

متن کامل

Approximate Policy Iteration Schemes: A Comparison

2014

Bruno Scherrer

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration (API) (Bertsekas & Tsitsiklis, 1996), Conservative Policy Iteration (CPI) (Kakade & Langford, 2002), a natural adaptation of the Policy Search by Dynamic Programming algorithm (Bagn...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید