نتایج جستجو برای: policy iterations

تعداد نتایج: 276392  

2015
Víctor Uc-Cetina Francisco Moo-Mena Rafael Hernandez-Ucan

We propose a Markov decision process model for solving the Web service composition (WSC) problem. Iterative policy evaluation, value iteration, and policy iteration algorithms are used to experimentally validate our approach, with artificial and real data. The experimental results show the reliability of the model and the methods employed, with policy iteration being the best one in terms of th...

2011
Alexander Hans Steffen Udluft

Reinforcement learning (RL) methods employing powerful function approximators like neural networks have become an interesting approach for optimal control. Since they learn a policy from observations, they are also applicable when no analytical description of the system is available. Although impressive results have been reported, their handling in practice is still hard, as they can fail at re...

2004
ASOK RAY JINBO FU

This paper presents optimal supervisory control of dynamical systems that can be represented by deterministic finite state automaton (DFSA) models. The performance index for the optimal policy is obtained by combining a measure of the supervised plant language with (possible) penalty on disabling of controllable events. The signed real measure quantifies the behaviour of controlled sublanguages...

Kazuhiko Himuro, Masayuki Sasaki, Shingo Baba, Shinichi Awamoto Yoshiyuki Umezu Yuji Tsutsui,

Objective(s): We evaluated edge artifacts in relation to phantom diameter and reconstruction parameters in point spread function (PSF)-based positron emission tomography (PET) image reconstruction.Methods: PET data were acquired from an original cone-shaped phantom filled with 18F solution (21.9 kBq/mL) for 10 min using a Biograph mCT scanner. The images were reconstructed using the baseline or...

Journal: :JORS 2015
Edmund J. Collins

We introduce a class of models for multidimensional control problems which we call skip-free Markov decision processes on trees. We describe and analyse an algorithm applicable to Markov decision processes of this type that are skip-free in the negative direction. Starting with the finite average cost case, we show that the algorithm combines the advantages of both value iteration and policy it...

Journal: :Automatica 2005
Constantino M. Lagoa Jinbo Fu Asok Ray

7 This paper presents an algorithm for robust optimal control of regular languages under specified uncertainty bounds on the event cost parameters of the language measure that has been recently reported in literature. The performance index for the proposed robust optimal 9 policy is obtained by combining the measure of the supervised plant language with uncertainty. The performance of a control...

2016
Egor George Karpenkov David Monniaux Philipp Wendler

We present local policy iteration (LPI), a new algorithm for deriving numerical invariants that combines the precision of max-policy iteration with the flexibility and scalability of conventional Kleene iterations. It is defined in the Configurable Program Analysis (CPA) framework, thus allowing inter-analysis communication. LPI uses adjustable-block encoding in order to traverse loop-free prog...

2001
Marianne Akian Agnès Sulem Michael I. Taksar

We study the optimal investment policy for an investor who has available one bank account and n risky assets modeled by log-normal diffusions. The objective is to maximize the long-run average growth of wealth for a logarithmic utility function in the presence of proportional transaction costs. This problem is formulated as an ergodic singular stochastic control problem and interpreted as the l...

2011
Stéphane Ross Geoffrey J. Gordon J. Andrew Bagnell

Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches (Daumé III et al., 2009; Ross and Bagnell, 2010) provide stronger guarantees in this setting, but remain somewhat u...

Journal: :CoRR 2017
Alexander J. T. Gurney

Some iterative calculations can be carried out by parallel communicating processors, and yield the same results whether or not the processors are synchronized. We show that this is the case if and only if the iteration is a contraction that is strict on orbits, with respect to an ultrametric defined on the state space. The maximum number of independent processors is given by the dimension of th...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید