policy iterations

Composition of Web Services Using Markov Decision Processes and Dynamic Programming

2015

Víctor Uc-Cetina Francisco Moo-Mena Rafael Hernandez-Ucan

We propose a Markov decision process model for solving the Web service composition (WSC) problem. Iterative policy evaluation, value iteration, and policy iteration algorithms are used to experimentally validate our approach, with artificial and real data. The experimental results show the reliability of the model and the methods employed, with policy iteration being the best one in terms of th...

متن کامل

Ensemble Usage for More Reliable Policy Identification in Reinforcement Learning

2011

Alexander Hans Steffen Udluft

Reinforcement learning (RL) methods employing powerful function approximators like neural networks have become an interesting approach for optimal control. Since they learn a policy from observations, they are also applicable when no analytical description of the system is available. Although impressive results have been reported, their handling in practice is still hard, as they can fail at re...

متن کامل

Optimal supervisory control of finite state automata

2004

ASOK RAY JINBO FU

This paper presents optimal supervisory control of dynamical systems that can be represented by deterministic finite state automaton (DFSA) models. The performance index for the optimal policy is obtained by combining a measure of the supervised plant language with (possible) penalty on disabling of controllable events. The signed real measure quantifies the behaviour of controlled sublanguages...

متن کامل

Edge Artifacts in Point Spread Function-based PET Reconstruction in Relation to Object Size and Reconstruction Parameters

Journal: Asia Oceania Journal of Nuclear Medicine and Biology 2017

Kazuhiko Himuro, Masayuki Sasaki, Shingo Baba, Shinichi Awamoto Yoshiyuki Umezu Yuji Tsutsui,

Objective(s): We evaluated edge artifacts in relation to phantom diameter and reconstruction parameters in point spread function (PSF)-based positron emission tomography (PET) image reconstruction.Methods: PET data were acquired from an original cone-shaped phantom filled with 18F solution (21.9 kBq/mL) for 10 min using a Biograph mCT scanner. The images were reconstructed using the baseline or...

متن کامل

Models and algorithms for skip-free Markov decision processes on trees

Journal: :JORS 2015

Edmund J. Collins

We introduce a class of models for multidimensional control problems which we call skip-free Markov decision processes on trees. We describe and analyse an algorithm applicable to Markov decision processes of this type that are skip-free in the negative direction. Starting with the finite average cost case, we show that the algorithm combines the advantages of both value iteration and policy it...

متن کامل

Robust optimal control of regular languages

Journal: :Automatica 2005

Constantino M. Lagoa Jinbo Fu Asok Ray

7 This paper presents an algorithm for robust optimal control of regular languages under specified uncertainty bounds on the event cost parameters of the language measure that has been recently reported in literature. The performance index for the proposed robust optimal 9 policy is obtained by combining the measure of the supervised plant language with uncertainty. The performance of a control...

متن کامل

Program Analysis with Local Policy Iteration

2016

Egor George Karpenkov David Monniaux Philipp Wendler

We present local policy iteration (LPI), a new algorithm for deriving numerical invariants that combines the precision of max-policy iteration with the flexibility and scalability of conventional Kleene iterations. It is defined in the Configurable Program Analysis (CPA) framework, thus allowing inter-analysis communication. LPI uses adjustable-block encoding in order to traverse loop-free prog...

متن کامل

Dynamic Optimization of Long-term Growth Rate for a Portfolio with Transaction Costs and Logarithmic Utility

2001

Marianne Akian Agnès Sulem Michael I. Taksar

We study the optimal investment policy for an investor who has available one bank account and n risky assets modeled by log-normal diffusions. The objective is to maximize the long-run average growth of wealth for a logarithmic utility function in the presence of proportional transaction costs. This problem is formulated as an ergodic singular stochastic control problem and interpreted as the l...

متن کامل

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

2011

Stéphane Ross Geoffrey J. Gordon J. Andrew Bagnell

Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches (Daumé III et al., 2009; Ross and Bagnell, 2010) provide stronger guarantees in this setting, but remain somewhat u...

متن کامل

Asynchronous iterations in ultrametric spaces

Journal: :CoRR 2017

Alexander J. T. Gurney

Some iterative calculations can be carried out by parallel communicating processors, and yield the same results whether or not the processors are synchronized. We show that this is the case if and only if the iteration is a contraction that is strict on orbits, with respect to an ultrametric defined on the state space. The maximum number of independent processors is given by the dimension of th...

متن کامل