نتایج جستجو برای: finite planning horizon

تعداد نتایج: 479354  

2016
Shuai Ma Jia Yuan Yu

This paper studies Value-at-Risk problems in finite-horizon Markov decision processes (MDPs) with finite state space and two forms of reward function. Firstly we study the effect of reward function on two criteria in a short-horizon MDP. Secondly, for long-horizon MDPs, we estimate the total reward distribution in a finite-horizon Markov chain (MC) with the help of spectral theory and the centr...

Journal: :Auton. Robots 2009
Ruben Martinez-Cantin Nando de Freitas Eric Brochu José A. Castellanos Arnaud Doucet

We address the problem of online path planning for optimal sensing with a mobile robot. The objective of the robot is to learn the most about its pose and the environment given time constraints. We use a POMDP with a utility function that depends on the belief state to model the finite horizon planning problem. We replan as the robot progresses throughout the environment. The POMDP is highdimen...

2012
Thomas Furmston David Barber

Parametric policy search algorithms are one of the methods of choice for the optimisation of Markov Decision Processes, with Expectation Maximisation and natural gradient ascent being popular methods in this field. In this article we provide a unifying perspective of these two algorithms by showing that their searchdirections in the parameter space are closely related to the search-direction of...

2005
Daniel Szer François Charpillet Shlomo Zilberstein

We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partiallyobservable Markov decision problems (DECPOMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multirobot coordination, network traffic control, or distributed...

2010
Tichakorn Wongpiromsarn Ufuk Topcu Necmiye Ozay Huan Xu Richard M. Murray

This paper describes TuLiP, a Python-based software toolbox for the synthesis of embedded control software that is provably correct with respect to an expressive subset of linear temporal logic (LTL) specifications. TuLiP combines routines for (1) finite state abstraction of control systems, (2) digital design synthesis from LTL specifications, and (3) receding horizon planning. The underlying ...

Journal: :Robotics and Autonomous Systems 2016
Mikko Lauri Risto Ritala

A robotic agent is tasked to explore an a priori unknown environment. The objective is to maximize the amount of information about the partially observable state. The problem is formulated as a partially observable Markov decision process (POMDP) with an informationtheoretic objective function, further approximated to a form suitable for robotic exploration. An open-loop approximation is applie...

2006
Hongyan Li Joern Meissner

Lot-sizing and capacity planning are important supply chain decisions, and competition and cooperation affect the performance of these decisions. In this paper, we look into the dynamic lot sizing and resource competition problem of an industry consisting of multiple firms. A capacity competition model combining the complexity of time-varying demand with cost functions and economies os scale ar...

Journal: :Communications in computer and information science 2021

Previous work on planning as active inference addresses finite horizon problems and solutions valid for online planning. We propose solving the general Stochastic Shortest-Path Markov Decision Process (SSP MDP) probabilistic inference. Furthermore, we discuss offline methods under uncertainty. In an SSP MDP, is indefinite unknown a priori. MDPs generalize infinite are widely used in artificial ...

2014
Heather Brown Marjon van der Pol

Evidence suggests that maternal and offspring smoking behaviour is correlated. Little is known about the mechanisms through which this intergenerational transfer occurs. This paper explores the role of time preferences. Although time preference is likely to be heritable and correlated with health investments, its role in the intergenerational transmission of smoking has not been explored previo...

2003
Michael O. Duff

Given a Markov decision process (MDP) with expressed prior uncertainties in the process transition probabilities, we consider the problem of computing a policy that optimizes expected total (finite-horizon) reward. Implicitly, such a policy would effectively resolve the "exploration-versus-exploitation tradeoff" faced, for example, by an agent that seeks to optimize total reinforcement obtained...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید