Steady-State Planning in Expected Reward Multichain MDPs

نویسندگان

چکیده

The planning domain has experienced increased interest in the formal synthesis of decision-making policies. This typically entails finding a policy which satisfies specifications form some well-defined logic. While many such logics have been proposed with varying degrees expressiveness and complexity their capacity to capture desirable agent behavior, value is limited when deriving policies satisfy certain types asymptotic behavior general system models. In particular, we are interested specifying constraints on steady-state an agent, captures proportion time spends each state as it interacts for indefinite period its environment. sometimes called average or expected associated problem faced significant challenges unless strong restrictions imposed underlying model terms connectivity graph structure. this paper, explore that consists satisfied. A linear programming solution case multichain Markov Decision Processes (MDPs) prove optimal solutions programs yield stationary rigorous guarantees behavior.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Total Expected Discounted Reward MDPs: Existence of Optimal Policies

This article describes the results on the existence of optimal and nearly optimal policies for Markov Decision Processes (MDPs) with total expected discounted rewards. The problem of optimization of total expected discounted rewards for MDPs is also known under the name of discounted dynamic programming.

متن کامل

Exploiting separability in multiagent planning with continuous-state MDPs

Recent years have seen significant advances in techniques for optimally solving multiagent problems represented as decentralized partially observable Markov decision processes (Dec-POMDPs). A new method achieves scalability gains by converting Dec-POMDPs into continuous state MDPs. This method relies on the assumption of a centralized planning phase that generates a set of decentralized policie...

متن کامل

Steady state behavior and maintenance planning of bleaching system in a paper plant

This paper presents the steady state behavior and maintenance planning of the bleaching system in a paper plant. The paper plant comprises of various systems including feeding, chipping, digesting, washing, bleaching, screening, stock preparation and paper making, etc. One of the most important functionaries of a paper plant, on which quality of paper depends, is the bleaching system, where rem...

متن کامل

Steady-state analysis of shortest expected delay routing

We consider a queueing system consisting of two non-identical exponential servers, where each server has its own dedicated queue and serves the customers in that queue FCFS. Customers arrive according to a Poisson process and join the queue promising the shortest expected delay, which is a natural and near-optimal policy for systems with non-identical servers. This system can be modeled as an i...

متن کامل

Robust Online Optimization of Reward-Uncertain MDPs

Imprecise-reward Markov decision processes (IRMDPs) are MDPs in which the reward function is only partially specified (e.g., by some elicitation process). Recent work using minimax regret to solve IRMDPs has shown, despite their theoretical intractability, how the set of policies that are nondominated w.r.t. reward uncertainty can be exploited to accelerate regret computation. However, the numb...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Artificial Intelligence Research

سال: 2021

ISSN: ['1076-9757', '1943-5037']

DOI: https://doi.org/10.1613/jair.1.12611