نتایج جستجو برای: markov decision process graph theory

تعداد نتایج: 2385831  

2017
GARY J. KOEHLER

— In this paper we present a generalized Markov décision process that subsumes the traditional discounted, infinité horizon, finite state and action Markov décision process, VeinotCs discountéd décision processes, and Koehler's generalization of these two problem classes. Résumé. — Nous présentons dans cet article un processus de Markov généralisé qui englobe le processus de décision markovien ...

Journal: :Transactions of the Society of Instrument and Control Engineers 1967

Journal: :Journal of Machine Learning Research 2015
Yangbo He Jinzhu Jia Bin Yu

When learning a directed acyclic graph (DAG) model via observational data, one generally cannot identify the underlying DAG, but can potentially obtain a Markov equivalence class. The size (the number of DAGs) of a Markov equivalence class is crucial to infer causal effects or to learn the exact causal DAG via further interventions. Given a set of Markov equivalence classes, the distribution of...

2010
Itamar Arel Andrew S. Davis

This paper presents a formalism for determining the episode duration distribution in fixed-policy Markov decision processes (MDP). To achieve this goal, we borrow the notion of obtaining the n-step first visit probability from queuing theory, apply it to a Markov chain derived from the MDP, and arrive at the distribution of the episode durations between any two arbitrary states. We illustrate t...

Journal: :Universität Trier, Mathematik/Informatik, Forschungsbericht 2000
Lothar Breuer

In 1995, Pacheco and Prabhu introduced the class of so–called Markov–additive processes of arrivals in order to provide a general class of arrival processes for queueing theory. In this paper, the above class is generalized considerably, including time–inhomogeneous arrival rates, general phase spaces and the arrival space being a general vector space (instead of the finite–dimensional Euclidea...

Journal: :Math. Oper. Res. 2004
Vladimir Ejov Jerzy A. Filar Minh-Tuan Nguyen

We consider the Hamiltonian cycle problem embedded in a singularly perturbed Markov decision process. We also consider a functional on the space of deterministic policies of the process that consists of the (1,1)-entry of the fundamental matrices of the Markov chains induced by the same policies. We show that when the perturbation parameter, ε, is less than or equal to 1 N2 the Hamiltonian cycl...

2010
Gergely Neu András György Csaba Szepesvári

We consider a stochastic extension of the loop-free shortest path problem with adversarial rewards. In this episodic Markov decision problem an agent traverses through an acyclic graph with random transitions: at each step of an episode the agent chooses an action, receives some reward, and arrives at a random next state, where the reward and the distribution of the next state depend on the act...

Journal: :journal of computer and robotics 0
samaneh assar faculty of computer and information technology engineering, qazvin branch, islamic azad university, qazvin, iran behrooz masoumi faculty of computer and information technology engineering, qazvin branch, islamic azad university, qazvin, iran

multi agent markov decision processes (mmdps), as the generalization of markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for multi agent reinforcement learning. in this paper, a generalized learning automata based algorithm for finding optimal policies in mmdp is proposed. in the proposed algorithm, mmdp ...

2011
David C. Parkes Ariel D. Procaccia

Social choice theory provides insights into a variety of collective decision making settings, but nowadays some of its tenets are challenged by Internet environments, which call for dynamic decision making under constantly changing preferences. In this paper we model the problem via Markov decision processes (MDP), where the states of the MDP coincide with preference profiles and a (determinist...

2014
Krishnendu Chatterjee Martin Chmelik Ayush Kanodia

– Visualize the textual input POMDP using the DOT language. – Reduce a POMDP with a parity objective, to an equivalent POMDP with a coBüchi objective (a parity objective with only 2 priorities), and visualize the POMDP. – Given a POMDP, with a coBüchi objective, it constructs its belief-observation POMDP Ĝ, and visualizes it. – Given the belief-observation POMDP Ĝ, if there exists an almost-sur...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید