mdp

Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning

Journal: :CoRR 2016

Yichen Chen Mengdi Wang

We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods use small storage and has low computational complexity p...

متن کامل

Restoration of the Mississippi Delta: lessons from Hurricanes Katrina and Rita.

Journal: :Science 2007

John W Day Donald F Boesch Ellis J Clairain G Paul Kemp Shirley B Laska William J Mitsch Kenneth Orth Hassan Mashriqui Denise J Reed Leonard Shabman Charles A Simenstad Bill J Streever Robert R Twilley Chester C Watson John T Wells Dennis F Whigham

Hurricanes Katrina and Rita showed the vulnerability of coastal communities and how human activities that caused deterioration of the Mississippi Deltaic Plain (MDP) exacerbated this vulnerability. The MDP formed by dynamic interactions between river and coast at various temporal and spatial scales, and human activity has reduced these interactions at all scales. Restoration efforts aim to re-e...

متن کامل

Call Admission Control and Routing for Integrated CBR/VBR and ABR Services: A Markov Decision Approach

1999

Ernst Nordström

In this paper we study the Call Admission Control (CAC) and routing issue for ATM networks which carry integrated CBR/VBR and ABR traffic. The integration of CBR/VBR and ABR traffic is assumed to be based on the max-min fairness criterion. The CAC and routing task is formulated as a Markov decision problem (MDP) where the objective is to maximize the revenue from carried calls. The MDP is solve...

متن کامل

Localization and a Distributed Local Optimal Solution Algorithm for a Class of Multi-Agent Markov Decision Processes

2003

Hyeong Soo Chang

We consider discrete-time factorial Markov Decision Processes (MDPs) in multiple decision-makers environment for infinite horizon average reward criterion with a general joint reward structure but a factorial joint state transition structure. We introduce the “localization” concept that a global MDP is localized for each agent such that each agent needs to consider a local MDP defined only with...

متن کامل

Lower Bound On the Computational Complexity of Discounted Markov Decision Problems

Journal: :CoRR 2017

Yichen Chen Mengdi Wang

We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space S and a finite action space A. We show that any randomized algorithm needs a running time at least Ω(|S||A|) to compute an -optimal policy with high probability. We consider two variants of the MDP where the input is given in specific data structures, including...

متن کامل

Identification of high density lipoprotein-binding proteins, including a glycosyl phosphatidylinositol-anchored membrane dipeptidase, in rat lung and type II pneumocytes.

Journal: :American journal of respiratory cell and molecular biology 2000

W Witt I Kolleck B Rüstow

Numerous communications have indicated that specific binding proteins for high density lipoprotein (HDL) exist in addition to the well characterized candidate HDL receptor SR-BI, but structural information was presented only in a few cases, and most of the work was aimed at the liver and steroidogenic glands. In this study, we purified two HDL-binding proteins by standard procedures from rat lu...

متن کامل

Containing Macrophage Activators Metastases following Intravenous Injection of Liposomes Involvement of Macrophages in the Eradication of Established

2006

I. J. Fidler Z. Barnes W. E. Fogler R. Kirsh P. Bugelski

Liposomes containing encapsulated lymphokines or muramyl dipeptide (MDP), when injected i.v. into C57BL/6 mice, pro duce significant destruction of established lung and lymph node mÃ©tastasesfrom a s.c. highly metastatic B16-BL6 mela noma. We present evidence that eradication of the mÃ©tastases is mediated by the activation of host macrophages to the tumoricidal state. Results from three separa...

متن کامل

Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial Puzzles*

Journal: :Applied Intelligence 2021

Spatial puzzles composed of rigid objects, flexible strings and holes offer interesting challenges for reasoning about spatial entities that are common in the human daily-life’s activities. This motivates use as domains study this work. The goal paper is to investigate automated solution kind problems by extending an algorithm combines Answer Set Programming (ASP) with Markov Decision Process (...

متن کامل

Learning Observation Models for Dialogue POMDPs

2012

Hamid R. Chinaei Brahim Chaib-draa Luc Lamontagne

The SmartWheeler project aims at developing an intelligent wheelchair for handicapped people. In this paper, we model the dialogue manager of SmartWheeler in MDP and POMDP frameworks using its collected dialogues. First, we learn the model components of the dialogue MDP based on our previous works. Then, we extend the dialogue MDP to a dialogue POMDP, by proposing two observation models learned...

متن کامل

Evidential Markov Decision Processes

2011

Hélène Soubaras Christophe Labreuche Pierre Savéant

This paper proposes a new model, the EMDP (Evidential Markov Decision Process). It is a MDP (Markov Decision Process) for belief functions in which rewards are defined for each state transition, like in a classical MDP, whereas the transitions are modeled as in an EMC (Evidential Markov Chain), i.e. they are sets transitions instead of states transitions. The EMDP can fit to more applications t...

متن کامل