reactive policies

Importance Sampling Estimates for Policies with Memory

2001

Christian R. Shelton

Importance sampling has recently become a popular method for computing off-policy Monte Carlo estimates of returns. It has been known that importance sampling ratios can be computed for POMDPs when the sampled and target policies are both reactive (memoryless). We extend that result to show how they can also be efficiently computed for policies with memory state (finite state controllers) witho...

متن کامل

حذف رنگ های راکتیو از پسآب های نساجی با استفاده از نانولوله های کربنی چندلایه ای اصلاح شده با نانوذرات مغناطیسی fe3o4

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه سمنان - دانشکده شیمی 1393

محمدباقر رحیمی, نقی سعادتجو, فیروزه نعمتی,

در کار حاضر، سه رنگ آنیونی reactive yellow 145,reactive blue19 و reactive red195 برای حذف انتخاب شدند. روش حذف به کار برده شده، با استفاده از جاذب نانولوله¬های کربنی اصلاح شده با نانوذرات مغناطیسی fe3o4 انجام شد. مشخصات نانوذرات آماده شده توسط tem,xrdو vsm تعین گردید. جاذب مغناطیس آماده شده می¬تواند بخوبی در آب حل شده و به آسانی توسط آهنربا از محیط جدا می¬گردد. متغیر¬هایی که معمولا بر کارایی فر...

HQ-Learning

Journal: :Adaptive Behaviour 1997

Marco Wiering Jürgen Schmidhuber

HQ-learning is a hierarchical extension of Q()-learning designed to solve certain types of partially observable Markov decision problems (POMDPs). HQ automatically decomposes POMDPs into sequences of simpler subtasks that can be solved by memoryless policies learn-able by reactive subagents. HQ can solve partially observable mazes with more states than those used in most previous POMDP work.

متن کامل

Learning Dynamic Policies from Demonstration

2013

Byron Boots Dieter Fox

We address the problem of learning a policy directly from expert demonstrations. Typically, this problem is solved with a supervised learning method such as regression or classification to learn a reactive policy. Unfortunately, reactive policies lack the ability to model long-range dependancies and this omission can result in suboptimal performance. So, we take a different approach. We observe...

متن کامل

A Reinforcement-Learning Approach to Reactive Control Policy Design for Autonomous Robots

1994

Andrew H. Fagg David Lotspeich George A. Bekey

Within the field of robotics, much recent attention has been given to control techniques that have been termed reactive or behavior-based. The design of such control systems for even a remotely interesting task is typically a laborious effort, requiring many hours of experimental "tweaking" as the actual behavior of the system is observed by the system designer. In this paper, we present a neur...

متن کامل

Comparing Simulation Output Accuracy of Discrete Event and Agent Based Models: A Quantitative Approach

2009

Mazlina Abdul Majid Uwe Aickelin

In our research we investigate the output accuracy of discrete event simulation models and agent based simulation models when studying human centric complex systems. In this paper we focus on human reactive behaviour as it is possible in both modelling approaches to implement human reactive behaviour in the model by using standard methods. As a case study we have chosen the retail sector, and h...

متن کامل

Comparing Simulation Output Accuracy of Discrete Event and Agent Based Models: A Quantitive Approach

Journal: :CoRR 2010

Mazlina Abdul Majid Uwe Aickelin Peer-Olaf Siebers

In our research we investigate the output accuracy of discrete event simulation models and agent based simulation models when studying human centric complex systems. In this paper we focus on human reactive behaviour as it is possible in both modelling approaches to implement human reactive behaviour in the model by using standard methods. As a case study we have chosen the retail sector, and h...

متن کامل

Governance and the Gulf of Mexico Coast: How Are Current Policies Contributing to Sustainability?

2013

Stephen Jordan William Benson

The quality of life and economies of coastal communities depend, to a great degree, on the ecological integrity of coastal ecosystems. Paradoxically, as more people are drawn to the coasts, these ecosystems and the services they provide are increasingly stressed by development and human use. Employing the coastal Gulf of Mexico as an example, we explore through three case studies how government...

متن کامل

Internal State GPOMDP with Trace Filtering

2007

Douglas Aberdeen Jonathan Baxter Peter L. Bartlett

GPOMDP is an algorithm for estimating the gradient of the average reward for arbitrary Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies. It applies to purely reactive (memoryless) policies, or policies that generate actions as a function of finite histories of observations. Based on the fact that maintenance of a belief state is sufficient ...

متن کامل

Learning Policies for Embodied Virtual Agents through Demonstration

2007

Jonathan Dinerstein Parris K. Egbert Dan Ventura

Although many powerful AI and machine learning techniques exist, it remains difficult to quickly create AI for embodied virtual agents that produces visually lifelike behavior. This is important for applications (e.g., games, simulators, interactive displays) where an agent must behave in a manner that appears human-like. We present a novel technique for learning reactive policies that mimic de...

متن کامل