q policy

Letter to the Editor: Applications Air Q Model on Estimate Health Effects Exposure to Air Pollutants

Journal: Archives of Hygiene Sciences 2016

Elaheh Jame Porazmey, Gholamreza Goudarzi, Mohammad Javad Mohammadi, Sahar Geravandi,

Epidemiologic studies in worldwide have measured increases in mortality and morbidity associated with air pollution (1-3). Quantifying the effects of air pollution on the human health in urban area causes an increasingly critical component in policy discussion (4-6). Air Q model was proved to be a valid and reliable tool to predicts health effects related to criteria  pollutants (particula...

متن کامل

When the Best Move Isn't Optimal: Q-learning with Exploration

1994

George H. John

The most popular delayed reinforcement learning technique, Q-learning (Watkins 1989)) estimates the future reward expected from executing each action in every state. If these estimates are correct, then an agent can use them to select the action with maximal expected future reward in each state, and thus perform optimally. Watkins has proved that Q-learning produces an optimal policy (the funct...

متن کامل

A Deep Policy Inference Q-Network for Multi-Agent Systems

Journal: :CoRR 2017

Zhang-Wei Hong Shih-Yang Su Tzu-Yun Shann Yi-Hsiang Chang Chun-Yi Lee

We present DPIQN, a deep policy inference Qnetwork that targets multi-agent systems composed of controllable agents, collaborators, and opponents that interact with each other. We focus on one challenging issue in such systems— modeling agents with varying strategies—and propose to employ “policy features” learned from raw observations (e.g., raw images) of collaborators and opponents by inferr...

متن کامل

Continuous-Review Tracking Policies for Dynamic Control of Stochastic Networks

Journal: :Queueing Syst. 2003

Constantinos Maglaras

This paper is concerned with dynamic control of stochastic processing networks. Specifically, it follows the so called “heavy traffic approach,” where a Brownian approximating model is formulated, an associated Brownian optimal control problem is solved, the solution of which is then used to define an implementable policy for the original system. A major challenge is the step of policy translat...

متن کامل

Localizing Policy Gradient Estimates to Action

2007

Gregory Z. Grudic Lyle H. Ungar

Function Approximation (FA) representations of the state-action value function Q have been proposed in order to reduce variance in performance gradients estimates, and thereby improve performance of Policy Gradient (PG) reinforcement learning in large continuous domains (e.g., the PIFA algorithm of Sutton et al. (in press)). We show empirically that although PIFA converges signiicantly faster t...

متن کامل

Smoothed Action Value Functions for Learning Gaussian Policies

2018

Ofir Nachum Mohammad Norouzi George Tucker Dale Schuurmans

State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Qlearning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. ...

متن کامل

Localizing Policy Gradient Estimates to Action Transitions

2000

Gregory Z. Grudic Lyle H. Ungar

Function Approximation (FA) representations of the state-action value function Q have been proposed in order to reduce variance in performance gradients estimates, and thereby improve performance of Policy Gradient (PG) reinforcement learning in large continuous domains (e.g., the PIFA algorithm of Sutton et al. (in press)). We show empirically that although PIFA converges significantly faster ...

متن کامل

The Development of the Human Aspects of Information Security Questionnaire (HAIS-Q)

2013

Kathryn Parsons Agata McCormac Marcus Butavicius

The Human Aspects of Information Security Questionnaire (HAIS-Q) is being developed using a hybrid inductive, exploratory approach, for the purpose of evaluating information security threats caused by employees within organisations. This study reports on the conceptual development and pre-testing of the HAIS-Q. Results from 500 Australian employees were then used to examine the reliability of t...

متن کامل

Off-Policy Temporal Difference Learning with Function Approximation

2001

Doina Precup Richard S. Sutton Sanjoy Dasgupta

We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Offpolicy learning is of interest because it forms the basis for popular reinforcement learning methods such as Q-learning, which has been known to diverge with linear function approximation, and because it is critical to the practical utility of multi-scale, multi-goa...

متن کامل

Approximate Policy Iteration for Markov Control Revisited

2012

Abhijit Gosavi

Q-Learning is based on value iteration and remains the most popular choice for solving Markov Decision Problems (MDPs) via reinforcement learning (RL), where the goal is to bypass the transition probabilities of the MDP. Approximate policy iteration (API) is another RL technique, not as widely used as Q-Learning, based on modified policy iteration. In this paper, we present and analyze an API a...

متن کامل