نتایج جستجو برای: q policy
تعداد نتایج: 381585 فیلتر نتایج به سال:
Letter to the Editor: Applications Air Q Model on Estimate Health Effects Exposure to Air Pollutants
Epidemiologic studies in worldwide have measured increases in mortality and morbidity associated with air pollution (1-3). Quantifying the effects of air pollution on the human health in urban area causes an increasingly critical component in policy discussion (4-6). Air Q model was proved to be a valid and reliable tool to predicts health effects related to criteria pollutants (particula...
The most popular delayed reinforcement learning technique, Q-learning (Watkins 1989)) estimates the future reward expected from executing each action in every state. If these estimates are correct, then an agent can use them to select the action with maximal expected future reward in each state, and thus perform optimally. Watkins has proved that Q-learning produces an optimal policy (the funct...
We present DPIQN, a deep policy inference Qnetwork that targets multi-agent systems composed of controllable agents, collaborators, and opponents that interact with each other. We focus on one challenging issue in such systems— modeling agents with varying strategies—and propose to employ “policy features” learned from raw observations (e.g., raw images) of collaborators and opponents by inferr...
This paper is concerned with dynamic control of stochastic processing networks. Specifically, it follows the so called “heavy traffic approach,” where a Brownian approximating model is formulated, an associated Brownian optimal control problem is solved, the solution of which is then used to define an implementable policy for the original system. A major challenge is the step of policy translat...
Function Approximation (FA) representations of the state-action value function Q have been proposed in order to reduce variance in performance gradients estimates, and thereby improve performance of Policy Gradient (PG) reinforcement learning in large continuous domains (e.g., the PIFA algorithm of Sutton et al. (in press)). We show empirically that although PIFA converges signiicantly faster t...
State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Qlearning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. ...
Function Approximation (FA) representations of the state-action value function Q have been proposed in order to reduce variance in performance gradients estimates, and thereby improve performance of Policy Gradient (PG) reinforcement learning in large continuous domains (e.g., the PIFA algorithm of Sutton et al. (in press)). We show empirically that although PIFA converges significantly faster ...
The Human Aspects of Information Security Questionnaire (HAIS-Q) is being developed using a hybrid inductive, exploratory approach, for the purpose of evaluating information security threats caused by employees within organisations. This study reports on the conceptual development and pre-testing of the HAIS-Q. Results from 500 Australian employees were then used to examine the reliability of t...
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Offpolicy learning is of interest because it forms the basis for popular reinforcement learning methods such as Q-learning, which has been known to diverge with linear function approximation, and because it is critical to the practical utility of multi-scale, multi-goa...
Q-Learning is based on value iteration and remains the most popular choice for solving Markov Decision Problems (MDPs) via reinforcement learning (RL), where the goal is to bypass the transition probabilities of the MDP. Approximate policy iteration (API) is another RL technique, not as widely used as Q-Learning, based on modified policy iteration. In this paper, we present and analyze an API a...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید