نتایج جستجو برای: action value function

تعداد نتایج: 2342819  

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه سیستان و بلوچستان 1390

the changes in todays world organization, to the extent that instability can be characterized with the most stable organizations called this afternoon. if you ever change management component, an additional value to the organization was considered, today, these elements become the foundation the organization is survival. the definition of the entrepreneur to identify opportunities to exploit a...

2004
Craig Boutilier

Markov decision processes (MDPs) have become the de facto standard model for decision-theoretic planning problems. However, classic dynamic programming algorithms for MDPs [22] require explicit state and action enumeration. For example, the classical representation of a value function is a table or vector associating a value with each system state; such value functions are produced by iterating...

Journal: :CoRR 2017
Seyed Sajad Mousavi Michael Schukat Peter Corcoran Enda Howley

Recent advances in combining deep neural network architectures with reinforcement learning techniques have shown promising potential results in solving complex control problems with high dimensional state and action spaces. Inspired by these successes, in this paper, we build two kinds of reinforcement learning algorithms: deep policy-gradient and value-function based agents which can predict t...

Journal: :Japanese Sociological Review 1974

2008
KEVIN ROSS

A singular stochastic control problem with state constraints in twodimensions is studied. We show that the value function is C1 and its directional derivatives are the value functions of certain optimal stopping problems. Guided by the optimal stopping problem, we then introduce the associated no-action region and the free boundary and show that, under appropriate conditions, an optimally contr...

2014
Bilal Piot Matthieu Geist Olivier Pietquin

Large Markov Decision Processes are usually solved using Approximate Dynamic Programming methods such as Approximate Value Iteration or Approximate Policy Iteration. The main contribution of this paper is to show that, alternatively, the optimal state-action value function can be estimated using Difference of Convex functions (DC) Programming. To do so, we study the minimization of a norm of th...

2008
Jun Ma Warren B. Powell

Most of the current theory for dynamic programming algorithms focuses on finite state, finite action Markov decision problems, with a paucity of theory for the convergence of approximation algorithms with continuous states. In this paper we propose a policy iteration algorithm for infinite-horizon Markov decision problems where the state and action spaces are continuous and the expectation cann...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید