نتایج جستجو برای: action value function

تعداد نتایج: 2342819  

Journal: :desert 2013
m. karimpour reihan s. feiznia a. salehpour jam m.k. kianian

investigation of desertification trend needs understanding of phenomena creating changes singly or action and reaction together in the manner that these changes were ended up in land degradation. in investigation of pedological criterion onland degradation in quaternary rock units, first, a part of the rude-shoor watershed area was selected. after distinguishing target area, maps of slope class...

1999
Stuart I. Reynolds

Reinforcement learning agents attempt to learn and construct a decision policy which maximises some reward signal. In turn, this policy is directly derived from long-term value estimates of state-action pairs. In environments with real-valued state-spaces, however, it is impossible to enumerate the value of every state-action pair, necessitating the use of a function approximator in order to in...

2012
A. Al-Tamimi

Convergence is proven of the value-iteration-based algorithm to find the optimal controller in the case of general non-affine in input nonlinear systems. That is, it is shown that algorithm converges to the optimal control and the optimal value function. It is assumed that at each iteration the value and action update equations can be exactly solved. Then two standard neural networks (NN) are u...

2006
Sébastien Jodogne Justus H. Piater

We target the problem of closed-loop learning of control policies that map visual percepts to continuous actions. Our algorithm, called Reinforcement Learning of Joint Classes (RLJC), adaptively discretizes the joint space of visual percepts and continuous actions. In a sequence of attempts to remove perceptual aliasing, it incrementally builds a decision tree that applies tests either in the i...

2010
Jan Hendrik Metzen Frank Kirchner

Scaling Reinforcement Learning (RL) to real-world problems with continuous state and action spaces remains a challenge. This is partly due to the reason that the optimal value function can become quite complex in continuous domains. In this paper, we propose to avoid learning the optimal value function at all but to use direct policy search methods in combination with model-based RL instead.

Journal: :Journal of Machine Learning Research 2006
Jelle R. Kok Nikos A. Vlassis

In this article we describe a set of scalable techniques for learning the behavior of a group of agents in a collaborative multiagent setting. As a basis we use the framework of coordination graphs of Guestrin, Koller, and Parr (2002a) which exploits the dependencies between agents to decompose the global payoff function into a sum of local terms. First, we deal with the single-state case and d...

Journal: :Neuron 2012
Moonsang Seo Eunjeong Lee Bruno B. Averbeck

The role that frontal-striatal circuits play in normal behavior remains unclear. Two of the leading hypotheses suggest that these circuits are important for action selection or reinforcement learning. To examine these hypotheses, we carried out an experiment in which monkeys had to select actions in two different task conditions. In the first (random) condition, actions were selected on the bas...

Journal: :international journal of mathematical modelling and computations 0
j. rashidinia department of mathematics, islamic azad university,central tehran branch, iran iran, islamic republic of n. taher iran, islamic republic of

in this work, we study the performance of the sinc-collocation method for solving bratu's problem. for different choices of step size, we consider the maximum absolute errors in the solutions at sinc grid points and tabulated in tables. the comparison of the obtained results veri ed that this method converges to the exact solution rapidly and with

Journal: :computational methods for differential equations 0
kamal shah university of malakand salman zeb department of mathematics university of malakand rahmat ali khan dean of science university of malakand

this article is devoted to the study of existence and multiplicity of positive solutions to aclass of nonlinear fractional order multi-point boundary value problems of the type−dq0+u(t) = f(t, u(t)), 1 < q ≤ 2, 0 < t < 1,u(0) = 0, u(1) =m−2∑ i=1δiu(ηi),where dq0+ represents standard riemann-liouville fractional derivative, δi, ηi ∈ (0, 1) withm−2∑i=1δiηi q−1 < 1, and f : [0, 1] × [0, ∞) → [0, ∞...

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه پیام نور - دانشگاه پیام نور استان تهران - دانشکده حقوق 1389

abstract the third millennium has started, but the world is facing with serious challenges in achieving international security and peace. various human rights violations have lead the states to find means to protect human rights. also article 55 of the united nations charter introduces the respect to human rights and fundamental freedom as the most suitable ways to realize peace and security. ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید