نتایج جستجو برای: q value

تعداد نتایج: 842664  

1994
George H. John

The most popular delayed reinforcement learning technique, Q-learning (Watkins 1989)) estimates the future reward expected from executing each action in every state. If these estimates are correct, then an agent can use them to select the action with maximal expected future reward in each state, and thus perform optimally. Watkins has proved that Q-learning produces an optimal policy (the funct...

Journal: :Automatica 2008
Shalabh Bhatnagar K. Mohan Babu

We propose two algorithms for Q-learning that use the two timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state-action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments usin...

2007
Liling Fu Suiqiong Li Kewei Zhang I-Hsuan Chen Valery. A. Petrenko Zhongyang Cheng

The magnetostrictive microcantilever (MSMC) as a high-performance transducer was introduced for the development of biosensors. The principle and characterization of MSMC are presented. The MSMC is wireless and can be easily actuated and sensed using magnetic field/signal. More importantly, the MSMC exhibits a high Q value and works well in liquid. The resonance behavior of MSMC is characterized...

2002
Abhijit Gosavi

We present a variant of the relative value iteration algorithm for solving average reward Markov decision problems. We also present its simulation-based counterpart, which is also called Q-Learning.

2015
Ying Qu Mengru Li Lingling Guo

Being an important aspect of sustainable development, sustainable consumption has attracted great attention among Chinese politicians and academia, and Chinese governments have established policies that encourage sustainable consumption behaviors. However, unsustainable consumption behavior still remains predominant in China. This paper aims to classify consumers with similar traits, in terms o...

1994
Minoru Asada Eiji Uchibe Shoichi Noda Sukoya Tawaratsumida Koh Hosoda

A method is proposed which accomplishes a whole task consisting of plural subtasks by coordinating multiple behaviors acquired by a vision-based reinforcement learning. First, individual behaviors which achieve the corresponding subtasks are independently acquired by Q-learning, a widely used reinforcement learning method. Each learned behavior can be represented by an action-value function in ...

Journal: :Informatics in Education 2007
Dalia Baziukaite

We investigate the possibility to apply a known machine learning algorithm of Q-learning in the domain of a Virtual Learning Environment (VLE). It is important in this problem domain to have algorithms that learn their optimal values in a rather short time expressed in terms of the iteration number. The problem domain is a VLE in which an agent plays a role of the teacher. With time it moves to...

Journal: :CoRR 2017
John Schulman Pieter Abbeel Xi Chen

Two of the leading approaches for model-free reinforcement learning are policy gradient methods and Q-learning methods. Q-learning methods can be effective and sample-efficient when they work, however, it is not well-understood why they work, since empirically, the Q-values they estimate are very inaccurate. A partial explanation may be that Q-learning methods are secretly implementing policy g...

1997
Michael Sullivan

and |g′′(x)| = M. We will show that |f(x)| ≤ g(x) on [0, 1]. The desired conclusion follows. Suppose however that this is false, that there is a number q ∈ [0, 1] for which f(q) > g(q). (The other case, f(q) < −g(q), is similar.) Our strategy will be to show that there are real numbers s and t with s < t such that f ′(s) > g′(s) and f ′(t) < g′(t). See figure. We will then apply the Mean Value ...

2018
Tomasz Kociumaka Jakub Radoszewski Wojciech Rytter Tomasz Walen

We investigate the function L(h, p, q), called here the threshold function, related to periodicity of partial words (words with holes). The value L(h, p, q) is defined as the minimum length threshold which guarantees that a natural extension of the periodicity lemma is valid for partial words with h holes and (strong) periods p, q. We show how to evaluate the threshold function in O(log p + log...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید