q algorithm

Convergence of a Q-learning Variant for Continuous States and Actions

Journal: :J. Artif. Intell. Res. 2014

S. W. Carden

This paper presents a reinforcement learning algorithm for solving infinite horizon Markov Decision Processes under the expected total discounted reward criterion when both the state and action spaces are continuous. This algorithm is based on Watkins’ Q-learning, but uses Nadaraya-Watson kernel smoothing to generalize knowledge to unvisited states. As expected, continuity conditions must be im...

متن کامل

Empirical Q-Value Iteration

Journal: :CoRR 2014

Dileep M. Kalathil Vivek S. Borkar Rahul Jain

We propose a new simple and natural algorithm for learning the optimal Q-value function of a discounted-cost Markov Decision Process (MDP) when the transition kernels are unknown. Unlike the classical learning algorithms for MDPs, such as Q-learning and ‘actor-critic’ algorithms, this algorithm doesn’t depend on a stochastic approximation-based method. We show that our algorithm, which we call ...

متن کامل

Isomorphisms of Algebraic Number Fields par

2012

Mark van Hoeij Vivek Pal

Let Q(α) and Q(β) be algebraic number fields. We describe a new method to find (if they exist) all isomorphisms, Q(β) → Q(α). The algorithm is particularly efficient if there is only one isomorphism.

متن کامل

Soft Decision Decoding Algorithm of Reed-Solomon Codes

2002

Emmanuelle Delpeyroux Jérôme Lacan

In this paper, it is proposed a new soft decoding algorithm of the the q-ary image of some q-ary Reed-Solomon Codes. This algorithm uses some permutations to improve the performances of an usual soft decision decoding.

متن کامل

Using number fields to compute logarithms in finite fields

Journal: :Math. Comput. 2000

Oliver Schirokauer

We describe an adaptation of the number field sieve to the problem of computing logarithms in a finite field. We conjecture that the running time of the algorithm, when restricted to finite fields of an arbitrary but fixed degree, is Lq[1/3; (64/9)1/3 + o(1)], where q is the cardinality of the field, Lq [s; c] = exp(c(log q)s(log log q)1−s), and the o(1) is for q →∞. The number field sieve fact...

متن کامل

Isomorphisms of Algebraic Number Fields

Journal: :CoRR 2010

Mark van Hoeij Vivek Pal

Let Q(α) and Q(β) be algebraic number fields. We describe a new method to find (if they exist) all isomorphisms, Q(β) → Q(α). The algorithm is particularly efficient if there is only one isomorphism.

متن کامل

Optimal Parallel algorithm for String Matching on Mesh Network Structure

2006

S. Viswanadha Raju

In this paper we consider the problem of string matching algorithm based on a two-dimensional mesh. This has applications such as string databases, cellular automata and computational biology. The main use of this method is to reduce the time spent on comparisons in string matching by using mesh connected network which achieves a constant time for mismatch a text string and we obtained O(¥. -ti...

متن کامل

A new algorithm for directed quantum search

2005

Tathagat Tulsi Lov Grover Apoorva Patel

Quantum searching requires precise knowledge of problem parameters (such as the fraction of target states) for efficient operation. Recently an algorithm has been discovered, referred to as the Phase-π/3 search algorithm, which gets around this limitation. This algorithm can search a database with the fraction of target states equal to 1 − ǫ so that in q queries it produces a probability of err...

متن کامل

user-based vehicle route guidance in urban networks based on intelligent multi agents systems and the ant-q algorithm

Journal: :international journal of transportation engineering 0

alireza eydi university of kurdistan susan panahi tarbiat modares university isa nakhai university of kurdistan

guiding vehicles to their destination under dynamic traffic conditions is an important topic in the field of intelligent transportation systems (its). nowadays, many complex systems can be controlled by using multi agent systems. adaptation with the current condition is an important feature of the agents. in this research, formulation of dynamic guidance for vehicles has been investigated based...

متن کامل

Two-Timescale Q-Learning with an Application to Routing in Communication Networks

2006

Mohan Babu Shalabh Bhatnagar

We propose two variants of the Q-learning algorithm that (both) use two timescales. One of these updates Q-values of all feasible state-action pairs at each instant while the other updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A sketch of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms fo...

متن کامل