q) policy

Tie Me Up!: An Empirical Investigation of Perceived Tie Characteristics on Prospective Connections

2009

Dominic Thomas Elliot Bendoly Monica Capra

How do social networks motivate people to connect not only to their previously existing friends but also to novel or blind new contacts? We report the results of an experiment to identify the value that participants give to alternative network characteristics when deciding to connect to a social network. We focus on network tie characteristics because they represent information that potentially...

متن کامل

designing a procurement mechanism based on q-learning with an action-selection policy based on pso algorithm

Journal: :مدیریت زنجیره تأمین 0

زهره کاهه رضا برادران کاظم زاده

in this paper, tender problems in an automobile company for procuring needed items from potential suppliers have been resolved by the learning algorithm q. in this case the purchaser with respect to proposals received from potential providers, including price and delivery time is proposed; order the needed parts to suppliers assigns. the buyer’s objective is minimizing the procurement costs thr...

متن کامل

The Implication Problem of Computing Policies

2015

Rezwana Reaz Muqeet Ali Mohamed G. Gouda Marijn Heule Ehab S. Elmallah

A computing policy is a sequence of rules, where each rule consists of a predicate and an action, and where each action is either “accept” or “reject”. A policy P is said to accept (or reject, respectively) a request iff the action of the first rule in P , that is matched by the request is “accept” (or “reject”, respectively). A pair of policies (P , Q) is called an accept-implication pair iff ...

متن کامل

Convergence of Optimistic and Incremental Q-Learning

2001

Eyal Even-Dar Yishay Mansour

Yishay Mansourt Vie sho,v the convergence of tV/O deterministic variants of Qlearning. The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an Eoptimal policy. The second is a new and novel algo...

متن کامل

Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

Journal: :CoRR 2016

Shixiang Gu Timothy P. Lillicrap Zoubin Ghahramani Richard E. Turner Sergey Levine

Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is the high sample complexity of such methods. Unbiased batch policy-gradient methods offer stable learning, but at the cost of high variance, which often requires large batches, while TD-style methods, such as off-policy act...

متن کامل

The Q(s,S) control policy for the joint replenishment problem extended to the case of correlation among item-demands

2007

Christian Larsen

We develop an algorithm to compute an optimal Q(s,S) policy for the joint replenishment problem when demands follow a compound correlated Poisson process. It is a non-trivial generalization of the work by Nielsen and Larsen (2005). We make some numerical analyses on two-item problems where we compare the optimal Q(s,S) policy to the optimal uncoordinated (s,S) policies. The results indicate tha...

متن کامل

Combining policy gradient and Q-learning

2016

Brendan O'Donoghue Remi Munos Koray Kavukcuoglu Volodymyr Mnih

Policy gradient is an efficient technique for improving a policy in a reinforcement learning setting. However, vanilla online variants are on-policy only and not able to take advantage of off-policy data. In this paper we describe a new technique that combines policy gradient with off-policy Q-learning, drawing experience from a replay buffer. This is motivated by making a connection between th...

متن کامل

Equivalence Between Policy Gradients and Soft Q-Learning

Journal: :CoRR 2017

John Schulman Pieter Abbeel Xi Chen

Two of the leading approaches for model-free reinforcement learning are policy gradient methods and Q-learning methods. Q-learning methods can be effective and sample-efficient when they work, however, it is not well-understood why they work, since empirically, the Q-values they estimate are very inaccurate. A partial explanation may be that Q-learning methods are secretly implementing policy g...

متن کامل

Simple Policies for Joint Replenishment Can Perform Badly

2012

Christopher R. Dance Onno R. Zoeter Haengju Lee

We consider the stochastic joint replenishment problem in which several items must be ordered in the face of stochastic demand. Previous authors proposed multiple heuristic policies for this economically-important problem. We show that several such policies are not good approximations to an optimal policy, since as some items grow more expensive than others, the cost rate of the heuristic polic...

متن کامل

Informed Initial Policies for Learning in Dec-POMDPs

2012

Landon Kraemer Bikramjit Banerjee

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multi-agent systems where agents operate with noisy sensors and actuators and local information. While many techniques have been developed for solving DecPOMDPs exactly and approximately, they have been primarily centralized and reliant on knowledge of the model parameters....

متن کامل