q policy

International Journal of Management and Transformation: Vol.5, No.2

2012

Nita H. Shah

In this study, the effect of a temporary price discount for a larger order than the regular order offered by the supplier on the retailer’s ordering policy is studied. The demand is assumed to be stock-dependent. The units in inventory deteriorate at a constant rate. This study will help the retailer to take the decision whether to adopt a regular or special order policy. The optimum special qu...

متن کامل

finding the cost of inventory in make to order supply chain under vendor managed inventory program

Journal: :مدیریت صنعتی 0

رسول حجی دانشگاه صنعتی شریف محمد محسن معارف دوست دانشگاه صنعتی شریف سید بابک ابراهیمی دانشگاه صنعتی شریف

this study considers a two echelon supply chain system consisting of one manufacturer and one retailer. under vendor managed inventory (vmi) program, the manufacturer is authorized to manage inventories of agreed upon stock keeping units at retail's locations. in this paper, we assume that the manufacturer monitors inventory levels at retailer's location, replenishes her stock under (...

متن کامل

Computer Science Technical Report Approximating a Policy Can be Easier Than Approximating a Value Function

2000

Charles W. Anderson

Value functions can speed the learning of a solution to Markov Decision Problems by providing a prediction of reinforcement against which received reinforcement is compared. Once the learned values relatively reect the optimal ordering of actions, further learning is not necessary. In fact, further learning can lead to the disruption of the optimal policy if the value function is implemented wi...

متن کامل

Optimal and Approximate Q-value Functions for Decentralized POMDPs

Journal: :J. Artif. Intell. Res. 2008

Frans A. Oliehoek Matthijs T. J. Spaan Nikos A. Vlassis

Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q is computed in a recursive manner by dynamic programming, and then an optimal policy is extr...

متن کامل

Action Time Sharing Policies for Ergodic Control of Markov Chains

Journal: :SIAM J. Control and Optimization 2012

Amarjit Budhiraja Xin Liu Adam Shwartz

Ergodic control for discrete time controlled Markov chains with a locally compact state space and a compact action space is considered under suitable stability, irreducibility and Feller continuity conditions. A flexible family of controls, called action time sharing (ATS) policies, associated with a given continuous stationary Markov control, is introduced. It is shown that the long term avera...

متن کامل

On Learning Soccer

1997

Marco Wiering A. Germond M. Hasler

We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy but may behave differently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare two learning algorithms: TD-Q learning with linear neural networks (TD-Q) and Pro...

متن کامل

Q Learning based Reinforcement Learning Approach to Bipedal Walking Control

2013

Sudhir Raj Cheruvu Siva Kumar

Reinforcement learning has been active research area not only in machine learning but also in control engineering, operation research and robotics in recent years. It is a model free learning control method that can solve Markov decision problems. Q-learning is an incremental dynamic programming procedure that determines the optimal policy in a step-by-step manner. It is an online procedure for...

متن کامل

Behavior Based Control and Fuzzy Q-learning for Autonomous Mobile Robot Navigation

2013

Khairul Anam Son Kuswadi

This paper presents collaboration of behavior based control and fuzzy Q-learning for mobile robot navigation systems. There are many fuzzy Qlearning algorithms that have been proposed to yield individual behavior like obstacle avoidance, find target and so on. However, for complicated tasks, it is needed to combine all behaviors in one control schema using behavior based control. Based this fac...

متن کامل

origins of armenia’s foreign policy and its foreign policy towards iran

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه علامه طباطبایی 1388

یزدان کیخسرو دولتیاری, رحمن قهرمانپور, آتوسا گودرزی,

foreign policy takes root from complicated matters. however, this issue may be more truth about armenia. although the new government of armenia is less than 20 years, people of this territory are the first ones who officially accepted christianity. in very past times, these people were a part of great emperors like iran, rome, and byzantium.armenia is regarded as a nation with a privileged hist...

15 صفحه اول

An exact solution procedure for multi-item two-echelon spare parts inventory control problem with batch ordering in the central warehouse

Journal: :Oper. Res. Lett. 2010

Engin Topan Z. Pelin Bayindir Tarkan Tan

We consider a multi-item two-echelon inventory system in which the central warehouse operates under a (Q,R) policy, and the local warehouses implement basestock policy. An exact solution procedure is proposed to find the inventory control policy parameters that minimize the system-wide inventory holding and fixed ordering cost subject to an aggregate mean response time constraint at each facility.

متن کامل