نتایج جستجو برای: markov reward models

تعداد نتایج: 981365  

2016
Hui Gao Xueqing Zhang Yashuai Li

Reward criterion is an important decision factor in a Markov-based road maintenance optimization model. At present, average reward criterion or discounted reward criterion is widely used to optimize life cycle costs of road maintenance. However, the former one cannot reflect the time value of life cycle costs whereas the latter one tends to neglect the costs accumulated in the later periods ove...

Journal: :Proceedings of the ... AAAI Conference on Artificial Intelligence 2021

At the working heart of policy iteration algorithms commonly used and studied in discounted setting reinforcement learning, evaluation step estimates value states with samples from a Markov reward process induced by following decision process. We propose simple efficient estimator called loop that exploits regenerative structure processes without explicitly estimating full model. Our method enj...

روشنایی, قدرت اله, صادقی فر, مجید, صفری, ملیحه, ظهیری, علی,

Background and Objectives: Tuberculosis is a chronic bacterial disease and a major cause of morbidity and mortality. It is caused by a Mycobacterium tuberculosis. Awareness of the incidence and number of new cases of the disease is valuable information for revising the implemented programs and development indicators. time series and regression are commonly used models for prediction but these m...

2011
Kevin Regan Craig Boutilier

Specifying the reward function of a Markov decision process (MDP) can be demanding, requiring human assessment of the precise quality of, and tradeoffs among, various states and actions. However, reward functions often possess considerable structure which can be leveraged to streamline their specification. We develop new, decisiontheoretically sound heuristics for eliciting rewards for factored...

Journal: :Computer Networks 2012
Dario Bruneo Salvatore Distefano Francesco Longo Antonio Puliafito Marco Scarpa

Wireless sensor networks are constituted of a large number of tiny sensor nodes randomly distributed over a geographical region. In order to reduce power consumption, nodes undergo active-sleep periods that, on the other hand, limit their ability to send/receive data. The aim of this paper is to analyze the longevity of a battery-powered sensor node. A battery discharge model able to capture bo...

Journal: :Stochastic Processes and their Applications 1993

Journal: :CoRR 2018
Jaden B. Travnik Kory Wallace Mathewson Richard S. Sutton Patrick M. Pilarski

The relationship between a reinforcement learning (RL) agent and an asynchronous environment is often ignored. Frequently used models of the interaction between an agent and its environment, such as Markov Decision Processes (MDP) or Semi-Markov Decision Processes (SMDP), do not capture the fact that, in an asynchronous environment, the state of the environment may change during computation per...

Journal: :IEEE Trans. Circuits Syst. Video Techn. 1997
Jie Wei Ze-Nian Li

Zhang and Zafar proposed a video compression scheme based on the wavelet representation and multiresolution motion compensation (MRMC). In this letter, an additional masking module will be created to further enhance its efficiency. Specifically, between the modules of wavelet decomposition and MRMC, the masking module will be inserted which will construct binary images based on the difference o...

In this study we model the daily rainfall occurrence using Markov Chain Analogue Yearmodel (MCAYM) and the intensity or amount of daily rainfall using three different probability distributions; gamma, exponential and mixed exponential distributions. Combining the occurrence and intensity model we obtain Markov Chain Analogue Year gamma model (MCAYGM), Markov Chain Analogue Year exponentia...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید