نتایج جستجو برای: markov reward models
تعداد نتایج: 981365 فیلتر نتایج به سال:
The high expectations of performance and availability for wireless mobile systems has presented great challenges in the modelling and design of fault tolerant wireless systems. The proper modelling methodology to study the degradation of such systems is so-called performability modelling. In this paper, we give overview of approaches for the construction and the solution of performability model...
In this chapter we study Markov decision processes (MDPs) with nite state and action spaces. This is the classical theory developed since the end of the fties. We consider nite and in nite horizon models. For the nite horizon model the utility function of the total expected reward is commonly used. For the in nite horizon the utility function is less obvious. We consider several criteria: total...
In competitive motor tasks such as table tennis, mastering the task is not merely a matter of perfect execution of a specific movement pattern. Here, a higher-level strategy is required in order to win the game. The data-driven identification of basic strategies in interactive tasks, such as table tennis is a largely unexplored problem. In order to automatically extract expert knowledge on effe...
We study the use of inverse reinforcement learning (IRL) as a tool for the recognition of agents’ behavior on the basis of observation of their sequential decision behavior interacting with the environment. We model the problem faced by the agents as a Markov decision process (MDP) and model the observed behavior of the agents in terms of forward planning for the MDP. We use IRL to learn reward...
Goal-directed Markov Decision Process models (GDMDPs) are good models for many decision-theoretic planning tasks. They have been used in conjunction with two different reward structures, namely the goal-reward representation and the action-penalty representation. We apply GDMDPs to planning tasks in the presence of traps such as steep slopes for outdoor robots or staircases for indoor robots, a...
Temporal difference learning models propose phasic dopamine signaling encodes reward prediction errors that drive learning. This is supported by studies where optogenetic stimulation of dopamine neurons can stand in lieu of actual reward. Nevertheless, a large body of data also shows that dopamine is not necessary for learning, and that dopamine depletion primarily affects task performance. We ...
The tool OpenSESAME offers an easy-to-use modeling framework which enables realistic availability and reliability analysis of faulttolerant systems. Our symbolic engine, which is based on an extension of binary decision diagrams (BDDs), is capable of analyzing Markov reward models consisting of more than 10 system states. In this paper, we introduce a tool chain where OpenSESAME is employed for...
The asymptotic bias and variance are important deter minants of the quality of a simulation run In particu lar the asymptotic bias can be used to approximate the bias introduced by starting the collection of a measure in a particular state distribution and the asymptotic variance can be used to compute the simulation time re quired to obtain a statistically signi cant estimate of a measure Whil...
19 to both control the uid ow, and have the discrete control decisions be aaected by observed uid ow. We have provided formal deenition of FSPNs and developed the rules for their dynamic evolution. We have derived coupled systems of partial diierential equations for the transient and the steady-state behavior of FSPNs. Spectral representation of the FSPNs with a single continuous place can be a...
Abstract Despite their many empirical successes, approximate value-function based approaches to reinforcement learning suffer from a paucity of theoretical guarantees on the performance of the policy generated by the value-function. In this paper we pursue an alternative approach: first compute the gradient of the average reward with respect to the parameters controlling the state transitions i...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید