مدل reward beta

Slothrop: Knuth-Bendix Completion with a Modern Termination Checker

2006

Ian Wehrman Aaron Stump Edwin M. Westbrook

A Knuth-Bendix completion procedure is parametrized by a reduction ordering used to ensure termination of intermediate and resulting rewriting systems. While in principle any reduction ordering can be used, modern completion tools typically implement only Knuth-Bendix and path orderings. Consequently, the theories for which completion can possibly yield a decision procedure are limited to those...

متن کامل

تأثیر اقلیم بر بیماری های قلبی و تنفسی در شهر خرم آباد

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه یزد - دانشکده علوم انسانی 1392

فاطمه سوخته زار, احمد مزیدی, بهروز پروانه,

چکیده اقلیم تأثیر زیادی بر زندگی انسان داشته و دارد و محیط زیست انسانی و طبیعی در سطح گسترده ای متأثر از شرایط اقلیمی است. انسان در طول تاریخ برای شناخت، کنترل و سازگاری با اقلیم تلاش های بسیاری کرده است. در این تحقیق هدف بررسی تأثیر اقلیم بر بیماری های قلبی و تنفسی در شهر خرم آباد می باشد. جهت انجام این تحقیق، داده های مربوط به مراجعین بیماران قلبی و تنفسی (افراد زیر 12 سال) از دانشگاه علوم پ...

15 صفحه اول

اثربخشی تکلیف پاداش محور بر سطوح عاطفی افراد افسرده

ژورنال: روانپزشکی و روانشناسی بالینی ایران 2018

اعتمادی چهارده, نیلوفر, بخشی پور, عباس, کریم پور وظیفه خورانی, علیرضا, کمالی قاسم آبادی, حسین,

Objectives The present study examined the effects of reward-driven task on improving the affective levels in individuals with depressive symptoms. Methods The present study is an experiment study with pretest- posttest and follow-up with control group. The community of this research was the students in Tabriz University in 2016-2017 semester. The sample size was 40 students which had visited t...

متن کامل

A Fully Syntactic AC-RPO

Journal: :Inf. Comput. 1999

Albert Rubio

We present the first fully syntactic (i.e., non-interpretation-based) AC-compatible recursive path ordering (RPO). It is simple, and hence easy to implement, and its behaviour is intuitive as in the standard RPO. The ordering is AC-total and defined uniformly for both ground and nonground terms, as well as for partial precedences. More important, it is the first one that can deal incrementally ...

متن کامل

Delayed Reward Discounting and Alcohol Misuse: The Roles of Response Consistency and Reward Magnitude.

Journal: :Journal of experimental psychopathology 2011

Michael Amlung James MacKillop

Delayed reward discounting (DRD) is a common index of impulsivity that refers to an individual's devaluation of rewards based on delay of receipt and has been linked to alcohol misuse and other maladaptive behaviors. The current study investigated response consistency and reward magnitude effects in two measures of DRD in a sample of 111 undergraduates who consumed an average of 10.7 drinks/wee...

متن کامل

Online Control With Least-Squares Methods

2007

Policy evaluation using least-squares techniques (such as LSTD and iLSTD) have been shown to estimate the value of a policy with far less data than traditional TD techniques. Unfortunately, they make use of policy-dependent statistics that have to be discarded when the policy changes. This makes it difficult to use the techniques for online control problems. In this paper, we explore the effect...

متن کامل

Double Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms

2017

Markus Dumke

Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q(σ) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning. Experiments sugges...

متن کامل

Alcohol demand, delayed reward discounting, and craving in relation to drinking and alcohol use disorders.

Journal: :Journal of abnormal psychology 2010

James MacKillop Robert Miranda Peter M Monti Lara A Ray James G Murphy Damaris J Rohsenow John E McGeary Robert M Swift Jennifer W Tidey Chad J Gwaltney

A behavioral economic approach to alcohol use disorders (AUDs) emphasizes both individual and environmental determinants of alcohol use. The current study examined individual differences in alcohol demand (i.e., motivation for alcohol under escalating conditions of price) and delayed reward discounting (i.e., preference for immediate small rewards compared to delayed larger rewards) in 61 heavy...

متن کامل

Improvement in Game Agent Control Using State-Action Value Scaling

2008

Leo Galway Darryl Charles Michaela M. Black

The aim of this paper is to enhance the performance of a reinforcement learning game agent controller, within a dynamic game environment, through the retention of learned information over a series of consecutive games. Using a variation of the classic arcade game Pac-Man, the Sarsa algorithm has been utilised for the control of the Pac-Man game agent. The results indicate the use of stateaction...

متن کامل

Maximum relevancy maximum complementary feature selection for multi-sensor activity recognition

Journal: :Expert Syst. Appl. 2015

Saisakul Chernbumroong Shuang Cang Hongnian Yu

In the multi-sensor activity recognition domain, the input space is often large and contains irrelevant and overlapped features. It is important to perform feature selection in order to select the smallest number of features which can describe the outputs. This paper proposes a new feature selection algorithms using the maximal relevance and maximal complementary criteria (MRMC) based on neural...

متن کامل