نتایج جستجو برای: مدل reward beta
تعداد نتایج: 336113 فیلتر نتایج به سال:
A Knuth-Bendix completion procedure is parametrized by a reduction ordering used to ensure termination of intermediate and resulting rewriting systems. While in principle any reduction ordering can be used, modern completion tools typically implement only Knuth-Bendix and path orderings. Consequently, the theories for which completion can possibly yield a decision procedure are limited to those...
چکیده اقلیم تأثیر زیادی بر زندگی انسان داشته و دارد و محیط زیست انسانی و طبیعی در سطح گسترده ای متأثر از شرایط اقلیمی است. انسان در طول تاریخ برای شناخت، کنترل و سازگاری با اقلیم تلاش های بسیاری کرده است. در این تحقیق هدف بررسی تأثیر اقلیم بر بیماری های قلبی و تنفسی در شهر خرم آباد می باشد. جهت انجام این تحقیق، داده های مربوط به مراجعین بیماران قلبی و تنفسی (افراد زیر 12 سال) از دانشگاه علوم پ...
Objectives The present study examined the effects of reward-driven task on improving the affective levels in individuals with depressive symptoms. Methods The present study is an experiment study with pretest- posttest and follow-up with control group. The community of this research was the students in Tabriz University in 2016-2017 semester. The sample size was 40 students which had visited t...
We present the first fully syntactic (i.e., non-interpretation-based) AC-compatible recursive path ordering (RPO). It is simple, and hence easy to implement, and its behaviour is intuitive as in the standard RPO. The ordering is AC-total and defined uniformly for both ground and nonground terms, as well as for partial precedences. More important, it is the first one that can deal incrementally ...
Delayed reward discounting (DRD) is a common index of impulsivity that refers to an individual's devaluation of rewards based on delay of receipt and has been linked to alcohol misuse and other maladaptive behaviors. The current study investigated response consistency and reward magnitude effects in two measures of DRD in a sample of 111 undergraduates who consumed an average of 10.7 drinks/wee...
Policy evaluation using least-squares techniques (such as LSTD and iLSTD) have been shown to estimate the value of a policy with far less data than traditional TD techniques. Unfortunately, they make use of policy-dependent statistics that have to be discarded when the policy changes. This makes it difficult to use the techniques for online control problems. In this paper, we explore the effect...
Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q(σ) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning. Experiments sugges...
A behavioral economic approach to alcohol use disorders (AUDs) emphasizes both individual and environmental determinants of alcohol use. The current study examined individual differences in alcohol demand (i.e., motivation for alcohol under escalating conditions of price) and delayed reward discounting (i.e., preference for immediate small rewards compared to delayed larger rewards) in 61 heavy...
The aim of this paper is to enhance the performance of a reinforcement learning game agent controller, within a dynamic game environment, through the retention of learned information over a series of consecutive games. Using a variation of the classic arcade game Pac-Man, the Sarsa algorithm has been utilised for the control of the Pac-Man game agent. The results indicate the use of stateaction...
In the multi-sensor activity recognition domain, the input space is often large and contains irrelevant and overlapped features. It is important to perform feature selection in order to select the smallest number of features which can describe the outputs. This paper proposes a new feature selection algorithms using the maximal relevance and maximal complementary criteria (MRMC) based on neural...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید