نتایج جستجو برای: reward penalty scheme

تعداد نتایج: 265788  

2008
Young-Ro Yoon

Can valuable information be disclosed intentionally by the informed agent even within a competitive environment? In this article, we bring our interest into the asymmetry in reward and penalty in the payoff structure and explore its effects on the strategic disclosure of valuable information. According to our results, the asymmetry in reward and penalty is a necessary condition for the disclosu...

Journal: :Neural Networks 1996
Richard Stuart Neville T. John Stonham

-This article presents an investigation which studied how training o f sigma-pi networks with the associative reward-penalty ( A R-p ) regime may be enhanced by using two networks in parallel. The technique uses what has been termed an unsupervised "'adaptive critic element" (ACE) to give critical advice to the supervised sigma-pi network. We utilise the conventions that the sigma-pi neuron mod...

2014
Yi Jiang Sung-il Kim Mimi Bong

This study investigates differential neural activation patterns in response to reward-related feedback depending on various reward contingencies. Three types of reward contingencies were compared: a "gain" contingency (a monetary reward for correct answer/no monetary penalty for incorrect answer); a "lose" contingency (no monetary reward for correct answer/a monetary penalty for incorrect answe...

2000
B. John Oommen Mariana Agache

A Learning Automaton is an automaton that interacts with a random environment, having as its goal the task of learning the optimal action based on its acquired experience. Many learning automata have been proposed, with the class of Estimator Algorithms being among the fastest ones. Thathachar and Sastry [23], through the Pursuit Algorithm, introduced the concept of learning algorithms that pur...

Journal: :Journal of vision 2007
Michael S Landy Ross Goutcher Julia Trommershäuser Pascal Mamassian

We investigate whether observers take into account their visual uncertainty in an optimal manner in a perceptual estimation task with explicit rewards and penalties for performance. Observers judged the mean orientation of a briefly presented texture consisting of a collection of line segments. The mean and, in some experiments, the variance of the distribution of line orientations changed from...

Journal: :Frontiers in Marine Science 2021

Credit systems for mitigation of bycatch and habitat impact, incentive-based approaches, incentivize changes in fishery operator behavior decision-making allow flexibility a least-cost method. Three types credit systems, originally developed to address environmental pollution, are presented evaluated as currently underutilized approaches. The first, cap-and-trade approach, evolved out direct re...

2009
Behdis Eslamnour Maciej Zawodniok

Single channel based wireless networks have limited bandwidth and throughput and the bandwidth utilization decreases due to congestion and interference from other sources. In order to increase the throughput, transmission in multiple channels is considered as an option. In this paper, we propose a distributed dynamic channel allocation scheme using adaptive learning automata for wireless networ...

2017
Xiao-qing Zhang Xi-gang Yuan

and Carbon Emission Constraints Xiao-qing Zhang Xi-gang Yuan (1. The School of Statistics Southwestern University of Finance and Economics Chengdu 611130, P.R. China (2. The School of Statistics Southwestern University of Finance and Economics Chengdu 611130, P.R. China) Abstract: In this paper, we discuss the government’s reward and penalty mechanism in the presence of asymmetric information a...

Journal: :IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society 2002
Anastasios A. Economides Athanasios Kehagias

We present the STack ARchitecture (STAR) automaton. It is a fixed structure, multiaction, reward-penalty learning automaton, characterized by a star-shaped state transition diagram. Each branch of the star contains D states associated with a particular action. The branches are connected to a central "neutral" state. The most general version of STAR involves probabilistic state transitions in re...

2018
Daniel A. Abolafia Mohammad Norouzi Jonathan Shen Rui Zhao Quoc V. Le

We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards. We introduce an iterative optimization scheme, where we train an RNN on a dataset of K best programs from a priority queue of the generated programs so far. Then, we synthesize new programs and add them to the priority queue by samp...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید