reward penalty scheme

نتایج جستجو برای: reward penalty scheme

تعداد نتایج: 265788 فیلتر نتایج به سال:

Planning rapid movements to maximize gain in scenes with multiple regions carrying reward or penalty

Journal: :Journal of Vision 2004

Deep Reinforcement Learning-Based Smart Joint Control Scheme for On/Off Pumping Systems in Wastewater Treatment Plants

Journal: :IEEE Access 2021

In this paper, we propose a deep reinforcement learning (DRL) based predictive control scheme for reducing the energy consumption and cost of pumping systems in wastewater treatment plants (WWTP), which pumps are operated binary mode, using on/off signals. As global increases, efficient operation energy-intensive facilities has also become important. A WWTP Busan, Republic Korea is used as targ...

متن کامل

The timemean circulation in the Agulhas region determined with the ensemble smoother

2007

Peter Jan Van Leeuwen

The time-mean circulation in the Agulhas Retroflection area is determined by combining TOPEX/POSEIDON data and a two-layer quasi-geostrophic model using the ensemble smoother. By taking the time-mean circulation as the unknown in the data assimilation procedure, the time-varying altimeter signal is used to constrain the time-mean field. The quasi-geostrophic model is applied as a strong constra...

متن کامل

A Successive Penalty-Based Asymptotic-Preserving Scheme for Kinetic Equations

Journal: :SIAM J. Scientific Computing 2013

Bokai Yan Shi Jin

We propose an asymptotic-preserving (AP) scheme for kinetic equations that is efficient also in the hydrodynamic regimes. This scheme is based on the BGK-penalty method introduced by Filbet-Jin [14], but uses the penalization successively to achieve the desired asymptotic property. This method possesses a stronger AP property than the original method of Filbet-Jin, with the additional feature o...

متن کامل

Reinforcement Learning: Model-free

2012

Chris R. Sims

Simply put, reinforcement learning (RL) is a term used to indicate a large family of dierent algorithms RL that all share two key properties. First, the objective of RL is to learn appropriate behavior through trialand-error experience in a task. Second, in RL, the feedback available to the learning agent is restricted to a reward signal that indicates how well the agent is behaving, but does ...

متن کامل

Nonlinear Lagrangian Functions and Applications to Semi-Infinite Programs

Journal: :Annals OR 2001

X. Q. Yang Kok Lay Teo

In this paper a nonlinear penalty method via a nonlinear Lagrangian function is introduced for semi-infinite programs. A convergence result is established which shows that the sequence of optimal values of nonlinear penalty problems converges to that of semi-infinite programs. Moreover a conceptual convergence result of a discretization method with an adaptive scheme for solving semi-infinite p...

متن کامل

A lemons market? An incentive scheme to induce truth-telling in third party logistics providers

Journal: :European Journal of Operational Research 2000

Wei Shi Lim

In this paper, we develop a game-theoretic model that studies the contract design problem of a third party logistics buyer when he is faced with a third party logistics provider and the quality of service and the cost of providing the service are private information to the latter. We apply the Revelation Principle to our analysis and characterise the optimal contract. We show that the contract ...

متن کامل

A Penalty Method for Rank Minimization Problems in Symmetric Matrices∗

2017

Xin Shen John E. Mitchell

The problem of minimizing the rank of a symmetric positive semidefinite matrix subject to constraints can be cast equivalently as a semidefinite program with complementarity constraints (SDCMPCC). The formulation requires two positive semidefinite matrices to be complementary. We investigate calmness of locally optimal solutions to the SDCMPCC formulation and hence show that any locally optimal...

متن کامل

Dynamic Pricing and Logistics Service Decisions for Crowd Logistics Platforms with Social Delivery Capacity

Journal: :Mathematical Problems in Engineering 2022

With the development of sharing economy, more and enterprises choose crowd logistics for distribution. Because platform uses social freelancers, service quality is difficult to guarantee. Considering reward-penalty mechanism, dynamic differential game models are constructed study optimal pricing services under stochastic demand based on control theory Pontryagin maximum principle. The numerical...

متن کامل

Efficient calculation of fitness function by calculating reward Penalty for a GA-based Network Intrusion Detection System

2015

H. M. Diwanji

Our network is facing a rapidly evolving threat landscape full of modern applications, exploits, malware and attack strategies that are capable of avoiding traditional methods of detection. Intrusion detection can perform the task of monitoring usability systems to detect any apparition of insecure states. To overcome above mentioned issues we have employed genetic algorithm to improve detectio...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید