نتایج جستجو برای: reward penalty scheme
تعداد نتایج: 265788 فیلتر نتایج به سال:
Planning rapid movements to maximize gain in scenes with multiple regions carrying reward or penalty
In this paper, we propose a deep reinforcement learning (DRL) based predictive control scheme for reducing the energy consumption and cost of pumping systems in wastewater treatment plants (WWTP), which pumps are operated binary mode, using on/off signals. As global increases, efficient operation energy-intensive facilities has also become important. A WWTP Busan, Republic Korea is used as targ...
The time-mean circulation in the Agulhas Retroflection area is determined by combining TOPEX/POSEIDON data and a two-layer quasi-geostrophic model using the ensemble smoother. By taking the time-mean circulation as the unknown in the data assimilation procedure, the time-varying altimeter signal is used to constrain the time-mean field. The quasi-geostrophic model is applied as a strong constra...
We propose an asymptotic-preserving (AP) scheme for kinetic equations that is efficient also in the hydrodynamic regimes. This scheme is based on the BGK-penalty method introduced by Filbet-Jin [14], but uses the penalization successively to achieve the desired asymptotic property. This method possesses a stronger AP property than the original method of Filbet-Jin, with the additional feature o...
Simply put, reinforcement learning (RL) is a term used to indicate a large family of dierent algorithms RL that all share two key properties. First, the objective of RL is to learn appropriate behavior through trialand-error experience in a task. Second, in RL, the feedback available to the learning agent is restricted to a reward signal that indicates how well the agent is behaving, but does ...
In this paper a nonlinear penalty method via a nonlinear Lagrangian function is introduced for semi-infinite programs. A convergence result is established which shows that the sequence of optimal values of nonlinear penalty problems converges to that of semi-infinite programs. Moreover a conceptual convergence result of a discretization method with an adaptive scheme for solving semi-infinite p...
In this paper, we develop a game-theoretic model that studies the contract design problem of a third party logistics buyer when he is faced with a third party logistics provider and the quality of service and the cost of providing the service are private information to the latter. We apply the Revelation Principle to our analysis and characterise the optimal contract. We show that the contract ...
The problem of minimizing the rank of a symmetric positive semidefinite matrix subject to constraints can be cast equivalently as a semidefinite program with complementarity constraints (SDCMPCC). The formulation requires two positive semidefinite matrices to be complementary. We investigate calmness of locally optimal solutions to the SDCMPCC formulation and hence show that any locally optimal...
With the development of sharing economy, more and enterprises choose crowd logistics for distribution. Because platform uses social freelancers, service quality is difficult to guarantee. Considering reward-penalty mechanism, dynamic differential game models are constructed study optimal pricing services under stochastic demand based on control theory Pontryagin maximum principle. The numerical...
Our network is facing a rapidly evolving threat landscape full of modern applications, exploits, malware and attack strategies that are capable of avoiding traditional methods of detection. Intrusion detection can perform the task of monitoring usability systems to detect any apparition of insecure states. To overcome above mentioned issues we have employed genetic algorithm to improve detectio...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید