Particle Value Functions
نویسندگان
چکیده
The policy gradients of the expected return objective can react slowly to rare rewards. Yet, in some cases agents may wish to emphasize the low or high returns regardless of their probability. Borrowing from the economics and control literature, we review the risk-sensitive value function that arises from an exponential utility and illustrate its effects on an example. This risk-sensitive value function is not always applicable to reinforcement learning problems, so we introduce the particle value function defined by a particle filter over the distributions of an agent’s experience, which bounds the risk-sensitive one. We illustrate the benefit of the policy gradients of this objective in Cliffworld.
منابع مشابه
Particle swarm optimization for a bi-objective web-based convergent product networks
Here, a collection of base functions and sub-functions configure the nodes of a web-based (digital)network representing functionalities. Each arc in the network is to be assigned as the link between two nodes. The aim is to find an optimal tree of functionalities in the network adding value to the product in the web environment. First, a purification process is performed in the product network ...
متن کاملOptimum allocation of Iranian oil and gas resources using multi-objective linear programming and particle swarm optimization in resistive economy conditions
This research presents a model for optimal allocation of Iranian oil and gas resources in sanction condition based on stochastic linear multi-objective programming. The general policies of the resistive economy include expanding exports of gas, electricity, petrochemical and petroleum products, expanding the strategic oil and gas reserves, increasing added value through completing the petroleum...
متن کاملAn Interactive Fuzzy Satisfying Method Based on Particle Swarm Optimization for Multi-Objective Function in Reactive Power Market
Reactive power plays an important role in supporting real power transmission, maintaining system voltages within proper limits and overall system reliability. In this paper, the production cost of reactive power, cost of the system transmission loss, investment cost of capacitor banks and absolute value of total voltage deviation (TVD) are included into the objective function of the power flow ...
متن کاملA self-guided Particle Swarm Optimization with Independent Dynamic Inertia Weights Setting on Each Particle
In the standard PSO algorithm, each particle in swarm has the same inertia weight settings and its values decrease from generation to generation, which can induce the decreasing of population diversity. As a result, it may fall into the local optimum. Besides, the decreasing of weights values is restricted by the maximum evolutionary generation, which has an influence on the convergence speed a...
متن کاملReproducing Polynomial(Singularity) Particle Methods and Adaptive Meshless Methods for 2-Dim Elliptic Boundary Value Problems
Oh et al ([25]) introduced the reproducing polynomial particle (RPP) shape functions that are piecewise polynomial and satisfy the Kronecker delta property. In this paper, we introduce RPPM (Reproducing Polynomial Particle Methods) that is the Galerkin approximation method associated with the use of the RPP approximation space. Planting particles in the computation domain in a patchwise uniform...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1703.05820 شماره
صفحات -
تاریخ انتشار 2017