Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

نویسندگان

چکیده

Value function is the central notion of Reinforcement Learning (RL). estimation, especially with approximation, can be challenging since it involves stochasticity environmental dynamics and reward signals that sparse delayed in some cases. A typical model-free RL algorithm usually estimates values a policy by Temporal Difference (TD) or Monte Carlo (MC) algorithms directly from rewards, without explicitly taking into consideration. In this paper, we propose Decomposition Future Prediction (VDFP), providing an explicit two-step understanding value estimation process: 1) first foresee latent future, 2) then evaluate it. We analytically decompose future part policy-independent trajectory return part, inducing way to model returns separately estimation. Further, derive practical deep algorithm, consisting convolutional learn compact representation past experiences, conditional variational auto-encoder predict convex evaluates representation. experiments, empirically demonstrate effectiveness our approach for both off-policy on-policy several OpenAI Gym continuous control tasks as well few variants reward.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decomposing Parameter Estimation Problems

We propose a technique for decomposing the parameter learning problem in Bayesian networks into independent learning problems. Our technique applies to incomplete datasets and exploits variables that are either hidden or observed in the given dataset. We show empirically that the proposed technique can lead to orders-of-magnitude savings in learning time. We explain, analytically and empiricall...

متن کامل

Latent Attention For If-Then Program Synthesis

Automatic translation from natural language descriptions into programs is a longstanding challenging problem. In this work, we consider a simple yet important sub-problem: translation from textual descriptions to If-Then programs. We devise a novel neural network architecture for this task which we train end-toend. Specifically, we introduce Latent Attention, which computes multiplicative weigh...

متن کامل

Analysing value substitution and confidence estimation for value prediction

Value Prediction is one of the newest techniques used to break down ILP limits. Despite being under continuous study during the last few years, a few aspects related to this emerging technique remain unanalysed in depth. Exhaustively investigated in the context of control speculation, confidence estimation has usually played a secondary role on value prediction and speculation. Closely linked t...

متن کامل

Prediction Outcome History-Based Confidence Estimation for Load Value Prediction

Load instructions occasionally incur very long latencies that can significantly affect system performance. Load value prediction alleviates this problem by allowing the CPU to speculatively continue processing without having to wait for the slow memory access to complete. Current load value predictors can only correctly predict about forty to seventy percent of the fetched load values. To avoid...

متن کامل

Ojects under Foresee Uncertainty

Uncertainty appears as a significant barrier to projects attaining their intended performance goals; thereby contri uting to project failure. Literature on project management under uncertainty has recommended a contingency however based on the premise that the level of uncertainty is static over project duration. We relaxed the assumption by considering variation in the level of uncertainty wit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i11.17182