Model Selection in Reinforcement Learning with General Function Approximations

نویسندگان

چکیده

We consider model selection for classic Reinforcement Learning (RL) environments – Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) under general function approximations. In the framework, we do not know classes, denoted by $$\mathcal {F}$$ {M}$$ , where true models reward generating MABs transition kernel MDPs lie, respectively. Instead, are given M nested (hypothesis) classes such that contained in at-least one class. this paper, propose analyze efficient algorithms MDPs, adapt to smallest class (among classes) containing underlying model. Under a separability assumption on hypothesis show cumulative regret of our adaptive match an oracle which knows correct (i.e., ) priori. Furthermore, both settings, cost is additive term having weak (logarithmic) dependence learning horizon T.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Abstraction Selection in Model-based Reinforcement Learning

ion Selection in Model-Based Reinforcement Learning Nan Jiang, Alex Kulesza, Satinder Singh {NANJIANG,KULESZA,BAVEJA}@UMICH.EDU Computer Science & Engineering, University of Michigan

متن کامل

Convergence of Reinforcement Learning with General Function Approximators

A key open problem in reinforcement learning is to assure convergence when using a compact hypothesis class to approximate the value function. Although the standard temporal-difference learning algorithm has been shown to converge when the hypothesis class is a linear combination of fixed basis functions, it may diverge with a general (nonlinear) hypothesis class. This paper describes the Bridg...

متن کامل

PAC-Bayesian Model Selection for Reinforcement Learning

This paper introduces the first set of PAC-Bayesian bounds for the batch reinforcement learning problem in finite state spaces. These bounds hold regardless of the correctness of the prior distribution. We demonstrate how such bounds can be used for model-selection in control problems where prior information is available either on the dynamics of the environment, or on the value of actions. Our...

متن کامل

Value-Aware Loss Function for Model Learning in Reinforcement Learning

We consider the problem of estimating the transition probability kernel to be used by a model-based reinforcement learning (RL) algorithm. We argue that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, might be an overkill because such a probabilistic loss does not take into account the underlying structure of the decision problem and the RL algorithm tha...

متن کامل

Nonparametric General Reinforcement Learning

Reinforcement learning problems are often phrased in terms of Markov decision processes (MDPs). In this thesis we go beyond MDPs and consider reinforcement learning in environments that are non-Markovian, non-ergodic and only partially observable. Our focus is not on practical algorithms, but rather on the fundamental underlying problems: How do we balance exploration and exploitation? How do w...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2023

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-26412-2_10