نتایج جستجو برای: atari

تعداد نتایج: 829  

2005
Markus Levy

Acorn – the Beginning The year was 1979. Atari introduced a coin-operated version of Asteroids. The programming language ADA was born. 3COM, Oracle, and Seagate were founded. TI entered the computer market. Hayes marketed its first modem, which became the industry standard for modems. The Motorola 68K and Intel 8088 were released. And Hermann Hauser and Chris Curry, with the support of a group ...

Journal: :CoRR 2015
Ivan Sorokin Alexey Seleznev Mikhail Pavlov Aleksandr Fedorov Anastasiia Ignateva

A deep learning approach to reinforcement learning led to a general learner able to train on visual input to play a variety of arcade games at the human and superhuman levels. Its creators at the Google DeepMind’s team called the approach: Deep Q-Network (DQN). We present an extension of DQN by “soft” and “hard” attention mechanisms. Tests of the proposed Deep Attention Recurrent Q-Network (DAR...

2014
Karol Gregor Ivo Danihelka Andriy Mnih Charles Blundell Daan Wierstra

We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. Successive deep stochastic hidden layers are equipped with autoregressive connections, which enable the model to be sampled from quickly and exactly via ancestral sampling. We derive an efficient approximate parameter estimation method based on the minimum description length (MD...

Journal: :CoRR 2017
Kavosh Asadi Cameron Allen Melrose Roderick Abdel-rahman Mohamed George Konidaris Michael L. Littman

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent’s explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. This significantly reduces variance in the gradient updates and removes the n...

2015
Tristan Cazenave Jialin Liu Olivier Teytaud

Recently, a methodology has been proposed for boosting the computational intelligence of randomized gameplaying programs. We modify this methodology by working on rectangular, rather than square, matrices; and we apply it to the Domineering game. At CIG 2015, We propose a demo in the case of Go. Hence, players on site can contribute to the scientific validation by playing (in a double blind man...

2016
Ian Osband Charles Blundell Alexander Pritzel Benjamin Van Roy

Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as -greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; this ca...

2017
Aimore R. R. Dutra Artur S. d'Avila Garcez

Deep Reinforcement Learning (DRL) has had several breakthroughs, from helicopter controlling and Atari games to the Alpha-Go success. Despite their success, DRL still lacks several important features of human intelligence, such as transfer learning, planning and interpretability. We compare two DRL approaches at learning and generalization: Deep Q-Networks and Deep Symbolic Reinforcement Learni...

2016
Hado P. van Hasselt Arthur Guez Matteo Hessel Volodymyr Mnih David Silver

Most learning algorithms are not invariant to the scale of the signal that is being approximated. We propose to adaptively normalize the targets used in the learning updates. This is important in value-based reinforcement learning, where the magnitude of appropriate value approximations can change over time when we update the policy of behavior. Our main motivation is prior work on learning to ...

2015
John Schulman Sergey Levine Pieter Abbeel Michael I. Jordan Philipp Moritz

We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks...

Journal: :The Astrophysical Journal 2022

We have developed a chemo-dynamical approach to assign 36,010 metal-poor SkyMapper stars various Galactic stellar populations. Using two independent techniques (velocity and action space behavior), $Gaia$ EDR3 astrometry, photometric metallicities, we selected with the characteristics of "metal-weak" thick disk population by minimizing contamination canonical or other structures. This sample co...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید