atari

The History of The ARM Architecture: From Inception to IPO

2005

Markus Levy

Acorn – the Beginning The year was 1979. Atari introduced a coin-operated version of Asteroids. The programming language ADA was born. 3COM, Oracle, and Seagate were founded. TI entered the computer market. Hayes marketed its first modem, which became the industry standard for modems. The Motorola 68K and Intel 8088 were released. And Hermann Hauser and Chris Curry, with the support of a group ...

متن کامل

Deep Attention Recurrent Q-Network

Journal: :CoRR 2015

Ivan Sorokin Alexey Seleznev Mikhail Pavlov Aleksandr Fedorov Anastasiia Ignateva

A deep learning approach to reinforcement learning led to a general learner able to train on visual input to play a variety of arcade games at the human and superhuman levels. Its creators at the Google DeepMind’s team called the approach: Deep Q-Network (DQN). We present an extension of DQN by “soft” and “hard” attention mechanisms. Tests of the proposed Deep Attention Recurrent Q-Network (DAR...

متن کامل

Deep AutoRegressive Networks

2014

Karol Gregor Ivo Danihelka Andriy Mnih Charles Blundell Daan Wierstra

We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. Successive deep stochastic hidden layers are equipped with autoregressive connections, which enable the model to be sampled from quickly and exactly via ancestral sampling. We derive an efficient approximate parameter estimation method based on the minimum description length (MD...

متن کامل

Mean Actor Critic

Journal: :CoRR 2017

Kavosh Asadi Cameron Allen Melrose Roderick Abdel-rahman Mohamed George Konidaris Michael L. Littman

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent’s explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. This significantly reduces variance in the gradient updates and removes the n...

متن کامل

The rectangular seeds of Domineering, Atari-Go and Breakthrough

2015

Tristan Cazenave Jialin Liu Olivier Teytaud

Recently, a methodology has been proposed for boosting the computational intelligence of randomized gameplaying programs. We modify this methodology by working on rectangular, rather than square, matrices; and we apply it to the Domineering game. At CIG 2015, We propose a demo in the case of Go. Hence, players on site can contribute to the scientific validation by playing (in a double blind man...

متن کامل

Deep Exploration via Bootstrapped DQN

2016

Ian Osband Charles Blundell Alexander Pritzel Benjamin Van Roy

Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as -greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; this ca...

متن کامل

A Comparison between Deep Q-Networks and Deep Symbolic Reinforcement Learning

2017

Aimore R. R. Dutra Artur S. d'Avila Garcez

Deep Reinforcement Learning (DRL) has had several breakthroughs, from helicopter controlling and Atari games to the Alpha-Go success. Despite their success, DRL still lacks several important features of human intelligence, such as transfer learning, planning and interpretability. We compare two DRL approaches at learning and generalization: Deep Q-Networks and Deep Symbolic Reinforcement Learni...

متن کامل

Learning values across many orders of magnitude

2016

Hado P. van Hasselt Arthur Guez Matteo Hessel Volodymyr Mnih David Silver

Most learning algorithms are not invariant to the scale of the signal that is being approximated. We propose to adaptively normalize the targets used in the learning updates. This is important in value-based reinforcement learning, where the magnitude of appropriate value approximations can change over time when we update the policy of behavior. Our main motivation is prior work on learning to ...

متن کامل

Trust Region Policy Optimization

2015

John Schulman Sergey Levine Pieter Abbeel Michael I. Jordan Philipp Moritz

We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks...

متن کامل

The Atari Disk, a Metal-poor Stellar Population in the Disk System of the Milky Way

Journal: :The Astrophysical Journal 2022

We have developed a chemo-dynamical approach to assign 36,010 metal-poor SkyMapper stars various Galactic stellar populations. Using two independent techniques (velocity and action space behavior), $Gaia$ EDR3 astrometry, photometric metallicities, we selected with the characteristics of "metal-weak" thick disk population by minimizing contamination canonical or other structures. This sample co...

متن کامل