نتایج جستجو برای: atari

تعداد نتایج: 829  

Journal: :CoRR 2018
Jiaming Song Yuhuai Wu

Deep reinforcement learning methods have shown tremendous success in a large variety tasks, such as Go [Silver et al., 2016], Atari [Mnih et al., 2013], and continuous control [Lillicrap et al., 2015, Schulman et al., 2015]. Policy gradient methods [Williams, 1992] is an important family of methods in model-free reinforcement learning, and the current state-of-the-art policy gradient methods ar...

2016
Yunshu Du Gabriel V. de la Cruz James Irwin Matthew E. Taylor

As one of the first successful models that combines reinforcement learning technique with deep neural networks, the Deep Q-network (DQN) algorithm has gained attention as it bridges the gap between high-dimensional sensor inputs and autonomous agent learning. However, one main drawback of DQN is the long training time required to train a single task. This work aims to leverage transfer learning...

Journal: :CoRR 2016
William F. Whitney Michael Chang Tejas D. Kulkarni Joshua B. Tenenbaum

We introduce a neural network architecture and a learning algorithm to produce factorized symbolic representations. We propose to learn these concepts by observing consecutive frames, letting all the components of the hidden representation except a small discrete set (gating units) be predicted from the previous frame, and let the factors of variation in the next frame be represented entirely b...

Journal: :CoRR 2016
Andrei A. Rusu Neil C. Rabinowitz Guillaume Desjardins Hubert Soyer James Kirkpatrick Koray Kavukcuoglu Razvan Pascanu Raia Hadsell

Learning to solve complex sequences of tasks—while both leveraging transfer and avoiding catastrophic forgetting—remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features. We evaluate this archite...

2011
SMOLIN LEE SMOLIN

LEE SMOLIN, a theoretical physicist, is concerned with quantum gravity,"the name we give to the theory that unifies all the physics now under construction." More specifically, he is a co-inventor of an approach called loop quantum gravity. In 2001, he became a founding member and research physicist of the Perimeter Institute for Theoretical Physics, in Waterloo, Ontario. Smolin is the author of...

2015
Junhyuk Oh Xiaoxiao Guo Honglak Lee Richard Lewis Satinder Singh

The network architectures of the proposed models and the baselines are illustrated in Figure 1. The weight of LSTM is initialized from a uniform distribution of [−0.08, 0.08]. The weight of the fully-connected layer from the encoded feature to the factored layer and from the action to the factored layer are initialized from a uniform distribution of [−1, 1] and [−0.1, 0.1] respectively.

1999
J. J. Blanco - Pillado R. A. Vázquez E. Zas

We discuss recent models in which neutrinos, which are assumed to have mass in the eV range, originate the highest energy cosmic rays by interaction with the enhanced density in the galactic halo of the relic cosmic neutrino background. We make an analytical calculation of the required neutrino fluxes

Journal: :IJTM 2008
Mats R. K. Lindstedt Juuso Liesiö Ahti Salo

The development of a product portfolio is a strategic decision which is often complicated by the large number of competing products, product interactions and high uncertainties about how successful the products will be in the marketplace. These decisions are commonly supported either by financially oriented approaches (e.g., net present value) or more qualitative approaches (e.g., scoring model...

2010
Jonathan Goodman

The dynamic replication strategy of Black and Scholes is important enough that it is worth repeating from last week. Recall the setup. From day k − 1 to day k, the stock (risky asset price) either goes up Sk−1 → Sk = uSk or goes down Sk = dSk−1 (recall that we actually did not necessarily need u > 1 or d < 1, but it is convenient to think of u as up and d as down.) The replicating portfolio is ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید