نتایج جستجو برای: atari
تعداد نتایج: 829 فیلتر نتایج به سال:
Deep reinforcement learning methods have shown tremendous success in a large variety tasks, such as Go [Silver et al., 2016], Atari [Mnih et al., 2013], and continuous control [Lillicrap et al., 2015, Schulman et al., 2015]. Policy gradient methods [Williams, 1992] is an important family of methods in model-free reinforcement learning, and the current state-of-the-art policy gradient methods ar...
As one of the first successful models that combines reinforcement learning technique with deep neural networks, the Deep Q-network (DQN) algorithm has gained attention as it bridges the gap between high-dimensional sensor inputs and autonomous agent learning. However, one main drawback of DQN is the long training time required to train a single task. This work aims to leverage transfer learning...
We introduce a neural network architecture and a learning algorithm to produce factorized symbolic representations. We propose to learn these concepts by observing consecutive frames, letting all the components of the hidden representation except a small discrete set (gating units) be predicted from the previous frame, and let the factors of variation in the next frame be represented entirely b...
Learning to solve complex sequences of tasks—while both leveraging transfer and avoiding catastrophic forgetting—remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features. We evaluate this archite...
LEE SMOLIN, a theoretical physicist, is concerned with quantum gravity,"the name we give to the theory that unifies all the physics now under construction." More specifically, he is a co-inventor of an approach called loop quantum gravity. In 2001, he became a founding member and research physicist of the Perimeter Institute for Theoretical Physics, in Waterloo, Ontario. Smolin is the author of...
The network architectures of the proposed models and the baselines are illustrated in Figure 1. The weight of LSTM is initialized from a uniform distribution of [−0.08, 0.08]. The weight of the fully-connected layer from the encoded feature to the factored layer and from the action to the factored layer are initialized from a uniform distribution of [−1, 1] and [−0.1, 0.1] respectively.
We discuss recent models in which neutrinos, which are assumed to have mass in the eV range, originate the highest energy cosmic rays by interaction with the enhanced density in the galactic halo of the relic cosmic neutrino background. We make an analytical calculation of the required neutrino fluxes
The development of a product portfolio is a strategic decision which is often complicated by the large number of competing products, product interactions and high uncertainties about how successful the products will be in the marketplace. These decisions are commonly supported either by financially oriented approaches (e.g., net present value) or more qualitative approaches (e.g., scoring model...
The dynamic replication strategy of Black and Scholes is important enough that it is worth repeating from last week. Recall the setup. From day k − 1 to day k, the stock (risky asset price) either goes up Sk−1 → Sk = uSk or goes down Sk = dSk−1 (recall that we actually did not necessarily need u > 1 or d < 1, but it is convenient to think of u as up and d as down.) The replicating portfolio is ...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید