Reinforcement Learning for Datacenter Congestion Control

نویسندگان

چکیده

We approach the task of network congestion control in datacenters using Reinforcement Learning (RL). Successful algorithms can dramatically improve latency and overall throughput. Until today, no such learning-based have shown practical potential this domain. Evidently, most popular recent deployments rely on rule-based heuristics that are tested a predetermined set benchmarks. Consequently, these do not generalize well to newly-seen scenarios. Contrarily, we devise an RL-based algorithm with aim generalizing different configurations real-world datacenter networks. overcome challenges as partial-observability, nonstationarity, multi-objectiveness. further propose policy gradient leverages analytical structure reward function approximate its derivative stability. show scheme outperforms alternative RL approaches, generalizes scenarios were seen during training. Our experiments, conducted realistic simulator emulates communication networks' behavior, exhibit improved performance concurrently multiple considered metrics compared deployed today real datacenters. is being productized replace some largest world.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Flow and Congestion Control for Datacenter Networks

The limits of power dissipation and Moore's law are leading toward increasing parallelism and a shift of focus from CPUs to interconnection networks. This trend is also reflected in the rise of blade-based datacenters, which cluster server and storage units packaged as blades, with several networks. We begin with the trends and requirements of datacenter interconnection networks. Next, we show ...

متن کامل

TIMELY: RTT-based congestion control for the datacenter – Public Review

The context is datacenter congestion control. Traditional TCP transport stacks fare poorly in this environment, which has led to considerable interest in recent years in developing specialized transports that aim to deliver high bandwidth utilization at extremely low, microsecond-level packet latency. This is important for demanding datacenter applications such as cloud storage and near-realtim...

متن کامل

Xavier : A Reinforcement-Learning Approach to TCP Congestion Control

Controlling congestion is a fundmanetal problem in computer networks. If the input load is greater than the output bandwidth at a particular switch, the bottleneck’s queue begins to fill up and we say that it is congested. In pathological scenarios and under certain protocols, the saturation of buffers, or bufferbloat [5], can lead to congestion collapse, a condition in which congestion reaches...

متن کامل

Centralized Congestion Control and Scheduling in a Datacenter

We consider the problem of designing a packet-level congestion control and scheduling policy for datacenter networks. Current datacenter networks primarily inherit the principles that went into the design of Internet, where congestion control and scheduling are distributed. While distributed architecture provides robustness, it suffers in terms of performance. Unlike Internet, data center is fu...

متن کامل

Reinforcement Learning for Control

Reinforcement learning (RL) offers a principled way to control nonlinear stochastic systems with partly or even fully unknown dynamics. Recent advances in areas such as deep learning and adaptive dynamic programming (ADP) have led to significant inroads in applications from robotics, automotive systems, smart grids, game playing, traffic control, etc. This open track provides a forum of interac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Performance evaluation review

سال: 2022

ISSN: ['1557-9484', '0163-5999']

DOI: https://doi.org/10.1145/3512798.3512815