Optimal Operator State Migration for Elastic Data Stream Processing
نویسندگان
چکیده
A cloud-based data stream management system (DSMS) handles fast data by utilizing the massively parallel processing capabilities of the underlying platform. An important property of such a DSMS is elasticity, meaning that nodes can be dynamically added to or removed from an application to match the latter’s workload, which may fluctuate in an unpredictable manner. For an application involving stateful operations such as aggregates, the addition / removal of nodes necessitates the migration of operator states. Although the importance of migration has been recognized in existing systems, two key problems remain largely neglected, namely how to migrate and what to migrate, i.e., the migration mechanism that reduces synchronization overhead and result delay during migration, and the selection of the optimal task assignment that minimizes migration costs. Consequently, migration in current systems typically incurs a high spike in result delay caused by expensive synchronization barriers and suboptimal task assignments. Motivated by this, we present the first comprehensive study on efficient operator states migration, and propose designs and algorithms that enable live, progressive, and optimized migrations. Extensive experiments using real data justify our performance claims.
منابع مشابه
Resource Allocation Strategies for In-Network Stream Processing
In this paper we consider the operator mapping problem for in-network stream processing applications. In-network stream processing consists in applying a tree of operators in steady-state to multiple data objects that are continually updated at various locations on a network. Examples of in-network stream processing include the processing of data in a sensor network, or of continuous queries on...
متن کاملLatency-aware Elastic Scaling for Distributed Data Stream Processing
Elastic scaling allows a data stream processing system to react to a dynamically changing query or event workload by automatically scaling in or out. Thereby, both unpredictable load peaks as well as underload situations can be handled. However, each scaling decision comes with a latency penalty due to the required operator movements. Therefore, in practice an elastic system might be able to im...
متن کاملScalable and Fault-tolerant Stateful Stream Processing
As users of “big data” applications expect fresh results, we witness a new breed of stream processing systems (SPS) that are designed to scale to large numbers of cloud-hosted machines. Such systems face new challenges: (i) to benefit from the “pay-as-you-go” model of cloud computing, they must scale out on demand, acquiring additional virtual machines (VMs) and parallelising operators when the...
متن کاملElasticutor: Rapid Elasticity for Realtime Stateful Stream Processing
Elasticity is highly desirable for stream processing systems to guarantee low latency against workload dynamics, such as surges in data arrival rate and fluctuations in data distribution. Existing systems achieve elasticity following a resource-centric approach that uses dynamic key partitioning across the parallel instances, i.e. executors, to balance the workload and scale operators. However,...
متن کاملElastic and Secure Energy Forecasting in Cloud Environments
Although cloud computing offers many advantages with regards to adaption of resources, we witness either a strong resistance or a very slow adoption to those new offerings. One reason for the resistance is that (i) many technologies such as stream processing systems still lack of appropriate mechanisms for elasticity in order to fully harness the power of the cloud, and (ii) do not provide mech...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1501.03619 شماره
صفحات -
تاریخ انتشار 2015