Inter-Operator Feedback in Data Stream Management Systems via Punctuation
نویسندگان
چکیده
High-volume, high-speed data streams may overwhelm the capabilities of stream processing systems; techniques such as data prioritization, avoidance of unnecessary processing and ondemand result production may be necessary to reduce processing requirements. However, the dynamic nature of data streams, in terms of both rate and content, makes the application of such techniques challenging. Such techniques have been addressed in the context of static and centralized query optimization; however, they have not been fully addressed for data-stream management systems. In this work, we present a comprehensive framework designed to support prioritization, avoidance of unnecessary work, and on-demand result production over distributed, unreliable, bursty, disordered data sources, typical of many streams. We propose a form of inter-operator feedback, which flows against the stream direction, to communicate the information needed to enable execution of these techniques. This feedback leverages punctuations to describe the subsets of interest. We identify potential sources of feedback information, characterize new types of punctuation to support feedback, and describe the roles of producers, exploiters, and relayers of feedback that query operators may implement. We also present initial experimental observations using the NiagaraST data-stream system.
منابع مشابه
A Heartbeat Mechanism and Its Application in Gigascope
Data stream management systems often rely on ordering properties of tuple attributes in order to implement non-blocking operators. However, query operators that work with multiple streams, such as stream merge or join, can often still block if one of the input stream is very slow or bursty. In principle, punctuation and heartbeat mechanisms have been proposed to unblock streaming operators. In ...
متن کاملA Quality-Centric Data Model for Distributed Stream Management Systems
It is challenging for large-scale stream management systems to return always perfect results when processing data streams originating from distributed sources. Data sources and intermediate processing nodes may fail during the lifetime of a stream query. In addition, individual nodes may become overloaded due to processing demands. In practice, users have to accept incomplete or inaccurate quer...
متن کاملUsing Control Theory to Guide Load Shedding in Medical Data Stream Management System
The load shedding problem is vital to a Data Stream Management System (DSMS). This paper presents the design, implementation, and evaluation of a load shedding method under the guide of the feedback control theory, in order to solve practical problems in medical environment. Thus, the using of operator selectivity, which has been proven not stable enough, is avoided. This paper focuses on the r...
متن کاملLSTM for punctuation restoration in speech transcripts
The output of automatic speech recognition systems is generally an unpunctuated stream of words which is hard to process for both humans and machines. We present a two-stage recurrent neural network based model using long short-term memory units to restore punctuation in speech transcripts. In the first stage, textual features are learned on a large text corpus. The second stage combines textua...
متن کاملExploiting Punctuation Semantics in Data Streams
Applications that process data streams are becoming common: financial applications process streams of stock ticker data; telephone network monitoring applications process streams of call data. These applications often are queries over streams, so it seems natural to use a database management system instead of a custom application. However, some traditional relational operators are not conducive...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/0909.2062 شماره
صفحات -
تاریخ انتشار 2009