Computing the throughput of probabilistic streaming applications with replication
نویسندگان
چکیده
In this paper, we investigate how to compute the throughput of probabilistic streaming applications. Given a streaming application whose dependence graph is a linear chain, a fully heterogeneous target platform, and a one-to-one mapping of the application onto the platform (a processor is assigned at most one application stage), how can we compute the throughput of the application, i.e., the rate at which data sets can be processed? The problem is easy when workflow stages are not replicated, i.e., assigned to a single processor: in that case the throughput is dictated by the critical hardware resource. However, when stages are replicated, i.e., assigned to several processors, the problem becomes surprisingly complicated. Even in the case when execution and communication times are deterministic, there are examples where the optimal period (i.e., the inverse of the throughput) is larger than the largest cycle-time of any resource. We model the problem as a timed Petri net to compute the optimal throughput in the general case, and we show how the throughput is impacted when execution and communication times are no longer deterministic, but follow some random variable laws. Finally, we prove that the problem of finding a one-to-one mapping with replication which maximizes the throughput is NP-complete, even with no communication costs.
منابع مشابه
Data Replication-Based Scheduling in Cloud Computing Environment
Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...
متن کاملImproving Mobile Grid Performance Using Fuzzy Job Replica Count Determiner
Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common computational platform. Mobile Computing is a Generic word that introduces using of movable, handheld devices with wireless communication, for processing data. Mobile Computing focused on providing access to data, information, services and communications anywhere an...
متن کاملImproving Mobile Grid Performance Using Fuzzy Job Replica Count Determiner
Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common computational platform. Mobile Computing is a Generic word that introduces using of movable, handheld devices with wireless communication, for processing data. Mobile Computing focused on providing access to data, information, services and communications anywhere an...
متن کاملE2DR: Energy Efficient Data Replication in Data Grid
Abstract— Data grids are an important branch of gird computing which provide mechanisms for the management of large volumes of distributed data. Energy efficiency has recently emerged as a hot topic in large distributed systems. The development of computing systems is traditionally focused on performance improvements driven by the demand of client's applications in scientific and business domai...
متن کاملExploiting Throughput for Pipeline Execution in Streaming Image Processing Applications
There is a large range of image processing applications that act on an input sequence of image frames that are continuously received. Throughput is a key performance measure to be optimized when executing them. In this paper we propose a new task replication methodology for optimizing throughput for an image processing application in the field of medicine. The results show that by applying the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010