Distributed Caching Using the HTCondor CacheD
نویسندگان
چکیده
A batch processing job in a distributed system has three clear steps, stage-in, execution, and stage-out. As data sizes have increased, the stage-in time has also increased. In order to optimize stage-in time for shared inputs, we propose the CacheD, a caching mechanism for high throughput computing. Along with caching on worker nodes for rapid transfers, we also introduce a novel transfer method to distribute shared caches to multiple worker nodes utilizing BitTorrent. We show that our caching method significantly improves workflow completion times by minimizing stage-in time while being non-intrusive to the computational resources, allowing for opportunistic resources to utilize this caching method.
منابع مشابه
On the Design of Scalable Peer-to-Peer Video Caching
Peer-to-Peer (P2P) video caching is a promising approach to accommodate asynchronous requests from cached content at individual peers. However, coherently managing a distributed, heterogeneous, dynamic and potentially large scale cache space is a challenging task. In particular, a key challenge is to effectively control the number of cached copies for popular streams in order to accommodate the...
متن کاملCS380L Project Writeup: Distributed Completion Service
Task parallelism is difficult to implement in a distributed setting due to machine unreliability and communication latency. HTCondor, an existing distributed computation framework, is insufficient for addressing these shortcomings. In this report, we present a high level abstraction built on top of HTCondor called the Distributed Completion Service (DCS). The DCS uses multiple different methods...
متن کاملData Suciency for Queries on Cache Internal Accession Date Only
In distributed computing environments, replication of data provides improved availability, isolation between workloads with di erent characteristics, and improved performance through local access to data. The \real data" is server resident and by \local data" we refer to cached client data. We examine which data should be cached on behalf of a cached query. The minimum requirement for cached da...
متن کاملCaching schemes for DCOP search algorithms
Distributed Constraint Optimization (DCOP) is useful for solving agent-coordination problems. Any-space DCOP search algorithms require only a small amount of memory but can be sped up by caching information. However, their current caching schemes do not exploit the cached information when deciding which information to preempt from the cache when a new piece of information needs to be cached. Ou...
متن کاملData Su ciency for Queries on Cache
In distributed computing environments, replication of data provides improved availability, isolation between workloads with di erent characteristics, and improved performance through local access to data. The \real data" is server resident and by \local data" we refer to cached client data. We examine which data should be cached on behalf of a cached query. The minimum requirement for cached da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015