Homomorphic Self-repairing Codes for Agile Maintenance of Distributed Storage Systems
نویسندگان
چکیده
Distributed data storage systems are essential to deal with the need to store massive volumes of data. In order to make such a system fault-tolerant, some form of redundancy becomes crucial. There are various overheads that are incurred due to such redundancy most prominent ones being overheads in terms of storage space and maintenance bandwidth requirements. Erasure codes provide a storage efficient alternative to replication based redundancy in storage systems. They however entail high communication overhead for maintenance in a networked setting, when some of the encoded fragments are lost due to failure of storage devices and need to be replenished in new ones. Such overheads arise from the fundamental need in storage systems to recreate (or keep separately) first a copy of the whole object before any individual encoded fragment can be generated and replenished. Traditional erasure codes, originally designed for communication over lossy channels, are optimized for recreation of the original message (object), but not for regeneration of individual lost encoded parts. We propose as an alternative a new family of erasure codes called self-repairing codes (SRC) taking into account the peculiarities of distributed storage systems, specifically to improve the maintenance process. SRC has the following salient features: (a) encoded fragments can be repaired directly from other subsets of encoded fragments by downloading less data than the size of the complete object, ensuring that (b) a fragment is repaired from a fixed number of encoded fragments, the number depending only on how many encoded blocks are missing and independent of which specific blocks are missing. This paper lays the foundations by defining the novel self-repairing codes, elaborating why the defined characteristics are desirable for distributed storage systems. Then a concrete family of such code, namely, homomorphic self-repairing codes (HSRC) are proposed and various aspects and properties of the same are studied in detail and compared quantitatively or qualitatively (as may be suitable) with respect to other codes including traditional erasure codes as well as other recent codes designed specifically for storage applications.
منابع مشابه
Self-repairing Codes Local Repairability for Cheap & Fast Maintenance of Erasure Coded Data
Networked distributed data storage systems are essential to deal with the needs of storing massive volumes of data. Dependability of such a system relies on its fault tolerance (data should be available in case of node failures) as well as its maintainability (its ability to repair lost data to ensure redundancy replenishment over time). Erasure codes provide a storage efficient alternative to ...
متن کاملA Non-MDS Erasure Code Scheme for Storage Applications
This paper investigates the use of redundancy and self repairing against node failures indistributed storage systems using a novel non-MDS erasure code. In replication method, accessto one replication node is adequate to reconstruct a lost node, while in MDS erasure codedsystems which are optimal in terms of redundancy-reliability tradeoff, a single node failure isrepaired after recovering the ...
متن کاملHybrid Regenerating Codes for Distributed Storage Systems
Distributed storage systems are mainly justified due to their ability to store data reliably over some unreliable nodes such that the system can have long term durability. Recently, regenerating codes are proposed to make a balance between the repair bandwidth and the storage capacity per node. This is achieved through using the notion of network coding approach. In this paper, a new variation ...
متن کاملSelf-Repairing Codes for distributed storage - A projective geometric construction
Self-Repairing Codes (SRC) are codes designed to suit the need of coding for distributed networked storage: they not only allow stored data to be recovered even in the presence of node failures, they also provide a repair mechanism where as little as two live nodes can be contacted to regenerate the data of a failed node. In this paper, we propose a new instance of self-repairing codes, based o...
متن کاملRepairing Erasure Codes
Distributed storage systems introduce redundancy to increase reliability. When erasure coding is used, the exact repair problem arises: if a node storing encoded information fails, in order to maintain the same level of reliability we need to create encoded information at a new node. This amounts to a partial recovery of the code, whereas conventional erasure coding focuses on the complete reco...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1107.3129 شماره
صفحات -
تاریخ انتشار 2011