In Search of I/O-Optimal Recovery from Disk Failures
نویسندگان
چکیده
We address the problem of minimizing the I/O needed to recover from disk failures in erasure-coded storage systems. The principal result is an algorithm that finds the optimal I/O recovery from an arbitrary number of disk failures for any XOR-based erasure code. We also describe a family of codes with high-fault tolerance and low recovery I/O, e.g. one instance tolerates up to 11 failures and recovers a lost block in 4 I/Os. While we have determined I/O optimal recovery for any given code, it remains an open problem to identify codes with the best recovery properties. We describe our ongoing efforts toward characterizing space overhead versus recovery I/O tradeoffs and generating codes that realize these bounds.
منابع مشابه
S-Code: Lowest Density MDS Array Codes for RAID-6
RAID, a storage architecture designed to exploit I/O parallelism and provide data reliability, has been deployed widely in computing systems as a storage building block. In large scale storage systems, in particular, RAID-6 is gradually replacing RAID-5 as the dominant form of disk arrays due to its capability of tolerating concurrent failures of any two disks. MDS (maximum distance separable) ...
متن کاملFlat Datacenter Storage
Flat Datacenter Storage (FDS) is a high-performance, fault-tolerant, large-scale, locality-oblivious blob store. Using a novel combination of full bisection bandwidth networks, data and metadata striping, and flow control, FDS multiplexes an application’s large-scale I/O across the available throughput and latency budget of every disk in a cluster. FDS therefore makes many optimizations around ...
متن کاملRAMCube: Exploiting Network Proximity for RAM-Based Key-Value Store
Disk-based storage is becoming increasingly problematic in meeting the needs of large-scale cloud applications. Recently RAM-based storage is proposed by aggregating the RAM of thousands of commodity servers in data center networks (DCN). These studies focus on improving performance with high throughput I/O, low latency RPC and fast failure recovery. RAM-based storage brings great DCN-related c...
متن کاملHierarchical RAID: Design, performance, reliability, and recovery
Hierarchical RAID (HRAID) extends the RAID paradigm to mask the failure of whole Storage Nodes (SNs) or bricks, where each SN is a disk array with a certain RAID level. HRAIDk/l with N SNs and M disks per SN tolerates k SN failures and l disk failures per SN withMaximum Distance Separable (MDS) erasure codes, which introduce the minimum level of redundancy at each level. For N = M there are k i...
متن کاملUsing Disk Add-Ons to Withstand Simultaneous Disk Failures with Fewer Replicas
Contemporary storage systems that utilize replication often maintain more than two replicas of each data item, reducing the risk of permanent data loss due to simultaneous disk failures. The price of the additional copies is smaller usable storage space, increased network traffic, and higher power consumption. We propose to alleviate this problem with SIMFAIL, a storage system that maintains on...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011