DataGarage: Warehousing Massive Amounts of Performance Data on Commodity Servers

نویسندگان

  • Charles Loboz
  • Slawek Smyl
  • Suman Nath
چکیده

Contemporary datacenters house tens of thousands of servers. The servers are closely monitored for operating conditions and utilizations by collecting their performance data (e.g., CPU utilization). In this paper, we show that existing database and file-system solutions are not suitable for warehousing performance data collected from a large number of servers because of the scale and the complexity of performance data. We describe the design and implementation of DataGarage, a performance data warehousing system that we have developed at Microsoft. DataGarage is a hybrid solution that combines benefits of DBMSs, file-systems, and MapReduce systems to address unique requirements of warehousing performance data. We describe how DataGarage allows efficient storage and analysis of years of historical performance data collected from hundreds of thousands of servers—on commodity servers. We also report DataGarage’s performance on a real dataset and a 32-node, 256-core shared-nothing cluster and our experience of using DataGarage at Microsoft for the last nine months.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DataGarage: Warehousing Massive Performance Data on Commodity Servers

Contemporary datacenters house tens of thousands of servers. The servers are closely monitored for operating conditions and utilizations by collecting their performance data (e.g., CPU utilization). In this paper, we show that existing database and file-system solutions are not suitable for warehousing performance data collected from a large number of servers because of the scale and the comple...

متن کامل

Mist : Efficient Dissemination of Erasure-coded Data in Data Centers

Data centers store a massive amount of data in a large number of servers built by commodity hardware. To maintain data integrity against server failures, erasure codes have been extensively deployed in modern data centers to provide a higher level of failure tolerance with less storage overhead than replication. Yet, compared to replication, disseminating erasure-coded data from a source server...

متن کامل

Serving cartography raster data in the Internet, a performance study

INTRODUCTION Map servers are applications oriented to show geographic information in a web site. These pieces of software cooperate with other applications as web servers, geographic enabled databases, etc. On the other hand, the evolution of the massive storage of images and compression algorithms allow acceding to huge amounts of geographic raster information faster and more efficient than fe...

متن کامل

Data warehousing with Oracle

With the emergence of data warehousing, Decision Support Systems have evolved to its best. At the core of these warehousing systems lies a good database management system. Database server, used for data warehousing, is responsible to provide robust data management, scalability, high performance query processing and integration with other servers. Oracle being the initiator in warehousing server...

متن کامل

Understanding Dimension Volatility in Data Warehouses ( or Bin There Done That )

Introduction Data warehousing has become an increasingly important technology in many organizations, integrating disparate sources of data for decision-making, planning, and policy formulation. Data warehousing applications can be sources of competitive advantage. Even if the raw data is widely available, the strategies for integrating, analyzing, and acting on the information can be differenti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010