Insert - aware Partitioning and Indexing Techniques For Skewed Database Workloads
نویسندگان
چکیده
Many data-intensive websites are characterized by a dataset that grows much faster than the rate that users access the data and possibly high insertion rates. In such systems, the growing size of the dataset leads to a larger overhead for maintaining and accessing indexes even while the query workload becomes increasingly skewed. Additionally, the database index update costs can be a non-trivial proportion of the overall system cost. Shinobi introduces a cost model that takes index update costs account, and proposes database design algorithms that optimally partition tables and drop indexes from partitions that are not queried often, and that maintain these partitions as workloads change. We show a 60x performance improvement over traditionally indexed tables using a real-world query workload derived from a traffic monitoring application and over 8 x improvement for a Wikipedia workload. Thesis Supervisor: Samuel Madden Title: Associate Professor of Electrical Engineering and Computer Science
منابع مشابه
Scaling transactional workloads on the cloud
In this paper, we address the problem of transparently scaling out transactional (OLTP) workloads on relational databases, to support database-as-a-service in cloud computing environment. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availabi...
متن کاملSlalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing
The constant flux of data and queries alike has been pushing the boundaries of data analysis systems. The increasing size of raw data files has made data loading an expensive operation that delays the data-to-insight time. Hence, recent in-situ query processing systems operate directly over raw data, alleviating the loading cost. At the same time, analytical workloads have increasing number of ...
متن کاملImproved Content Aware Image Retargeting Using Strip Partitioning
Based on rapid upsurge in the demand and usage of electronic media devices such as tablets, smart phones, laptops, personal computers, etc. and its different display specifications including the size and shapes, image retargeting became one of the key components of communication technology and internet. The existing techniques in image resizing cannot save the most valuable information of image...
متن کاملIndexing Highly Dynamic Hierarchical Data
Maintaining and querying hierarchical data in a relational database system is an important task in many business applications. This task is especially challenging when considering dynamic use cases with a high rate of complex, possibly skewed structural updates. Labeling schemes are widely considered the indexing technique of choice for hierarchical data, and many different schemes have been pr...
متن کاملGraphTwist: Fast Iterative Graph Computation with Two-tier Optimizations
Large-scale real-world graphs are known to have highly skewed vertex degree distribution and highly skewed edge weight distribution. Existing vertex-centric iterative graph computation models suffer from a number of serious problems: (1) poor performance of parallel execution due to inherent workload imbalance at vertex level; (2) inefficient CPU resource utilization due to short execution time...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010