Interacting with Large Distributed Datasets Using Sketch
نویسندگان
چکیده
We present Sketch, a library and a distributed runtime for building interactive tools for exploring large datasets, distributed across multiple machines. We have built several sophisticated applications using this framework; in this paper we describe a billion-row spreadsheet, and a distributed-systems performance analyzer. Sketch applications allow interactive and responsive exploration of complex distributed datasets, scaling effectively to take advantage of large computational resources.
منابع مشابه
Bias-Aware Sketches
Count-Sketch [6] and Count-Median [11] are two widely used sketching algorithms for processing large-scale distributed and streaming datasets, such as finding frequent elements, computing frequency moments, performing point queries, etc. The errors of Count-Sketch and Count-Median are expressed in terms of the sum of coordinates of the input vector excluding those largest ones, or, the mass on ...
متن کاملScaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch
Graph-based Semi-supervised learning (SSL) algorithms have been successfully used in a large number of applications. These methods classify initially unlabeled nodes by propagating label information over the structure of graph starting from seed nodes. Graph-based SSL algorithms usually scale linearly with the number of distinct labels (m), and require O(m) space on each node. Unfortunately, th...
متن کاملClassification of Photo and Sketch Images Using Convolutional Neural Networks
In this study we propose a Convolutional Neural Network(CNN) which can classify hand drawn sketch images. Though CNN is known to be very effective on classification of realistic images, there are few studies on CNN dealing with nonphotorealistic images and even more images those types are mixing. Classifying non-photorealistic images is difficult mainly because there are no large datasets. In t...
متن کاملLarge Scale Distributed Semi-Supervised Learning Using Streaming Approximation
Traditional graph-based semi-supervised learning (SSL) approaches, even though widely applied, are not suited for massive data and large label scenarios since they scale linearly with the number of edges |E| and distinct labels m. To deal with the large label size problem, recent works propose sketch-based methods to approximate the distribution on labels per node thereby achieving a space redu...
متن کاملAn Overview of Data Privacy in Multi-Agent Learning Systems
Public and private sector entities continuously produce, store, and transact in large amounts of data. However, combined with the growth of the internet, such datasets get stored and accessed on multiple devices, locations, and across the globe. Therefore, the necessity for autonomous agents that can learn across distributed systems to extract knowledge from large datasets while at the same tim...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016