Tupleware: Distributed Machine Learning on Small Clusters
نویسندگان
چکیده
There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the challenges of the Googles and Facebooks of the world— petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes, analyze relatively small datasets of up to several terabytes in size, and perform primarily compute-intensive operations. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes our vision for the design of Tupleware, a new system specifically aimed at performing complex analytics (e.g., distributed machine learning) on small clusters. Tupleware’s architecture brings together ideas from the database and compiler communities to create a powerful end-to-end solution for data analysis. Our preliminary results show orders of magnitude performance improvement over alternative systems.
منابع مشابه
Tupleware: "Big" Data, Big Analytics, Small Clusters
There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the challenges of the Googles and Facebooks of the world— processing petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users analyze relatively small datasets of up to sev...
متن کاملTupleware: A Distributed Tuple Space for the Development and Execution of Array-based Applications in a Cluster Computing Environment
This thesis describes Tupleware, an implementation of a distributed tuple space which acts as a scalable and efficient cluster middleware for computationally intensive numerical and scientific applications. Tupleware is based on the Linda coordination language (Gelernter 1985), and incorporates additional techniques such as peer-to-peer communications and exploitation of data locality in order ...
متن کاملTupleware: Redefining Modern Analytics
There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world—petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few ...
متن کاملStock Price Prediction using Machine Learning and Swarm Intelligence
Background and Objectives: Stock price prediction has become one of the interesting and also challenging topics for researchers in the past few years. Due to the non-linear nature of the time-series data of the stock prices, mathematical modeling approaches usually fail to yield acceptable results. Therefore, machine learning methods can be a promising solution to this problem. Methods: In this...
متن کاملAn Architecture for Compiling UDF-centric Workflows
Data analytics has recently grown to include increasingly sophisticated techniques, such as machine learning and advanced statistics. Users frequently express these complex analytics tasks as workflows of user-defined functions (UDFs) that specify each algorithmic step. However, given typical hardware configurations and dataset sizes, the core challenge of complex analytics is no longer sheer d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Data Eng. Bull.
دوره 37 شماره
صفحات -
تاریخ انتشار 2014