Optimizing Hadoop* Deployments
ثبت نشده
چکیده
Intel is a major contributor to open source initiatives, such as Linux*, Apache*, and Xen*, and has also devoted resources to Hadoop analysis, testing, and performance characterizations, both internally and with fellow travelers such as HP and Cloudera. Through these technical efforts, Intel has observed many practical trade-offs in hardware, software, and system settings that have real-world impacts. EXECUTIVE SUMMARY This paper provides guidance, based on extensive lab testing conducted with Hadoop* at Intel, to organizations as they make key choices in the planning stages of Hadoop deployments. It begins with best practices for establishing server hardware specifications, helping architects choose optimal combinations of components. Next, it discusses the server software environment, including choosing the OS and version of Hadoop. Finally, it introduces some configuration and tuning advice that can help improve results in Hadoop environments.
منابع مشابه
ALOJA: A Benchmarking and Predictive Platform for Big Data Performance Analysis
The main goals of the ALOJA research project from BSCMSR, are to explore and automate the characterization of cost-effectiveness of Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and webbased analytic tools to gather insights about system’s cos...
متن کاملMulti-Agent Distributed Adaptive Resource Allocation (MADARA)
The component placement problem involves mapping a component to a particular location and maximizing component utility in grid and cloud systems. It is also an NP hard resource allocation and deployment problem, so many common grid and cloud computing libraries, such as MPICH and Hadoop, do not address this problem, even though large performance gains can occur by optimizing communications betw...
متن کاملMochi: Visual Log-Analysis Based Tools for Debugging Hadoop
Mochi, a new visual, log-analysis based debugging tool correlates Hadoop’s behavior in space, time and volume, and extracts a causal, unified controland dataflow model of Hadoop across the nodes of a cluster. Mochi’s analysis produces visualizations of Hadoop’s behavior using which users can reason about and debug performance issues. We provide examples of Mochi’s value in revealing a Hadoop jo...
متن کاملALOJA: A Framework for Benchmarking and Predictive Analytics in Big Data Deployments
This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010