نتایج جستجو برای: spark assisted performance
تعداد نتایج: 1176099 فیلتر نتایج به سال:
The aim of this research is to study the influence of using ethanol-gasoline fuel blends on spark ignition engine performance characteristics and compare the obtained results with those using base gasoline. An electric generator operated by a SI (spark ignition, four stroke, single cylinder, air cooled) engine was used for conducting this study. The tested fuels were gasoline (E0) and gasoline-...
Java 8 has introduced new capabilities such as lambda expressions and streams which simplify data-parallel computing. However, as a base language for Big Data systems, it still lacks a number of important capabilities such as processing very large datasets and distributing the computation over multiple machines. This paper gives an overview of the Java 8 Streams API and proposes extensions to a...
A survey of the development of the spark chamber from the spark counter is given. An effort has been made to clarify the early history of the device. Some unpublished recent Russian work on spark chambers is summarized. A consistent terminology for the various times relevant to spark chamber operation is suggested.
Given a signal S ∈ RN and a full rank matrix D ∈ RN×L with N < L, we define the signal’s overcomplete representations as all α ∈ RL satisfying S = Dα. Among all the possible solutions, we have special interest in the sparsest one – the one minimizing ‖α‖0. Previous work has established that a representation is unique if it is sparse enough, requiring ‖α‖0 < Spark(D)/2. The measure Spark(D) stan...
The Berkeley Data Analytics Stack (BDAS) is an emerging framework for big data analytics. It consists of the Spark analytics framework, the Tachyon in-memory filesystem, and the Mesos cluster manager. Spark was designed as an in-memory replacement for Hadoop that can in some cases improve performance by up to 100X. In this paper, we describe our experiences running BDAS on the new Cray Urika-XA...
Data partitioning significantly improves query performance in distributed database systems. A large number of techniques have been proposed to efficiently partition a dataset, often focusing on finding the best partitioning for a particular query workload. However, many modern analytic applications involve ad-hoc or exploratory analysis where users do not have a representative query workload. F...
Data frames in scripting languages are essential abstractions for processing structured data. However, existing data frame solutions are either not distributed (e.g., Pandas in Python) and therefore have limited scalability, or they are not tightly integrated with array computations (e.g., Spark SQL). This paper proposes a novel compiler-based approach where we integrate data frames into the Hi...
The k-Nearest Neighbors classifier is a simple yet effective widely renowned method in data mining. The actual application of this model in the big data domain is not feasible due to time and memory restrictions. Several distributed alternatives based on MapReduce have been proposed to enable this method to handle large-scale data. However, their performance can be further improved with new des...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید