نتایج جستجو برای: apache spark

تعداد نتایج: 18089  

2016
Diego García-Gil Sergio Ramírez-Gallego Salvador García Francisco Herrera

*Correspondence: [email protected] 1Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), University of Granada, Calle Periodista Daniel Saucedo Aranda, 18071 Granada, Spain Full list of author information is available at the end of the article Abstract The large amounts of data have created a need for new fram...

2016
Timo Bingmann Michael Axtmann Emanuel Jöbstl Sebastian Lamm Huyen Chau Nguyen Alexander Noe Sebastian Schlag Matthias Stumpp Tobias Sturm Peter Sanders

We present the design and a first performance evaluation of Thrill – a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C++ which enables performance advantages due to direct native code compilation, a more...

2015
Martin Maas Timothy L. Harris Krste Asanovic John Kubiatowicz

Cloud systems such as Hadoop, Spark and Zookeeper are frequently written in Java or other garbage-collected languages. However, GC-induced pauses can have a significant impact on these workloads. Specifically, GC pauses can reduce throughput for batch workloads, and cause high tail-latencies for interactive applications. In this paper, we show that distributed applications suffer from each node...

2015
Ahsan Javed Awan Mats Brorsson Vladimir Vlassov Eduard Ayguadé

Sheer increase in volume of data over the last decade has triggered research in cluster computing frameworks that enable web enterprises to extract big insights from big data. While Apache Spark is gaining popularity for exhibiting superior scale-out performance on the commodity machines, the impact of data volume on the performance of Spark based data analytics in scale-up configuration is not...

2015
Matteo Interlandi Kshitij Shah Sai Deep Tetali Muhammad Ali Gulzar Seunghyun Yoo Miryung Kim Todd D. Millstein Tyson Condie

Debugging data processing logic in Data-Intensive Scalable Computing (DISC) systems is a difficult and time consuming effort. Today's DISC systems offer very little tooling for debugging programs, and as a result programmers spend countless hours collecting evidence (e.g., from log files) and performing trial and error debugging. To aid this effort, we built Titian, a library that enables data ...

2017
Abbas Rehman Ali Abbas Muhammad Atif Sarwar Javed Ferzund

Next Generation Sequencing has resulted in the generation of large number of omics data at a faster speed that was not possible before. This data is only useful if it can be stored and analyzed at the same speed. Big Data platforms and tools like Apache Hadoop and Spark has solved this problem. However, most of the algorithms used in bioinformatics for Pairwise alignment, Multiple Alignment and...

2016
Michael F. Ringenburg Shuxia Zhang Kristyn J. Maschhoff Bill Sparks Evan Racah

This paper describes an investigation of the performance characteristics of high performance data analytics (HPDA) workloads on the Cray XC40TM, with a focus on commonly-used open source analytics frameworks like Apache Spark. We look at two types of Spark workloads: the Spark benchmarks from the Intel HiBench 4.0 suite and a CX matrix decomposition algorithm. We study performance from both the...

Journal: :CoRR 2015
Zubair Nabi

The synergy between Big Data and Open Data has the potential to revolutionize information access in the developing world. Following this mantra, we present the analysis of more than a decade worth of open judgements and orders from the Supreme Court of Pakistan. Our overarching goal is to discern the presence of judicial activism in the country in the wake of the Lawyers’ Movement. Using Apache...

2016
Miguel Nuñez-del-Prado Edgardo Bravo Miguel Sierra Isaias Hoyos Miguel Canchay

In the present effort, we present a knowledge tier platform to collect information from cities in a form of graphs. This platform enables people to share the information of the area where they live allowing them to inform about pollution, crime levels, traffic jams, streets topology, commerces, markets, etc. The main objective is to provide information, stored in Elastic about a city to find sp...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید