apache spark

نتایج جستجو برای: apache spark

تعداد نتایج: 18089 فیلتر نتایج به سال:

Apache Spark and Apache Kafka at the Rescue of Distributed RDF Stream Processing Engines

2016

Xiangnan Ren Olivier Curé Houda Khrouf Zakia Kazi-Aoul Yousra Chabchoub

Due to the growing need to timely process and derive valuable information and knowledge from data produced in the Semantic Web, RDF stream processing (RSP) has emerged as an important research domain. In this paper, we describe the design of an RSP engine that is built upon state of the art Big Data frameworks, namely Apache Kafka and Apache Spark. Together, they support the implementation of a...

متن کامل

FITS Data Source for Apache Spark

Journal: :Computing and Software for Big Science 2018

متن کامل

Big data analytics on Apache Spark

Journal: :International Journal of Data Science and Analytics 2016

متن کامل

Kira: Processing Astronomy Imagery Using Big Data Technology

2016

Zhao Zhang Kyle Barbary Frank Austin Nothaft Evan R. Sparks Oliver Zahn Michael J. Franklin David A. Patterson Saul Perlmutter

Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end dataflow of single-process programs is known as a many-task application. Typically, HPC tools are used to parallelize these analyses. In this work, we investigate an alternate approach that uses Apache Spark—a modern platform for data intensive computing—to parallelize many-task applications. We...

متن کامل

InferSpark: Statistical Inference at Scale

Journal: :CoRR 2015

Zhuoyue Zhao Eric Lo Kenny Q. Zhu Chris Liu

The Apache Spark stack has enabled fast large-scale data processing. Despite a rich library of statistical models and inference algorithms, it does not give domain users the ability to develop their own models. The emergence of probabilistic programming languages has showed the promise of developing sophisticated probabilistic models in a succinct and programmatic way. These frameworks have the...

متن کامل

Conquering Big Data with Spark

2015

Ion Stocia

Today, big and small organizations alike collect huge amounts of data, and they do so with one goal in mind: extract "value" through sophisticated exploratory analysis, and use it as the basis to make decisions as varied as personalized treatment and ad targeting. To address this challenge, we have developed Berkeley Data Analytics Stack (BDAS), an open source data analytics stack for big data ...

متن کامل

Ddup - towards a deduplication framework utilising apache spark

2015

Niklas Wilcke

This paper is about a new framework called DeduPlication (DduP). DduP aims to solve large scale deduplication problems on arbitrary data tuples. DduP tries to bridge the gap between big data, high performance and duplicate detection. At the moment a first prototype exists but the overall project status is work in progress. DduP utilises the promising successor of Apache Hadoop MapReduce [Had14]...

متن کامل

Large-scale virtual screening on public cloud resources with Apache Spark

2017

Marco Capuccini Laeeq Ahmed Wesley Schaal Erwin Laure Ola Spjuth

BACKGROUND Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure ra...

متن کامل

SparkTrails: A MapReduce Implementation of HypTrails for Comparing Hypotheses About Human Trails

2016

Martin Becker Hauke Mewes Andreas Hotho Dimitar Dimitrov Florian Lemmerich Markus Strohmaier

HypTrails is a bayesian approach for comparing different hypotheses about human trails on the web. While a standard implementation exists, it exposes performance issues when working with large-scale data. In this paper, we propose a distributed implementation of HypTrails based on Apache Spark taking advantage of several structural properties inherent to HypTrails. The performance improves subs...

متن کامل

Streaming Twitter Data Analysis Using Spark for Effective Job Search

2015

LEKHA R. NAIR SUJALA D. SHETTY

Near real time Big Data from social network sites like Twitter or Facebook has been an interesting source for analytics by researchers in recent years owing to various factors including its up-to-date-ness, availability and popularity, though there may be a compromise in genuineness or accuracy. Apache Spark, the trendy big data processing engine that offers faster solutions compared to Hadoop,...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید