apache spark

نتایج جستجو برای: apache spark

تعداد نتایج: 18089 فیلتر نتایج به سال:

SPARQL query processing with Apache Spark

Journal: :CoRR 2016

Hubert Naacke Olivier Curé Bernd Amann

The number and the size of linked open data graphs keep growing at a fast pace and confronts semantic RDF services with problems characterized as Big data. Distributed query processing is one of them and needs to be efficiently addressed with execution guaranteeing scalability, high availability and fault tolerance. RDF data management systems requiring these properties are rarely built from sc...

متن کامل

Benchmarking Apache Spark with Machine Learning Applications

2016

Jinliang Wei Jin Kyu Kim Garth A. Gibson

We benchmarked Apache Spark with a popular parallel machine learning training application, Distributed Stochastic Gradient Descent for Matrix Factorization [5] and compared the Spark implementation with alternative approaches for communicating model parameters, such as scheduled pipelining using POSIX socket or MPI, and distributed shared memory (e.g. parameter server [13]). We found that Spark...

متن کامل

Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark

Journal: :Mathematics 2022

Access plan recommendation is a query optimization approach that executes new queries using prior created execution plans (QEPs). The optimizer divides the space into clusters in mentioned method. However, traditional clustering algorithms take significant amount of time for such large datasets. MapReduce distributed computing model provides efficient solutions storing and processing vast quant...

متن کامل

SpaRC: scalable sequence clustering using Apache Spark

Journal: :Bioinformatics 2018

متن کامل

Efficient Distributed SPARQL Queries on Apache Spark

Journal: :International Journal of Advanced Computer Science and Applications 2019

متن کامل

Real-time Text Analytics Pipeline Using Open-source Big Data Tools

Journal: :CoRR 2017

Hassan Nazeer Waheed Iqbal Fawaz S. Bokhari Faisal Bukhari Shuja Ur Rehman Baig

Real-time text processing systems are required in many domains to quickly identify patterns, trends, sentiments, and insights. Nowadays, social networks, e-commerce stores, blogs, scientific experiments, and server logs are main sources generating huge text data. However, to process huge text data in real time requires building a data processing pipeline. The main challenge in building such pip...

متن کامل

SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters

Journal: :CoRR 2015

Oren Segal Philip Colangelo Nasibeh Nasiri Zhuo Qian Martin Margala

We introduce SparkCL, an open source unified programming framework based on Java, OpenCL and the Apache Spark framework. The motivation behind this work is to bring unconventional compute cores such as FPGAs/GPUs/APUs/DSPs and future core types into mainstream programming use. The framework allows equal treatment of different computing devices under the Spark framework and introduces the abilit...

متن کامل

Bridging the Gap between HPC and Big Data frameworks

Journal: :PVLDB 2017

Michael J. Anderson Shaden Smith Narayanan Sundaram Mihai Capota Zheguang Zhao Subramanya Dulloor Nadathur Satish Theodore L. Willke

Apache Spark is a popular framework for data analytics with attractive features such as fault tolerance and interoperability with the Hadoop ecosystem. Unfortunately, many analytics operations in Spark are an order of magnitude or more slower compared to native implementations written with high performance computing tools such as MPI. There is a need to bridge the performance gap while retainin...

متن کامل

Verifying Equivalence of Spark Programs

2017

Shelly Grossman Sara Cohen Shachar Itzhaky Noam Rinetzky Shmuel Sagiv

Apache Spark is a popular framework for writing large scale data processing applications. Our long term goal is to develop automatic tools for reasoning about Spark programs. This is challenging because Spark programs combine database-like relational algebraic operations and aggregate operations, corresponding to (nested) loops, with User Defined Functions (UDFs). In this paper, we present a no...

متن کامل

An Algorithmic Way to Generate Simplexes for Topological Data Analysis

2016

Krzysztof Rykaczewski Piotr Wisniewski Krzysztof Stencel

In this article we present a new algorithm for creating simplicial Vietoris-Rips complexes that is easily parallelizable using computation models like MapReduce and Apache Spark. The algorithm does not involve any computation in homology spaces.

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید