نتایج جستجو برای: apache spark

تعداد نتایج: 18089  

Journal: :CoRR 2016
Hubert Naacke Olivier Curé Bernd Amann

The number and the size of linked open data graphs keep growing at a fast pace and confronts semantic RDF services with problems characterized as Big data. Distributed query processing is one of them and needs to be efficiently addressed with execution guaranteeing scalability, high availability and fault tolerance. RDF data management systems requiring these properties are rarely built from sc...

2016
Jinliang Wei Jin Kyu Kim Garth A. Gibson

We benchmarked Apache Spark with a popular parallel machine learning training application, Distributed Stochastic Gradient Descent for Matrix Factorization [5] and compared the Spark implementation with alternative approaches for communicating model parameters, such as scheduled pipelining using POSIX socket or MPI, and distributed shared memory (e.g. parameter server [13]). We found that Spark...

Journal: :Mathematics 2022

Access plan recommendation is a query optimization approach that executes new queries using prior created execution plans (QEPs). The optimizer divides the space into clusters in mentioned method. However, traditional clustering algorithms take significant amount of time for such large datasets. MapReduce distributed computing model provides efficient solutions storing and processing vast quant...

Journal: :International Journal of Advanced Computer Science and Applications 2019

Journal: :CoRR 2017
Hassan Nazeer Waheed Iqbal Fawaz S. Bokhari Faisal Bukhari Shuja Ur Rehman Baig

Real-time text processing systems are required in many domains to quickly identify patterns, trends, sentiments, and insights. Nowadays, social networks, e-commerce stores, blogs, scientific experiments, and server logs are main sources generating huge text data. However, to process huge text data in real time requires building a data processing pipeline. The main challenge in building such pip...

Journal: :CoRR 2015
Oren Segal Philip Colangelo Nasibeh Nasiri Zhuo Qian Martin Margala

We introduce SparkCL, an open source unified programming framework based on Java, OpenCL and the Apache Spark framework. The motivation behind this work is to bring unconventional compute cores such as FPGAs/GPUs/APUs/DSPs and future core types into mainstream programming use. The framework allows equal treatment of different computing devices under the Spark framework and introduces the abilit...

Journal: :PVLDB 2017
Michael J. Anderson Shaden Smith Narayanan Sundaram Mihai Capota Zheguang Zhao Subramanya Dulloor Nadathur Satish Theodore L. Willke

Apache Spark is a popular framework for data analytics with attractive features such as fault tolerance and interoperability with the Hadoop ecosystem. Unfortunately, many analytics operations in Spark are an order of magnitude or more slower compared to native implementations written with high performance computing tools such as MPI. There is a need to bridge the performance gap while retainin...

2017
Shelly Grossman Sara Cohen Shachar Itzhaky Noam Rinetzky Shmuel Sagiv

Apache Spark is a popular framework for writing large scale data processing applications. Our long term goal is to develop automatic tools for reasoning about Spark programs. This is challenging because Spark programs combine database-like relational algebraic operations and aggregate operations, corresponding to (nested) loops, with User Defined Functions (UDFs). In this paper, we present a no...

2016
Krzysztof Rykaczewski Piotr Wisniewski Krzysztof Stencel

In this article we present a new algorithm for creating simplicial Vietoris-Rips complexes that is easily parallelizable using computation models like MapReduce and Apache Spark. The algorithm does not involve any computation in homology spaces.

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید