apache spark

نتایج جستجو برای: apache spark

تعداد نتایج: 18089 فیلتر نتایج به سال:

A configurable and executable model of Spark Streaming on Apache YARN

Journal: :International Journal of Grid and Utility Computing 2020

متن کامل

Automating cluster creation and management for Apache Spark in Openstack cloud

Journal: :Proceedings of the Institute for System Programming of RAS 2014

متن کامل

A General and Parallel Platform for Mining Co-Movement Patterns over Large-scale Trajectories

Journal: :PVLDB 2016

Qi Fan Dongxiang Zhang Huayu Wu Kian-Lee Tan

Discovering co-movement patterns from large-scale trajectory databases is an important mining task and has a wide spectrum of applications. Previous studies have identified several types of interesting co-movement patterns and showcased their usefulness. In this paper, we make two key contributions to this research field. First, we propose a more general co-movement pattern to unify those defin...

متن کامل

An Information Theoretic Feature Selection Framework for Big Data under Apache Spark

Journal: :CoRR 2016

Sergio Ramírez-Gallego Héctor Mouriño-Talín David Martínez-Rego Verónica Bolón-Canedo José Manuel Benítez Amparo Alonso-Betanzos Francisco Herrera

With the advent of extremely high dimensional datasets, dimensionality reduction techniques are becoming mandatory. Among many techniques, feature selection has been growing in interest as an important tool to identify relevant features on huge datasets –both in number of instances and features–. The purpose of this work is to demonstrate that standard feature selection methods can be paralleli...

متن کامل

On the Equivalence of CoCoA+ and DisDCA

Journal: :CoRR 2015

Ching-pei Lee

Here we compare the codes CoCoA+ and Birds. CoCoA+ is the code released by the authors of Ma et al. (2015) implementing their algorithm in Apache Spark. As indicated in Ma et al. (2015), it is available in http:// github.com/gingsmith/cocoa/. Birds is the code released by the author of Yang (2013) implementing their practical variant of DisDCA proposed in that work using C++ and MPI. It is avai...

متن کامل

Supporting Data Provenance in Data-Intensive Scalable Computing Systems

Journal: :IEEE Data Eng. Bull. 2018

Matteo Interlandi Tyson Condie

Debugging data processing logic in Data-Intensive Scalable Computing (DISC) systems is a difficult and time consuming effort. Data provenance support is a key building block in libraries that aim to provide debugging support for data processing pipelines. In this paper we report our experience in building Titian: a data provenance system targeting the Apache Spark framework. Our focus here is t...

متن کامل

Low Latency Geo-distributed Data Analytics – Public Review

2015

Mohammad Alizadeh

Large cloud service providers ingest massive amounts of data in geographically distributed sites spread across the globe. Analytics for such planetary-scale datasets is an important emerging challenge. The current practice is to copy all data to a central location, where it can be dealt with locally by standard data analytics stacks such as Hadoop and Spark. However, transferring large volumes ...

متن کامل

Prajna: Cloud Service and Interactive Big Data Analytics

2015

Jin Li Sanjeev Mehrotra Weirong Zhu

Apache Spark has attracted broad attention in both academia and industry. When people talk about Spark, the first thing that comes to mind is the Resilient Distributed Datasets (RDDs), which lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. While RDD is certainly a great contribution, an overlooked aspect of Spark lies in its harness of functional pro...

متن کامل

Large-scale Analysis of Event Data

2015

Stefan Hagedorn Kai-Uwe Sattler Michael Gertz

With the availability of numerous sources and the development of sophisticated text analysis and information retrieval techniques, more and more spatio-temporal data are extracted from texts such as news documents or social network data. Temporal and geographic information obtained this way often form some kind of event, describing when and where something happened. An important task in the con...

متن کامل

FAIR: A Hadoop-based Hybrid Model for Faculty Information Retrieval System

Journal: :CoRR 2017

Noopur Gupta Rakesh K. Lenka Rabindra K. Barik Harishchandra Dubey

In era of ever-expanding data and knowledge, we lack a centralized system that maps all the faculties to their research works. This problem has not been addressed in the past and it becomes challenging for students to connect with the right faculty of their domain. Since we have so many colleges and faculties this lies in the category of big data problem. In this paper, we present a model which...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید