نتایج جستجو برای: apache spark

تعداد نتایج: 18089  

Journal: :International Journal of Grid and Utility Computing 2020

Journal: :Proceedings of the Institute for System Programming of RAS 2014

Journal: :PVLDB 2016
Qi Fan Dongxiang Zhang Huayu Wu Kian-Lee Tan

Discovering co-movement patterns from large-scale trajectory databases is an important mining task and has a wide spectrum of applications. Previous studies have identified several types of interesting co-movement patterns and showcased their usefulness. In this paper, we make two key contributions to this research field. First, we propose a more general co-movement pattern to unify those defin...

Journal: :CoRR 2016
Sergio Ramírez-Gallego Héctor Mouriño-Talín David Martínez-Rego Verónica Bolón-Canedo José Manuel Benítez Amparo Alonso-Betanzos Francisco Herrera

With the advent of extremely high dimensional datasets, dimensionality reduction techniques are becoming mandatory. Among many techniques, feature selection has been growing in interest as an important tool to identify relevant features on huge datasets –both in number of instances and features–. The purpose of this work is to demonstrate that standard feature selection methods can be paralleli...

Journal: :CoRR 2015
Ching-pei Lee

Here we compare the codes CoCoA+ and Birds. CoCoA+ is the code released by the authors of Ma et al. (2015) implementing their algorithm in Apache Spark. As indicated in Ma et al. (2015), it is available in http:// github.com/gingsmith/cocoa/. Birds is the code released by the author of Yang (2013) implementing their practical variant of DisDCA proposed in that work using C++ and MPI. It is avai...

Journal: :IEEE Data Eng. Bull. 2018
Matteo Interlandi Tyson Condie

Debugging data processing logic in Data-Intensive Scalable Computing (DISC) systems is a difficult and time consuming effort. Data provenance support is a key building block in libraries that aim to provide debugging support for data processing pipelines. In this paper we report our experience in building Titian: a data provenance system targeting the Apache Spark framework. Our focus here is t...

2015
Mohammad Alizadeh

Large cloud service providers ingest massive amounts of data in geographically distributed sites spread across the globe. Analytics for such planetary-scale datasets is an important emerging challenge. The current practice is to copy all data to a central location, where it can be dealt with locally by standard data analytics stacks such as Hadoop and Spark. However, transferring large volumes ...

2015
Jin Li Sanjeev Mehrotra Weirong Zhu

Apache Spark has attracted broad attention in both academia and industry. When people talk about Spark, the first thing that comes to mind is the Resilient Distributed Datasets (RDDs), which lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. While RDD is certainly a great contribution, an overlooked aspect of Spark lies in its harness of functional pro...

2015
Stefan Hagedorn Kai-Uwe Sattler Michael Gertz

With the availability of numerous sources and the development of sophisticated text analysis and information retrieval techniques, more and more spatio-temporal data are extracted from texts such as news documents or social network data. Temporal and geographic information obtained this way often form some kind of event, describing when and where something happened. An important task in the con...

Journal: :CoRR 2017
Noopur Gupta Rakesh K. Lenka Rabindra K. Barik Harishchandra Dubey

In era of ever-expanding data and knowledge, we lack a centralized system that maps all the faculties to their research works. This problem has not been addressed in the past and it becomes challenging for students to connect with the right faculty of their domain. Since we have so many colleges and faculties this lies in the category of big data problem. In this paper, we present a model which...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید