Calculating Large All - Pair - Shortest - Path Matrices under Google MapReduce A project report of CS 736 Fall 2012
نویسندگان
چکیده
The all-pair-shortest-path (APSP) problem has a general application in many research fields. Different types of applications have different performance requirements. A traffic safety project in identifying spatially close crashes favors a pre-calculated APSP between all reference-sites in the roadway network. However, with the actual size of a practical statewide roadway network, the targeted APSP can be so large that a single commodity computer cannot store it all in the memory, not even finishing the calculation without disk I/O’s. Once disk I/O is involved, the calculation can be extremely time-consuming. To make things worse, every time the topology of the roadway network changes, the long journey to calculate a new APSP will start again. To alleviate the time expense of calculating large APSP matrices that must involve disk I/O’s, the project team aims to implement two APSP algorithms in a distributed way. The two algorithms are the Dijkstra’s shortest path algorithm and the Floyd-Warshall algorithm. Google App Engine (GAE) servers are chosen as the underlying distributed environment. Upon GAE, the Google MapReduce library is used as the major API to convert the two algorithms into MapReduce tasks, which are distributedready.
منابع مشابه
A MapReduce-based approach for shortest path problem in large-scale networks
The cloud computing allows to use virtually infinite resources, and seems to be a new promising opportunity to solve scientific computing problems. The MapReduce parallel programming model is a new framework favoring the design of algorithms for cloud computing. Such framework favors processing of problems across huge datasets using a large number of heterogeneous computers over the web. In thi...
متن کاملShortest Paths in Microseconds
Computing shortest paths is a fundamental primitive for several social network applications including sociallysensitive ranking, location-aware search, social auctions and social network privacy. Since these applications compute paths in response to a user query, the goal is to minimize latency while maintaining feasible memory requirements. We present ASAP, a system that achieves this goal by ...
متن کاملMapReduce Functions on GasDay Data Using Hadoop
The GasDay lab at Marquette University forecasts natural gas consumption for 26 Local Distributing Companies around the United States. We have a very large amount of data that has accumulated over the past 19 years, and the lab needs a way to select and process from all of this data to gain insight into our forecasting methods. MapReduce is a pair of functions originally proposed by Jeffrey Dea...
متن کاملMotivating a Distributed System of Commodity Machines1
This report examines the price/performance benefit of using a large cluster of commodity machines rather than server level hardware for certain large scale software applications. A number of tools are presented which make it easier to produce software that runs across large clusters of commodity machines. These tools are the Chubby locking service, the Google file system, MapReduce and BigTable...
متن کاملRDFPath: Path Query Processing on Large RDF Graphs with MapReduce
The MapReduce programming model has gained traction in different application areas in recent years, ranging from the analysis of log files to the computation of the RDFS closure. Yet, for most users the MapReduce abstraction is too low-level since even simple computations have to be expressed as Map and Reduce phases. In this paper we propose RDFPath, an expressive RDF path query language geare...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012