A runtime estimation framework for ALICE
نویسندگان
چکیده
The European Organization for Nuclear Research (CERN) is the largest research organization for particle physics. ALICE, short for A Large I on C ollider Experiment, serves as one of the main detectors at CERN and produces approximately 15 petabytes of data each year. The computing associated with an ALICE experiment consists of both online and offline processing. An online cluster retrieves data while an offline cluster farm performs a broad range of data analysis. Online processing occurs as collision events are streamed from the detector to the online cluster. This process compresses and calibrates the data before storing it in a data storage system for subsequent offline processing, e.g., event reconstruction. Due to the large volume of stored data to process, offline processing seeks to minimize execution time and data-staging time of the applications via a two-tier offline cluster — the Event Processing Node (EPN) as the first tier and the World LHC Grid Computing (WLGC) as the second tier. This two-tier cluster requires a smart job scheduler to efficiently manage the running of the application. Thus, we propose a runtime estimation method for this offline processing in the ALICE environment. Our approach exploits application profiles to predict the runtime of a highperformance computing (HPC) application without the need for any additional metadata. To evaluate our proposed framework, we performed our experiment ∗Corresponding author: Phond Phunchongharn Preprint submitted to Journal of Future Generation Computer Systems July 20, 2017 on the actual ALICE applications. In addition, we also test the efficacy of our runtime estimation method to predict the run times of the HPC applications on the Amazon EC2 cloud. The results show that our approach generally delivers accurate predictions, i.e., low error percentages.
منابع مشابه
A Software Data Transport Framework for Trigger Applications on Clusters
In the future ALICE heavy ion experiment at CERN’s Large Hadron Collider input data rates of up to 25 GB/s have to be handled by the High Level Trigger (HLT) system, which has to scale them down to at most 1.25 GB/s before being written to permanent storage. The HLT system that is being designed to cope with these data rates consists of a large PC cluster, up to the order of a 1000 nodes, conne...
متن کاملA modular and fault-tolerant data transport framework
The High Level Trigger (HLT) of the future ALICE heavy-ion experiment has to reduce its input data rate of up to 25 GB/s to at most 1.25 GB/s for output before the data is written to permanent storage. To cope with these data rates a large PC cluster system is being designed to scale to several 1000 nodes, connected by a fast network. For the software that will run on these nodes a flexible dat...
متن کاملAdaptive Online Performance and Power estimation Framework for Dynamic Reconfigurable Embedded Systems
Runtime dynamic reconfiguration of field-programmable gate arrays (FPGAs) and devices incorporating both microprocessors and FPGA has been successfully utilized to both increase performance and reduce power consumption for embedded applications. Previous approaches primarily utilized design time information to schedule the reconfiguration process. While these methods are successful, they do not...
متن کاملROSIE: Runtime Optimization of SPARQL Queries Using Incremental Evaluation
Relational databases are wildly adopted in RDF (Resource Description Framework) data management. For efficient SPARQL query evaluation, the legacy query optimizer needs reconsiderations. One vital problem is how to tackle the suboptimal query plan caused by error-prone cardinality estimation. Consider the schema-free nature of RDF data and the Join-intensive characteristic of SPARQL query, dete...
متن کاملA New Framework for Distributed Multivariate Feature Selection
Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Future Generation Comp. Syst.
دوره 72 شماره
صفحات -
تاریخ انتشار 2017