Fast data series indexing for in-memory data
نویسندگان
چکیده
Data series similarity search is a core operation for several data analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver time performance required interactive exploration, or of large collections. In this work, we propose MESSI, first index designed in-memory on modern hardware. Our takes advantage hardware parallelization opportunities (i.e., SIMD instructions, multi-socket and multi-core architectures), in order accelerate both construction processing times. Moreover, it benefits from careful design setup coordination parallel workers structures, so that maximizes its operations. MESSI supports using Euclidean dynamic warping (DTW) distances. experiments with synthetic real datasets demonstrate overall up 4x faster at 11x query answering than approach. answer exact queries 100GB \(\sim \)50 ms (30–75 diverse datasets), which enables real-time, exploration very
منابع مشابه
Missing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کاملpattern recognition in maintenance data using methodologies data minitng (cade study isfahan regional power electric company)
فعالیت های نگهداری و تعمیرات اطلاعاتی را تولید می کند که می تواند در تعیین زمان های بیکاری و ارایه یک برنامه زمان بندی شده یا تعیین هشدارهای خرابی به پرسنل نگهداری و تعمیرات کمک کند. وقتی که مقدار داده های تولید شده زیاد باشند، فهم بین متغیرها بسیار مشکل می شوند. این پایان نامه به کاربردی از داده کاوی برای کاوش پایگاه های داده چندبعدی در حوزه نگهداری و تعمیرات، برای پیدا کردن خرابی هایی که موجب...
15 صفحه اولEfficient In-memory Data Structures for n-grams Indexing
Indexing n-gram phrases from text has many practical applications. Plagiarism detection, comparison of DNA of sequence or spam detection. In this paper we describe several data structures like hash table or B+ tree that could store n-grams for searching. We perform tests that shows their advantages and disadvantages. One of neglected data structure for this purpose, ternary search tree, is deep...
متن کاملEmbedded Data Indexing for Fast Stream Interception by Internet Appliances
Interception of a data stream is central to any intelligent and dynamic processing of web information. It is perhaps as fundamental to Internet services’ overall architecture as the design of disk scheduling to the conventional machine architecture. In this paper we discuss an IPv6 based indexing protocol that can facilitate random access into multilevel hierarchically encoded content streams a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Vldb Journal
سال: 2021
ISSN: ['0949-877X', '1066-8888']
DOI: https://doi.org/10.1007/s00778-021-00677-2