Fast data series indexing for in-memory data

نویسندگان

چکیده

Data series similarity search is a core operation for several data analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver time performance required interactive exploration, or of large collections. In this work, we propose MESSI, first index designed in-memory on modern hardware. Our takes advantage hardware parallelization opportunities (i.e., SIMD instructions, multi-socket and multi-core architectures), in order accelerate both construction processing times. Moreover, it benefits from careful design setup coordination parallel workers structures, so that maximizes its operations. MESSI supports using Euclidean dynamic warping (DTW) distances. experiments with synthetic real datasets demonstrate overall up 4x faster at 11x query answering than approach. answer exact queries 100GB \(\sim \)50 ms (30–75 diverse datasets), which enables real-time, exploration very

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

pattern recognition in maintenance data using methodologies data minitng (cade study isfahan regional power electric company)

فعالیت های نگهداری و تعمیرات اطلاعاتی را تولید می کند که می تواند در تعیین زمان های بیکاری و ارایه یک برنامه زمان بندی شده یا تعیین هشدارهای خرابی به پرسنل نگهداری و تعمیرات کمک کند. وقتی که مقدار داده های تولید شده زیاد باشند، فهم بین متغیرها بسیار مشکل می شوند. این پایان نامه به کاربردی از داده کاوی برای کاوش پایگاه های داده چندبعدی در حوزه نگهداری و تعمیرات، برای پیدا کردن خرابی هایی که موجب...

15 صفحه اول

Efficient In-memory Data Structures for n-grams Indexing

Indexing n-gram phrases from text has many practical applications. Plagiarism detection, comparison of DNA of sequence or spam detection. In this paper we describe several data structures like hash table or B+ tree that could store n-grams for searching. We perform tests that shows their advantages and disadvantages. One of neglected data structure for this purpose, ternary search tree, is deep...

متن کامل

A New Fast Data Encryption Algorithm (FDE)

متن کامل

Embedded Data Indexing for Fast Stream Interception by Internet Appliances

Interception of a data stream is central to any intelligent and dynamic processing of web information. It is perhaps as fundamental to Internet services’ overall architecture as the design of disk scheduling to the conventional machine architecture. In this paper we discuss an IPv6 based indexing protocol that can facilitate random access into multilevel hierarchically encoded content streams a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The Vldb Journal

سال: 2021

ISSN: ['0949-877X', '1066-8888']

DOI: https://doi.org/10.1007/s00778-021-00677-2