نتایج جستجو برای: data sampling

تعداد نتایج: 2525723  

Journal: :CoRR 2012
Hadassa Daltrophe Shlomi Dolev Zvi Lotker

Given a large set of measurement sensor data, in order to identify a simple function that captures the essence of the data gathered by the sensors, we suggest representing the data by (spatial) functions, in particular by polynomials. Given a (sampled) set of values, we interpolate the datapoints to define a polynomial that would represent the data. The interpolation is challenging, since in pr...

2009
Rocky K. C. Chang Edmond W. W. Chan Xiapu Luo

In this paper, we present preliminary results of measuring TCP data-path quality using a new measurement tool called OneProbe. Unlike the existing tools, OneProbe uses legitimate TCP data probes to profile TCP data-path quality by sampling round-trip delay, one-way loss rate, and one-way reordering rate at the same time. This paper presents a set of recent measurement studies on a set of web se...

Journal: :Journal of Computing and Information Technology 2021

Fraud detection has received considerable attention from many academic research and industries worldwide due to its increasing popularity. Insurance datasets are enormous, with skewed distributions high dimensionality. Skewed class distribution volume considered significant problems while analyzing insurance datasets, as these issues increase the misclassification rates. Although sampling appro...

2014
David J. Dittman Taghi M. Khoshgoftaar Randall Wald Amri Napolitano

Class imbalance is a frequent problem found in bioinformatics datasets. Unfortunately, the minority class is usually also the class of interest. One of the methods to improve this situation is data sampling. There are a number of different data sampling methods, each with their own strengths and weaknesses, which makes choosing one a difficult prospect. In our work we compare three data samplin...

2010
Ioannis Kosmidis

The problem of clustering large data sets has attracted a lot of current research. The approaches taken are mainly based either on the more efficient implementation or modification of existing methods or/and on the construction of clusters from a small sub-sample of the data and then the assignment of all observations in those clusters. The current paper focuses on the latter direction. An alte...

Journal: :JIDM 2015
Tiago Rodrigo Kepe Eduardo Cunha de Almeida Thomas Cerqueus

Data sampling over data streams is common practice to allow the analysis of data in real-time. However, sampling over data streams becomes complex when the stream does not fit in memory, and worse yet, when the length of the stream is unknown. A well-known technique for sampling data streams is the Reservoir Sampling. It requires a fixed-size reservoir that corresponds to the resulting sample s...

1996
George H. John Pat Langley

As data warehouses grow to the point where one hundred gigabytes is considered small, the computational efficiency of data-mining algorithms on large databases becomes increasingly important. Using a sample from the database can speed up the datamining process, but this is only acceptable if it does not reduce the quality of the mined knowledge. To this end, we introduce the “Probably Close Eno...

Journal: :Data Knowl. Eng. 2008
Hüseyin Akcan Alex Astashyn Hervé Brönnimann

Processing and extracting meaningful knowledge from count data is an important problem in data mining. The volume of data is increasing dramatically as the data is generated by day-to-day activities such as market basket data, web clickstream data or network data. Most mining and analysis algorithms require multiple passes over the data, which requires extreme amounts of time. One solution to s...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید