A Note on the Covariance for Categorical Data
نویسنده
چکیده
Generalization of covariance concept is discussed to be used among categorical and numerical data. Gini’s definition of the variance for categorical data gives us a starting point to treat this problem. The value difference in the original definition is changed to a vector in the value space. Then a new definition of covariance is introduced for categorical and numerical data. It can provide us natural and reasonable correlation coefficients when applied to typical contingency tables.
منابع مشابه
ارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها
Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...
متن کاملCovariance and PCA for Categorical Variables
Covariances from categorical variables are defined using a regular simplex expression for categories. The method follows the variance definition by Gini, and it gives the covariance as a solution of simultaneous equations. The calculated results give reasonable values for test data. A method of principal component analysis (RS-PCA) is also proposed using regular simplex expressions, which allow...
متن کاملSimultaneous Monitoring of Multivariate Process Mean and Variability in the Presence of Measurement Error with Linearly Increasing Variance under Additive Covariate Model (RESEARCH NOTE)
In recent years, some researches have been done on simultaneous monitoring of multivariate process mean vector and covariance matrix. However, the effect of measurement error, which exists in many practical applications, on the performance of these control charts is not well studied. In this paper, the effect of measurement error with linearly increasing variance on the performance of ELR contr...
متن کاملAnalysis of Dynamic Longitudinal Categorical Data in Incomplete Contingency Tables Using Capture-Recapture Sampling: A case Study of Semi-Concentrated Doctoral Exam
Abstract. In this paper, dynamic longitudinal categorical data and estimation of their parameters in incomplete contingency tables are evaluated. To apply the proposed method, a study has been conducted on the data of the semi-concentrated doctoral exam of the National Organization for Educational Testing (NOET). The results of studies such as the obtained confidence intervals and calculating t...
متن کاملTown trip forecasting based on data mining techniques
In this paper, a data mining approach is proposed for duration prediction of the town trips (travel time) in New York City. In this regard, at first, two novel approaches, including a mathematical and a statistical approach, are proposed for grouping categorical variables with a huge number of levels. The proposed approaches work based on the cost matrix generated by repetitive post-hoc tests f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000