Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons

نویسندگان

  • Ali-Akbar Haghdoost Regional Knowledge Hub for HIV/AIDS Surveillance, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran
  • Azam Rastegari Social Determinant of Health Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran
  • Mohammad Reza Baneshi Research Center for Modeling in Healtth, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran
  • Saiedeh Haji-Maghsoudi Regional Knowledge Hub for HIV/AIDS Surveillance, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran
چکیده مقاله:

Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern, to be addressed here, is the role of the pattern of missing data.   Methods We used information of 2720 prisoners. Results derived from fitting regression model to whole data were served as gold standard. Missing data were then generated so that 10%, 20% and 50% of data were lost. In scenario 1, we generated missing values, at above rates, in one variable which was significant in gold model (age). In scenario 2, a small proportion of each of independent variable was dropped out. Four imputation methods, under different Event Per Variable (EPV) values, were compared in terms of selection of important variables and parameter estimation.   Results In scenario 2, bias in estimates was low and performances of all methods for handing missing data were similar. All methods at all missing rates were able to detect significance of age. In scenario 1, biases in estimations were increased, in particular at 50% missing rate. Here at EPVs of 10 and 5, imputation methods failed to capture effect of age.   Conclusion In scenario 2, all imputation methods at all missing rates, were able to detect age as being significant. This was not the case in scenario 1. Our results showed that performance of imputation methods depends on the pattern of missing data.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

influence of pattern of missing data on performance of imputation methods: an example from national data on drug injection in prisons

background policy makers need models to be able to detect groups at high risk of hiv infection. incomplete records and dirty data are frequently seen in national data sets. presence of missing data challenges the practice of model development. several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. one of the issues which was of less concern...

متن کامل

Influence of pattern of missing data on performance of imputation methods: an example using national data on drug injection in prisons.

BACKGROUND Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern...

متن کامل

impact of imputation of missing data on estimation of survival rates: an example in breast cancer

background: multifactorial regression models are frequently used in medicine to estimate survival rate of patients across risk groups. however, their results are not generalisable, if in the development of models assumptions required are not satisfied.  missing data is a common problem in pathology. the aim of this paper is to address the danger of exclusion of cases with missing data, and to h...

متن کامل

data mining rules and classification methods in insurance: the case of collision insurance

assigning premium to the insurance contract in iran mostly has based on some old rules have been authorized by government, in such a situation predicting premium by analyzing database and it’s characteristics will be definitely such a big mistake. therefore the most beneficial information one can gathered from these data is the amount of loss happens during one contract to predicting insurance ...

15 صفحه اول

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 1  شماره 1

صفحات  69- 77

تاریخ انتشار 2013-06-03

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023