Transfer and transport: incorporating causal methods for improving predictive models.

نویسندگان

  • Kyle W Singleton
  • Alex A T Bui
  • William Hsu
چکیده

Predicting patient outcome is an important task in medical decision making, as clinician expectations of outcome drive testing and treatment decisions. Accurate models can assist clinicians by capitalizing on information from a broad spectrum of features to predict outcome. In an article in this journal, ‘A study in transfer learning: leveraging data from multiple hospitals to enhance hospital-specific predictions,’ Wiens, Guttag, and Horvitz explore the use of transfer learning for improving a predictive model of Clostridium difficile infection (CDI). Their discussion focuses on the need to aggregate data for studying rare diseases but notes the failure of global models to predict accurately for specific institutions. Transfer learning attempts to rectify the generalizability problem by applying evidence from multiple sources on a related target task. Their work demonstrates how transfer learning can be utilized to create a ‘source+target’ model matching or outperforming models trained with source or target data alone. A number of important considerations when pooling data are raised by the authors. Here, we note the need for further discussion by revisiting two points raised in their paper affecting transfer: (1) feature similarity and (2) feature selection. We briefly discuss limitations of transfer learning tied to a lack of causal knowledge and posit that causal information can complement transfer learning to improve model generalization. Weins et al combined datasets by comparing the overlap of features in data collected across hospitals. However, it is important to note that overlapping features do not guarantee feature similarity. This issue was explored in the section, ‘Not all transfer is created equal’; hospital B was determined as the most different of the three, transferring evidence poorly to target tasks at hospitals A and C. When differences are minimal, the task of transfer learning is straightforward and data can be aggregated freely. But uninformed assumptions of similarity could have a detrimental effect on model accuracy through population and confounding effects. Population differences are either systematic collection differences or inherent differences in the given populations at each location. Confounding differences are tied to the causal interactions and subsequent correlations between chosen features in the model. Wiens et al discuss the apparent difference between the three hospitals examined in the work, yet no explanation is provided concerning why such differences exist or the probable source of differences (population/confounding). Consideration should be given to feature similarity early in the modeling task to appropriately constrain the model. Poor assumptions may lead to spurious associations or remove generalization. To enhance feature similarity exploration, causal assumptions between variables can be drawn from expert knowledge and previous research. CDI risk, for example, is associated with increased age, duration of hospitalization, and exposure to antimicrobial agents in past scientific literature (eg, randomized clinical trials). Age, hospitalization, and antimicrobial drug knowledge can be combined with other empirical evidence to define casual assumptions and construct a causal graph, providing a linked consideration of feature effects. Transportability theory, introduced by Pearl and Bareinboim, offers a basis for using causal graph relations to describe which variables’ probability distributions ‘transport’ or are more likely to generalize, between populations. Transport encompasses transfer learning in attempting to use statistical evidence from a source on a target, but differs by incorporating causal assumptions derived from a combination of empirical source data and outside domain knowledge (table 1). The final product of transportability analysis is a formula dictating what information should be combined from the considered domains. Transport techniques allow information to be pulled from appropriate data resources: a larger source can strengthen a weaker target or a strong target can build on portions of a source. Outside domain knowledge can also define assumptions or strengthen weak empirical findings. Depending on the causal network structure, features can be deemed ‘nontransportable’, indicating that source data will never accurately predict the target task despite overlap. When multiple source tasks are available, causal considerations for each source can be combined to yield a transport formula using the expanded rubrics from meta-transportability. Feature selection is another important consideration of the learning task that received limited discussion. Causal assumptions cannot be properly described until a modeler understands variables available to the model. Chosen predictive features should make sense as causal indicators for the medical task of raising a CDI alert. For example, physician and location features, as suggested in the paper, may be predictive of increased CDI risk, but these features are not independent: location strongly mediates the physician (ie, a physician is associated with a location). As such, the physician feature may be unstable. Feature selection is also unstable when source and target locations cannot supply enough data. Additional evidence must be obtained before transfer or transport techniques can proceed. Also large feature spaces can be beneficial to finding predictive correlations, but these large spaces are challenging for considering causal relationships and require larger datasets. Consider Wiens et al’s model with 256 features common to all sites and features specific to the target task. The final set of selected features and the predictive weights of individual variables are not described, making it difficult for a reader to ascertain which available features contribute to prediction. Applying a deeper understanding of available features avoids blind application of large data sets. Understanding guides feature selection and describes what associations may be influenced by other factors. Striking a balance between big data and causal feature selection methods will be important for developing future learning techniques. Additionally, the transfer method demonstrated by Wiens et al removes all source-only features from the model during feature selection. However, indiscriminately removing source features may decrease predictive performance. Including a sourceonly feature can be advantageous when data are difficult or expensive to obtain in routine care. Genetic phenotypes, for example, include important information about disease and are increasingly collected by academic research hospitals. Rural clinics, however, lack the ability to measure these same data. By considering population Table 1 Definitions of ‘Transfer’ and ‘Transport’ terminology used in this correspondence

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gyroscope Random Drift Modeling, using Neural Networks, Fuzzy Neural and Traditional Time- series Methods

In this paper statistical and time series models are used for determining the random drift of a dynamically Tuned Gyroscope (DTG). This drift is compensated with optimal predictive transfer function. Also nonlinear neural-network and fuzzy-neural models are investigated for prediction and compensation of the random drift. Finally the different models are compared together and their advantages a...

متن کامل

A Multi-Year Scenario-Based Transmission Expansion Planning Model Incorporating Available Transfer Capability

This paper presents a multi-year scenario-based methodology for transmission expansion planning (TEP) in order to enhance the available transfer capability (ATC). The ATC is an important factor for all players of electricity market who participate in power transaction activities and can support the competition and nondiscriminatory access to transmission lines among all market participants. The...

متن کامل

ارائه مدلی جهت پیش بینی بیماری دیابت با استفاده از شبکه عصبی

Introduction: Meta-heuristic and combined algorithms have a great capability in modelling medical decision making. This study used neural networks in order to predict Type 2 Diabetes (T2D) among high risk individuals. Methods: This study was   an applied research. Data from 545 individuals (diabetic and non-diabetic), in Diabetes Clinic of Hamedan University of Medical Sciences, we...

متن کامل

Onm-18: The Causal for Repeated ImplantationFailures

Background: In spite of the great deal of research in assisted reproductive techniques, more than 80% of transferred embryos by IVF/ICSI methods fail to be implanted. The causes for repeated implantation failures (RIF) may be reduced endometrial receptivity or other various uterine pathologies, such as thin endometrium, altered expression of adhesive molecules or immunological factors; whereas ...

متن کامل

Towards a Vocabulary for Incorporating Predictive Models into the Linked Data Web

Predictive modeling reflects the process of using data and statistical or data mining methods for predicting new observations. The predictive models that are created out of this process could be reused in different applications in the same sense that open data is reused. Towards this end, a few standards have been proposed in order to enable transfer of predictive models across platforms and ap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of the American Medical Informatics Association : JAMIA

دوره 21 e2  شماره 

صفحات  -

تاریخ انتشار 2014