Exploiting Features for Data Source Quality Estimation
نویسندگان
چکیده
We revisit data fusion, i.e., the problem of integrating noisy data from multiple sources by estimating the source accuracies, and show that the simple model of logistic regression can capture most existing approaches for solving data fusion. This allows us to put data fusion on a solid statistical footing and obtain solutions with rigorous theoretical guarantees. Expanding on logistic regression, we introduce SLiMFast, a framework that converts data fusion to a learning and inference problem over discriminative probabilistic models. In contrast to previous approaches that rely on complex generative models, discriminative models allow us to decouple the specification of a data fusion model from the algorithm used to learn the model’s parameters. This allows us to extend data fusion to take into account domain-specific features that are indicative of the accuracy of data sources, and design data fusion approaches that yield source accuracy estimates with 5× lower error than competing baselines. We also design an optimizer to automatically select the best algorithm for learning the model’s parameters. We validate our optimizer on multiple real datasets and show that it chooses the best algorithm for learning in almost all cases.
منابع مشابه
Document-level translation quality estimation: exploring dicsourse an pseudo-references
Predicting the quality of machine translations is a challenging topic. Quality estimation (QE) of translations is based on features of the source and target texts (without the need for human references), and on supervised machine learning methods to build prediction models. Engineering well-performing features is therefore crucial in QE modelling. Several features have been used so far, but the...
متن کاملEstimation of kinematic source parameters and frequency independent shear wave quality factor around Bushehr
In this paper, the shear wave quality factor and source parameters in the near field are estimated by analyzing the acceleration data in Zagros region. Accelerograms recorded by Building and Houses Research Center strong ground motion network have been used. The data have been considered with the magnitude of 4.7 to 6.3 collected from 1999 to 2014. In this approach, the theoretical S-wave displ...
متن کاملTranscRater: a Tool for Automatic Speech Recognition Quality Estimation
We present TranscRater, an open-source tool for automatic speech recognition (ASR) quality estimation (QE). The tool allows users to perform ASR evaluation bypassing the need of reference transcripts and confidence information, which is common to current assessment protocols. TranscRater includes: i) methods to extract a variety of quality indicators from (signal, transcription) pairs and ii) m...
متن کاملOn the mutual information of glottal source estimation techniques for the automatic detection of speech pathologies
detection of speech pathologies by exploiting the estimation of the glottal source. Three methods of estimation are compared and time and spectral features are extracted. The relevancy of these features is assessed by means of information theory-based measures. This allows an intuitive interpretation in terms of discrimination power and redundancy between the features. It is discussed which fea...
متن کاملVideo quality monitoring for mobile multicast peers using distributed source coding
We consider a peer-to-peer multicast video streaming system in which untrusted intermediaries transcode video streams for heterogeneous mobile peers. Many different legitimate versions of the video might exist. However, there is the risk that the untrusted intermediaries might tamper with the video content. Quality estimation and tampering detection are important in this scenario. We propose th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1512.06474 شماره
صفحات -
تاریخ انتشار 2015