Unsupervised Semantic Role Induction via Split-Merge Clustering
نویسندگان
چکیده
In this paper we describe an unsupervised method for semantic role induction which holds promise for relieving the data acquisition bottleneck associated with supervised role labelers. We present an algorithm that iteratively splits and merges clusters representing semantic roles, thereby leading from an initial clustering to a final clustering of better quality. The method is simple, surprisingly effective, and allows to integrate linguistic knowledge transparently. By combining role induction with a rule-based component for argument identification we obtain an unsupervised end-to-end semantic role labeling system. Evaluation on the CoNLL 2008 benchmark dataset demonstrates that our method outperforms competitive unsupervised approaches by a wide margin.
منابع مشابه
Learning semantic hierarchy with distributed representations for unsupervised spoken language understanding
We study the problem of unsupervised ontology learning for semantic understanding in spoken dialogue systems, in particular, learning the hierarchical semantic structure from the data. Given unlabelled conversations, we augment a frame-semantic based unsupervised slot induction approach with hierarchical agglomerative clustering to merge topically-related slots (e.g., both slots “direction” and...
متن کاملMeasuring unsupervised acoustic clustering through phoneme pair merge-and-split tests
Subphonetic discovery through segmental clustering is a central step in building a corpus-based synthesizer. To help decide what clustering algorithm to use we employed mergeand-split tests on English fricatives. Compared to reference of 2%, Gaussian EM achieved a misclassification rate of 6%, Kmeans 10%, while predictive CART trees performed poorly.
متن کاملA soft-clustering algorithm for automatic induction of semantic classes
In this paper, we propose a soft-decision, unsupervised clustering algorithm that generates semantic classes automatically using the probability of class membership for each word, rather than deterministically assigning a word to a semantic class. Semantic classes are induced using an unsupervised, automatic procedure that uses a context-based similarity distance to measure semantic similarity ...
متن کاملSpeaker Clustering Using Direct Maxisation of the Mllr-adapted Likelihood
In this paper speaker clustering schemes are investigated in the context of improving unsupervised adaptation for broadcast news transcription. The various techniques are presented within a framework of top-down split-and-merge clustering. Since these schemes are to be used for MLLR-based adaptation, a natural evaluation metric for clustering is the increase in data likelihood from adaptation. ...
متن کاملUtilizing the One-Sense-per-Discourse Constraint for Fully Unsupervised Word Sense Induction and Disambiguation
Recent advances in word sense induction rely on clustering related words. In this paper, instead of using a clustering algorithm, we suggest to perform a Singular Value Decomposition (SVD) which can be guaranteed to always find a global optimum. However, in order to apply this method to the problem of word sense induction, a semantic interpretation of the dimensions computed by the SVD is requi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011