Collocation Lattices and Maximum Entropy Models
نویسنده
چکیده
Maximum entropy framework proved to be expressive and powerful for the statistical language modelling, but it suffers from the computational expensiveness of the model building. The iterative scaling algorithm that is used for the parameter estimation is computationally expensive while the feature selection process requires to estimate parameters of the model for many candidate features many times. In this paper we present a novel approach for building maximum entropy models. Our approach uses a features collocation lattice and selects the atomic features without resorting to iterative scaling. After the atomic features have been selected we, using the iterative scaling, compile a fully saturated model for the maximal constraint space and then start to eliminate the most specific constraints. Since during constraint deselection at every point we have a fully fit maximum entropy model, we rank the constraints on the basis of their weights in the model. Therefore we don't have to use the iterative scaling during constraint ranldng and apply it only for linear model regression. Another important improvement is that since the simplified model deviates from the previous larger model only in a small number of constraints, we use the parameters of the old model as the initial values of the parameters for the iterative scaling of the new one. This proved to decrease the number of required iterations by about tenfold. As practical results we discuss how our method has been applied to several tasks of language modelling such as sentence boundary disambiguation, part-of-speech tagging and automatic document abstracting.
منابع مشابه
Feature Lattices for Maximum Entropy Modelling
Maximum entropy framework proved to be expressive and powerful for the statistical language modelling, but it suffers from the computational expensiveness of the model building. The iterative scaling algorithm that is used for the parameter estimation is computationally expensive while the feature selection process might require to estimate parameters for many candidate features many times. In ...
متن کامل{32 () Feature Lattices and Maximum Entropy Models Ref:mach1379-rm Editor: Ray Mooney
The maximum entropy framework has proved to be expressive and powerful for statistical language modelling, but it suuers from the computational expensiveness of model building. The iterative scaling algorithm that is used for parameter estimation is rather slow while the feature selection process might require parameters for many candidate features to be estimated many times. In this paper we p...
متن کاملA Note on the Bivariate Maximum Entropy Modeling
Let X=(X1 ,X2 ) be a continuous random vector. Under the assumption that the marginal distributions of X1 and X2 are given, we develop models for vector X when there is partial information about the dependence structure between X1 and X2. The models which are obtained based on well-known Principle of Maximum Entropy are called the maximum entropy (ME) mo...
متن کاملUsing a maximum entropy model to build segmentation lattices for MT
Recent work has shown that translating segmentation lattices (lattices that encode alternative ways of breaking the input to an MT system into words), rather than text in any particular segmentation, improves translation quality of languages whose orthography does not mark morpheme boundaries. However, much of this work has relied on multiple segmenters that perform differently on the same inpu...
متن کاملEvaluation of Dynamical Spectra for Zero-temperature Quantum Monte Carlo Simulations: Hubbard Lat- Tices and Continuous Systems
Dynamical spectra for Hubbard lattices and simple atoms are obtained using ground state projection (zero-temperature) quantum Monte Carlo and the maximum entropy method. For Hubbard lattices we show that results are equivalent to those obtained from maximum entropy deconvolutions of low-temperature grand canonical quantum Monte Carlo data. These calculations are resolution limited and fail to p...
متن کامل