Probabilistic Models over Ordered Partitions with Applications in Document Ranking and Collaborative Filtering
نویسندگان
چکیده
Ranking is an important task for handling a large amount of content. Ideally, training data for supervised ranking would include a complete rank of documents (or other objects such as images or videos) for a particular query. However, this is only possible for small sets of documents. In practice, one often resorts to document rating, in that a subset of documents is assigned with a small number indicating the degree of relevance. This poses a general problem of modelling and learning rank data with ties. In this paper, we propose a probabilistic generative model, that models the process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on two application areas: (i) document ranking with the data from the recently held Yahoo! challenge, and (ii) collaborative filtering with movie data. The results demonstrate that the models are competitive against well-known rivals.
منابع مشابه
A Theory of Information Matching ( TIM ) ∗
In this work, we propose a theory for information matching. It is motivated by the observation that retrieval is about the relevance matching between two sets of properties (features), namely, the information need representation and information item representation. However, many probabilistic retrieval models rely on fixing one representation and optimizing the other (e.g. fixing the single inf...
متن کاملA Theory of Information Matching
In this work, we propose a theory for information matching. It is motivated by the observation that retrieval is about the relevance matching between two sets of properties (features), namely, the information need representation and information item representation. However, many probabilistic retrieval models rely on fixing one representation and optimizing the other (e.g. fixing the single inf...
متن کاملLearning From Ordered Sets and Applications in Collaborative Ranking
Ranking over sets arise when users choose between groups of items. For example, a group may be of those movies deemed 5 stars to them, or a customized tour package. It turns out, to model this data type properly, we need to investigate the general combinatorics problem of partitioning a set and ordering the subsets. Here we construct a probabilistic log-linear model over a set of ordered subset...
متن کاملA User-Item Relevance Model for Log-Based Collaborative Filtering
Implicit acquisition of user preferences makes log-based collaborative filtering favorable in practice to accomplish recommendations. In this paper, we follow a formal approach in text retrieval to re-formulate the problem. Based on the classic probability ranking principle, we propose a probabilistic user-item relevance model. Under this formal model, we show that user-based and item-based app...
متن کاملLearning to Make Social Recommendations: A Model-Based Approach
Social recommendation, predicting people who match other people for friendship or as potential partners in life or work, has recently become an important task in many social networking sites. Traditional content-based and collaborative filtering methods are not sufficient for people-to-people recommendation because a good match depends on the preferences of both sides. We proposed a framework f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011