Applying Naive Bayes Classification to Google Play Apps Categorization
نویسنده
چکیده
There are over one million apps on Google Play Store and over half a million publishers. Having such a huge number of apps and developers can pose a challenge to app users and new publishers on the store. Discovering apps can be challenging if apps are not correctly published in the right category, and, in turn, reduce earnings for app developers. Additionally, with over 41 categories on Google Play Store, deciding on the right category to publish an app can be challenging for developers due to the number of categories they have to choose from. Machine Learning has been very useful, especially in classification problems such sentiment analysis, document classification and spam detection. These strategies can also be applied to app categorization on Google Play Store to suggest appropriate categories for app publishers using details from their application. In this project, we built two variations of the Naı̈ve Bayes classifier using open metadata from top developer apps on Google Play Store in other to classify new apps on the store. These classifiers are then evaluated using various evaluation methods and their results compared against each other. The results show that the Naı̈ve Bayes algorithm performs well for our classification problem and can potentially automate app categorization for Android app publishers on Google Play Store.
منابع مشابه
A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملImproving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets
E-mail foldering or e-mail classification into user predefined folders can be viewed as a text classification/categorization problem. However, it has some intrinsic properties that make it more difficult to deal with, mainly the large cardinality of the class variable (i.e. the number of folders), the different number of e-mails per class state and the fact that this is a dynamic problem, in th...
متن کاملIn silico prediction of anticancer peptides by TRAINER tool
Cancer is one of the causes of death in the world. Several treatment methods exist against cancer cells such as radiotherapy and chemotherapy. Since traditional methods have side effects on normal cells and are expensive, identification and developing a new method to cancer therapy is very important. Antimicrobial peptides, present in a wide variety of organisms, such as plants, amphibians and ...
متن کاملIntegrating Multiple Internet Directories by Instance-based Learning
Finding desired information on the Internet is becoming increasingly difficult. Internet directories such as Yahoo!, which organize web pages into hierarchical categories, provide one solution to this problem; however, such directories are of limited use because some bias is applied both in the collection and categorization of pages. We propose a method for integrating multiple Internet directo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1608.08574 شماره
صفحات -
تاریخ انتشار 2016