Multi-label Classification using Logistic Regression Models for NTCIR-7 Patent Mining Task

نویسندگان

  • Akinori Fujino
  • Hideki Isozaki
چکیده

We design a multi-label classification system based on a machine learning approach for the NTCIR-7 Patent Mining Task. In our system, we employ a logistic regression model for each International Patent Classification (IPC) code that determines the IPC code assignment of research papers. The logistic regression models are trained by using patent documents provided by task organizers. To mitigate the overfitting of the logistic regression models to the patent documents, we design the feature vectors of the patent documents with feature weighting and component selection methods utilizing a research paper set. Using a test collection for the Japanese subtask of the NTCIR7 Patent Mining Task, we confirmed the effectiveness of our multi-label classification system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using the Multi-level Classification Method in the Patent Mining Task at NTCIR-7

A patent includes a great deal of practical technical information, and plays an important role in promoting scientific development. The research on patent classification and retrieval has significant application value. A patent is a special technical text with strict hierarchical classification system and normalized structure, and there are a number of relations between patents and their consti...

متن کامل

Multi-label Patent Classification at NTT Communication Science Laboratories

We design a multi-label classification system based on the combination of binary classifications for classification subtask at NTCIR-6 Patent Retrieval Task. In our system, we design a binary classifier per Fterm that determines the assignment of the F-term to patent documents. Hybrid classifiers are employed as binary classifiers so that the multiple components of patent documents are used eff...

متن کامل

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

KNN and Re-ranking Models for English Patent Mining at NTCIR-7

This paper describes our English patent mining system for NTCIR-7 patent mining task which maps a research paper abstract into IPC taxonomy. Our system is basically under the k-Nearest Neighboring framework, in which various similarity calculation and ranking methods are used. We employ two re-ranking techniques to improve the performance by the use of richer features. Our systems performed wel...

متن کامل

Overview of the Patent Mining Task at the NTCIR-7 Workshop

This paper introduces the Patent Mining Task of the Seventh TCIR Workshop and the test collections produced in this task. The task’s goal was the classification of research papers written in either Japanese or English in terms of the International Patent Classification (IPC) system, which is a global standard. For this task, 12 participant groups submitted 49 runs. In this paper, we also report...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008