Multimedia Event Detection – Strong by Multi-Modality Integration

نویسنده

  • S PEAKER
چکیده

We will present our Multimedia Event Detection system with positive video exemplars (event query is defined by 10 or 100 positive videos), which achieves state-of-the-art performance by designing different fusion strategies for different modalities. First, in visual system, the standard fusion strategy is averaging probability scores obtained by different features. This strategy could achieve reasonable results e.g., relative mAP improvement of 5% with 100 positive videos case, in fusion of improved dense trajectory and high-level concept features. However, we show that using an inverse joint probability instead of the standard strategy in the combination of the concept feature and improved dense trajectory improves performance with a relative mAP improvement of 7%, gaining a better AP in 16 of 20 pre-specified multimedia events on MED15 full evaluation set. The main reason is that classifiers trained on high-level concept feature and improved dense trajectory can be complementary and with average fusion method low score of one type of classifier downgrades a possibly relevant video. By using the inverse joint probability, only videos that receive a low score from both classifiers will be put at the bottom of the list. Besides combining visual information, we combine the speech (ASR) and textual (OCR) information. Our ASR and OCR system are tuned for high precision and only retrieve those videos that almost certainly contain the event. These results are used to rank the relevant videos higher in the ranked list than before. Our results show that our OCR system improves performance by 5.8% and 2% relative mAP in both 10 and 100 positive videos cases. ASR system, on the other hand, is not precise enough, as overall performance does not increase by adding ASR results. This indicates that both ASR and OCR might be useful in some events, but not in other events. Our systems are, thus, not the best individual systems, but by combining multiple sources we can outperform systems that only use one source of information. This paper was presented at TRECVID 2015 Workshop, November 16-18, 2015, Gaithersburg, MD, USA. Supervisor: Prof C W Ngo Research Interests: Multimedia Event Detection; Multimedia Content Analysis; Semantical Concept Indexing

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Janus - Multi Source Event Detection and Collection System for Effective Surveillance of Criminal Activity 2 Janus - Multi Source Event Detection and Collection System for Effective Surveillance of Criminal Activity

Recent technological advances provide the opportunity to use large amounts of multimedia data from a multitude of sensors with different modalities (e.g., video, text) for the detection and characterization of criminal activity. Their integration can compensate for sensor and modality deficiencies by using data from other available sensors and modalities. However, building such an integrated sy...

متن کامل

Janus - Multi Source Event Detection and Collection System for Effective Surveillance of Criminal Activity

Recent technological advances provide the opportunity to use large amounts of multimedia data from a multitude of sensors with different modalities (e.g., video, text) for the detection and characterization of criminal activity. Their integration can compensate for sensor and modality deficiencies by using data from other available sensors and modalities. However, building such an integrated sy...

متن کامل

Online Learning of Multimedia Event Detection: Multi-modality Analysis

Our life is more and more exposed to massive video data and it is harder and harder for users to retrieve the relevant videos. In this paper, we propose an online learning algorithm to learn a version of Markov Network known as Harmonium models to deal with incremental multimedia event detection based on the content of the video. We used TRECVID competition data set, which consists of approxima...

متن کامل

Resource Constrained Multimedia Event Detection

We present a study comparing the cost and efficiency tradeoffs of multiple features for multimedia event detection. Low-level as well as semantic features are a critical part of contemporary multimedia and computer vision research. Arguably, combinations of multiple feature sets have been a major reason for recent progress in the field, not just as a low dimensional representations of multimedi...

متن کامل

Events based Multimedia Indexing and Retrieval

Event recognition is one of multimedia applications that has been gaining ground recently. However, it has received scarce attention relatively to other applications. The methodologies presented hereby are aimed at event-based analysis of multimedia content, which is achieved from three perspectives, namely (i) event recognition in single images, (ii) event recognition in personal photo collect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016