Automatic Content Filtering KDDI R&D Laboratories Inc.
UGC(User Generated Content) is very popular and becoming a high part of online volume. Industry sources tell us that YouTube content submissions are moving to 5M minutes of new content uploads per day A large variety of formats, resolutions and sizes of videos and images are uploaded to the internet daily How can a company can check all this picture and movie content? Drawbacks of Manual checking : Subjective evaluation is time and resource consuming Subjective evaluation introduces fluctuations in results What are the key drivers for automatic content filtering? High speed High accuracy Background
Content Filtering Block Diagram Performance Can operate 55Pics/sec. using only Laptop PC Off Line Online OK Image Database NGImage Database Input Images Feature Extraction Feature Extraction Dictionary Detection Training (iSVM) Strong Point-1 Adopt proprietary image features OK Image NG Image Strong Point-2 Fast training by introducing iSVM 3
High Speed Training by Incremental SVM (iSVM™) SVM (Support Vector Machine) : Concept and Problem Concept : Mapping to multidimensional space and determining boundary between OK/NG Problem :Huge calculations are needed to support working on these huge datasets. Conventional SVM cannot handle a huge training dataset There’s a Strong Need for Fast Training Algorithm while maintaining high accuracy Incremental SVM (iSVM) : Concept, Features, Benefit Introducing KDDI R&D Labs’ proprietary adaptive training algorithm - iSVM Now calculation cost increases are proportional to the amount of data! Conventional methods SVM cube the proportion of calculation to data!!! We have confirmed that iSVM accelerates calculation speeds up to 8X for 5,000,000 training datasets.
Performance Comparison 1.0 High 良い Other KDDI R&D 0.8 KDDI R&D 0.6 Other Accuracy 0.97 0.975 0.4 0.697 0.643 0.2 Low 悪い 0.0 Recall Precision 100 Slow 80 60 5X faster than other product msec/content Speed 90 40 20 18 Fast 0 Other KDDI R&D
Demo • Training Datasets • Top half : Training images for OK. • Bottom half : Training images for NG. • Input images obtained from the internet • 200 images are arbitrary obtained. • Detection result using other product • Some NG pictures are detected as OK. • About 10% in this case. • Detection result using KDDI R&D Labs. • Almost all NG pictures are detected as NG. • Accuracy is far better than other product.