Toward Multimedia: A String Pattern-based Passage Ranking Model for Video Question Answering

1. Toward Multimedia: A String Pattern-based Passage Ranking Model for Video Question Answering

2. 2 Introduction With the rapid expansion of video data, there is an increasing demand for retrieving and browsing videos Current video retrieval techniques merely support for retrieving related �documents� To provide multimedia Q/A, it implies: Video content extraction Objects, sounds, speech, images, motions, etc Text-based Q/A Pinpoint exact answers rather than returning documents

3. 3 Related Works (Video) Extracting video contents is a very difficult but important task Objects, sounds, speech, images, motions, etc Among them, text in videos, especially for the closed captions is the most powerful features Common OCR (optical character recognition) > SR (speech recognition) The well-known Informedia project (Wactlar, 2000) and TREC VID tracks (Over et al., 2005) But both of them serves for simple retrieval only, Find shots of [a ship or boat]

4. 4 Related Works (TextQ/A) TREC-Q/A gave the pilot competition on extracting answers from huge document corpus Most top-performed Q/A systems required combining many domain and language dependent resources: Parsers (Charniak, 2002) Named Entity Taggers (Florian et al., 2003) Elaborate ontology (Yang et al., 2003) WordNet (almost every Q/A studies) They are far difficult to port to different languages or domains

5. 5 Related Works (VideoQ/A) Lin et al. (2001) presented the earlier work Simple OCR techniques and combining with simple term weighting schemes Un-advanced work in OCR The thesaurus was hand-created Yang et al. (2003) proposed the earliest video Q/A system Made use many linguistic resources NER, Parser, WordNet, WWW, � Applying the news articles to correct speech errors Keyword frequency-based answer selection Cao et al. (2004) designed a domain-dependent Q/A system For online education Pattern-based (manually constructed) answer selection Wu et al. (2004) showed the first cross-language video Q/A system Applying density-based method for answer selection Convert each language into English (only support for English query) Zhang and Nunamaker (2004) developed a videoQ/A technique based on retrieving short clips The short clips were segmented manually Applied a simple TFIDF-like weighting

6. 6 In this paper We propose a passage ranking algorithm for extending textQ/A to videoQ/A Users interact with our system through natural language questions Passages are able to answer the question Lin et al. (2003) showed that users prefer passages rather than short answers since it contains context Our method are: Multilingual portable Effective

7. 7 Outline Introduction Related works Our videoQ/A Method Video Processing Passage Ranking Algorithm Experiments Settings Results Conclusion

8. 8 System Architecture

9. 9 Video Processing

10. 10 Video Processing Text localization: Purpose: Localize the text areas in frames Related works: Top-down (Cai et al., 2002) Bottom-up (Fan et al., 2001)

11. 11 Video Processing Extraction & Tracking: Purpose: Extract text color and multi-frame integration Related works: Text extraction (Ryu et al., 2005) Multi-frame integration

12. 12 Video Processing OCR Purpose: Recognizing the characters in text components Related works: Simple OCR (Wu et al., 2004; Hong et al., 1995)

13. 13 System Architecture

14. 14 Chinese word segmentation There is no explicit boundary between words in most oriented languages (Chinese, Japanese, Korean, etc) We could adopt two approaches to extract words in those text A well-trained Chinese word segmentation (SIGHAN bake-off, see Levow, 2006) N-gram (widely used for NTCIR cross-language retrieval, see Kishida et al., 2007)

15. 15 An example


17. 17 What is a sentence?

18. 18 Document Retrieval and Passage Segmentation Passage segmentation: Sliding window with size=3 and one previous sentence overlapping Initial retrieval model Okapi-BM25 (Robertson et al., 2000; Savoy, 2005) Top-1000 relevant passages for further re-ranking One can replace BM-25 with better retrieval models


20. 20 Ranking Algorithm Related works Introduction Limitations The importance of N-gram and word density Our method Suffix Tree Algorithms for finding the best match sequence Preprocessing Re-tokenization and Weighting

21. 21 Ranking Algorithm (Related works) The ranking model receives the segmented passages and ranks the top-N passages to response the question Tellex et al. (2003) compared seven portable passage retrieval algorithm Density-based is the best Cui et al. (2005) further improve the density-based method with 17% relatively MRR (main reciprocal ratio) score But it is necessary to prepare training data, WordNet, and parsers at first

22. 22 Ranking Algorithm (Related works) Parsing is a very complex work, in particular to Chinese Word segmentation Part-of-Speech tagging Constituent parsing / Dependency parsing Full parsing is also very slow 1 sentence cost 0.8-1.3 second (Charniak�s parser) Besides, the development of labeled corpora is a laborious work Develop a trained-passage ranker to another language is also very expensive What about the OCR errors ?

23. 23 Ranking Algorithm (N-gram) Traditional ranking algorithm biased to give more weight for high-frequent words rather than N-gram N-gram is useful but much less ambiguous than its individual unigrams For example, it is often the case that �OpticalnCharacternRecognition� = �Optical?Character?Recognition�

24. 24 Ranking Algorithm (Density) Dense �distinct� word distribution is useful If the passage contains abundant �identical� question words, potential answer words might occur Basic assumption of the density-based algorithms We should state that �distinct� word distribution does apart from the classic word distribution Classical density-based method simply account the match word distributions The first term of the SiteQ�s method is �keyword frequency� In comparison, we focus on find the �only one� best fit match word of each question term

25. 25 Ranking Algorithm (Frequency) Frequency is not always useful Usually a passage contain Chinese stopwords, and punctuations: � ;!????, In our case, many unrecognizable or false-alarm words are also appear ? ? ? ? ? ? ? ?

26. 26 Ranking Algorithm Our ranking algorithm both takes the two �views� into account In other words, find the best match sequence for the passage that results in the �long� N-gram matching and �dense� N-gram distribution In addition, each match word is restricted to appear at most once in the sentence

27. 27 Ranking Algorithm Unfortunately finding the best match is an NP-Complete problem => O(2n) Match or Mismatch Thus we propose an algorithm to approximately find the best fit match sequence to be scored Induction: Probabilistic view to score the importance of a passage Propose an algorithm to find the match sequence An example to estimate the score Time complexity analysis Compared with �density�, �frequency� methods


29. 29 Question Analysis At first, we remove all the Chinese stopwords from the given question using the maximum-N-gram matching algorithm Decreasingly check N-gram, N-1 gram, � 2-gram, 1-gram in the sentence The stoplist is selected via Estimate the N-gram (1,2,3) frequency Sorting Selected by a Chinese native expert 897 = 571 (English stopwords) +326 (semi-manually)

30. 30 Question Suffix Tree

31. 31 Passage Suffix Tree

32. 32 String Matching Insert the question string into the Passage Suffix Tree, we can find the common subsequences for question string

33. 33 String Matching Hence we observe the following common subsequences Similarly, we can insert the passage string into the Question Suffix Tree to get:

34. 34 Scoring Function The passage score is ranked by ? is used to adjust the importance of the density score QW_Density(Q, P) estimates the Q word density in P QW_Weight(Q, P) measures the sum of weight of the matched question words in P

35. 35 QW_Density Quantifies the weighted word density distribution Modify SiteQ�s second term As our hypothesis, we want the long string patterns

36. 36 Discriminative Power Also the discriminative power should be taken into consideration

37. 37 QW_Density By re-tokenizing and re-weighting, the QW_Density can be computed as follows

38. 38 QW_Weight This term estimates How much content information the passage has given the question

39. 39 Combining Density and Weight We further taking first two or last two sentences into account Answers might occur before/after the sentences that contain useful term match

40. 40 Outline Introduction Related works Our videoQ/A method Experiments Settings Results Conclusion

41. 41 Settings The testing question data set (about 250) is mainly collected by Web-logs We use the MRR, precision, pattern-recall score to estimate the proposed Q/A method Pattern-recall: number of answer patterns found in top-5 rank To compare with state-of-the-art, we adopted six effective but multilingual portable ranking algorithms TFIDF, BM-25, Language Model, INQUERY, Cosine, SiteQ

42. 42 For askers

43. 43 For askers

44. 44 Statistics of the collected Discovery videos

45. 45 Comparison

46. 46 Results (character-level)

47. 47 Results (word-level)

48. 48 Large-scale experiments

49. 49 Auto-Translate into English

50. 50 Re-ranking the six retrieval models

51. 51 Conclusion This paper propose a new passage ranking algorithm for Chinese video QA 250 collected questions are evaluated in the 75.6 Hrs videos Outperform the BM-25, Language Model, INQUERY, etc. Applying the word-segmentation for video QA is not a good idea: drop Avg. 10% for most retrieval models Can we parse the OCR transcripts as articles?? Word-segmentation (0.94), POS tagging (0.91-0.92), Parsing (0.846)

52. 52 Future Directions Speech is another important clue Now we are investigating some well-known toolkits CMUs SphinX, Cambrige�s HTK Effective parse the transcript (especially to Asian-like languages) How to improve the error-recognized and false-alarm words Domain adaptation (from news articles to the video)

53. 53 Thanks Prof. Yue-Shi Lee and Chia-Hui Chang gave great amount of comments and everything support Database Lab (National Central Univ.) and Data Mining Lab (Ming-Chuan Univ.) for usage testing and truly comments

54. 54 References Yang, H., Chaison, L., Zhao, Y., Neo, S. Y., & Chua, T. S. (2003). VideoQA: question answering on news video. In Proceedings of the 11th ACM international conference on multimedia (ACMM) (pp. 632-641). Zhang, D., & Nunamaker, J. (2004). A natural language approach to content-based video indexing and retrieval for interactive E-learning. Journal of IEEE Transactions on Multimedia, 6, 450-458. Wu, Y. C., Lee, Y. S., & Chang, C. H. (2004). CLVQ: cross-language video question/answering system. In Proceedings of 6th IEEE international symposium on multimedia software engineering (MSE) (pp. 294-301). Lyu, M. R., Song, J., & Cai, M. (2005). A comprehensive method for multilingual video text detection, localization, and extraction. Journal of IEEE transactions on circuits and systems for video technology, 15, 243-255. Lienhart, R., & Wernicke, A. (2002). Localizing and segmenting text in images and videos. Journal of IEEE transactions on circuits and systems for video technology, 12, 256-268. Lin, C. J., Liu, C. C., & Chen, H. H. (2001). A simple method for Chinese videoOCR and its application to question answering. Computational linguistics and Chinese language processing, 6, 11-30. Cao, J., & Nunamaker, J. F. (2004). Question answering on lecture videos:�a multifaceted approach. In Proceedings of the joint conference on digital libraries (JCDL) (pp. 214-215). Cao, J., Roussinov, D., Robles, J., & Nunamaker J. F. (2005). Automated question answering from videos: NLP vs. pattern matching. Hawaii international conference on system science (HICSS) (pp. 43(b)-43(b)). Kishida, K., Chen, K. H., Lee, S., Kuriyama, K., Kando, N., Chen, H. H., & Myaeng, S. H. (2007). Overview of CLIR task at the sixth NTCIR workshop. In Proceedings of the 6th NTCIR Workshop.

55. 55 SPVQA system




59. 59 Online-Demonstration

60. 60 Discussions Our method outperforms TF-based and density-based methods Our method is suitable for videoOCR transcripts Even OCR error appears within keywords Question: ??????????? Answer: ????????????????????

61. 61

62. 62 Error Analysis OCR error in key question words �Where is the headquarter of FBI?� But most �FBI� were incorrect identified as 98I or 28I Synonyms and anaphora Our method focus on the surface terms Failed to identify the �It�, �He�, �She�, � etc.

63. 63 Error Analysis Lake of language-dependent analysis Chinese word tokenization Chinese stopword removal ??????????? ?????? obtains more weight In this case We should focus on �??(?)� and �???� Machine translation errors Out-of-vocabulary Hotshepsut => conspicuous Zhai (???)

64. 64 In natural language, this is quite uncommon

65. 65 Repeat pattern does not hurt Q/A Most repeat words are: Meaningless words: ?, ? Punctuations: , . ? OCR false-alarm words: ? ? Experiment also demonstrates that with employing simple Longest Common Subsequence is the same as using the proposed method (or enumerate all of the state sequences)

66. 66 Video Processing (Experiments) We use a small subset of Discovery videos 30 short clips NTSC 352x240 MPEG-1 1684 frames (sample 2 frames per second) 2166 text areas

67. 67 Experimental Result (Text detection)

68. 68 Experimental Result (OCR)

69. 69 VideoOCR Efficiency Analysis

Toward Multimedia: A String Pattern-based Passage Ranking Model for Video Question Answering

Toward Multimedia: A String Pattern-based Passage Ranking Model for Video Question Answering

Presentation Transcript

LogAnswer Deduction-Based Question-Answering

Building a Simple Question Answering System

Multimedia

Ranking

Pattern Matching on Compressed Texts II

Approaching question 2

Web-based Factoid Question Answering (including a sketch of Information Retrieval )

Introduction to Multimedia

Answering a “ DBQ ” *

Natural Language Processing Question Answering

Chapter 10 Multimedia and the Web

Multi-Perspective Question Answering Using the OpQA Corpus

National University of Singapore at the TREC-13 Question Answering Main Task

Question Answering Passage Retrieval Using Dependency Parsing

Spanish Question Answering Evaluation

Principal Investigator: Howard Wactlar Co-PIs: Mike Christel, Alex Hauptmann, Jianbo Shi

Question Answering

Question Answering

A Noisy Approach to Question Answering

The Composition Filters model