190 likes | 260 Views
This study presents a method to align scholarly documents and their presentations using text, visual content, and images. The aim is to provide a comprehensive summary including important technical details often missed in slides. Previous works and error analysis are discussed, noting a substantial improvement in alignment accuracy by incorporating visual features. Various alignment modalities, textual content analysis, text extraction, and image classification methods are explored. Experiments demonstrate the effectiveness of different alignment strategies, with a focus on enhancing alignment quality and incorporating graphical elements. Future tasks include expanding datasets, refining text similarity measures, optimizing content weights, and supporting additional file formats with the goal of developing a user-friendly GUI for document alignment visualization. Thank you!
E N D
Multimodal Alignment of Scholarly Documents and Their Presentations BamdadBahrani JCDL 2013 Submission Feb 2013
Motivation • How many papers do you read every week? • How many you read deeply? • How many you just skim? • Title, abstract and conclusion Enough? • A summary of the paper Most important issues
Motivation • Slide Presentation as a summary • It includes important contents from paper • It is made by the same author • But • Not detailed enough • Misses some technical parts of the paper
Introduction • The Paper • and its Slide Presentation • Alignment map
Previous Works • Hayamaet al. • 2005 • Japanese technical papers and presentation sheets • Using HMM • Kan • 2007 • SlideSeer • Crawling of paper-presentation pairs, aligning them and GUI • Beamer and Girju • 2009 • Detailed analysis of different similarity measures Only Textual Content
Error Analysis Around 70% are showing “Evaluation and Result”
Alignment Modals • Text Similarity • Between each slide and each section • The core aligner unit • The baseline • A cosine similarity measure: TF . IDF • Linear Ordering • Ordering between slides and sections are monotonic • Visual appearance of slides
Text Extraction Unit • Presentation • Paper Slide Title text Slide Body text Slide Number Slides MS PowerPoint VB compiler Section Title Section Body PDF XML PDFx Parser (via Python)
Slide Image Classifier Unit • 1. Text • 2. Outline • 3. Drawing • 4. Results Slides Image Take Snapshot Image Classifier
Image Class Instructions • 1. Text • Text similarity alignment weight Increase 2/3 • 2. Outline • Text similarity alignment weight Decrease 1/3 • Linear ordering alignment weight Decrease 1/3 • 3. Drawing • Uniform probability for all weights • 4. Result • Exceptional rule: Align directly to “Experiment and Result” section
Image Classifier experiment and result • 750 Manually annotated slides • Linear SVM • Feature extraction: Histogram of Oriented Gradiants • Blurring filters • Normalization • 10 fold cross validation
Experiments • Experiment 1: • Baseline • Paragraph-to-slide alignment • Only textual data • Experiment 2: • Section-to-slide alignment • Only textual data
Experiments • Experiment 3: • The effect of Linear Ordering alignment was added. • Textual data and ordering information • Experiment 4: • The effect of Image Classification was added. • Textual data, ordering information and visual content
Results 25% Ordering Baseline Section Image Class
Conclusion • Many slides with images and drawings • Textual data is not enough • Taking advantage of graphical features of slides
Future Tasks • Bigger dataset • More efficient text similarity measures • Differentiate between Title and Body text weights • Support more input file format • A GUI to view aligned documents
System Architcture Input: Presentation Multimodal Fusion Slide Image Classifier 1. Text 3. Drawing nil Text Extraction Textual Similarity 2. Index 4. Results Linear Ordering Output: Alignment Input: Document