Enhancing Scholarly Document Alignment via Multimodal Fusion for Presentation Summaries

Multimodal Alignment of Scholarly Documents and Their Presentations BamdadBahrani JCDL 2013 Submission Feb 2013

Motivation • How many papers do you read every week? • How many you read deeply? • How many you just skim? • Title, abstract and conclusion  Enough? • A summary of the paper  Most important issues

Motivation • Slide Presentation as a summary • It includes important contents from paper • It is made by the same author • But • Not detailed enough • Misses some technical parts of the paper

Introduction • The Paper • and its Slide Presentation • Alignment map

Previous Works • Hayamaet al. • 2005 • Japanese technical papers and presentation sheets • Using HMM • Kan • 2007 • SlideSeer • Crawling of paper-presentation pairs, aligning them and GUI • Beamer and Girju • 2009 • Detailed analysis of different similarity measures Only Textual Content

Slide Analysis

Error Analysis Around 70% are showing “Evaluation and Result”

Alignment Modals • Text Similarity • Between each slide and each section • The core aligner unit • The baseline • A cosine similarity measure: TF . IDF • Linear Ordering • Ordering between slides and sections are monotonic • Visual appearance of slides

Text Extraction Unit • Presentation • Paper Slide Title text Slide Body text Slide Number Slides MS PowerPoint VB compiler Section Title Section Body PDF XML PDFx Parser (via Python)

Slide Image Classifier Unit • 1. Text • 2. Outline • 3. Drawing • 4. Results Slides Image Take Snapshot Image Classifier

Image Class Instructions • 1. Text • Text similarity alignment weight  Increase 2/3 • 2. Outline • Text similarity alignment weight  Decrease 1/3 • Linear ordering alignment weight  Decrease 1/3 • 3. Drawing • Uniform probability for all weights • 4. Result • Exceptional rule: Align directly to “Experiment and Result” section

Image Classifier experiment and result • 750 Manually annotated slides • Linear SVM • Feature extraction: Histogram of Oriented Gradiants • Blurring filters • Normalization • 10 fold cross validation

Experiments • Experiment 1: • Baseline • Paragraph-to-slide alignment • Only textual data • Experiment 2: • Section-to-slide alignment • Only textual data

Experiments • Experiment 3: • The effect of Linear Ordering alignment was added. • Textual data and ordering information • Experiment 4: • The effect of Image Classification was added. • Textual data, ordering information and visual content

Results 25% Ordering Baseline Section Image Class

Conclusion • Many slides with images and drawings • Textual data is not enough • Taking advantage of graphical features of slides

Future Tasks • Bigger dataset • More efficient text similarity measures • Differentiate between Title and Body text weights • Support more input file format • A GUI to view aligned documents

Thank you…!

System Architcture Input: Presentation Multimodal Fusion Slide Image Classifier 1. Text 3. Drawing nil Text Extraction Textual Similarity 2. Index 4. Results Linear Ordering Output: Alignment Input: Document

Enhancing Scholarly Document Alignment via Multimodal Fusion for Presentation Summaries