Novel Keyframe Selection Methods in Passive Visual Lifelogs

Investigating keyframe Selection Methods in the Novel Domain of Passively Captured Visual Lifelogs研究Keyframe的新奇選擇方法基於被動式拍照的視覺生活紀錄報告者：陳冠州 2012.09.26

Outline • Introduction • Background and related work • Keyframe approaches • Experiment overview • Results and discussion • Conclusions

1.Introduction • 目標 • 自動拍照 • 檢索個人活動 • 瀏覽網頁、接收email、交談、活動參與情形 • 硬體使用Microsoft SenseCam • 被動紀錄生活經驗的穿戴式數位相機 • 紀錄成一連串的相片 • 類似影像內容 • 可以分割成不連續的unit 或稱為event

1.Introduction (cont.) • 大量的影像 • 成長速度約1900張/天 • 22個事件/天 • 挑出有意義的資訊做管理　具有挑戰性 • 挑出可代表event的影像(keyframe)是很重要的 • 穿戴設備的人可以快速的審閱產生的內容判斷是否具高度相關性 • 天天產生的內容需要得到有效率的處理 • 有很大的比例照到品質差的相片

2.Backguound And Related Work • 事件分割(Event Segmentation) • 需要自動劃分LifeLog的事件 • LifeLog的事件分割比Video的事件分割難 • LifeLog每張間格50秒，Video是連續畫面(容易判斷出事件分割點) • 場景分割偵測(scene boundary detection) • 非分鏡偵測(shot boundary detection) • 一個場景通常由多個分鏡構成 • 因為事件或活動具有內在的含意 • 使用這種方法是不切實際的無法預期performance

2.Backguound And Related Work(cont.) • 推薦結合影像特徵(content)及sensor感測值(context) • F-Measure value為0.6237(越接近1越好) • F-Measure=2PR/P+R http://botonnote.blogspot.tw/2011/10/retrieval-precision-recall-f-measure.html • P: Precision (精確度)　找到內容中　有意義的比例 • R:Recall(召回率)　找到有意義於所有有意義的比例

2.Backguound And Related Work(cont.) • Keyframe選擇 • 一個事件是由多張圖片組成 • 每個事件需選出一張當作Keyframe • 比較不同的Keyframe選擇方法 • baseline • 選擇該事件中，與該事件其他圖最靠近的影像 • 選擇該事件中，與該事件其他圖最靠近的影像但與其他事件中的圖最具區別(不同)的影像 • 後兩者方法較費時計算，可運用GPU加速

3.Keyframe Approaches • 選Keyframe前，先分出event • Paper中會介紹到傳統的技術及新奇的技術 • 事件分割(Event Segmentation) • 先以天為單位分解成一個大區塊 • 每張圖都使用MPEG-7descriptors及SenseCam 的 sensor值作為描述圖片的方法 • MPEG-7descriptors [1] • Colour layout • Colour structure • Scalable colour • Edge histogram

3.Keyframe Approaches(cont.) • 分割一整天的圖片流程[5] • 將一張圖片分出幾個大區塊，再與其他圖做比較 • 決定一個門檻，當visual或sensor值有大改變時，即可得知event的分界 • 移除太相近的event分界點 • 此方法的 • Precision (精確度)：62.57% • Recall(召回率)：62.17% [5]A. R. Doherty and A. F. Smeaton. Automaticallysegmenting lifelog data into events. In WIAMIS 2008 -9th International Workshop on Image Analysis forMultimedia Interactive Services, 2008.

3.Keyframe Approaches(cont.) • Traditional Keyframe Selection Techniques • Three approaches(後面說明) • Middle image • Most representative of a given event • Most representative of a given eventbut also most different to the other events • 實驗 • 使用上面三個方法 • 101 events, 8,247 images • 1-5 Likert scale

3.Keyframe Approaches(cont.) • 徹底的比較imageVS. 只比事件平均image • Select the image that is closest to all other images in the event. (需要nx n次比較，n為event內的圖片數量) • 每次比較都要重新計算，選出一張跟其他張差距和最小的 • Select the image that is closest to the average of all the other images in a given event. (只需要n次比較) • 只需要算出事件的平均值，找最接近平均值者 • 結論 • 1-5 Likert scale： (3.35 VS. 3.33) • 差距些微，但前者耗時很多

3.Keyframe Approaches(cont.) • Cooper & Foote 的另一種方法 VS. 本篇方法 • Select the image that is closest to all the other images in a given event, but most different to all the other images in the other events [4]. • 此方法前部分需要Nx N次，後部分需要N x M次比較，共 Nx(N+M) 次比較N：一個事件影像數，M：一天影像數≒1900 • Selection of the image that is most representative of an event by being closet to the average value of that event, but also what distinguishes(區別) it best from other events (by comparing against the average value of each of the other events). • 每個事件先算出最靠近平均值的image，比較就會比較快速 • 此方法只需要N x E次比較，E：一天event數量≒22

3.Keyframe Approaches(cont.) • 結論 • 3.01 vs. 3.14 • 本篇方法只需Cooper & Foote4%的運算量 [4]M. Cooper and J. Foote. Discriminative techniques forkeyframeselection. In ICME 2005 – IEEEInternational Conference on Multimedia and Expo,2005.

3.Keyframe Approaches(cont.) • Best Vector Distance Metric(向量距離測量) • 前面幾個章節都需要計算及比較距離作者比較了以下幾種方法 • Histogram Intersection(Likert score 3.35) • Kullback-Leiber (3.30) • Manhattan (3.54) • Euclidean (3.59) • Euclidean approach performed best • 在事件中增加重視(權重)中間部分的圖片(Middle of event) • 選擇開始或結尾的圖危險的可能性會增加 • Likert score： 3.59 VS. 3.25

3.Keyframe Approaches(cont.) • 正規化與資料融合 • 正規化 • v’ = v /Max – Min • 正規化到0~1之間 • 資料融合 • 使用 CombSUM • 將多筆正規化後的資料，加總做排名

3.Keyframe Approaches(cont.) • Image Quality Measures • Contrast Measure (對比測量) • 轉換色彩空間(RGB→YUV)(*1) • 將整張影像以8x8為一個block做劃分 • 每個block中，找出Y(亮度)的最大值及最小值做相減，得到該block的對比 • 平均所有block的對比，則為整張影像的對比 *1.YUV：色彩空間的一種，Y代表明亮度，UV為色度及濃度

3.Keyframe Approaches(cont.) • YUV色彩空間 YUVY UV 圖片來源： http://nauful.com/pages/imagecompression.html

3.Keyframe Approaches(cont.) • Colour Variance (顏色變異) • 測量顏色的豐富程度 • 需要色彩空間中，8個主要顏色(*2)對每個pixel算距離 • 主要顏色對每個pixel之間距離(*3)最小者，則將該pixel分至該主要顏色堆內 • 算出每個主要顏色堆的平均變異數，根據經驗來說，平均變異數會比閥值高20% *2.八個主要顏色為，黑、白、紅、綠、藍、黃、青、洋紅 *3.Euclidean distance(歐幾里德距離)：

3.Keyframe Approaches(cont.) • Noise Mesure (雜訊測量) • 為了計算整個影像的雜訊量，我們需要每個pixel都進行檢測 • 每個3x3的區塊中，分別計算每個pixel的值跟3x3區塊中平均值的歐幾里德距離 • 在3x3區塊中間的值歐幾里德距離，如果是最大的(跟周圍8個比較)，則那個pixel將被標記為雜訊 • 最後計算被標記為雜訊的pixel佔全部的pixel多少% 中間的值

3.Keyframe Approaches(cont.) • Global Sharpness (全域清晰度)[17] • 做垂直方向的Sobel以找出邊緣 • 依序掃描整張影像的每一列 • 邊緣的開始與結束位置被定義為區域極值的位置，而這個位置接近邊緣 • 這個邊緣的寬度不同於給出的結束跟開始位置，而對於這個邊緣，這個寬度被認定為區域模糊的大小 • 整張影像的模糊大小為每個區域模糊大小的平均 [17]P. Marziliano, F. Dufaux, S. Winkler, andT. Ebrahimi. A no-reference perceptual blur metric.Image Processing. 2002. Proceedings. 2002International Conference on, 3, 2002.

找垂直邊緣 邊緣數=0(number of edges) 全部模糊量=0(total blur measurement) 是否為最後一個pixel 模糊量 = 全部模糊量/邊緣數找出邊緣的開始及結束位置(區域極值) 現在的位置是否有垂直邊緣計算區域模糊量(邊緣寬度) 到下一個pexel 全部模糊量=全部模糊量+邊緣寬度邊緣數=邊緣數+1

3.Keyframe Approaches(cont.) • Selecting a quality approach (選擇評估品質的方法) • 全面性的評估(各種Image Quality Measures混合搭配使用) • Contrast、Colour Variance、Global Sharpness、Noise、Saliency、Accelerometer、Light Sensor • 有8248張影像 101個事件分類 • 採用正規化後之低階影像特徵和/或 sensor值 • 一個標註者用five-point Likert scale (*1) 評分 (*1)Likert scale：李克特量表 1.強烈反對 2.不同意 3.既不同意也不反對 4.同意 5.堅決同意

3.Keyframe Approaches(cont.) • 11種評估品質的方法(quality approaches) • Sensor Values(2.91) • Accelerometer Sensor、Light Sensor • Basic Quality(2.21) • Blur、Noise、ColourVariance • Weighted Approach 1(2.72) • Blur (0.2)、Noise (0.2)、Colour Variance (0.6) • All Quality Measures(3.67) • Blur、Noise、Colour Variance、Contrast、Salience • All Quality & Sensor(3.40) • Accelerometer、Light、Blur、Noise、Colour Variance、Contrast、Salience

3.Keyframe Approaches(cont.) • Combination Approach 1(3.42) • Accelerometer Sensor、Noise、Colour Variance、Contrast、Salience • Combination Approach 2(3.72) • Blur、Noise、Colour Variance、Light Sensor、Salience • Simple Approach 1(3.72) • Contrast、Salience • Simple Approach 2(3.49) • Blur、Contrast、Salience • Simple Approach 3(3.67) • Blur、Colour Variance、Contrast、Salience • Weighted Approach 2(3.70) • Blur、Noise (0.25) • Constrast、Salience (0.75)

3.Keyframe Approaches(cont.) • 結論 • 4, 7, 8, 10, 11有較好的結果，7, 8最好 • 將Blur以Accelerometersensor取代 4→6 • 3.67 VS. 3.42 結果不好 • 將Constrast以Light sensor取代 4→7 • 3.67 VS. 3.72 結果較好 • 方法8的運算的速度比方法7要來的好 Figure 4: Comparision ofperformance as either betterthan, equal to orworse than theaverageperformanceof uniquekeyframesselected

3.Keyframe Approaches(cont.) • Approaches for Investigation(Keyframe select) • Middle Image (Baseline) • Select middle image. • Within Event (不超過event) • Select the image within the event that is closest to the average value of all images in the event. • 選擇一張最靠近事件圖片平均值的圖片 • Cross Event (超過event) • Select the image within the event that is closest to the average value of all the images in this event, but most different to the average value of all the images in the other events of that same day. • 不僅符合Within Event，還要與其他事件的圖片平均值最不同 • Image Quality • Select the image with the highest quality. • 選擇品質最好的圖片

3.Keyframe Approaches(cont.) • Within Event and Image Quality Fusion • Select the image that is most representative of the event, but which also has a good quality score. • 選擇最具代表性的圖片且也有很好品質分數 • Cross Event and Image Quality Fusion • Select the image that is most representative of the event, that also has a good image quality, and is finally distinguishable from the images in the other events. • 選擇最具代表性的圖片，也有很好品質分數，且與其他事件圖片有區別(不同)

4.Experiment Overview • Segmentation method [5] 分割出事件 • 有6種選擇Keyframe的方法 • 假設一個事件有可能選出6種不同的Keyframe • SenseCam使用者必須以five-point Likert scale判定6種方法取出之Keyframe代表事件的程度(1~5) • 若不同的方法選到相同的圖，只需判別一次 • 一致性 • 減少判別(5597次)

4.Experiment Overview (cont.) • Keyframe標註工具 • 上方顯示標註進度 • 左邊為要標註的圖，右邊是那個enent所有的圖 • 下方為five-point Likert scale評分處

5. Results and Discussion • 五位使用者判斷完13,410個Keyframe需求 • Image Quality與Within Event或Cross Event結合 • 最有效的方法(3.99) • 比baseline好8.4% • 前者效率較好

5. Results and Discussion(cont.) • Qualitymeasures • 對 User3 較無效果 • 對User1、5 非常有效 • 此方法的performance是變動的，但跟其他方法結合是有效的

5. Results and Discussion (cont.) • Overall daily average performance • “Within Event”或”Cross Event”結合image quality在大部分的event將會勝出 • 80%至少一個方法比baseline好 • Quality measure整體表現好，但不代表單一天也會好

5. Results and Discussion (cont.) • Difficulty in Selecting Correct Keyframe • 一個事件中，可能有一個以上的活動可代表事件Keyframe，故很難選擇 • 使用者標註時，盡可能標註趣事 • 有大量的visual change(視覺畫面改變) • 如：某事件代表穿戴者從家裡走路到附近超商 • 包含：下樓梯、開門、走在馬路上、接近超商、抵達超商

5. Results and Discussion (cont.) • Selection of events with high visual change • 為了判定大量高變化的影像，需要使用MPEG-7 的特徵

5. Results and Discussion (cont.) • Performance of approaches on events with high visual variability • 高度視覺變化事件，六種方法的performance • High visual variability 的 performance 比 all events 低 • 此時image quality的performance

5. Results and Discussion (cont.) • 儘管Quility有結合其他方法的approach，但在high visual時，只用Quility在usre1、5有很好的表現

6. Conclusions • 選擇一個恰當的圖來當Keyframe是很具挑戰 • 全部的圖有高達40%的圖都是品質不好的 • 傳統的方法沒有考慮到影像的品質 • 唯一的缺點是較耗費計算 • 69.92%Quility比Middle(baseline)表現好 • 比全部的平均好6.07%

Novel Keyframe Selection Methods in Passive Visual Lifelogs