1 / 13

Step 1. Semi-supervised

Step 1. Semi-supervised. Given a region, where a primitive event happens. Given the beginning and end time of each instance of the primitive event. Two resolutions: Far field (encoding using trajectory) Near and middle field (represent using appearance). Near/Mid Field.

sook
Download Presentation

Step 1. Semi-supervised

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Step 1. Semi-supervised • Given a region, where a primitive event happens • Given the beginning and end time of each instance of the primitive event

  2. Two resolutions: • Far field (encoding using trajectory) • Near and middle field (represent using appearance)

  3. Near/MidField Inthis region, two types of primitive event are shown as exemplar: vehicle drive and human walk

  4. FarField

  5. Basic event in far field

  6. Learn motion from exemplar clips(A toy example) Motion clips to learn from clothing, position, scale unstable camera background fast lighting • Challenges: • Variety object appearance (human in different clothes) • Varity in background clutter, lighting change • Small disturbation in object’s positionand/or scale • Small different in motion speed.

  7. Learning motion templates (A toy example) A motion template is learned key-frame by key-frame. At each key-frame, a static active basis template is learned by pooling over the images of the “same” gesture from multiple videos. Since motion has variance in position, speed and scale, the correspondence is found by local max over a small spatial-temporal neighborhood . frame 15 frame 19 frame 10 frame 13 Pooling over the “same” gesture frame 5 Local maximum: Perturbation over a small spatial-temporal neighborhood. [dx, dy, dt, dScale] frame 7 frame 1 frame 1 y t Motion 1 (faster) Frame 5 motion1 correspond to frame 7 of motion 3, since motion 2 is slower. x Motion 3 (slower )

  8. Results (A toy example)

  9. 本页预备增加一个动画 上面先给一条公共时空轨迹 下面给出对应不同instance的动画例子

  10. Zhenyu, at the beginning, we want to use the object in each 5 (or more) frames, and to align them in the spatial space using EM active basis, now I think we should align them in the spatial-temporal space. 最简单的情况就是比如一个instance用10秒另一个用6秒,采2个样应该第一个每隔5秒,第二个每隔3秒 • 稍微复杂一点的情况是,比如完成一个转弯动作,一个车进弯时快,出去慢,另一个进时慢,出去快

  11. 而且还有一个问题就是既然是用coding来做,那么选择中间的代表帧应该选择变化严重的地方选,就像Chellappa的一篇文章,他把轨迹分成了许多可以用线性方程描述的片段。我们的代表性元就应该是这个线性段的中断点 –学习这种选择会学出一些很有趣的东西,这个使我想起了给你发的论文PPT里面CMU的工作,他是手动的根据人的先验把动作分块,咱们这个可能将这个过程变成自动的(包括形状的复杂变化和轨迹的中间变化) • 这样两种方法的实现方法上的不同就体现出来了,利用active basis作为contour的方法可能利用不同物体在对应帧的Active Basis轮廓的对应更加有效,而不是利用同一物体帧间的correspondence。而基于轨迹的方法因为需要在不同物体之间匹配的是轨迹,所以实际操作时需要使用的是同一物体帧间的correspondence

  12. Step2. Unsupervised

More Related