Tuesday, April 13, 2010

IUI '08 (assignment): Temporal semantic compression for video browsing

Authors:
Brett Adams Curtin University of Technology, Perth, W. Australia
Stewart Greenhill Curtin University of Technology, Perth, W. Australia
Svetha Venkatesh Curtin University of Technology, Perth, W. Australia

Paper Link:
http://portal.acm.org/citation.cfm?id=1378773.1378813&coll=ACM&dl=ACM&type=series&idx=SERIES823∂=series&WantType=Proceedings&title=IUI&CFID=81639924&CFTOKEN=12013848

Adams et. al. set out a video browsing approach known as Temporal Semantic Compression (TSC) that allows for unique ways of browsing and playing video data based on tempo and interest algorithms.

With interest algorithms, which can be installed to the browser using customizable plug-ins, a video can be filtered in terms of what the user is looking for in the video. An interesting application highlighted in the paper is that of applying different interest algorithms based on the genre.

For example, we could use:
excitement algorithms for sports
anxiety for home surveillance and news story change
attention for home home videos
etc...



The controls for the temporal compression based video browser employ a 2d spatial control on the display screen where the horizontal axis controls the point in the video whereas the vertical axis controls the compression. (compression is the amount of the video remaining from the original. i.e. 20% compression leaves 20% of the shots from the original video. 100% compression only leaves the "most intersting" frame of the video.)



The main measure of interest to derive which frames are selected in compression is calculated by determining the tempo. Tempo is determined by the director of the video by using action, music, dialog to affect the audiences sense of the time in the film. This video compression browser measures tempo by the pan, tilt, volume.
The calculation is as follows:



3 timescales:
1) Frame level features are in the timescale in the original movie. Adjusts playback point.
2) Shot level features are in the timescale that weights the timescale durations as being equal.
3) Compression level is where the compression functions can be changed.

Example compression functions:

Default (linear) - playback is in a linear pace much like the regular playback and fast forward functions.

Midshot - takes a constant amount from each shot (section) chosen by the pacing algorithm

Pace Proportional - uses the pacing tempo to continuously vary the playback speed. When the tempo is low the playback increases leading to more playback from higher tempo sections. (i.e. the more important sections are favored for playback)

Interesting shots - Applies speed up and compression and entire shots that consist of lower tempos are left out.

Adams et. al. tested their system on several movies, news shows, commercial, cartoons and talk shows and found that their compression algorithm could successfully pull out meaningful and interesting chunks of shots from the clips.

Video: (should make it easier to understand)
video

__________
My Spill:

The Temporal Semantic Compression scheme is a great idea from my perspective. Most media players only support regular playback and fastforward and scene selection but I've never seen a browsing tool for choosing interesting parts of the video.
That's really cool.

The plugable functions could make the user able to search for different points of interest. (maybe I just want to find the action scenes in a movie.)

The real improvement in their interface would be to reduce the amount of metrics are shown so that screen space can be maximized.

No comments:

Post a Comment