Who's Better, Who's Best: Skill Determination in Video using Deep Ranking

March 2017

Hazel Doughty, Dima Damen, Walterio Mayol-Cuevas

We present a method for assessing skill of performance from video, for a variety of tasks, ranging from drawing to surgery and rolling dough. We formulate the problem as pairwise and overall ranking of video collections, and propose a supervised deep ranking model, which characterises the relative differences in performance between a pair of videos. Each pair consists of two videos: one ranked higher than the other by human annotators in terms of the skill displayed. By assigning videos a relative score of skill for the given task, we can create a skill ranking for a set of videos. We utilise a two-stream Temporal Segment Network to capture both the type and quality of motions and the evolving task state.

Results demonstrate our method is applicable to a variety of tasks, with the percentage of correctly ordered pairs of videos ranging from 70% to 82% for all four datasets. We demonstrate the robustness of our approach via sensitivity analysis of its parameters. We see this work as effort toward the automated and objective organisation of how-to videos and overall, generic skill determination in video.

Skill Determination Overview


Hazel Doughty, Dima Damen and Walterio Mayol-Cuevas (2017). Who's Better, Who's Best: Skill Determination in Video using Deep Ranking. PDF | arxiv


The datasets for the Drawing and Chopstick-Using tasks will be made publically available alongside the annotated pairs for each dataset.