Video analysis has been moving towards more detailed interpretation (e.g., segmentation) with encouraging progress. These tasks, however, increasingly rely on densely annotated training data both in space and time. Since such annotation is labor-intensive, few video region boundaries exist. This work aims to resolve this dilemma by learning automatically generate for all frames of a from sparse...