Estimating Head Orientation with Stereo Vision
نویسندگان
چکیده
Interpretation of human behaviors in video data is essential for natural and intuitive human-computer interfaces. In this context, the estimation of a person’s head pose plays a major role, since heads and faces are continuously used in interaction between people. In this work we present a method for estimating a person’s head pose with a stereo camera. Our approach focuses on the application of humanrobot interaction, where people may be further away from the camera and may move freely around in a room. First, the 3D scene is reconstructed from the images of a stereo camera by calculating depth information. Subsequently, the face is extracted with a color-based face tracking approach. Finally, the resulting 3D face model is preprocessed by a number of normalization algorithms. The estimation is based on neural networks, which are trained to compute the head pose from gray scale and depth information. We show that depth information not only helps improving the accuracy of the pose estimation, but also improves the robustness of the system when the lighting conditions change. The system can handle pan and tilt rotations from −90◦ to +90◦ and achieves high accuracy in a realistic environment. It doesn’t require any manual initialization and doesn’t suffer from drift during an image sequence. Moreover, the system is capable of real-time processing. Acknowledgements This work was conducted at the Interactive Systems Labs as part of my studies at the Universität Karlsruhe (TH). I would like to thank all members of the laboratory for participating in the various data collections performed during this work. I am particularly grateful for the help of Kai Nickel who was always there for advice regarding the stereo camera and implementation details. Furthermore I thank my advisor Rainer Stiefelhagen for his constant support.
منابع مشابه
Real-time Head Pose Estimation with Stereo Vision
Head pose estimation is an important task for many applications such as human-computer interaction and human action understanding since a person’s head direction has an important role in representing his/her intention. In this paper, we propose a real-time head pose estimation method with stereo vision, which does not stress users and is easily applied to a lot of users. We use the degree of th...
متن کاملRelative Orientation of Two Disparity Maps in Stereo Vision
Two methods to solve the relative orientation of two disparity maps measured by a movable stereo head from distinct viewpoints are presented. The rst method is based on modeled features such as plane normals, axes of cones and cylinders, and vertices of cones. In the second method, the rst disparity map is projected onto the second one and the diierence between the projected and the second map ...
متن کاملMotion and Structure Estimation Using Fusion of Inertial and Vision Data for Helmet Tracker
For weapon cueing and Head-Mounted Display (HMD), it is essential to continuously estimate the motion of the helmet. The problem of estimating and predicting the position and orientation of the helmet is approached by fusing measurements from inertial sensors and stereo vision system. The sensor fusion approach in this paper is based on nonlinear filtering, especially expended Kalman filter(EKF...
متن کاملCalibrating the Eye Motion of an Humanoid Robot
In the human visual system, the projective relationship between the images seen in each eye with each other changes with their motion as the viewer attends to different points in space. The active vision heads built for many humanoid robots approximate human gaze behavior and share this property. Knowledge of this projective relationship is used in stereo vision tasks and is captured entirely b...
متن کاملBinocular photometric stereo acquisition and reconstruction for 3d talking head applications
In order to render a high quality, versatile 3D talking head, a stable, high frame rate AV data acquisition system is constructed. It can capture 3D position, surface orientation and albedo texture of the talking head video images along with the corresponding speech signals. The system consists of a computer controlled LED lighting subsystem; high speed stereo cameras; a microphone; and a compu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003