Implementation of 3D Object Reconstruction Using Multiple Kinect Cameras
نویسندگان
چکیده
Three-dimensional (3D) object reconstruction is to represent objects in the virtual space. It allows viewers to observe the objects at arbitrary viewpoints and feel a realistic sense. Currently, RGBD camera from Microsoft was released at a reasonable price and it has been exploited for the purpose in various fields such as education, culture, and art. In this paper, we propose a 3D object reconstruction method using multiple Kinect cameras. First, we acquire raw color and depth images from triple Kinect cameras; the cameras are placed in front of the object as a convergent form. Since raw depth images include hole regions where don’t have any depth value, we employ a depth-weighted joint bilateral filter using depth differences between center and neighbor pixels in the filter kernel to fill such hole regions. In addition to that, a color mismatch problem occurs in color images from multi-view data. We exploit a color correction method by means of 3D multi-view geometry to adjust color tones in each image. After matching the correspondences between source and target images by using 3D image warping, we obtain the color correction matrix for the target image via a polynomial regression. In order to evaluate the proposed depth refinement method, we estimate the bad pixel rate (BPR) of depth images. Our results show that the BPR of the refined depth image is lower than that of the raw depth image. Through the test results, we found that the reconstructed 3D object by the proposed method is more natural than the 3D object using raw images in terms of color and shape. Introduction The real world consists of a three-Dimensional (3D) space which has X, Y and Z axes. We can reproduce the real world into the virtual environment by 3D reconstruction technique. Recently, the 3D reconstruction has the important role in various fields, e.g., games, films, advertisement, construction, and art. For example, 3D facial reconstruction is used for medical counseling and 3D human body reconstruction is applied to real-time cloth fitting services [1]. Another important example is a cultural asset reconstruction. When a fire had broken out at Sungnyemun Gate in 2008 which is South Korea’s top cultural landmark, a pre-scanned 3D information of Sungnyemun Gate assisted the rebuilding of the cultural asset. After the accident, the Cultural Properties Administration of South Korea acknowledged the importance of 3D virtual reconstruction and proceeded 3D scanning of most cultural assets of South Korea. In addition to that, 3D reconstruction was used for investigation of fire accidents. Unlike the existing ways to record pictures or clips, it record 3D information of the scene which aids the investigation. In this paper, we propose 3D object reconstruction by lowcost Kinect cameras so that users can fast-access accurate 3D reconstruction at a reasonable price. The contributions of this paper include the implementation of a 3D object reconstruction method using multiple Kinect cameras and the acquisition of the color corrected multi-view color images and the corresponding multiview depth images. To achieve this goal, we propose a multiple Kinect camera system. It captures multi-view color and depth videos at 30 frame per second. The resolution of the color and depth image is 640×480. Second, we propose a depth image refinement method for captured multi-view depth images. The proposed algorithms is based on joint bilateral filter [2]. We analyze the existing joint bilateral filter and add a depth weighting factor which is affected by the difference of the depth value between center and neighboring pixel in the filter kernel. Third, we propose a color correction method using 3D multiview geometry. In order to perform color correction on the target image, we find the correspondences between source and target images via 3D image warping. Next, we eliminate outliers from occlusion areas among correspondences using the difference between two points. Then we apply polynomial regression to the correspondence set. Finally, we obtain the color correction matrix on the target image. The overall contribution of this paper is to generate highquality 3D objects in the virtual space. Specifically, depth map refinement from Kinect and color tone matching among multi-view images are performed. Microsoft Kinect Kinect is one of motion sensing input devices developed by Microsoft for Xbox 360 and Xbox One video game consoles and Windows PCs. It allows users to control and interact with their game console or computer just using their own body through a natural user interface using hand & body gestures and spoken orders. The appearance of Kinect is shown in Fig. 1. Figure 1. Appearance of Kinect Kinect as a first-generation was introduced in November 2010 in an attempt to enlarge Xbox 360’s user base. Kinect for windows was released on February 1, 2012. Microsoft released Kinect ©2016 Society for Imaging Science and Technology DOI: 10.2352/ISSN.2470-1173.2016.21.3DIPM-408 IS&T International Symposium on Electronic Imaging 2016 3D Image Processing, Measurement (3DIPM), and Applications 2016 3DIPM-408.1 software development kit for Windows 7 On June 16, 2011 and it meant to allow developers to develop Kinect applications to broaden their own platform. Kinect sensor has a horizontal bar connecting with a small base including a motorized pivot. The device consist of an RGB camera, depth sensor, and multi-array microphone running proprietary software. Figure 2 shows the structure of Kinect camera [3]. Figure 2. Structure of Kinect camera Thanks to those devices, Kinect can provide full-body 3D motion capture, facial recognition and voice recognition capabilities. Table 1 shows the device specification of Kinect. Table 1. Device specification of Kinect Viewing angle Vertical: 43°, Horizontal: 57° Vertical tilt range ±27° Color image size 640×480 Depth image size 640×480 Frame rate (color and depth stream) 30 frames per second Audio format 16-kHz, 24-bit mono pulse code modulation (PCM) Interface USB 2.0 Power AC adapter The sensor has a field of view of 43° vertically and 57° horizontally and the motorized pivot can tilt the sensor up to 27° either up or down. Kinect has various sensors output video at a frame rate of from 9 Hz to 30 Hz depending on resolution. Even though RGB video stream usually uses VGA resolution (640×480), the hardware can capture the video stream up to 1280×1024, and use a colour format as RGB and YUV. The resolution of the depth video stream is in VGA resolution with 11 bit depth (2048 depth levels). Kinect includes Infrared camera in it and can stream the IR view as VGA resolution or 1280x1024 at a lower frame rate. It has a practical ranging limit from 1.2 3.5 m when using it as connecting to Xbox console. The microphone array has four microphone capsules and works with each channel processing 24 bit mono pulse code modulation (PCM) at a sampling rate of 16 kHz. Because Kinect’s tilting system requires more power than the Xbox 360’s USB ports can supply, the device uses USB communication with additional power: power is supplied from the main AC adapter. Proposed 3D object reconstruction method In this paper, we propose the 3D object reconstruction method using multiple Kinect cameras. Figure 3 shows the system structure of the proposed method. We placed three Kinect cameras in front of the object as a convergence form and set the position of the 3D object within the field of view of all Kinect cameras. Figure 4 shows the flowchart of the proposed method Figure 3. System structure of the proposed method Figure 4. Flowchart of the proposed 3D object reconstruction method First of all, we perform a camera calibration on Kinect cameras as a preprocessing step. The camera calibration is performed by using a planar chessboard pattern. We capture the images displaying the planar chessboard pattern as various poses and perform the camera calibration step on it by [4]. Then we can obtain intrinsic and extrinsic parameters for each camera. Next, we refine color and depth images from Kinect cameras as a main process. First, we need to refine the depth image because the raw depth image has a lot of hole regions [5]. In this paper, we refine the depth image by using depth weighted joint bilateral filter. Second, we need to adjust the color image since the raw color image from the multi-view system has a color mismatch problem. When we construct the 3D model in the virtual space, the color Camera parameter Camera Calibration
منابع مشابه
3d Content Capturing and Reconstruction Using Microsoft Kinect Depth Camera 3d Content Capturing and Reconstruction Using Microsoft Kinect Depth Camera
The main motivation behind this thesis was to create a new method of interaction between the physical and virtual space. The object was set to be achieved by creating a 3D-scanner capable of producing 360-degrees scans of real world objects to be visualized and used inside virtual spaces. The implementation of this objective began by conducting a background research about the existing methods f...
متن کاملSimultaneous Calibration: A Joint Optimization Approach for Multiple Kinect and External Cameras
Camera calibration is a crucial problem in many applications, such as 3D reconstruction, structure from motion, object tracking and face alignment. Numerous methods have been proposed to solve the above problem with good performance in the last few decades. However, few methods are targeted at joint calibration of multi-sensors (more than four devices), which normally is a practical issue in th...
متن کاملIndoor 3D Video Monitoring Using Multiple Kinect Depth-Cameras
This article describes the design and development of a system for remote indoor 3D monitoring using an undetermined number of Microsoft® Kinect sensors. In the proposed client-server system, the Kinect cameras can be connected to different computers, addressing this way the hardware limitation of one sensor per USB controller. The reason behind this limitation is the high bandwidth needed by th...
متن کامل3D Object Detection with Multiple Kinects
Categorizing and localizing multiple objects in 3D space is a challenging but essential task for many robotics and assisted living applications. While RGB cameras as well as depth information have been widely explored in computer vision there is surprisingly little recent work combining multiple cameras and depth information. Given the recent emergence of consumer depth cameras such as Kinect w...
متن کاملSemantic Structure from Motion: A Novel Framework for Joint Object Recognition and 3D Reconstruction
Conventional rigid structure from motion (SFM) addresses the problem of recovering the camera parameters (motion) and the 3D locations (structure) of scene points, given observed 2D image feature points. In this chapter, we propose a new formulation called Semantic Structure From Motion (SSFM). In addition to the geometrical constraints provided by SFM, SSFM takes advantage of both semantic and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016