Develop a depth estimation method capable of learning human vision-based cues (such as semantic meaning, blurring, or texture), for its application on 2D background extraction and its reconstruction in 3D.
- 3D Computer Vision
- Geometric Deep Learning
- Video Recognition/Detection/Segmentation
What do you get
- A challenging assignment within a practical environment
- Professional guidance
- Courses aimed at your graduation period
- Support from our academic Research center at your disposal
What you will do
- 65% Research
- 10% Analyze, design, realize
- 25% Documentation
Monocular depth estimation has been one of the most researched problems in computer vision since its early days. Although we humans can do it naturally, estimating depth from a 2D image computationally is an ill-posed problem: for the same 2D image, there are many solutions for a matching 3D point cloud. Despite this, monocular depth estimation has been a thoroughly researched topic in the past decade, and recent methods based on deep learning have achieved remarkable results.
The utilized neural networks seem to correctly infer semantic meaning from pictures most of the time: for instance, a well-outlined person is generally correctly predicted as being closer to the camera than its background. However, such semantic meaning, or other visual depth cues that we humans use, is never explicitly taught. Therefore, learning these implicit features highly depends on the quality of the training data. Due to this, most state-of-the-art depth estimators still fail when presented with images that strongly differ from the training ones, for instance in terms of the camera’s focal length and viewpoint.
Background extracted images usually display landscapes, scenarios where several of the elements are consistent between images and their general depth characteristics are known in advance; some examples are the ground plane, the sky, or a group of buildings. Their uniform depth properties (the sky being located at the optical infinite, or the ground being a horizontal plane) could be explicitly derived through image segmentation and classification. Exploring this and other biological cues –like defocus and depth of field, or texture gradients, which consistently vary with distance– could help improve current depth estimation methods specifically applied for these kinds of scenes. This would facilitate their reconstruction in 3D, which could have a broad range of applications, in fields such as architecture and civil engineering (spatial modelling), healthcare (3D-aware vision assistant for the visually impaired), or security (crowd planning and control), among many others.
About Info Support Research Center
We anticipate on upcoming and future challenges and ensures our engineers develop cutting-edge solutions based on the latest scientific insights. Our research community proactively tackles emerging technologies. We do this in cooperation with renowned scientists, making sure that research teams are positioned and embedded throughout our organisation and our community, so that their insights are directly applied to our business. We truly believe in sharing knowledge, so we want to do this without any restrictions.
Read more about Info Support Research here.