At Univrses, we conduct original research into perception capability. We develop algorithms to tackle hard problems in computer vision and machine learning, often with state-of-the-art performance in terms of accuracy and robustness. Our technical focus is in areas like 3D Positioning, 3D Mapping, 3D Localization, Spatial Deep Learning and Sensor Fusion. We have published at many top conferences and regularly share our ideas and discoveries with the scientific community. We also support the growth of young talent by sponsoring Master's students during their thesis. Please get in contact if you would like to know more.
Depth is crucial for many computer vision and robotics applications such as navigation, mapping and object grasping. Monocular Depth Estimation refers to the task of inferring a depth map from a single image, without exploiting expensive laser scanners, multi-camera arrays or depth sensors. Many architectures are capable of generating depth maps from monocular images but they often lack precision in ambiguous environments or when they encounter objects rarely seen during training. In response to this problem, we employed a traditional Visual Odometry algorithm to give a prior structure of the environment to enhance the final depth estimation.
In Deep Learning, we usually focus on one single task. This is performed by training the network to maximize or minimize one specific metric or benchmark, by learning only task-specific descriptors.
Training a neural network on multiple different - but intrinsically related - tasks could enable the network to generalize better on the original task. Moreover, a multi-task network with a shared feature extractor requires one single forward pass at prediction time for all tasks, making it considerably more efficient during deployment. Here at Univrses, we exploited the synergy between stereo matching and semantic segmentation to achieve high accuracy real-time performance even on mobile devices.
Convolutional Neural Networks (CNN) have proven to be an exceptional tool in the field of computer vision. The performance of these algorithms on traditional computer vision tasks such as semantic segmentation and object classification has surpassed those of previous methods. However, CNNs require significant quantities of annotated data to train them to perform at such high levels. Producing these training datasets is time consuming and costly.
In response to this problem, many have considered the use of a synthetic environment to generate training data, significantly reducing the cost and time of production. However, CNNs suffer huge drops in performance if the environment in which the network is deployed differs significantly from the dataset used to train the network. This shortcoming is the major factor why synthetic data has not been a viable option for generating training data.
Domain Adaptation techniques focus on providing a solution to this weakness. Typically effort is focussed in two areas; either the network is trained using domain-agnostic features; or a process of data translation is performed whereby an image produced from a synthetic environment is enhanced to resemble more closely the deployment environment. This process is usually achieved through training of Generative Adversarial Networks (GANs) or Variational Auto Encoders (VAE). At Univrses, we combined semantic segmentation with a GAN to achieve best-in-class performance when generating photorealistic images from a simulated environment.
Semantic segmentation of an image yields enhanced understanding of a scene. Common tasks include classification of elements within the scene into semantic categories (e.g. car, bus, human) and detecting specific instances of these categories (e.g. human #1, human #2, etc.). Typically, a CNN is trained to carry out these tasks. The output of such a network can also have the potential to improve the performance of several classic geometrical computer vision applications.
At Univrses we study and design algorithms that extend the capability of geometry-based SLAM pipelines. Using CNNs to improve scene understanding enables the development of hybrid algorithms that combine the strengths of neural networks with the precision of geometrical approaches. This leads to algorithms that are more robust and perform more precise relocalization than those using purely geometrical methods.
Obtaining an accurate estimation of the camera model is a fundamental problem for any modern computer vision algorithm. Even small errors in the camera model parameters can result in significant performance drops of even the best algorithms. Such parameters can be hard to estimate; they often require labour-intensive and strict calibration protocols as well as an expensive calibration rig. As a result, the calibration process is done only occasionally and usually offline.
At Univrses, we have developed a state-of-the-art algorithm to estimate efficiently the relative position and orientation of sensors (with respect to each other) dynamically as a system operates in the real-world. Our algorithm uses a dual-quaternion formulation of the problem of estimating camera model parameters. Our approach allows us to translate the generally non-convex problem of estimating six degrees of freedom to a convex one-dimensional root finding problem. We have shown it to be comparable, in terms of accuracy, to classical approaches to the problem but to achieve this with lower computation time and guaranteed optimality.