Mixed Reality and Artificial Intelligence

At our last Scientific Advisory Board Meeting of the year, we had the pleasure of hosting Prof. Marc Pollefeys, head of the Computer Vision and Geometry lab at ETH Zürich and director of the Microsoft Mixed Reality and Artificial Intelligence Zurich lab. Prof. Pollefeys is actively conducting research in computer vision, robotics, machine learning and computer graphics, predominantly focusing on the development of flexible approaches to capture visual representations of real-world objects, scenes and events. He provided us with valuable insights into HoloLens, the next generation mixed reality technology for businesses, to which his team of scientists at Microsoft contributes.

Mixed reality
Recent advances in computer vision, graphical processing power, display technology, and input systems paved the way for mixed reality (MR), a new kind of interaction between humans, their environment and computers. MR finds itself on the intersection of augmented and virtual reality – it allows for merging of real and virtual worlds, thereby producing new environments and visualizations, where physical and digital objects co-exist and interact in real time. Holographic devices, such as the HoloLens, overlay precisely placed and oriented virtual objects on the real-world environment, as if they were actually there.

The Microsoft HoloLens is an untethered holographic computer, which has the shape of AR smart glasses. Released in February this year, its new edition boasts a more comfortable, immersive and business-friendly experience paired with additional options for collaborating in mixed reality.

The device is ergonomic and tailored for use over extended periods of time – attached hardware is light both with respect to its weight and dissipated heat from the built-in energy source. Holographic processing units – application-specific integrated circuits implementing sophisticated deep neural network architectures, are embedded into the HoloLens, enabling its many analytical capabilities and real-time inference. Multiple cameras are integrated into the device – four of them perform environment scanning, a depth camera precisely detects objects in front of user’s eyes and assists with depth perception, while two inward-facing cameras track eye movement for purposes of biometrical identification and imagery optimization. Much attention is given to containing the tracking delay caused by sudden shifts of user’s head or eye position below the threshold of human perception time (~10 ms), to provide the impression that holograms are rigidly attached to the world. Multiple microphones, some of them placed close to the user’s mouth, make sure the device can understand audio commands in loud environments. It wouldn’t work if all microphones were equally close to the mouth. The five microphone array (two of which are close to the mouth) allows to suppress environment noise.

One of the key components of a powerful deep learning model is data used for its training – the more abundant it is, the better the universal approximation performed by the network. Devices such as the HoloLens can be used to collect new data samples in real-time, store and process them on the device or share to remote servers for computationally intensive tasks.

Virtual reality and visual anchoring
MR devices also allow for simulation of artificial environments, which users can utilize for entertainment or education (e.g. surgeons can practice complex operations, pilots can use the device instead of a flight simulator). Visual anchoring refers to the process of creating virtual reality (VR) environments and holograms and embedding them into real world locations. This allows other people to manipulate and interact with shared virtual objects, or to fully integrate themselves into VR environments.

Semantic modelling
Semantic modelling is the challenging task of building a dense geometric model from images and videos while at the same time inferring the semantic classes of each individual pixel in the reconstructed model. Prof. Pollefeys demonstrated impressive examples of real-time semantic annotation and dense 3D environment reconstruction – the software generated rich representations of encountered scenes, understanding much more than just their geometric layout. In addition to remote control of robotic devices, explorative mapping of inaccessible areas and simulation-assisted learning of human motion, semantic modelling has its applications in many other areas, such as autonomous driving and navigation.

Remote collaboration
Another exciting application of cutting-edge mixed reality technology is remote collaboration, where on-site employees in remote locations can work side-by-side with experts to solve challenging problems in real time. A field of vision can instantly be shared with people thousands of kilometers away, who can then create and place holographic objects in a real-world environment to assist and guide an operator’s work.