Animals’ eyes are extremely efficient, and researchers have been trying to reproduce their function in robots for many years. However, despite the latest improvements in artificial intelligence, a computer’s visual data processing still cannot compete with the speed and accuracy of biological visual systems.
When it comes to the visual perception of autonomous robots — think self-driving cars and drones, which need to accurately “see” their surroundings — one of the fundamental issues is motion estimation. How can we make sure that our robot correctly perceives the three-dimensional movement in space of objects in motion?
One way to get around this issue is using event cameras, which are basically bio-inspired sensors that produce motion information to give a robot a sense for depth and movement. These “robotic eyes” generally offer an advantage compared to the traditional video cameras we are all familiar with, as they allow for better visual efficiency, speed, and accuracy, even in the dark.
However, this increased accuracy comes with an increased computational cost, which slows the system down, effectively cancelling the speed advantages they provide.
In order to solve this problem, researchers Shintaro Shiba and Yoshimitsu Aoki from Keio University in Japan, and Guillermo Gallego from the Science of Intelligence Excellence Cluster at TU Berlin, Germany developed a new method that allows robots to estimate motion as accurately as before without compromising speed.
It utilizes event camera data, just like the previous method, but also uses something called “prior knowledge”, a sort of robotic common sense that automatically removes data that is deemed “unrealistic”, thereby lightening the process and reducing computational effort. This discovery is important for future research and may find applications in areas such as driverless cars and autonomous drones.
Silicon retinas
Often called “silicon retinas”, event cameras mimic the visual system of animals — a bit like the photoreceptor cells in the human retina, each pixel in an event camera produces precisely timed outputs called events, which differs from still images we are all familiar with generated by conventional cameras.
“The cameras naturally respond to the moving parts of the scene and to changes in illumination,” said Gallego, head of the Robotic Interactive Perception laboratory at the Science of Intelligence Excellence Cluster (TU Berlin).
“Event cameras offer many advantages over their traditional, video-based counterparts, such as a very high dynamic range, resolution in the order of microseconds, low power consumption, and data redundancy suppression,” he continued. “We work on designing efficient motion estimation methods, which comes with great challenges.”
Trade-off between accuracy and estimation
Even if the event camera is fast, as a complex system, a robot doesn’t always process information as fast as the sensor does, because its slow algorithms create a bottleneck in data processing.
Therefore, what happens is that existing motion estimation methods for event cameras tend to be either accurate but slow, or fast but inaccurate. The techniques that make these algorithms stable are often computationally expensive, which makes it difficult for them to run in real-time.
To solve this issue, Shiba, Aoki, and Gallego have improved a framework called Contrast Maximization that achieves state-of-the-art accuracy on motion estimation without sacrificing execution time.
The key idea being that some motions are more realistic than others, and this can be used as additional information in the framework. The experimental results show that the proposed method is two to four times faster than previous approaches.
The researchers also demonstrated an application of the proposed method that estimates time-to-collision in driving scenarios, which is useful for advanced driver-assistance systems.
“We’ve started by analyzing failure cases that did not produce realistic motion speeds, caused by the existence of multiple sub-optimal solutions,” said Shiba, a Ph.D. candidate and author of the study. “Contrast Maximization is a useful motion estimation method, but it needs additional computing workload to run stably in certain scenarios, such as when the robot is moving forward. We tried to find the root cause of the failure and managed to measure the failure based on geometric principles of the visual data.”
In other words, the researchers were able to quantify the failure by calculating how fast the images in the camera change in size (which would suggest that the object is getting closer or moving away).
Using this measurement, the proposed method prevents the failure cases in question. “Event cameras have a great potential, and with this method, we can further leverage this potential and get a step closer to real-time motion estimation,” said Aoki, Professor at the Department of Electronics and Electrical Engineering of Keio University. “Our method is the only effective solution against these failure cases without trading off workload, and this is a critical step towards mobile-robot applications for event cameras and the Contrast Maximization framework.”
Improving driverless cars
The researchers applied the method to estimate one of the most important parameters in autonomous driver scenarios, also known as Advanced Driver-Assistance Systems or ADAS, namely time-to-collision. The camera, placed in the car, computes motion information faster than before, and warns the driver before collision.
Another way the method has proven useful is the estimation of scene-depth information if the vehicle’s speed is known, which gives the car a better visual accuracy.
“Through our latest work, we have now moved one step further, and we are excited to achieve even faster algorithms and real-time applications on autonomous robots. We aim to make them just as fast and accurate as flies’ or cats’ eyes,” said Shiba.
In the next phase of their research, Shiba, Aoki, and Gallego plan to extend the method to more complex scenes, such as multiple independently moving objects, like a bird flock or maybe even crowds of people.
Reference: Shintaro Shiba, Yoshimitsu Aoki, and Guillermo Gallego, A Fast Geometric Regularizer to Mitigate Event Collapse in the Contrast Maximization Framework, Advanced Intelligent Systems (2022). DOI: 10.1002/aisy.202200251