Robotic stereo Vision with the Neuroprocessor STM32N657

Artificial Neural Networks for Robots to see where they're going

My project is Artificial Vision for Robots, giving them the ability to see and detect objects, obstacles and the nearby environment, so that they can interpret it and be able to move without crashing and on the best possible path to reach their destination while mapping their route in real time.

Using stereo vision but with only one camera

A "triangle mirror" is placed in front of a single camera to divide the total captured image into the right and left sides, which are oriented with other mirrors facing forward but separated by a distance X, thus obtaining two different angles of view from the same direction (stereo view).

Stereo vision works by comparing two images of the same scene taken from slightly different angles to find disparity (the difference in the position of the same points in both images). To do this, a point correspondence is first established, identifying which pixel in the left image corresponds to the same physical point in the right image.

This is done using matching algorithms based on textures, edges, or specific scene features. A task for which convolutional neural networks such as those that the new STM32N6 Convolutional NeuroProcessor can perform are ideal.

Once the disparity is found, depth is calculated through triangulation. Knowing the distance between the two cameras (or in this case, between the two views obtained with mirrors) and the projection angle, the formula Z=B⋅fdZ = \frac{B \cdot f}{d}Z=dB⋅f is used, where Z is the depth, B is the distance between the views (baseline), f is the focal length of the camera, and d is the disparity. The greater the disparity (the larger the difference in the position of a point between both images), the shorter the distance to the object, allowing the reconstruction of a depth map of the scene.

with which the three-dimensional reconstruction of these different 3D points corresponds to the equivalent of a point cloud or depth map (depending on the storage format):

And as everyone knows, having a 3D representation of points, of the stage, you can calculate the free distance from the camera to any of these points, with which we can know the depth and know if a robot can fit through a place without colliding: