Machine vision processing in order to achieve what purpose
Thanks to the artificial intelligence, machine learning and the progress of integration technology such as computer vision, robot can see every day, analysis and make more like human decision making. Analysis logic involved in the development of such visual solutions, these solutions can determine the direction of the object, handling of mobile object and execute navigation. This begins with the basis of two important tasks:
A, pretreatment by sensors from the data collected in the real world, each subsystem is to make it into a more usable
Second, perform feature detection to extract visual features from the data, such as Angle, edge, etc.
After these systems in place, can continue to use the higher level of robot visual functions, namely: object detection and classification and object tracking and navigation.
Detection object and the direction
Due to the change of viewpoint, different size of image and dynamic lighting, object detection and classification in traditionally has been challenging. Can help one solution is, the use of trained with object detection and classification of neural network.
A popular method is to use the convolution neural network (CNN), a small part of the image to referred to as the "sliding window" of the process is feed to the network. Another task is to determine the direction of the object, which is very important for object interaction and navigation. The main challenge here is to determine the object or the robot itself in the direction of the 3 d world space. A popular approach is the application of single sex algorithm (such as linear least square solver, random sample consensus (RANSAC) and) and least squares in the value to calculate points between 2 d image frames. Once detected object, you can for their distribution of metadata, such as ID, bounding box and so on, these metadata can be used during the period of object detection and navigation.
Can detect and identify objects and people.
Identify the environment object and aspect, robots need to be followed. Due to the object can be moved, and the robot's viewport will change with the navigation, the developer will need a mechanism to over time and across the cameras and other sensors capture frame to track these elements. Because this mechanism must be fast enough to run each frame, so for many years, has designed many algorithms, the algorithm to solve the problem in different ways.
Centroid tracking, will, for example, cross frame identified object around the center of the bounding box, and then the assumption that objects each frame only mobile distance under the assumption of the distance between the calculation point changes. Another method is to use kalman filter, the filter used over a period of statistical data to predict the position of the object.
Alternative, the mean shift algorithm is a kind of basically find frame of certain aspects of regional image (for example, color histogram) method of the mean. Then, by seeking to maximize the characteristic similarity, looking for the same description in the next frame. Which makes it can be solved, such as scale, direction of change, and finally to track the location of the object.
Because these techniques only a subset of the need to track the original features, so they usually can be treated successfully and efficiently change, such as direction or shade so that they are very effective for robot visual processing.
But the object is not the only thing need to track. Robot itself should be able to successfully navigate their environment, this is the simultaneous localization and mapping (SLAM) in place. SLAM trying to estimate the position of the robot and the environment. Many algorithms can be used (e.g., kalman filter). SLAM is usually done by fusion of data from multiple sensors, and when it comes to visual data, the process is often referred to as visual inertia localization and mapping (VISLAM) at the same time.
Applications from multiple sensors multiple filters to collect trace. SLAM, of course, only up to the level of the robot can sense, so developers should choose high quality cameras and sensors, and find a way to make sure it is not affected by data capture. From a security perspective, developers should design fault also safe, in case of one thousand can't get the data (for example, the camera is covered).
Next generation by using computer vision and machine learning technology of robot, a "view" environment, "analysis" dynamic scene or changing conditions and the ability to "decide", so the more advanced. This will require developers to master for object detection and classification and object tracking and navigation of advanced robot visual functions and tools.