In recent years, research has focused on the development of intelligent robots that are aware of their environment and begin to link perceptions to the meaning of objects attributed by humans. One way to bridge the gap between robotic and human understanding of their common environment is the concept of affordances, which links the perception of objects to the action of robots.
Allowing a robot to learn object affordances from a human tutor or autonomously requires a representation of what it knows, so that it can reason about what it can learn, how to act so as to learn it, execute those actions and then learn to fill its knowledge gaps.
This work presents a perception system for a cognitive robot, which represents the structure of objects to link perception to robot actions.
To manifest what objects afford to the robot, a model based on piecewise planar surface patches is proposed. Planar patches are detected from tracked interest points in image sequences. For this we formalize model selection with Minimal Description Length (MDL) in an incremental manner. In each iteration tracked planes and new planes computed from randomly sampled interest points are evaluated. The hypotheses that best explain the scene are retained and their supporting points are marked so that in the next iteration random sampling is guided to unexplained points. Hence, it is possible to represent the remaining finer details of the scene.
Planar patches are stored in a spatio-temporal graph and tracked to subsequent images. After reconstruction of the planes the 3D motion is analyzed and initial object hypotheses are created.
These object hypotheses are verified from the robot by pushing them.
In case planar patches start moving independently a split event is triggered, the spatio-temporal object graph is traced back and visible planes as well as occluded planes are assigned to the most probable split object.
Furthermore, we developed probabilistic measures for observed detection success, predicted detection success and the completeness of learned models where learning is incremental and online. This allows the robot to decide when to add a new keyframe to its view-based object model, and where to look next in order to complete the model, predicting the probability of successful object detection given the model trained so far as well as knowing when to stop learning.
We demonstrate that the proposed planar patches build basic meaningful features, where the robot can start to explore the scene. We further show that through interaction with planar parts the approximate structure of objects can be reconstructed and that the proposed measure for completeness explains what has been seen and where to continue exploration.