Vision and Actions for Robots Vision and Actions

One of the most important problems in robotics currently is arguably to improve the robots' abilities to understand and interact with the environment around them: a robot needs to be able to perceive, detect and locate objects in its surrounding and then be able to plan and execute actions to manipulate these.



To decide and act in unstructured environments, the robot needs to be able to perceive its surroundings, as it is generally not feasible or possible to provide the robot with all the information it needs a priori. This might be because of hardware limitations, such as limited storage space, or because the information is simply not available/unknown at the time (the most striking example is robotic space exploration). My research focusses on the intersection between computer vision and robotics.

In the past years I worked on the iCub humanoid robot in Lugano, Switzerland. My goal was to make the iCub see, that is, developing computer vision algorithms for object detection, identification and localisation. The overarching goal being to create robots with more autonomous and more adaptive behaviours, leading to more "intelligent" robots. This requires advances along the complete processing ‘pipeline’, from sensing through to learning and interaction. The aim of this research is a closer integration of the sensory and motor sides and creating a closely tied action-perception loop.


Perception / Vision

Perception is a key requirement in order for robots to be useful in a wide range of scenarios. In robotic applications, to imitate eyes, cameras are used to capture the light of the environment enabling visual sensing. The reason for the focus on vision is twofold, firstly the sensing capabilities of robotic platforms (cameras are cheap) and secondly, vision is the most important sense for humans. A lot of my research work focusses on developing frameworks that allow rapid prototyping of detectors for both object detection and identification, as well as, the autonomous learning of such object models.

Robot Videos

Motion / Action

Even with great advances in sensing technologies, the robot still needs to control its motion to perform manipulation tasks. Directly programming robots by writing code can be tedious, error prone, and inaccessible to non-experts. Robots, used in a human environment, cannot be expected to know the state of the surroundings with certainty and will need to adapt. Through learning, robots may be able to reduce the burden of programming and in addition continue to adapt even after being deployed, creating a level of autonomy needed for ubiquitous robot manipulation.

When attempting to create behaviours on a complex robot like the iCub or Baxter, state-of-the-art machine learning and control theories can be tested and shortcomings can be discovered and addressed. For example, Hart et al. [2006] showed that a developmental approach can be used for a robot to learn to reach and grasp.


The sensory and motor sides establish quite a few capabilities by themselves, yet most demonstrable behaviours for (humanoid) robots do not effectively integrate elements from both sides. Consequently, tasks that seem trivial to us humans, such as picking up a specific object in a cluttered environment, remain beyond the state-of-the-art in experimental robotics. To grasp objects successfully while avoiding obstacles, requires continuous tracking of obstacles and the target object for the creatio of a reactive reaching behaviour adapting, in real-time, to the changes of the environment.
Interfaces between our motor and sensory frameworks allow for a continuous visual based localization of the detected objects to be propagated into the world model. Specific detectors (aka filters) for each object and the hand can be used to update the positions. This basic approach to eye-hand coordination allows for an adaptation while executing the reaching behaviour to changing circumstances, improving the autonomy of the humanoid robot.

A full schematic of the current system shows the various, interconnected parts:
Action/Vision and SensoriMotor Coordination

Robot Videos

Software Frameworks

Recently icVision, my modular computer vision system used in our cognitive robotics research, was released. Have a look at the project page if you are interested. It is an easy-to-use, modular framework performing vision related tasks on the iCub humanoid robot for research experiments.

MoBeE (to be added) is at the core of the described framework for eye-hand coordination. It is a solid, reusable toolkit for prototyping behaviours on our humanoids. MoBeE represents the state-of-the-art in humanoid robotic control and is similar in conception to the control system that runs DLR's Justin de Santis et al. [2007]. The goal of MoBeE is to facilitate the close integration of planning and motion control. Inspired by \cite{brooks1991} it aims to embody the planner, provide save and robust action primitives and perform real-time re-planning. This facilitate exploratory behaviour using a real robot and MoBeE acting as a supervisor preventing collisions. It also supports multiple interacting robots, and behavioural components are portable and reusable thanks to their weak coupling.

LEOGrasper is now available as well. It is our light-weight, easy-to-use, one-shot grasping system for the iCub. It's been used extensively at IDSIA, especially for the IM-CLeVeR video.


come back in a bit to find code here :)

in the meantime check here