Within the scope of this diploma thesis we investigated the possibilities of mulitmodal human robot interaction with current technologies. For this, a pre-built low-cost robot head, called Eva, has been used and a software system was developed which includes state of the art algorithms from the fields of speech recognition, face detection, face recognition and object classification. Special attention was given to provide natural communication between users and the robot by using current speech recognition technology. Multiple systems were evaluated after integrating them in our implementation, before using one of them in our complete set-up. For the ability to find and identify individual people known algorithms were implemented and compared to each other. These include two variations of the Viola-Jones algorithm for face detection as well as Eigenfaces, Fisherfaces and Local Binary Pattern histograms for face recognition. These, in combination with face tracking by coupling the Viola-Jones algorithm with either a Kalman filter or a Lucas-Kanade optical flow estimation, provide one more part of the multimodal interaction between Eva and the user. Object classification provides the robot with the ability to perform further analysis in the field of interaction with objects. One method for this, using random decision forests, is explained as well.