The ability to autonomously recognize objects and estimate their pose is a key component of robotic agents. It enables autonomous systems to interact and understand their environment and, because of the important role of objects within our society and industries, it increases their usefulness. The necessity of such components combined with the difficult and interesting challenges posed by this problem, has caused object recognition to be an important field of research in recent years. Despite of recent advances -- in terms of sensing technologies and algorithms -- the object recognition problem is considered in general unsolved. In particular, being able to recognize a larger number of partially occluded objects in complex scenes populated with clutter has happened to be specially challenging. Moreover, the different recognition relevant properties of objects (i.e., texture or texture-less, geometrically unique or common) as well as several artefacts associated with current sensing capabilities (i.e, noisy or missing data) pose additional problems. Aiming at increasing the recognition capabilities of autonomous systems, this thesis investigates and proposes improvements related to different object recognition paradigms. Because the different paradigms present unique characteristics that make them suitable for different scene configurations and/or different types of objects, we argue that the parallel deployment of multiple paradigms broadens the range of situations on which a recognition system can be successfully applied. While this results in more objects being correctly recognized (together with their pose in the scene), it inevitably incurs in the generation of wrong object hypotheses. In order to maximize the positive effects of multiple recognition pipelines (correct recognitions) while minimizing their negative effects (wrong hypotheses), his thesis proposes a novel hypotheses verification stage. The goal of this stage is simple: reject wrong hypotheses while preserving correct ones, effectively increasing the operating point of the overall system. This is achieved by selecting a subset of object hypotheses which best represents the scene under consideration. A unique trait of the proposed verification stage is that all hypotheses are simultaneously considered, instead of one at a time, which results in a global model of the scene. We formalize this stage as the minimization (over the object hypotheses) of a global cost function enforcing geometrical and appearance cues as well as physical constraints. We show how the proposed recognition system results in excellent performance on six different benchmark datasets presenting heterogeneous recognition scenarios (in terms of sensory data, scene configurations and type of objects). The proposed framework currently candidates itself as the first algorithm being able to outperform the state of the art on such a vast and diverse set of benchmark datasets.