Driven by the need for faster and more efficient workflows in the digitization of urban environments, the availability of affordable 3D data-acquisition systems for buildings has drastically increased in the last years: Laser scanners and photogrammetric methods both produce millions of 3D points within minutes of acquisition time. They are applied both on street-level as well as from above using drones, and are used to enhance traditional tachymetric measurements in surveying. However, these 3D data points are not the only available information: Extracted meta data from images, simulation results (e.g., from light simulations), 2D floor plans, and semantic tags especially from the upcoming Building Information Modeling (BIM) systems are becoming increasingly important. The challenges this multimodality poses during the reconstruction of CAD-ready 3D buildings are manifold: Apart from handling the enormous size of the data that is collected during the acquisition steps, the different data sources must also be registered to each other in order to be applicable in a common context which can be difficult in case of missing or erroneous information. Nevertheless, the potential for improving both the workflow efficiency as well as the quality of the reconstruction results is huge: Missing information can be substituted by data from other sources, information about spatial or semantic relations can be utilized to overcome limitations, and interactive modeling complexity can be reduced (e.g., by limiting interactions to a two-dimensional space). In this thesis, four publications are presented which aim at providing freely combinable “building blocks” for the creation of helpful methods and tools for advancing the field of Multimodal Urban Reconstruction. First, efficient methods for the calculation of shadows cast by area light sources are presented one with a focus on the most efficient generation of physically accurate penumbras, and the other one with the goal of reusing soft shadow information in consecutive frames to avoid costly recalculations. Then, a novel, optimization-supported reconstruction and modeling tool is presented, which employs sketch-based interactions and snapping techniques to create water-tight 3D building models. An extension to this system is demonstrated consecutively: There, 2D photos act as the only interaction canvas for the simple, sketch-based creation of building geometry and the corresponding textures. Together, these methods form a solid foundation for the creation of common, multimodal environments targeted at the reconstruction of 3D building models.