In this paper, we present a localization algorithm for estimating the 3D position and orientation (pose) of a moving vehicle based on visual and inertial measurements. The main advantage of the proposed method is that it provides precise pose estimates at low computational cost. This is achieved by introducing a two-layer estimation architecture that processes measurements based on their information content. Inertial measurements and feature tracks between consecutive images are processed locally in the first layer (Multi-State-Constraint Kaltnan filter) providing estimates for the motion of the vehicle at a high rate. The second layer comprises a bundle adjustment iterative estimator that operates intermittently so as to (i) reduce the effect of the linearization errors, and (ii) update the state estimates every time an area is re-visited and features are re-detected (loop closure). Through this process reliable state estimates are available continuously, while the estimation errors remain bounded during long-term operation. The performance of the developed system is demonstrated in large-scale experiments, involving a vehicle localizing within an urban area.