In this paper, we study the problem of Cooperative Localization (CL) for two robots, each equipped with an Inertial Measurement Unit (IMU) and a camera. We present an algorithm that enables the robots to exploit common features, observed over a sliding-window time horizon, in order to improve the localization accuracy of both vehicles. In contrast to existing CL methods, which require robot-to-robot distance and/or bearing measurements to resolve the robots' relative position and orientation (pose), our approach recovers the relative pose through indirect information from the commonly observed features. Moreover, we analyze the system observability properties to determine how many degrees of freedom (d.o.f.) of the relative transformation can be computed under different measurement scenarios. Lastly, we present simulation results to evaluate the performance of the proposed method.