Activity detection (e.g. recognizing people's behavior and intent), when used over an extended range of applications, suffers from high false detection rates. Also, activity detection limited to 2D image domain (symbolic space) is confined to qualitative activities. Symbolic features, represented by apparent dimensions, i.e. pixels, can vary with distance or viewing angle. One way to enhance performance is to work within the physical space, where object features are represented by their physical dimensions (e.g. inches or centimeters) and are invariant to distance or viewing angle. In this paper, we propose an approach to construct a 3D Site Model and co-register the video with the site model to obtain real-time physical reference at every pixel in the video. We present a unique approach that creates a 3D site model via fusion of laser range sensor and a single camera. We present experimental results to demonstrate our approach.