<p><b>Abstract</b>—<tmath>$W^4$</tmath> is a real time visual surveillance system for detecting and tracking multiple people and monitoring their activities in an outdoor environment. It operates on monocular gray-scale video imagery, or on video imagery from an infrared camera. <tmath>$W^4$</tmath> employs a combination of shape analysis and tracking to locate people and their parts (head, hands, feet, torso) and to create models of people's appearance so that they can be tracked through interactions such as occlusions. It can determine whether a foreground region contains multiple people and can segment the region into its constituent people and track them. <tmath>$W^4$</tmath> can also determine whether people are carrying objects, and can segment objects from their silhouettes, and construct appearance models for them so they can be identified in subsequent frames. <tmath>$W^4$</tmath> can recognize events between people and objects, such as depositing an object, exchanging bags, or removing an object. It runs at 25 Hz for 320<tmath>$\times$</tmath>240 resolution images on a 400 Mhz dual-Pentium II PC.</p>