Airborne platforms are recording large amounts of video data. Extracting the events which are needed to see is a time-demanding task for analysts. The reason for this is that the sensors record hours of video data in which only a fraction of
the footage contains events of interest. For the analyst, it is hard to retrieve such events from the large amounts of video
data by hand. A way to extract information more automatically from the data is to detect all humans within the scene.
This can be done in a real-time scenario (both on-board as on the ground station) for strategic and tactical purposes and
in an offline scenario where the information is analyzed after recording to acquire intelligence (e.g. a daily life pattern).
In this paper, we evaluate three different methods for object detection from a moving airborne platform. The first one is a
static person detection algorithm. The main advantage of this method is that it can be used on single frames, and therefor
does not depend on the stabilization of the platform. The main disadvantage of this method is that the number of pixels
needed for the detection is pretty large. The second method is based on detection of motion-in-motion. Here the
background is stabilized, and clusters of pixels that move with respect to this stabilized background are detected as
moving object. The main advantage is that all moving objects are detected, the main disadvantage is that it heavily
depends on the quality of the stabilization. The third method combines both previous detection methods.
The detections are tracked using a histogram-based tracker, so that missed detections can be filled in and a trajectory of
all objects can be determined. We demonstrate the tracking performance using the three different detections methods on
the publicly available UCF-ARG aerial dataset. The performance is evaluated for two human actions (running and
digging) and varying object sizes. It is shown that a combined detection approach (static person detection and motion-in-motion
detection) gives better tracking results for both human actions than using one of the detectors alone. Furthermore
it can be concluded that the minimal height of humans must be 20 pixels to guarantee a good tracking performance.
|