2013-09-27

[Notes]CALTECH Pedestrian Detection Survey

Caltech pedestrian dataset is one of the most popular dataset nowadays. It offers insight for data analysis and contemporary detectors. Datasets, toolbox, survey paper can be found on project homepage.

Below is my note on the survey paper, which lists some points that I find worth attention.

Evaluation

16 detectors
varying levels of scale and occusion pedestrians
localization accuracy and analyze runtime

Dataset

resolution: 640*480
250k frames, 350k bounding boxes, 2300 unique pedestrians
annotation tools provided on project website
occluded object’s bbox is estimated
3 labels
- Person: individual pedestrians ~1900
- People: large groups of pedestrians ~300
- ‘Person?’: hard to identification ~110

Overall Description

nearly 50% no pedestrians (serve as negative set? )

Scale statistics

Cutoffs for near/far scales

Scale	near	medium	far
height (px)	larger than 80	larger than 30	smaller than 30
		and smaller than 80

Observation: ~69% of pedestrians lie in the medium scale

Geometry context
object's pixel height h is inversely proportional to the distance d to the camera.
i.e. $h \approx H \cdot f / d$, where H is the pedestrian physical height, f is focal length of camera. H and f are constant.
Medium scale are safety system concern and not well-handled by current detectors. This is a mismatch in research efforts and requirements of real systems.

Occlusion

occlusion is not uniform.
7 types of occlusions covers ~97% of all occlusions.

Position

Many objects, not only pedestrians, concentrate in the same region.
Half of detections, both true positives and false positives, occur in the same band.

Training and test data

For CALTECH dataset, 6 session for training(s0-s5), 5 for testing (S6-S10)
Detectors can be trained with external data and tested on CALTECH
4 evaluation scenarios

Comparison of pedestrian datasets

Refer to Table 1

Evaluation

public evaluation code on project webpage
evaluate on full image, follow PASCAL evaluation
plot miss rate vs. false positives per image (log-log plots)
detect in ROI, see Eqn. (2)
per window vs. full image evaluation
- performance are somewhat correlated, ranking substantially different.
- Authors prefer full image evaluation over PW

Detection algorithms

History:
sliding window + SVM/boosting
-> gradient-based features
-> motion features (challenging)
-> additional features to complement HoG
-> learning framework
Successful detection systems follow certain paradigm
- sliding window
- detection -> binary classification
- dense multiscale scanning
- NMS

Detail comparison of 16 detectors refer to TABLE 2.

Performance Evaluation

Detailed figures refer to the Section 5 of paper.

Discussion

Overall performance

overall performance is not satisfying for real-life applications
Poor performance on small-scaled or partially occluded targets.

Research directions

small scales
occlusion
motion features
only MULTIFTR+MOTION in 16 detectors uses optical flow feature, which primarily works at large scales
Temporal integration
Context
Novel features
Data
Use big dataset like CALTECH to improve performance. Training data vs. performance.

Written with StackEdit.

On the Way

One thousand heavy steps forward, not one back

[Notes]CALTECH Pedestrian Detection Survey

Evaluation

Dataset

Overall Description

Scale statistics

Occlusion

Position

Training and test data

Comparison of pedestrian datasets

Evaluation

Detection algorithms

Performance Evaluation

Discussion

Overall performance

Research directions