Caltech pedestrian dataset is one of the most popular dataset nowadays. It offers insight for data analysis and contemporary detectors. Datasets, toolbox, survey paper can be found on project homepage.
Below is my note on the survey paper, which lists some points that I find worth attention.
Evaluation
- 16 detectors
- varying levels of scale and occusion pedestrians
- localization accuracy and analyze runtime
Dataset
- resolution: 640*480
- 250k frames, 350k bounding boxes, 2300 unique pedestrians
- annotation tools provided on project website
- occluded object’s bbox is estimated
- 3 labels
- Person: individual pedestrians ~1900
- People: large groups of pedestrians ~300
- ‘Person?’: hard to identification ~110
Overall Description
nearly 50% no pedestrians (serve as negative set? )
Scale statistics
Cutoffs for near/far scales
Scale | near | medium | far |
---|---|---|---|
height (px) | larger than 80 | larger than 30 | smaller than 30 |
and smaller than 80 |
Observation: ~69% of pedestrians lie in the medium scale
- Geometry context
object's pixel height h is inversely proportional to the distance d to the camera.
i.e. $h \approx H \cdot f / d$, where H is the pedestrian physical height, f is focal length of camera. H and f are constant. - Medium scale are safety system concern and not well-handled by current detectors. This is a mismatch in research efforts and requirements of real systems.
Occlusion
- occlusion is not uniform.
- 7 types of occlusions covers ~97% of all occlusions.
Position
- Many objects, not only pedestrians, concentrate in the same region.
- Half of detections, both true positives and false positives, occur in the same band.
Training and test data
- For CALTECH dataset, 6 session for training(s0-s5), 5 for testing (S6-S10)
- Detectors can be trained with external data and tested on CALTECH
- 4 evaluation scenarios
Comparison of pedestrian datasets
Refer to Table 1
Evaluation
- public evaluation code on project webpage
- evaluate on full image, follow PASCAL evaluation
plot miss rate vs. false positives per image (log-log plots) - detect in ROI, see Eqn. (2)
- per window vs. full image evaluation
- performance are somewhat correlated, ranking substantially different.
- Authors prefer full image evaluation over PW
Detection algorithms
- History:
sliding window + SVM/boosting
-> gradient-based features
-> motion features (challenging)
-> additional features to complement HoG
-> learning framework - Successful detection systems follow certain paradigm
- sliding window
- detection -> binary classification
- dense multiscale scanning
- NMS
Detail comparison of 16 detectors refer to TABLE 2.
Performance Evaluation
Detailed figures refer to the Section 5 of paper.
Discussion
Overall performance
- overall performance is not satisfying for real-life applications
- Poor performance on small-scaled or partially occluded targets.
Research directions
- small scales
- occlusion
- motion features
only MULTIFTR+MOTION in 16 detectors uses optical flow feature, which primarily works at large scales - Temporal integration
- Context
- Novel features
- Data
Use big dataset like CALTECH to improve performance. Training data vs. performance.
Written with StackEdit.