Figure 1: Visualization of features of PANDA dataset. (a) The scale variation of pedestrians in a large-scale scene. (b) Three ﬁne-grained bounding box an- notations of the human body. (c) Annotations of ﬁve types of human poses. (d) Group information along with the intra-group interactions (TK=Talking, PC=Physical contact), where the circle and short line denote pedestrian and their face orientation.
Note that PANDA (gigapixel-level human-centric video dataset) contains 600 gigapixel images and 20 gigapixel videos covering various real-world scenes. The ground-truth annotations include 10; 218:4k bounding boxes, 111:8k ﬁne- grained attribute labels, 8:4k trajectories, 1:5k groups and 2:9k interactions. As shown in Fig. 2, the representative scene in PANDA may cover ∼1 area with high resolution details (∼gigapixel-level/frame). Apparently, due to the vast variance of pedestrian pose, scale, occlusion and trajectory, existing approaches will be highly challenged by both accuracy and eﬃciency. Therefore, we propose to hold the challenge of large-scale pedestrian detection and long-term multi- object tracking on PANDA.
Challenge 1: Large-scale pedestrian detection in gigapixel image.
There will be 3 sub-tasks: visible body, full body, and head detection for pedestrians. The evaluation metrics are the Average Precision AP.50 and Average Recall AR.
Challenge 2: Long-term multi-object tracking in gigapixel video.
There will be 2 sub-tasks: Multi-pedestrian tracking with/without public detection results. Metrics of MOTChallenge are adopted.
By proposing the 2nd GigaVision workshop in ECCV 2020, we hope to bridge the gap between computational photography and computer vision, and bring the community’s attention on the visual analysis of complicated behaviors and interactions of crowd in large-scale real-world scenes.
For more details about the challenges, we will update soon!