Gigapixel videography, beyond the resolution of a single camera and human visual perception, aims to capture large-scale dynamic scenes with extremely high resolution. Benefiting from the high resolution and wide FoV, it leads to new challenges and opportunities for a large amount of computer vision tasks. Among them, object detection is a typical task to label the objects, especially humans and vehicles in an image or video. It is widely used in daily life and already changes the way industries work in many scenarios, e.g., counting the number of people attending an event, helping staffing allocation and resource allotment, monitoring high-traffic areas, etc.
However, accurate object detection in large-scale scenes is still difficult due to the low image-quality of the instances in the distance. Although the gigapixel videography can capture both the wide-FoV scene and the high-resolution local details, how to efficiently process such high-resolution data is still not well studied. It is an open question demanding prompt solutions. Thus, in this challenge, we present our unprecedented dataset “PANDA”, gigaPixel-level humAN-centric viDeo dAtaset (published in CVPR 2020), aiming for object detection research in large-scale gigapixel videography.