Home | Gigavision Workshop of ICME 2019

GigaVision

The 1st Workshop of Computer Vision in Gigapixel Videography

Background and Relevance

Figure 1: Illustration of representative imaging systems. (a) single camera imaging system faces the contradiction between wide FOV and high resolution, (b) single-scale camera array imaging [1, 2] relies on image stitching [3], (c) structured multi-scale camera array (AWARE2 [4]) adopts two-stage optical imaging design, (d) unstructured multi-scale camera array [5] (denoted as UnstructureCam).

With the development of deep learning theory and technology, the performance of computer vision algorithms including object detection and tracking, face recognition, and 3D reconstruction have made tremendous progress. Deep learning based computer vision algorithms have surpassed the human-level performance for many CV tasks, like object recognition [6] and face verification [7]. However, computer vision technology relies on the valid information from the input image and video, and the performance of the algorithm is essentially constrained by the quality of source image/video. For example, it has been widely observed in object detection systems that the resolution of input images has significant impact on detection accuracy, especially for objects far away [8]. To achieve satisfactory performance in real-world applications, high-quality visual information is demanded which requires image/video with high resolution and high dynamic range in terms of spatial, temporal, angle and spectrum dimensions.

The recent gigapixel videography, beyond the resolution of single camera and human visual perception, aims to capture large-scale dynamic scene with extremely high resolution. Restricted by the spatial-temporal bandwidth product of optical system, the size, weight, power and cost are central challenges in gigapixel video. More explicitly, as shown in Fig. 1(a), the most popular single lens camera is composed by one stage optical imaging system, suffering from the inherent contradiction between high resolution and wide field-of-view. The single-scale multi-camera/camera-array system in Fig. 1(b) solves the contradiction through panoramic stitching pipeline, such as Microsoft ICE [9], Autopano Giga [10], Gigapan [11], Pointgrey ladybug 360 camera, etc. Such stitching based scheme always requires for a certain overlapping region among nearby images/cameras, leading to the redundant usage of CCD/CMOS in the camera array system.

While the recent multiscale optical design [4, 2] adopts a spherical objective lens as the first-stage optical imaging system, and the secondary imaging system uses multiple identical micro-optics to divide the whole FOV into small circular overlapped regions, as shown in Fig. 1(c). It substantially reduced the size and weight of gigapixel scale optical systems, the volume and weight of camera electronics in video operation is more than 10× larger than the optics [4, 13]. More importantly, it usually adopts the delicately structured camera array design, which is faced with the challenges of complicated optical, electronic and mechanical design, laborious calibration, massive data processing etc.

Aiming for the scalable, efficient and economized gigapixel videography, Yaun et al. present a novel gigapixel videography system with unstructured multi-scale camera array design, denoted as ‘UnstructureCam’ in Fig. 1(d). Here ‘Unstructured’ indicates that the overall structure of our camera array does not follow fixed or particular designs thus without precise assembling and careful calibration in advance. ‘Multi-scale’ means not only the parameters of global-view camera varies from local-view camera, but also the parameters of local-view cameras can be different. For example, in UnstructureCam, the reference/global camera (with wide-angle lens to capture the global scene) works together with local camera (with telephoto lens to capture local details). Such setting enables gigapixel videography by warping each local-view video to the reference video independently and parallelly, without the troublesome camera calibration among local-view cameras, which further allows flexible, compressible, adaptive and moveable local-view camera setting during data capture.

Figure 2: (a) the prototype of UnstructureCam, (b) the corresponding multi-scale videos captured by UnstructureCam.

In addition to the existing gigapixel camera array capturing outdoor large-scale dynamic scene, large-scale imaging of biological dynamics at high spatiotemporal resolution is indispensable in the study of system biology. However, with conventional microscopes, one has to make a compromise between large field-of-view (FOV) and high spatial resolution, resulting from the inherently limited space-bandwidth-product (SBP). In addition, there is lack of imaging system that is of sufficient data throughputs to record such huge information yet. Dai et al. break these bottlenecks by proposing the flat-curved-flat strategy, in which the sample plane is magnified into a large spherical image surface and then is seamlessly conjugated to multiple planar sensors with a relay lens array. Accordingly, they develop a customized objective of globally-uniform 0.92μm resolution across a 10mm×12mm FOV, and an accompanying camera array for high-throughput recording at 5.1 gigapixels per second. They demonstrate the first reported video-rate, gigapixel imaging of biological dynamics at centimeter scale and micron resolution, including brain-wide structural imaging and functional imaging in awake, behaving mice. Given such gigapixel image/video, the corresponding data processing tasks in microscope domain such as image segmentation, tumor detection, cell tracking (illustrated in Fig. 4) etc. remain tough problems, as simply usage of existing computer vision algorithms cannot handle such high-resolution, large-scale, and huge-throughput imaging result.

Along with the emergence of the novel camera array design for the extremely high resolution gigapixel video capture, the corresponding processing such as the compression, transmission, understanding etc. are urgently demanded. In particular, the understanding of gigapixel video via classical computer vision tasks such as detection, recognition, tracking, segmentation etc. remain open questions, regardless the extensive progress in computer vision community over past few years. More specifically, the opportunities and challenges that raised when computer vision meets gigapixel videography are summarized as follows.

  • Huge data throughput: Gigapixel camera system usually captures billions of pixels every second, such a huge mass of data brings great challenges in compression, transmission, and processing. In particular, different from traditional videos, gigapixel videography may have spatial variant video resolution, quality and importance. Therefore, smarter video coding and streaming need to be designed for gigapixel videography.
  • High resolution: the extremely high resolution of the gigapixel videography, giving many problems to existing computer vision applications. For example, image/video with gigapixel-level resolution can hardly be fed into existing neural networks directly. Simply down-sampling leads to severe resolution/information lost, which affects the performance of computer vision tasks significantly, such as face detection/recognition, semantic segmentation, etc. While simply dividing the gigapixel image into blocks cannot guarantee the computational complexity accordingly.
  • Large scale: Benefit from both wide FOV and high resolution characteristics of Gigapixel videography, the large-scale dynamic scene can be well captured, containing sufficient objects and activities to enable more potentially useful information in video surveillance. However, more objects means more occlusions, more complex scene, which brings great challenges to some computer vision algorithms, such as multi-target object tracking, Anomaly detection, etc.

Figure 3: The representative dynamic scene in Tsinghua Campus captured by UnstructuredCam.

Figure 4: Illustration of Leukocyte trafficking along brain vasculature network of Cx3Cr1-GFP mice. The red dots are migrating immune cells.

References

1. B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM TOG, vol.24, no.3, pp. 765–776, 2005.
2. F. Perazzi, A. Sorkine-Hornung, H. Zimmer, P. Kaufmann, O. Wang, S. Watson, and M. Gross, “Panoramic video from unstructured camera arrays,” in CGF, vol. 34, no. 2, 2015, pp. 57–68.
3. M. Brown and D. G. Lowe, “Automatic panoramic image stitching using invariant features,” International journal of computer vision, vol. 74, no. 1, pp. 59–73, 2007.
4. D. Brady, M. Gehm, R. Stack, D. Marks, D. Kittle, D. Golish, E. Vera, and S. Feller, “Multiscale gigapixel photography,” Nature, vol. 486, no. 7403, pp. 386–389, 2012.
5. X. Yuan, L. Fang, Q. Dai, D. J. Brady, and Y. Liu, “Multiscale gigapixel video: A cross resolution image matching and warping approach,” in Computational Photography (ICCP), 2017 IEEE International Conference on. IEEE, 2017, pp. 1–9.
6. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in 2015 IEEE International Conference on Computer Vision, 2015, pp. 1026–1034.
7. F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp.815–823.
8. T. Lin, P. Doll’ar, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection.”
9. M. Research, “Image composite editor: An advanced panoramic image stitcher.”
10. Kolor, “Autopano giga.”
11. “Gigapan,” http://www.gigapan.com/.
12. O. S. Cossairt, D. Miau, and S. K. Nayar, “Gigapixel computational imaging,” in IEEE ICCP, 2011, pp. 1–8.
13. J. Nichols, K. Judd, C. Olson, K. Novak, J. Waterman, S. Feller, S. McCain, J. Anderson, and D. Brady, “Range performance of the DARPA aware wide field-of-view visible imager,” Applied Optics, vol. 55, no. 16, pp. 4478–4484, 2016.

Important Dates (Tentative)

February 1, 2019 Challenge registration start
April 1, 2019 Paper submission deadline
April 1, 2019 Track1(crowd counting with our dataset used for training only) model & paper submission deadline
April 8, 2019 Track2(crowd counting with other dataset used for training) model & paper submission deadline
April 22, 2019 Paper acceptance notification
April 22, 2019 Evaluation results announcement
April 29, 2019 Camera-ready paper submission deadline

Technical Program (Tentative)

09:00 – 09:15 Workshop Opening
09:15 – 10:15 Plenary talk given by David J. Brady
10:15 – 11:00 Invited talk given by Jiashi Feng
11:00 – 12:00 3 Oral presentations
12:00 – 13:30 Lunch
13:30 – 14:30 Plenary talk given by Qionghai Dai
14:30 – 15:15 Invited talk given by Boqing Gong
15:15 – 16:15 3 Oral presentations

Sponsor