In conjunction with the 2019 IEEE International Conference on Multimedia and Expo (ICME)
With the development of deep learning theory and technology, the performance of computer vision algorithms including object detection and tracking, face recognition, and 3D reconstruction have made tremendous progress. However, computer vision technology relies on the valid information from the input image and video, and the performance of the algorithm is essentially constrained by the quality of source image/video.
Gigapixel videography plays important role in capturing large-scale dynamic scenes for both macro and micro domains. Benefited from the recent progress of gigapixel camera design, the capture of gigapixel-level image/video becomes more and more convenient. In particular, along with the emergence of gigapixel-level image/video, the corresponding computer vision tasks remain unsolved, due to the extremely high-resolution, large-scale, huge-data that induced by the gigapixel camera.
GigaVision is the first workshop of computer vision in gigapixel videography, aiming to introduce a latest video/image dataset with gigapixel-level resolution and high dynamic range in terms of spatial, temporal, angle and spectrum dimensions and promote the application of such big data in computer vision tasks.
A challenge of crowd counting in gigapixel videography will be run in this workshop. The workshop invites submissions of original high-quality contributions. Relevant work that has been recently published, is in progress, or is to be presented other venues including the ICME main conference is also welcome. Topics of interests include, but are not limited to:
Figure 1: Illustration of representative imaging systems. (a) single camera imaging system faces the contradiction between wide FOV and high resolution, (b) single-scale camera array imaging [1, 2] relies on image stitching , (c) structured multi-scale camera array (AWARE2 ) adopts two-stage optical imaging design, (d) unstructured multi-scale camera array  (denoted as UnstructureCam).
With the development of deep learning theory and technology, the performance of computer vision algorithms including object detection and tracking, face recognition, and 3D reconstruction have made tremendous progress. Deep learning based computer vision algorithms have surpassed the human-level performance for many CV tasks, like object recognition  and face verification . However, computer vision technology relies on the valid information from the input image and video, and the performance of the algorithm is essentially constrained by the quality of source image/video. For example, it has been widely observed in object detection systems that the resolution of input images has significant impact on detection accuracy, especially for objects far away . To achieve satisfactory performance in real-world applications, high-quality visual information is demanded which requires image/video with high resolution and high dynamic range in terms of spatial, temporal, angle and spectrum dimensions.
The recent gigapixel videography, beyond the resolution of single camera and human visual perception, aims to capture large-scale dynamic scene with extremely high resolution. Restricted by the spatial-temporal bandwidth product of optical system, the size, weight, power and cost are central challenges in gigapixel video. More explicitly, as shown in Fig. 1(a), the most popular single lens camera is composed by one stage optical imaging system, suffering from the inherent contradiction between high resolution and wide field-of-view. The single-scale multi-camera/camera-array system in Fig. 1(b) solves the contradiction through panoramic stitching pipeline, such as Microsoft ICE , Autopano Giga , Gigapan , Pointgrey ladybug 360 camera, etc. Such stitching based scheme always requires for a certain overlapping region among nearby images/cameras, leading to the redundant usage of CCD/CMOS in the camera array system.
While the recent multiscale optical design [4, 2] adopts a spherical objective lens as the first-stage optical imaging system, and the secondary imaging system uses multiple identical micro-optics to divide the whole FOV into small circular overlapped regions, as shown in Fig. 1(c). It substantially reduced the size and weight of gigapixel scale optical systems, the volume and weight of camera electronics in video operation is more than 10× larger than the optics [4, 13]. More importantly, it usually adopts the delicately structured camera array design, which is faced with the challenges of complicated optical, electronic and mechanical design, laborious calibration, massive data processing etc.
Aiming for the scalable, efficient and economized gigapixel videography, Yaun et al. present a novel gigapixel videography system with unstructured multi-scale camera array design, denoted as ‘UnstructureCam’ in Fig. 1(d). Here ‘Unstructured’ indicates that the overall structure of our camera array does not follow fixed or particular designs thus without precise assembling and careful calibration in advance. ‘Multi-scale’ means not only the parameters of global-view camera varies from local-view camera, but also the parameters of local-view cameras can be different. For example, in UnstructureCam, the reference/global camera (with wide-angle lens to capture the global scene) works together with local camera (with telephoto lens to capture local details). Such setting enables gigapixel videography by warping each local-view video to the reference video independently and parallelly, without the troublesome camera calibration among local-view cameras, which further allows flexible, compressible, adaptive and moveable local-view camera setting during data capture.
Figure 2: (a) the prototype of UnstructureCam, (b) the corresponding multi-scale videos captured by UnstructureCam.
In addition to the existing gigapixel camera array capturing outdoor large-scale dynamic scene, large-scale imaging of biological dynamics at high spatiotemporal resolution is indispensable in the study of system biology. However, with conventional microscopes, one has to make a compromise between large field-of-view (FOV) and high spatial resolution, resulting from the inherently limited space-bandwidth-product (SBP). In addition, there is lack of imaging system that is of sufficient data throughputs to record such huge information yet. Dai et al. break these bottlenecks by proposing the flat-curved-flat strategy, in which the sample plane is magnified into a large spherical image surface and then is seamlessly conjugated to multiple planar sensors with a relay lens array. Accordingly, they develop a customized objective of globally-uniform 0.92μm resolution across a 10mm×12mm FOV, and an accompanying camera array for high-throughput recording at 5.1 gigapixels per second. They demonstrate the first reported video-rate, gigapixel imaging of biological dynamics at centimeter scale and micron resolution, including brain-wide structural imaging and functional imaging in awake, behaving mice. Given such gigapixel image/video, the corresponding data processing tasks in microscope domain such as image segmentation, tumor detection, cell tracking (illustrated in Fig. 4) etc. remain tough problems, as simply usage of existing computer vision algorithms cannot handle such high-resolution, large-scale, and huge-throughput imaging result.
Along with the emergence of the novel camera array design for the extremely high resolution gigapixel video capture, the corresponding processing such as the compression, transmission, understanding etc. are urgently demanded. In particular, the understanding of gigapixel video via classical computer vision tasks such as detection, recognition, tracking, segmentation etc. remain open questions, regardless the extensive progress in computer vision community over past few years. More specifically, the opportunities and challenges that raised when computer vision meets gigapixel videography are summarized as follows.
Figure 3: The representative dynamic scene in Tsinghua Campus captured by UnstructuredCam.
Figure 4: Illustration of Leukocyte trafficking along brain vasculature network of Cx3Cr1-GFP mice. The red dots are migrating immune cells.
1. B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM TOG, vol.24, no.3, pp. 765–776, 2005. 2. F. Perazzi, A. Sorkine-Hornung, H. Zimmer, P. Kaufmann, O. Wang, S. Watson, and M. Gross, “Panoramic video from unstructured camera arrays,” in CGF, vol. 34, no. 2, 2015, pp. 57–68. 3. M. Brown and D. G. Lowe, “Automatic panoramic image stitching using invariant features,” International journal of computer vision, vol. 74, no. 1, pp. 59–73, 2007. 4. D. Brady, M. Gehm, R. Stack, D. Marks, D. Kittle, D. Golish, E. Vera, and S. Feller, “Multiscale gigapixel photography,” Nature, vol. 486, no. 7403, pp. 386–389, 2012. 5. X. Yuan, L. Fang, Q. Dai, D. J. Brady, and Y. Liu, “Multiscale gigapixel video: A cross resolution image matching and warping approach,” in Computational Photography (ICCP), 2017 IEEE International Conference on. IEEE, 2017, pp. 1–9. 6. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in 2015 IEEE International Conference on Computer Vision, 2015, pp. 1026–1034. 7. F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp.815–823. 8. T. Lin, P. Doll’ar, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection.” 9. M. Research, “Image composite editor: An advanced panoramic image stitcher.” 10. Kolor, “Autopano giga.” 11. “Gigapan,” http://www.gigapan.com/. 12. O. S. Cossairt, D. Miau, and S. K. Nayar, “Gigapixel computational imaging,” in IEEE ICCP, 2011, pp. 1–8. 13. J. Nichols, K. Judd, C. Olson, K. Novak, J. Waterman, S. Feller, S. McCain, J. Anderson, and D. Brady, “Range performance of the DARPA aware wide field-of-view visible imager,” Applied Optics, vol. 55, no. 16, pp. 4478–4484, 2016.
Gigapixel videography plays important role in capturing large-scale dynamic scenes for both macro and micro domains. Benefited from the recent progress of gigapixel camera design, the capture of gigapixel-level image/video becomes more and more convenient. In particular, along with the emergence of gigapixel-level image/video, the corresponding computer vision tasks remain unsolved, due to the extremely high-resolution, large-scale, huge-data that induced by the gigapixel camera. To this end, we solicit original and ongoing research addressing the topics listed below (but not limited to):
|February 1, 2019||Challenge registration start|
|April 1, 2019||Paper submission deadline|
|April 1, 2019||Track1(crowd counting with our dataset used for training only) model & paper submission deadline|
|April 8, 2019||Track2(crowd counting with other dataset used for training) model & paper submission deadline|
|April 22, 2019||Paper acceptance notification|
|April 22, 2019||Evaluation results announcement|
|April 29, 2019||Camera-ready paper submission deadline|
|09:00 – 09:15||Workshop Opening|
|09:15 – 10:15||Plenary talk given by David J. Brady|
|10:15 – 11:00||Invited talk given by Jiashi Feng|
|11:00 – 12:00||3 Oral presentations|
|12:00 – 13:30||Lunch|
|13:30 – 14:30||Plenary talk given by Qionghai Dai|
|14:30 – 15:15||Invited talk given by Boqing Gong|
|15:15 – 16:15||3 Oral presentations|