The PASCAL VOC and ImageNet ILSVRC challenges have enabled significant progress for object recognition in the past decade. Beginning with CVPR 2015, we borrowed this mechanism to speed up the progress for scene understanding via the LSUN workshop. Complementary to the object-centric ImageNet ILSVRC Challenge hosted at ICCV/ECCV every year, we propose to continue hosting this scene-centric challenge at CVPR every year. Our challenge will focus on major tasks in scene understanding, including scene object retrieval, outdoor scene segmentation, RGB-D 3D object detection and saliency prediction. Inspired by recent successes using big data, such as deep learning, we focus on providing benchmarks that are significantly bigger and more diverse than the existing ones, to support training these data-hungry algorithms. By providing a set of large-scale benchmarks in an annual challenge format, we expect significant progress to continue for scene understanding in the coming years. Given the experience of our previous workshops, we are updating all of our existing tasks and rolling out new tasks.
In this task, an algorithm needs to report the top 1 most likely scene categories
for each image. You can now download preliminary released data from the links below. The final training set may be different. Besides the training
set, we also provide 300 images per category for validation. There are 1,000 images for each category in the testing set.
The data can be downloaded by the provided script.
Please check README for documentation and demo code. Contact Fisher Yu for requests of original images and other questions. The submission deadline is July 15, 2017.
LSUN Dataset more information about LSUN dataset can be found at the project webpage lsun.yf.io.
|Bedroom||3,033,042 images (43 GB)||300 images|
|Bridge||818,687 images (16 GB)||300 images|
|Church Outdoor||126,227 images (2.3 GB)||300 images|
|Classroom||168,103 images (3.1 GB)||300 images|
|Conference Room||229,069 images (3.8 GB)||300 images|
|Dining Room||657,571 images (11 GB)||300 images|
|Kitchen||2,212,277 images (34 GB)||300 images|
|Living Room||1,315,802 images (22 GB)||300 images|
|Restaurant||626,331 images (13 GB)||300 images|
|Tower||708,264 images (12 GB)||300 images|
|Testing Set||10,000 images (173 MB)|
This task comprises two separate challenges based on the novel Mapillary Vistas Dataset: Semantic image segmentation and Instance-specific semantic image segmentation of street-level images. Mapillary Vistas Research edition contains 25,000 densely annotated street level images (66 object classes, pixel-accurate, polygon-based annotations with instance-specific object annotations for 37 categories), featuring locations from all around the world. The image data visually covers parts of Europe, North and South America, Asia and Australia and consequently spans a broad range of object appearances. For performance assessment, commonly used metrics like average intersection-over-union scores for pixel-level segmentation and average precision on instance-specific segmentations are used. We expect large resonance from the object recognition community and hope to generate high impact for pushing the boundaries of state-of-the-art models. Participation details for these challenges can be found at https://research.mapillary.com/lsun.html.
In this task, an algorithm needs to predict where human look in a scene. SALICON (mouse tracking based) is provided. Evaluation toolkit in both Matlab and Python will be released with the benchmark. The challenge of this task is co-hosted with UMN VIP Lab.
SALICON The data is collected via mouse cursor tracking in a new psychophysical paradigm from Amazon Mechanical Turk by UMN VIP Lab. All the images are from MS COCO dataset. For each image, we provide the image content in JPG, image resolution and ground truth (including mouse trajectory, fixation points, and saliency mask, for training and validation sets only). Please refer to the SALICON page for more details.
July 26, 2017, Room 304 AB
|13:30 - 13:35||Welcome||Fisher Yu|
|13:35 - 14:05||Keynote Talk||Prof. Hao Jiang|
|14:05 - 14:25||Classification winner talk - Deep Pyramidal Residual Networks||Jiwhan Kim|
|14:25 - 14:40||Mapillary Scene Parsing Task||Peter Kontschieder|
|14:40 - 15:10||Keynote Talk||Prof. Devi Parikh|
|15:10 - 15:30||Contributed Talk||Scene Parsing Winner|
|15:30 - 16:15||Coffee break|
|16:15 - 16:20||Saliency Detection Task||Qi Zhao|
|16:20 - 16:40||Saliency Winner Talk 1||Prof. Roberto Vezzani|
|16:40 - 17:00||Saliency Winner Talk 2||Samuel Dodge|