Introduction

PASCAL VOC and ImageNet ILSVRC challenges have enabled significant progress for object recognition in the past decade. We plan to borrow this mechanism to speed up the progress for scene understanding as well. Complementary to the object-centric ImageNet ILSVRC Challenge hosted at ICCV/ECCV every year, we are hosting a scene-centric challenge at CVPR every year. Our challenge focuses on four major tasks in scene understanding, including scene classification, saliency prediction, room layout estimation, and caption generation (hosted by MS COCO). Inspired by recent success using big data, such as deep learning, we will focus on providing benchmarks that are at least several times bigger than the existing ones, to support training these data-hungry algorithms. By providing a set of large-scale benchmarks in an annual challenge format, we expect significant progress to be made for scene understanding in the coming years. The details for the last-year challenge can be found at LSUN 2015.

Submission: The details of each task and submission format are provided below. You can submit results once every 5 days and the submission with best performance from each team will appear in the final ranking. Please email the results with "LSUN2016" in the subject and filled submission form to princeton.vision@gmail.com. For classification task, you can attach the text file containing the results to the email. For the other tasks, please upload it to cloud storage such as Dropbox and send us the downloadable link, because the submission files can be large.

Results: The challenge results are listed in our leaderboard.


Keynote Speakers


Jitendra Malik
University of Califronia, Berkeley
Yann LeCun
Facebook AI Research & NYU

Scene Classification

In this task, an algorithm needs to report the top 1 most likely scene categories for each image. You can now download preliminary released data from the links below. The final training set may be different. Besides the training set, we also provide 300 images per category for validation. There are 1,000 images for each category in the testing set. The data can be downloaded by the provided script. Please check README for documentation and demo code. Contact Fisher Yu for requests of original images and other questions.

LSUN Dataset more information about LSUN dataset can be found at the project webpage lsun.yf.io.

CategoryTrainingValidation
Bedroom3,033,042 images (43 GB)300 images
Bridge818,687 images (16 GB) 300 images
Church Outdoor126,227 images (2.3 GB) 300 images
Classroom168,103 images (3.1 GB) 300 images
Conference Room229,069 images (3.8 GB) 300 images
Dining Room657,571 images (11 GB) 300 images
Kitchen2,212,277 images (34 GB) 300 images
Living Room1,315,802 images (22 GB) 300 images
Restaurant626,331 images (13 GB) 300 images
Tower708,264 images (12 GB) 300 images
Testing Set 10,000 images (173 MB)

Saliency Prediction

In this task, an algorithm needs to predict where human look in a scene. Two datasets are provided: iSUN (eye tracking based) and SALICON (mouse tracking based). All submissions will be evaluated on both datasets respectively, and we will have a winner for each of the datasets. Evaluation toolkit in both Matlab and Python will be released with the benchmark. The challenge of this task is co-hosted with NUS VIP Lab and Bethge Lab.

iSUN The data is collected by gaze tracking from Amazon Mechanical Turk using a web-cam. All our images are from the SUN database. For each image, we provide the image content in JPG, image resolution, scene category, and ground truth (including gaze trajectory, fixation points, and saliency mask, for training and validation sets only). Please refer to iSUN project page for more details about how this data is collected.

SALICON The data is collected via mouse cursor tracking in a new psychophysical paradigm from Amazon Mechanical Turk by NUS VIP Lab. All the images are from MS COCO dataset. For each image, we provide the image content in JPG, image resolution and ground truth (including mouse trajectory, fixation points, and saliency mask, for training and validation sets only). Please refer to the SALICON page for more details.

iSUNDownload
Training Set (6000 images)Image List and Labels
Validation Set (926 images)Image List and Labels
Testing Set (2000 images)Image List
Fixation Ground TruthZip File
Saliency Map Ground TruthZip File (12GB)
All Images in JPGZip File (2GB)
SALICONDownload
Training Set (10000 images)Image List and Labels
Validation Set (5000 images)Image List and Labels
Testing Set (5000 images)Image List
Fixation Ground TruthZip File
Saliency Map Ground TruthZip File (19GB)
All Images in JPGZip File (3GB)
Matlab ToolkitDownload
Python ToolkitDownload
DocumentationPDF

Room Layout Estimation

In this task, an algorithm needs to estimate the room layout from a single indoor scene image. All the images are indoor. They are from the SUN database and our LSUN scene classification database. We assume that a room showed in an image can be represented by a part of a 3D box. Therefore, the room layout estimation is formulated as a way to predict the positions of intersection between planar walls, ceiling and floors. There are 4000 images for training, 394 images for validation and 1000 images for testing. All the images have valid room layout that can be clearly annotated by human. The annotation is done in house by the organizers from Princeton Vision Group. For each image, we provide the image content, the scene category and the room layout annotation (for training and validation sets only). There are eight scene categories in our dataset, including bedroom, hotel room, dining room, dinette home, living room, office, conference room and classroom. The scene categories for the images in the testing set are also provided. A Matlab toolkit is provided for visualization and evaluation.

DatasetDownload
Training Set (4000 images)Image List and Labels
Validation Set (394 images)Image List and Labels
Testing Set (1000 images)Image List
Layout Ground TruthZip File
Evaluation ToolkitMatlab Toolkit
All Images in JPGZip File (2GB)
DocumentationPDF

Schedule

June 26, 2016 at Augustus I - II

13:30 - 13:35WelcomeJianxiong Xiao
13:35 - 13:50Introduction to LSUN dataset and classification taskFisher Yu
13:50 - 14:05Classification winner talk 1Bowen Zhang (Team SIAT-MMLAB)
14:05 - 14:20Classification winner talk 2Wen-Sheng Chu (Team SJTU-ReadSense)
14:20 - 14:25Introduction to saliency prediction taskYinda Zhang
14:25 - 14:35Saliency evaluation and toolkitMatthias Kümmerer
14:35 - 14:50Saliency winner talkSrinivas Kruthiventi (Team VAL)
14:50 - 14:55Introduction to room layout taskYinda Zhang
14:55 - 15:10Room layout winner talkYuzhuo Ren (Team CF)
15:10 - 15:45Coffee break
15:45 - 16:15Keynote TalkJitendra Malik
16:15 - 16:45Keynote TalkYann LeCun
16:45 - 16:50Award Session
16:50 - 17:00Closing remarks

Organizers

  • Fisher Yu* - Princeton University
  • Yinda Zhang* - Princeton University
  • Matthias Kümmerer - University of Tübingen
  • Ming Jiang - NUS
  • Qi Zhao - NUS
  • Matthias Bethge - University of Tübingen
  • Jianxiong Xiao - AutoX, Inc.
* indicates equal contribution