Amodal Panoptic Segmentation

Benchmarks

We present two benchmarking challenges for the Amodal Panoptic Segmentation task, namely, KITTI-360-APS and BDD100K-APS. The goal of this task is to predict the pixel-wise semantic segmentation labels of the visible amorphous regions of stuff classes (e.g., road, vegetation, sky, etc.), and the instance segmentation labels of both the visible and occluded countable object regions of thing classes (e.g., cars, trucks, pedestrians, etc.). In this task, each pixel can be assigned more than one class label and instance-ID depending on the visible and occluded regions of objects that it corresponds to, i.e. it allows multi-class and multi-ID predictions. Further, for each segment belonging to a thing class, the task requires knowledge of its visible and occluded regions.

To ensure an unbiased evaluation of these tasks, we follow the common best practice to use a server-side evaluation of the test set results, which enables us to keep the test set labels private. Further, we evaluate the performance of the amodal panoptic predictions using the Amodal Panoptic Quality (APQ) and Amodal Parsing Coverage (APC) evaluation metrics. The test set evaluations are performed using CodaLab Competitions. For each challenge, we set up a competition handling the submissions and scoring them using non-public labels for the test set sequences. Check out the individual competition websites for further details on the participation process. Here, we only provide the leaderboards at KITTI-360-APS and BDD100K-APS.

Important: We will not approve accounts on CodaLab with email addresses from free email providers, e.g., gmail.com, qq.com, web.de, etc. Only university or company email addresses will get access. If you need to use freemail accounts, contact us.

What is Amodal Panoptic Segmentation?

Humans rely on their ability to perceive complete physical structures of objects even when they are only partially visible, to navigate through their daily lives. This ability, known as amodal perception, serves as the link that connects our perception of the world to its cognitive understanding. However, unlike humans, robots are limited to modal perception, which restricts their ability to emulate the visual experience that humans have. In this work, we bridge this gap by proposing the amodal panoptic segmentation task.

Any given scene can broadly be categorized into two components: stuff and thing. Regions that are amorphous or uncountable belong to stuff classes (e.g., sky, road, sidewalk, etc.), and the countable objects of the scene belong to thing classes (e.g., cars, trucks, pedestrians, etc.). The amodal panoptic segmentation task aims to concurrently predict the pixel-wise semantic segmentation labels of visible regions of stuff classes, and instance segmentation labels of both the visible and occluded regions of thing classes. We believe this task is the ultimate frontier of visual recognition and will immensely benefit the robotics community. For example, in automated driving, perceiving the whole structure of traffic participants at all times, irrespective of partial occlusions, will minimize the risk of accidents. Moreover, by inferring the relative depth ordering of objects in a scene, robots can make complex decisions such as in which direction to move relative to the object of interest to obtain a clearer view without additional sensor feedback.

Amodal panoptic segmentation is substantially more challenging as it entails all the challenges of its modal counterpart (scale variations, illumination changes, cluttered background, etc.) while simultaneously requiring more complex occlusion reasoning. This becomes even more complex for non-rigid classes such as pedestrians. These aspects also reflect on the groundtruth annotation effort that it necessitates. In essence, this task requires an approach to fully grasp the structure of objects and how they interact with other objects in the scene to be able to segment occluded regions even for cases that seem ambiguous.

To address the task of amodal panoptic segmentation, we propose two methods:

The top-down approach: APSNet
The bottom-up approach: PAPS

KITTI-360-APS Dataset

We extend the KITTI 360 dataset which has semantic and instance labels with amodal panoptic annotations and name it the KITTI-360-APS dataset. It consists of nine sequences of urban street scenes with annotations for 61,168 images of resolution 1408x376 pixels. Our dataset comprises 10 stuff classes. We define a class as stuff if the class has amorphous regions or is incapable of movement at any point in time. Road, sidewalk, building, wall, fence, pole, traffic sign, vegetation, terrain, and sky are the stuff classes. Further, the dataset consists of 7 thing classes, namely car, pedestrians, cyclists, two-wheeler, van, truck, and other vehicles. Please note that we merge the bicycle and motorcycle class into a single class called two-wheelers.

License Agreement

The data is provided for non-commercial use only. By downloading the data, you accept the license agreement which can be downloaded here. If you report results based on the KITTI-360-APS dataset, please consider citing the paper mentioned in the Publications section.

BDD100K-APS Dataset

Our BDD100K-APS dataset extends the Berkeley Deep Drive (BDD100K) instance segmentation dataset with amodal instance and stuff semantic segmentation groundtruth labels. We provide amodal panoptic annotations for 10 stuff classes and 6 thing classes. Road, sidewalk, building, fence, pole, traffic sign, fence, terrain, vegetation, and sky are the stuff classes. Whereas, pedestrian, car, truck, rider, bicycle, and bus are the thing classes.

License Agreement

The data is provided for non-commercial use only. By downloading the data, you accept the license agreement which can be downloaded here. If you report results based on the BDD100K-APS dataset, please consider citing the paper mentioned in the Publications section.

Videos

Code and Models

A software implementation of this project based on Pytorch can be found in our GitHub repository for academic usage and is released under the GPLv3 license. For any commercial purpose, please contact the authors.

Publications

Rohit Mohan, and Abhinav Valada
Amodal Panoptic Segmentation
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2022.

(Pdf) (Bibtex)

Rohit Mohan, and Abhinav Valada
Perceiving the Invisible: Proposal-Free Amodal Panoptic Segmentation
IEEE Robotics and Automation Letters (RA-L) , 2022.

(Pdf) (Bibtex)

People

Rohit Mohan