TAPVid-3D: A Benchmark for Tracking Any Point in 3D

TAPVid-3D is a dataset and benchmark for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). The dataset consists of 4,000+ real-world videos and 2.1 million metric 3D point trajectories, spanning a variety of object types, motion patterns, and indoor and outdoor environments.

While point tracking in two dimensions (TAP-2D) has many benchmarks measuring performance on real-world videos, such as TAPVid-DAVIS [2], benchmarks for three-dimensional point tracking on real-world videos were lacking. To fill this gap, we built a new benchmark for 3D point tracking leveraging existing footage.

To measure performance on the TAP-3D task, we formulated a Jaccard-based metric to handle the complexities of ambiguous depth scales across models, occlusions, and multi-track spatio-temporal smoothness.

In the paper, we assess the current state of the TAP-3D task by constructing competitive baselines using existing tracking models, such as SpatialTracker. You can read more and find out how to download and generate the data using the GitHub link above. We hope you'll find the benchmark useful!

Statistics Overview

#clips	#trajs per clip	#frames per clip	#videos	#scenes	resolution	fps
4569	50 - 1024	25 − 300	2828	255	Multiple	10 / 30

1. Kubric: A scalable dataset generator is a data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.

2. TAP-Vid: A Benchmark for Tracking Any Point in a Video builds an evaluation dataset with 2D points tracked across real videos.

3. Tracking Everything Everywhere All at Once presents a test-time optimization method for estimating dense and long-range motion from a video sequence.

4. Pointodyssey: A large-scale synthetic dataset for long-term point tracking propose a large-scale synthetic dataset and data generation framework for 3D dynamic scenes.

5. SpatialTracker: Tracking Any 2D Pixels in 3D Space estimates point trajectories in 3D space and runs 2D tracking evaluation and 3D tracking evaluation on synthetic videos.

TAPVid-3D: A Benchmark for Tracking Any Point in 3D

What is the dataset?

Video Summary

Dataset Samples

Statistics Overview

Licensing

Related Links