TAPVid-3D is a dataset and benchmark for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). The dataset consists of 4,000+ real-world videos and 2.1 million metric 3D point trajectories, spanning a variety of object types, motion patterns, and indoor and outdoor environments.
While point tracking in two dimensions (TAP-2D) has many benchmarks measuring performance on real-world videos, such as TAPVid-DAVIS [2], benchmarks for three-dimensional point tracking on real-world videos were lacking. To fill this gap, we built a new benchmark for 3D point tracking leveraging existing footage.
To measure performance on the TAP-3D task, we formulated a Jaccard-based metric to handle the complexities of ambiguous depth scales across models, occlusions, and multi-track spatio-temporal smoothness.
In the paper, we assess the current state of the TAP-3D task by constructing competitive baselines using existing tracking models, such as SpatialTracker. You can read more and find out how to download and generate the data using the GitHub link above. We hope you'll find the benchmark useful!
#clips |
#trajs per clip |
#frames per clip |
#videos |
#scenes |
resolution |
fps |
---|---|---|---|---|---|---|
4569 |
50 - 1024 |
25 − 300 |
2828 |
255 |
Multiple |
10 / 30 |
The annotations and code to generate TAPVid-3D are released under a slightly modified Apache 2.0 license, as described in the LICENSE file in GitHub. In particular, to use the code and annotations for a particular data subset (Waymo Open, Aria Digital Twin, and Panoptic Studio), you must agree and adhere to license and terms of use for usage of the corresponding data and annotations.
1. Kubric: A scalable dataset generator is a data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.
2. TAP-Vid: A Benchmark for Tracking Any Point in a Video builds an evaluation dataset with 2D points tracked across real videos.
3. Tracking Everything Everywhere All at Once presents a test-time optimization method for estimating dense and long-range motion from a video sequence.
4. Pointodyssey: A large-scale synthetic dataset for long-term point tracking propose a large-scale synthetic dataset and data generation framework for 3D dynamic scenes.
5. SpatialTracker: Tracking Any 2D Pixels in 3D Space estimates point trajectories in 3D space and runs 2D tracking evaluation and 3D tracking evaluation on synthetic videos.