We present our latest TLD system designed tracking of rigid 3D objects in video. The system is based on a combination of neural networks and a 3D model and achieves promising trade-off between precision and robustness, all without camera calibration.

Stages


The system operates in 3 stages. First, the object location is detected across multiple scales. The second stage locates object parts on these detections. The third stage perform data association and aligns a 3D model using L1 loss.

Attributes


Our system also allows to estimate additional attributes about the object, such as the name of the team. This information can be presented as a text, or as additional 3D object which accurately follows the motion of the car.

CG Bridge


The live video stream can be paused at any point and instantly zoom in into the CG world. Here it is possible to point out to some specific details about the car, or give more context about the scene.

Generality


The system can be applied to any rigid object. In this case, a toy model of Porsche 911.

The following video shows the system in action.

Human Head
2017-1-17
video
Our journey to TLD3 started with a system called HeadTLD which was our first attempt to simultaneously solve object detection and pose estimation. Our work was recognized by NVIDIA as part of Inception Program Contest.

Porsche 911
2017-10-20
video
Later on, we switched to a different target, more rigid but also much faster -- Porsche 911. This revealed a number of new challenges; mostly related to mutual occlusion. But also clearly indicated that the quality of 3D model impacts the quality of the tracking. So accurate modeling of the target is our preference for the future.

The highest class of auto racing
2017-10-20
video
The next logical step was to apply the system to something even faster -- The most prestigious motor race in the world. Improvements done on this use case resulted in the final version of TLD3.0.

Keyboard
2017-11-24
video
A proof of concept for indoor AR/VR applications which require high robustness for certain types of objects. The system was trained to estimate 6 DoF pose of a keyboard. On the way we developed a marker-based tool which allows to capture training data within several minutes for any object we can put on our work desk.

[Drone]
Under development.