We present our latest TLD system designed tracking of rigid 3D objects in video. The system is based on a
combination of neural networks and a 3D model and achieves promising trade-off
between precision and robustness, all without camera calibration.
Stages
The system operates in 3 stages. First, the object location is detected across multiple scales. The second
stage locates object parts on these detections. The third stage perform data association and aligns a 3D model
using L1 loss.
Attributes
Our system also allows to estimate additional attributes about the object, such as the name of the team. This
information can be presented as a text, or as additional 3D object which accurately follows the motion of the
car.
CG Bridge
The live video stream can be paused at any point and instantly zoom in into the CG world. Here it is possible
to point out to some specific details about the car, or give more context about the scene.
Generality
The system can be applied to any rigid object. In this case, a toy model of Porsche 911.
Our journey to TLD3 started with a system called HeadTLD which was our first attempt to simultaneously solve
object detection and pose estimation. Our work was recognized by NVIDIA as part of Inception
Program Contest.
Later on, we switched to a different target, more rigid but also much faster -- Porsche 911. This revealed a
number of new challenges; mostly related to mutual occlusion. But also clearly indicated that the quality of 3D
model impacts the quality of the tracking. So accurate modeling of the target is our preference for the future.
The next logical step was to apply the system to something even faster -- The most prestigious motor race in the
world. Improvements done on this use case resulted in the final version of TLD3.0.
A proof of concept for indoor AR/VR applications which require high robustness for certain
types of objects. The system was trained to estimate 6 DoF pose of a keyboard. On the way we developed a
marker-based tool which allows to capture training data within several minutes for any object we can put on our
work desk.