xTLD Pipeline: Detection component

This blogpost describes the detection component of xTLD technology which has been awarded in NVIDIA Inception Program Contest. xTLD is still in development.
2017-01-11, Zdenek Kalal, TLD Vision

Submitted video (examples)

Object detection is the key technology for computer vision – a subset of AI which deals with visual perception. While current research focus on detection of large number of object classes, industrial problems often require to detect only a single object class but with significantly higher robustness and accuracy.

To answer this demand, we spent the past year developing an innovative technology codenamed xTLD. As other modern detection methods, xTLD is using deep neural networks for learning and inference, but instead of spreading the power among many classes, it focuses it into single class only. This decision allows us to increase the detection robustness and speed, but also to use the remaining power for additional visual intelligence. In particular, we simultaneously estimate the 3D object pose, which is a challenging task on its own.

xTLD is general and allows to be adapted for a large number of objects which are worth detecting in practice. Nevertheless, we start with human head and refer to this instance as HeadTLD. In near future, we plan to apply xTLD to other classes (e.g. cars, pedestrians) depending on the real demand.

Current features

This image shows the accuracy of alignment. The face is detected/aligned independent of its rotation end even when it covers just 20x20 pixel area.

Training data


Our technology is building on top of relatively low-level libraries in order to ensure maximal flexibility of our system.


Example detections

Target applications

Our price has arrived, thank you NVIDIA