Object Detection

Object detection is a computer vision task that identifies and localises objects within an image or video frame by predicting bounding boxes and class labels - enabling machines to understand what objects are present and where they are.

Object detection has evolved from hand-crafted features (HOG + SVM) through two-stage detectors (Faster R-CNN) to single-stage architectures (YOLO, SSD) and most recently to end-to-end detection transformers (DETR, RT-DETR).

Detection transformers remove the need for hand-tuned components like non-maximum suppression (NMS) and anchor boxes. They use a set-based loss (Hungarian matching) to directly predict a fixed set of detections in parallel. RT-DETR (Real-Time Detection Transformer) achieves YOLO-level speed with superior accuracy and is fully open-source under Apache 2.0.

In production, object detection models must handle diverse conditions: varying lighting, camera angles, partial occlusion, and class imbalance. Deployment considerations include model quantisation (INT8, FP16), batch inference, and integration with tracking systems for video applications.

Datameister deploys detection transformers for real-time applications in sports analytics, industrial inspection, and autonomous systems - typically as the first stage of a larger visual intelligence pipeline.

Related Capabilities

Computer Vision

See All Research Tracks

From the Blog

Why DETRs are replacing YOLOs for real-time object detection

Lab Perception

Detection Transformers (DETRs) have matured into real-time capable object detectors, rivaling YOLOs in both speed and accuracy. Despite early challenges, advancements like deformable attention, denoising training, and top-k query selection paved the way for the first real-time Detection Transformer RT-DETR, introduced by a team of Baidu researchers in 2024. Recent innovations like D-Fine’s fine-grained localization and DEIMv2’s foundation-model backbones push accuracy even further. Additionally, all DETR models and weights are released under the permissive Apache 2.0 License, enabling free use and commercial adaptation. At Datameister, we integrate these cutting-edge models into our vision library for high-performance, adaptable, and production-ready detection systems for complex, specific problems.

Larsen D'hietNovember 21, 2025

Read Article

Related Terms

Computer Vision MLOps LiDAR

← Back to Glossary