How to Combine YOLO Object Detection, Object Tracking, and Drone Control for Autonomous Navigation

5 min readOct 4, 2024

Autonomous drones equipped with computer vision capabilities are revolutionizing industries ranging from aerial photography and surveillance to disaster relief and agriculture. By combining YOLO (You Only Look Once) object detection, motion tracking, and drone control, you can create a smart, autonomous drone that can detect, track, and follow objects automatically.

This article will guide you through the steps required to merge these technologies, discuss key components of the system, and provide sample code that demonstrates how these components work together.

Components of the System

There are three major parts to creating a drone that can autonomously follow objects:

Object Detection with YOLO: The first step is to detect objects in real-time using a pre-trained YOLO model, which excels at identifying objects in individual frames with high speed and accuracy.
Object Tracking: Once objects are detected, a tracking algorithm helps maintain consistency by predicting the object’s position in future frames. This reduces noise from object detection errors and provides smoother control input to the drone.
Drone Control: The final piece is using the detected and tracked object’s position to generate movement commands that allow the drone to follow or approach the object.

Step 1: Object Detection with YOLO

YOLO (You Only Look Once) is a state-of-the-art deep learning model for real-time object detection. YOLO models, particularly the later versions like YOLOv5 or YOLOv8, can be trained on custom datasets or used out-of-the-box to detect common objects like people, cars, or animals.

Here’s how YOLO works in our drone system:

The drone streams a video feed using its onboard camera.
Each frame of the video is passed to the YOLO model.
YOLO detects objects within the frame, providing bounding boxes and class labels for each object.

Example of YOLO object detection code:

import torch

# Load YOLO model (assume YOLOv5 is used)
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
# Get the video feed frame (assuming you have a frame from the drone's camera)
results = model(frame)
# Extract object detection results (bounding boxes, labels, confidence)
detections = results.xyxy[0]  # x1, y1, x2, y2, confidence, class

Step 2: Object Tracking

Object detection only works on a frame-by-frame basis, which can result in jumpy behavior if objects move quickly or if detection confidence fluctuates. By combining YOLO with a tracking algorithm, such as SORT (Simple Online and Realtime Tracking) or DeepSORT, the drone can track objects over time.

Tracking predicts where the object will be in future frames, maintaining continuity and making drone navigation smoother.

The key steps in object tracking are:

Initialize the tracker: After YOLO detects an object, the bounding box coordinates are passed to a tracker.
Update the tracker: In subsequent frames, the tracker adjusts the bounding box to account for the object’s movement, even if YOLO does not detect the object in a particular frame.

Here’s an example of integrating object tracking:

# Initialize SORT object tracker
from sort import Sort

tracker = Sort()
# Update tracker with detected bounding boxes from YOLO
tracked_objects = tracker.update(detections)

Now, tracked_objects will contain the predicted bounding boxes for the objects across multiple frames, which the drone can use to plan its movements.

Step 3: Drone Control Using Detected and Tracked Data

Once the drone has detected and tracked an object, the next step is to move toward or follow the object. To do this, we must translate the position of the tracked object into movement commands for the drone.

Control Strategy

The basic idea is to calculate the error between the object’s current position in the camera frame and the center of the frame, which represents the drone’s point of view. The error is then used to adjust the drone’s position so that it moves toward the object.

For example:

If the object is to the left of the frame center, the drone should move left.
If the object is higher than the center, the drone should ascend.
If the object is too far or too close, the drone adjusts its forward or backward speed.

This approach often uses Proportional Control (P-Control) or more advanced controllers like PID (Proportional-Integral-Derivative) to ensure smooth and stable movement.

Here is an example of how the control logic can be implemented:

# Get the drone's camera frame dimensions
frame_center_x = frame_width / 2
frame_center_y = frame_height / 2

# Get the center of the tracked object
object_center_x = (tracked_object[0] + tracked_object[2]) / 2
object_center_y = (tracked_object[1] + tracked_object[3]) / 2
# Calculate the error between the object's position and the center of the frame
error_x = object_center_x - frame_center_x
error_y = object_center_y - frame_center_y
# Apply proportional control to determine the drone's movement
Kp = 0.1  # Proportional gain
speed_x = Kp * error_x
speed_y = Kp * error_y
# Send control commands to the drone (assuming DJI Tello SDK)
drone.send_rc_control(int(speed_x), int(speed_y), 0, 0)

In this example:

speed_x controls the left-right movement of the drone.
speed_y controls the forward-backward movement of the drone.

You can expand this control logic to include altitude control and yaw adjustment, ensuring the drone orients itself toward the object and adjusts its height to match the object’s position.

Tools and Frameworks

To implement YOLO-based object detection, object tracking, and drone control, you need a combination of tools and libraries:

YOLOv5 or YOLOv8: For object detection, available through frameworks like PyTorch or OpenCV.
SORT or DeepSORT: For object tracking. These algorithms can be integrated with Python and work seamlessly with YOLO detections.
DJI Tello SDK: If you’re using a DJI Tello drone, the Tello SDK provides an easy-to-use interface for sending control commands to the drone.
DroneKit or ROS: For more advanced drone control (e.g., if you’re using drones with ArduPilot or PX4).

Example End-to-End System

The following code snippet shows a simplified end-to-end system that detects, tracks, and follows a person using a DJI Tello drone and YOLOv5:

import cv2
from djitellopy import Tello
from yolov5 import YOLOv5
from sort import Sort

# Initialize the drone, YOLO model, and tracker
drone = Tello()
drone.connect()
drone.streamon()

yolo_model = YOLOv5("yolov5s.pt")
tracker = Sort()

while True:
    # Get frame from drone's camera
    frame = drone.get_frame_read().frame
    
    # Detect objects
    results = yolo_model(frame)
    
    # Update tracker
    tracked_objects = tracker.update(results.xyxy[0])
    
    # Control logic (move drone based on object center and frame center)
    for obj in tracked_objects:
        x1, y1, x2, y2 = obj[:4]  # Bounding box coordinates
        object_center_x = (x1 + x2) / 2
        object_center_y = (y1 + y2) / 2
        
        frame_center_x = frame.shape[1] / 2
        frame_center_y = frame.shape[0] / 2
        
        error_x = object_center_x - frame_center_x
        error_y = object_center_y - frame_center_y
        
        # Move the drone based on the error
        speed_x = int(Kp * error_x)
        speed_y = int(Kp * error_y)
        
        drone.send_rc_control(speed_x, speed_y, 0, 0)
    
    # Exit condition
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
# Land the drone
drone.land()

Result

Combining YOLO object detection, object tracking, and drone control allows you to build autonomous drones capable of detecting and following objects in real-time. With advances in deep learning and affordable drone hardware, these systems are increasingly accessible for hobbyists and professionals alike.

By following this guide and using the sample code provided, you can develop your own smart drone that autonomously navigates and interacts with its environment in exciting ways. Whether you’re building an intelligent surveillance system or simply experimenting with autonomous flight, this setup provides a solid foundation for further innovation.