Email: tutor@culerlearn.com   Phone: +44 78 2645 4301

Computer Vision – How YOLO Works

Introduction

This post discusses the steps to using YOLO. It has links to an earlier post on YOLO which you can access on this link – (Computer Vision – YOLO 11 for Object Detection in a Video Feed). In this post, we show a workflow for how to use YOLO for building a computer vision model. There would be a follow-up post on how to use Label-Studio to annotate images for YOLO so keep in touch to get notified.  

The figure below illustrates the general process flow for using YOLO. This flowchart provides a high-level overview of the YOLO process. The specific details and architectures can vary depending on the YOLO version (e.g., YOLOv3, YOLOv4, YOLOv5, YOLOv7) and implementation. We will discuss this in further detail below. 

Discussion of the Steps

  1. Input Image: The original image is the starting point for the YOLO process. The image is usually either a static image, or frames from a video stream.

  2. Preprocessing: The image undergoes preprocessing steps to prepare it for the neural network. This typically includes:

    • Resizing: The image is resized to a fixed size that the YOLO model expects.
    • Normalization: Pixel values are normalized (e.g., scaled to a range between 0 and 1) to improve training and inference.
  3. Feature Extraction (Backbone): A convolutional neural network (CNN) backbone (e.g., Darknet, ResNet) extracts features from the preprocessed image. This backbone is pre-trained on a large dataset (like ImageNet) to learn general image features.

  4. Feature Maps: The backbone produces a set of feature maps that represent the image at different scales and levels of abstraction. These feature maps capture spatial information and object characteristics.

  5. Detection (Head): The detection head of YOLO processes the feature maps to:

    • Bounding Box Regression: Predict the location and size of bounding boxes around potential objects.
    • Class Prediction: Predict the class label (e.g., person, car, bicycle) for each detected object.
    • Confidence Score Prediction: Predict a confidence score indicating how likely it is that a bounding box contains an actual object.
  6. Postprocessing: The raw outputs from the detection head are further processed:

    • Non-Maximum Suppression (NMS): NMS eliminates duplicate or overlapping bounding boxes, keeping only the most confident and accurate ones.
  7. Output: The final output of the YOLO process consists of:

    • Bounding boxes around detected objects.
    • Class labels for each detected object.
    • Confidence scores for each detection.

Key Concepts

Here are a few important concepts to note:

  • Backbone: The CNN used for feature extraction.
  • Head: The part of YOLO that performs detection (bounding box regression, class prediction, and confidence score prediction).
  • Bounding Box: A rectangular region that encloses a detected object.
  • Class Label: The category or type of the detected object.
  • Confidence Score: A measure of how certain YOLO is that a bounding box contains a valid object.
  • Non-Maximum Suppression (NMS): A technique for removing redundant bounding boxes.

 

Start Learning Python Now

If you would like to learn how to use YOLO someday, why not start with learning the Python programming language now. You can register on our (currently) free course. It introduces you to the basic concepts that you will need to jump onto the bandwagon of computer programming, and especially, into the exciting world of Artificial Intelligence (AI) and Computer Vision.

Start small, take your time and learn the concepts. That’s how we started, and it paid off. It paid off by enabling us to assimilate and entrench the basic concepts that we need for learning to code.

You can register, log in, read through the concepts, read the exercises and solutions, and take a break (quite important when learning programming). You would also be able to call attention to the expert tutor to give you some guidelines. Get onto the course now by clicking on this link Python for Beginners

Leave A Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You May Also Like

Introduction Parenting and programming may seem worlds apart, but anyone who’s done both will tell you they aren’t all that...
Introduction Object detection, the ability of a computer vision system to identify and locate objects within an image or video,...
  • 14 February 2025
Introduction Computer vision has revolutionized numerous industries, from security surveillance and autonomous vehicles to healthcare and retail. Among the most...
  • 14 February 2025