Creating Datasets for YOLOv8 +
Last updated
Last updated
Before labeling, you need a dataset of images containing the objects you want to detect.
Ensure a diverse dataset (different backgrounds, angles, lighting).
Use at least 5000–1000 images for decent results.
Resize images (if necessary) to a standard resolution (e.g., 640x640).
To train a YOLOv8 model, you need to annotate your images with bounding boxes.
You can use various tools to label your dataset:
(for manual labeling)
Open the labeling tool and load your images.
Draw bounding boxes around each object.
Assign a class label to each object.
Save the annotations in YOLO format.
Each image should have a corresponding .txt
file with the same name, containing lines in this format:
class_id: Index of the object class (starting from 0).
x_center, y_center: Bounding box center (normalized, 0 to 1).
width, height: Bounding box dimensions (normalized).
Example for a cat (class 0) and dog (class 1):
💡 Tip: Always normalize bounding box coordinates relative to image size.
Your dataset should be structured as follows:
Create a data.yaml
file that defines your dataset:
Replace "cat", "dog"
with your actual class names.
Make sure you have installed Ultralytics YOLOv8:
Run the following command to train your model:
"yolov8n.pt"
→ Start with YOLOv8 nano; switch to yolov8s.pt
, yolov8m.pt
, or yolov8l.pt
for better accuracy.
epochs=50
→ Number of training cycles (increase for better results).
imgsz=640
→ Image size (should match dataset).
batch=16
→ Number of images per batch (adjust based on GPU memory).
device="cuda"
→ Use GPU (change to "cpu"
if needed).
After training, you can evaluate the model:
mAP (Mean Average Precision) is the key metric for object detection.
Precision & Recall help measure how well the model detects objects.
Run inference on a test image:
This will display and save the output with detected objects.
You can export the trained model to different formats:
Supported formats: ONNX
✔ More Data = Better Model
✔ Augment Data = Improve Generalization (flip
, rotate
, color jitter
)
✔ Higher Epochs = Better Accuracy (but watch for overfitting)
✔ Monitor Loss & mAP = Track Performance (tensorboard --logdir=runs
)