Python | OpenCV 多物件偵測與追蹤

📚 前言

在上一篇 物件分類與定位 中，我們學會了以 MobileNet SSD 在圖片與影片中辨識物件並標記位置。
這一篇要進一步介紹 多物件偵測與追蹤 (Multi-Object Detection & Tracking)。
偵測是同時找到多個物件的位置，追蹤則是持續更新它們的移動並維持 ID。這是進入智慧監控與交通分析的重要技術。

🎨 範例圖片與影片

圖片

來源：Pexels - Street Image，屬於無版權圖片，可自由下載與使用。
內容：圖片呈現街道場景，畫面中有車輛與行人，非常適合用來測試多物件偵測。
下載後將檔名改為 street.jpg，放到專案的 assets/ 目錄下。

影片

來源：Pexels - Street Video，屬於無版權影片，可自由下載與使用。
內容：影片呈現街道上的車輛與行人移動畫面，非常適合用來測試即時多物件追蹤。
下載後將檔名改為 street.mp4，放到專案的 assets/ 目錄下。

🔎 原理說明

多物件偵測：同時輸出多個邊界框與類別。
追蹤 (Tracking)：持續更新物件位置，並維持唯一 ID。
常見方法：
- YOLO (You Only Look Once)：即時多物件偵測。
- SSD (Single Shot Detector)：輕量化偵測模型。
- DeepSORT：結合偵測結果，維持物件 ID。

📂 模型下載與使用說明

YOLOv3-tiny (Darknet)

檔案名稱：
- yolov3-tiny.cfg
- yolov3-tiny.weights
下載來源：
- YOLO 官方網站 — pjreddie.com/darknet/yolo

使用方式：

1	net = cv2.dnn.readNetFromDarknet("models/yolov3-tiny.cfg", "models/yolov3-tiny.weights")

💡 YOLOv3-tiny 是輕量化版本，適合即時應用。

🧠 函式與參數說明

📌 `cv2.dnn.readNetFromDarknet()`

載入 YOLO 模型

1	net = cv2.dnn.readNetFromDarknet(cfg, weights)

cfg：模型結構檔案。
weights：訓練好的權重檔案。

📌 `cv2.dnn.blobFromImage()`

將圖片轉換成 DNN 輸入格式

1	blob = cv2.dnn.blobFromImage(image, scalefactor, size, mean, swapRB, crop)

image：輸入圖片。
scalefactor：縮放比例，常用 1/255.0。
size：輸入大小 (width, height)，例如 (416, 416)。
mean：減去的平均值，常用 (0, 0, 0)。
swapRB：是否交換 R 與 B 通道，常用 True。
crop：是否裁切圖片，常用 False。

💻 範例程式 — YOLO 多物件偵測 (圖片)

# yolo_detection_image.py
import cv2

net = cv2.dnn.readNetFromDarknet("models/yolov3-tiny.cfg", "models/yolov3-tiny.weights")
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]

img = cv2.imread("assets/street.jpg")
(h, w) = img.shape[:2]
blob = cv2.dnn.blobFromImage(img, 1/255.0, (416, 416), swapRB=True, crop=False)

net.setInput(blob)
outputs = net.forward(output_layers)

for output in outputs:
    for detection in output:
        scores = detection[5:]
        class_id = int(scores.argmax())
        confidence = scores[class_id]
        if confidence > 0.5:
            box = detection[0:4] * [w, h, w, h]
            (centerX, centerY, width, height) = box.astype("int")
            x = int(centerX - width / 2)
            y = int(centerY - height / 2)
            cv2.rectangle(img, (x, y), (x + int(width), y + int(height)), (0, 255, 0), 2)

cv2.imshow("YOLO Object Detection", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

圖：圖片多物件偵測，框選多個物件

💻 範例程式 — YOLO 多物件偵測 (影片)

# yolo_detection_video.py
import cv2

net = cv2.dnn.readNetFromDarknet("models/yolov3-tiny.cfg", "models/yolov3-tiny.weights")
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]

cap = cv2.VideoCapture("assets/street.mp4")

while True:
    ret, frame = cap.read()
    if not ret:
        break

    (h, w) = frame.shape[:2]
    blob = cv2.dnn.blobFromImage(frame, 1/255.0, (416, 416), swapRB=True, crop=False)

    net.setInput(blob)
    outputs = net.forward(output_layers)

    for output in outputs:
        for detection in output:
            scores = detection[5:]
            class_id = int(scores.argmax())
            confidence = scores[class_id]
            if confidence > 0.5:
                box = detection[0:4] * [w, h, w, h]
                (centerX, centerY, width, height) = box.astype("int")
                x = int(centerX - width / 2)
                y = int(centerY - height / 2)
                cv2.rectangle(frame, (x, y), (x + int(width), y + int(height)), (0, 255, 0), 2)

    cv2.imshow("YOLO Video Detection", frame)

    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

圖：影片多物件偵測，逐幀框選多個物件

💻 範例程式 — YOLO + DeepSORT 多物件追蹤

# yolo_deepsort_tracking.py
from deep_sort_realtime.deepsort_tracker import DeepSort
import cv2

net = cv2.dnn.readNetFromDarknet("models/yolov3-tiny.cfg", "models/yolov3-tiny.weights")
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]

tracker = DeepSort(max_age=30)

cap = cv2.VideoCapture("assets/street.mp4")

while True:
    ret, frame = cap.read()
    if not ret:
        break

    (h, w) = frame.shape[:2]
    blob = cv2.dnn.blobFromImage(frame, 1/255.0, (416, 416), swapRB=True, crop=False)
    net.setInput(blob)
    outputs = net.forward(output_layers)

    detections = []
    for output in outputs:
        for detection in output:
            scores = detection[5:]
            class_id = int(scores.argmax())
            confidence = scores[class_id]
            if confidence > 0.5:
                box = detection[0:4] * [w, h, w, h]
                (centerX, centerY, width, height) = box.astype("int")
                x = int(centerX - width / 2)
                y = int(centerY - height / 2)
                detections.append(([x, y, int(width), int(height)], confidence, class_id))

    tracks = tracker.update_tracks(detections, frame=frame)
    for track in tracks:
        if not track.is_confirmed():
            continue
        x1, y1, x2, y2 = track.to_ltrb()
        track_id = track.track_id
        cv2.rectangle(frame, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 255), 2)
        cv2.putText(frame, f"ID {track_id}", (int(x1), int(y1)-10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,255), 2)

    cv2.imshow("YOLO + DeepSORT Tracking", frame)

    if cv2.waitKey(30) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

圖：影片多物件偵測與追蹤，持續追蹤多個物件並維持唯一 ID

🛠️ 使用 DeepSORT 常見問題與解決辦法

在整合 YOLO 與 DeepSORT 的過程中，常會遇到一些環境或套件相容性的問題：

ModuleNotFoundError: No module named 'pkg_resources'
- 原因：Python 3.12+ 移除了 pkg_resources，導致 DeepSORT 初始化失敗。
- 解決：改用 Python 3.10 或 3.11 建立虛擬環境，並安裝 setuptools。
ModuleNotFoundError: No module named 'torch'
- 原因：PyTorch 尚未安裝。
- 解決：在虛擬環境中安裝 PyTorch 與 TorchVision：
  1
  pip install torch torchvision
ModuleNotFoundError: No module named 'torchvision'
- 原因：TorchVision 尚未安裝。
- 解決：補安裝 TorchVision：
  1
  pip install torchvision
環境衝突或安裝不完整
- 症狀：明明安裝了套件，仍然報錯。
- 解決：確認 Python 解譯器路徑正確，必要時刪掉 .venv，重新建立乾淨的虛擬環境，並一次安裝所有依賴：
  1
  2
  3
  4
  python3.10 -m venv .venv310
  .\.venv310\Scripts\activate
  pip install --upgrade pip wheel setuptools
  pip install numpy opencv-python torch torchvision deep-sort-realtime

💡 建議建立 requirements.txt，一次安裝所有依賴，避免逐一補套件。

⚠️ 注意事項

YOLO 模型檔案需先下載並放在專案目錄。
YOLOv3-tiny 適合即時應用，但準確度有限。
DeepSORT 需要額外的 ReID 模型，才能維持物件 ID。
偵測速度與準確度會因模型不同而有差異。

📊 應用場景

交通監控：同時追蹤多輛車輛。
群眾分析：統計人流數量與移動路徑。
智慧零售：追蹤顧客行為。
安全監控：辨識並追蹤可疑人物。

🎯 結語

本篇我們學會了如何使用 OpenCV 與 YOLO 模型進行 多物件偵測與追蹤，並結合 DeepSORT 維持物件 ID。
下一篇進入 與深度學習框架整合，了解如何將 OpenCV 與 PyTorch、TensorFlow 等框架搭配使用。

📖 如在學習過程中遇到疑問，或是想了解更多相關主題，建議回顧一下 Python | OpenCV 系列導讀，掌握完整的章節目錄，方便快速找到你需要的內容。

註：以上參考了
OpenCV 官方文件 — Tutorials
OpenCV 官方文件 — Python Tutorials
YOLO 官方網站 — pjreddie.com/darknet/yolo
deep-sort-realtime GitHub
Pexels — 免費圖片與影片素材

Python | OpenCV 物件分類與定位

Python | OpenCV 與深度學習框架整合

↑
If you enjoy the article, please feel free to donate~ Thx.
若本文對您有幫助，您也願意支持打賞，謝謝您的鼓勵。

本文由J.J. Huang 創作，採用CC BY 3.0 TW協議進行許可。可自由轉載、引用，但需署名作者且註明文章出處。

J.J.'s Blogs

J.J. Huang 2026-03-01 Python OpenCV 07.物件偵測與辨識篇瀏覽次數：次 {{moment(1772326800000).fromNow()}}

Python | OpenCV 多物件偵測與追蹤

📚 前言

🎨 範例圖片與影片

🔎 原理說明

📂 模型下載與使用說明

YOLOv3-tiny (Darknet)

🧠 函式與參數說明

📌 `cv2.dnn.readNetFromDarknet()`

📌 `cv2.dnn.blobFromImage()`

💻 範例程式 — YOLO 多物件偵測 (圖片)

💻 範例程式 — YOLO 多物件偵測 (影片)

💻 範例程式 — YOLO + DeepSORT 多物件追蹤

🛠️ 使用 DeepSORT 常見問題與解決辦法

⚠️ 注意事項

📊 應用場景

🎯 結語

J.J. Huang 2026-03-01 Python OpenCV 07.物件偵測與辨識篇 瀏覽次數：次 {{moment(1772326800000).fromNow()}}

Python | OpenCV 多物件偵測與追蹤

📚 前言

🎨 範例圖片與影片

🔎 原理說明

📂 模型下載與使用說明

YOLOv3-tiny (Darknet)

🧠 函式與參數說明

📌 cv2.dnn.readNetFromDarknet()

📌 cv2.dnn.blobFromImage()

💻 範例程式 — YOLO 多物件偵測 (圖片)

💻 範例程式 — YOLO 多物件偵測 (影片)

💻 範例程式 — YOLO + DeepSORT 多物件追蹤

🛠️ 使用 DeepSORT 常見問題與解決辦法

⚠️ 注意事項

📊 應用場景

🎯 結語

J.J. Huang 2026-03-01 Python OpenCV 07.物件偵測與辨識篇瀏覽次數：次 {{moment(1772326800000).fromNow()}}

📌 `cv2.dnn.readNetFromDarknet()`

📌 `cv2.dnn.blobFromImage()`