Week 4
In Week 4, we faced major challenges in Optical Character Recognition (OCR) for license plates. The initial OCR strategy delivered poor results due to multiple factors, including motion blur, occlusion, perspective distortion, and character misalignment. To overcome these issues, we implemented license plate character correction, refined segmentation techniques, and integrated more advanced text detection models. Eventually, we selected the CRAFT model for robust text detection and further optimized our pipeline to ensure high accuracy. Additionally, we extended our work to video processing to enable real-time application.
Challenges in OCR and Initial Solutions
1. Factors Affecting OCR Performance
Our initial OCR implementation struggled due to:
- Blurriness and Occlusion: Vehicles at varying distances caused blurred license plate images, while partial occlusions further degraded recognition accuracy.
- Perspective Distortion: Different camera angles led to skewed and curved license plates, making character segmentation difficult.
- Logo Interference: Some plates contained additional elements like car brand logos, which interfered with character extraction.
To address these challenges, we implemented license plate character correction to preprocess input images before OCR.
2. License Plate Character Correction
At first, we relied on traditional computer vision techniques to highlight character regions. This approach encountered issues similar to those in Week 1, such as inadequate segmentation under challenging conditions. As a result, we pivoted towards license plate character segmentation.
Refining Character Segmentation
1. Pixel Projection-Based Segmentation
- Horizontal and Vertical Projections: By analyzing pixel density along both axes, we identified peaks that indicated character boundaries.
- Bounding Box Extraction: Using peak positions, we determined approximate character locations and extracted individual characters.
2. License Plate Rotation Correction
To further improve character alignment, we developed a license plate rotation correction module:
- Otsu Thresholding: Applied for binarization, improving edge detection.
- Hough Line Detection: Identified horizontal and vertical edges to determine the plate's tilt angle.
- Affine and Perspective Transformation: Aligned the license plate to a standard orientation before applying OCR.
Adopting CRAFT for Robust Text Detection
Given the limitations of traditional methods, we integrated the CRAFT (Character Region Awareness for Text detection) model.
Advantages of CRAFT
- Resilient to Skewed Plates: Unlike projection-based methods, CRAFT can detect characters regardless of perspective distortion.
- Better Character Localization: CRAFT outputs character bounding boxes with high precision, improving segmentation.
Challenges with CRAFT and Optimization Strategies
While CRAFT provided robust text detection, it occasionally misclassified logos and unrelated characters as part of the license plate. To refine the output, we:
- Fine-Tuned the Model: We retrained CRAFT using a dedicated license plate dataset.
- Applied Height-Based Filtering: Removed bounding boxes that were too large or too small relative to typical characters.
- Corrected Skewed Bounding Boxes: Calculated each box’s inclination angle and applied perspective transformation.
- Merged Character Boxes: Concatenated properly aligned characters to form a complete license plate string.
After these refinements, we achieved a 0.97 confidence score in license plate recognition.
Expanding the Pipeline to Video Processing
Following the success of static image OCR, we extended our pipeline to video streams:
- Frame Extraction: Processed video by extracting frames at fixed intervals.
- Applying the Pipeline: Performed vehicle detection, license plate localization, character segmentation, and OCR on each frame.
- Generating Output Frames: Created output frames containing:
- Vehicle Bounding Boxes: Indicating detected vehicles with confidence scores.
- License Plate Bounding Boxes: Highlighting detected plates with confidence scores.
- OCR Results: Displaying extracted license plate text along with confidence scores.
We successfully deployed this system on Raspberry Pi, achieving real-time feedback with lower frame rates due to the pipeline’s computational demands.
Future Optimization Strategies
To improve real-time performance and reduce processing time per frame, we identified two key enhancements:
-
SORT Object Tracking:
- Implement a Simple Online Realtime Tracker (SORT) to track detected vehicles across frames.
- Avoid redundant re-processing of the same vehicle, reducing computational load and improving FPS.
-
Pipeline A/B Voting Mechanism:
- Develop an alternative OCR pipeline (Pipeline B) alongside our current system (Pipeline A).
- Compare results from both pipelines and use a voting mechanism to determine the most reliable OCR output.
评论
发表评论