We ran the Japanese model of the open-source OCR engine "EasyOCR" on the "RB3 Gen2 Lite" evaluation board equipped with the Qualcomm QCS5430.
AI accelerator (NPU)By utilizing this technology, we are able to perform Japanese OCR processing solely on the device without a cloud connection.
Supported characters: Kanji, Hiragana, Katakana, and alphanumeric characters (total of 2,214 characters)
・NPU inference time: Detector 159 ms + Recognizer 22 ms
• No cloud required — processing is completed on the device.
Why run OCR on the edge?
OCR (Optical Character Recognition) is a technology that recognizes characters in an image and converts them into text data.
It is used in a wide range of fields, including inspection work in manufacturing, reading invoices in logistics, and digitizing documents in offices.
While cloud-based OCR services have become more sophisticated in recent years, "edge AI" that completes processing on the device is required in the following cases:
• Security requirements — Documents containing confidential information or personal data cannot be sent to the cloud.
• Communication environment limitations — Environments where a stable network cannot be secured, such as factories, warehouses, and construction sites.
• Real-time capabilities — Situations requiring low-latency processing, such as on-line inspection.
• Running costs — We want to avoid pay-as-you-go charges for cloud APIs and operate with fixed costs.
AI model used
The EasyOCR used in this study is an open-source OCR engine developed by JaidedAI that supports over 80 languages. The model is also publicly available on Qualcomm AI Hub and consists of a two-stage pipeline: text area detection (Detector) and character recognition (Recognizer).
The EasyOCR model (※2) available on AI Hub (※1) only supports English. Therefore, in this demo, we applied the Japanese weighting of EasyOCR and converted it to a model format compatible with Qualcomm SoC, enabling support for 2,214 characters including kanji, hiragana, and katakana.
*1) Overview of Qualcomm AI Hub and registration
Qualcomm AI Hub - Semiconductor Business -Macnica
*2) Qualcomm AI Hub EasyOCR model page
EasyOCR - Qualcomm AI Hub
processing pipeline
• Detector (CRAFT) – Detects text regions from the input image and outputs bounding Box.
- Text area extraction - Each detected area is extracted as a separate image.
• Recognizer – Recognizes text from extracted images and outputs the text and confidence score.
In this demo, we applied Japanese language weights (2,214 characters in total, including kanji, hiragana, katakana, and alphanumeric characters) to the AI Hub base model and performed quantization and optimization using the Qualcomm AI Runtime SDK.
*Qualcomm AI Runtime SDK (QAIRT SDK) is a software development tool for edge AI development provided by Qualcomm.
testing environment
• Evaluation board: Qualcomm QCS5430-equipped "RB3 Gen2 Lite"
AI Accelerator (NPU): HTP (Hexagon Tensor Processor)
• Host PC: Ubuntu 22.04 (WSL2)
・Qualcomom AI Runtime SDK:v2.44
Supported languages: Japanese + English
Implementation in Edge AI
To enable high-speed inference on Qualcomm'sNPU using EasyOCR's PyTorch model, we performed model transformation, quantization, and optimization.
Model Optimization Process
PyTorch (.pth) → ONNX (.onnx) → QNN transformation / w8a8 quantization → Context Binary (.bin) → Device deployment
We used the QNN SDK and quantized both the weights and activations using INT8 (w8a8). For the calibration data required for quantization, we used images containing actual Japanese text.
Operation results
I actually performed OCR processing on an image containing Japanese text using the RB3 Gen2 Lite.
Recognition result
Source: Wikipedia "Macnica"
The EasyOCRdemo is being run usingan image with text like the one shown on the left as the target for inference.
The green bounding Box in the image shows the text area detection result, and the terminal shows the character recognition result (text + confidence level).
Inference time (when using NPU)
・Detector:159 ms
・Recognizer:22 ms
Total: 181ms
This is the average value obtained from multiple measurements of the same input image. Hardware acceleration using the NPUensures sufficient speed for real-time processing on edge devices.
Summary
This articledemonstrated how to run the EasyOCR Japanese model withNPU acceleration on an evaluation board equipped with the Qualcomm QCS5430.
- Edge inference using an open-source OCR engine (EasyOCR) on a Qualcomm SoC
- Optimize the model using INT8 quantization with Qualcomm AI Runtime SDK andachieve high-speed inference onthe NPU.
- Supports 2,214 characters, including Japanese, and processing is completed entirely on the device without a cloud connection.
- Achieve sufficient inference speed for real-time processing on edge devices.
Inquiry
If you have any questions about the contents of this page or would like detailed product information, please contact us here.