This repository contains instructions to get 2 stages of OCR detector from edge impulse for your target platform, as well as a python script that runs both models and displays the result on the screen. Inference of the models is performed using Edge Impulse Linux Python SDK - so looking at this example you can include the OCR logic (or any edge impulse model inference) in your python application.
This OCR repository requires two models in EIM format:
- A text detector model (object detection, single class). This will find areas of text. For this application, the easiest is to use a generic PaddleOCR detector; but you can swap it out for a custom one (e.g. a YOLO-Pro based license plate detector). Either import a bounding box model into Edge Impulse through Bring-Your-Own-Model or train one from scratch.
- A text recognizer model - which interprets the bounding boxes found by model 1. The only model type supported in this application are PaddleOCR recognizers. Import them into your Edge Impulse through Bring-Your-Own-Model; and set output type "Freeform" (the parsing of the output tensor is done in this application app).
This repository grabs data from your camera, and stitches these models together to run a complete OCR application (incl. nice demo view).
Note: Prebuilt models for Apple-silicon macOS, aarch64 Linux boards and aarch64 Linux boards w/ Qualcomm QNN optimizations (e.g. Rubik Pi, RB3 Gen 2 Vision Kit) are in
models/.
You can replace this stage with any other object detection model - as long as it has a single class (or change classify-camera-webserver.ts to ignore other classes).
- Download PaddleOCR detector model in ONNX format (HF: monkt/paddleocr-onnx).
- Create a new Edge Impulse project, e.g. name it "PaddleOCR detector (pretrained)".
- Click Dashboard > Upload your model.
- On the 'Step 1: Upload pretrained model' screen:
-
Under "Upload your trained model" select
det.onnx. -
Under "Set input shape for ONNX file" set
1, 3, 480, 640(you can change this if you want higher/lower resolution). -
Optional (to quantize the model): Under "Upload representative features" select
source_models/repr_dataset_480_640.npy(from this repo).If you want to use another resolution, you'll need to create a new representative dataset. Run from this repository:
# 1) create a new venv, and install dependencies in source_models/requirements.txt # e.g. on macOS/Linux via 'cd source_models && python3 -m venv .venv && source .venv/bin/activate && pip3 install -r requirements.txt && cd ..' # 2) download an OpenImages subset oi_download_images --base_dir=source_models/openimages --labels Car --limit 200 # 2) create a representative dataset from OpenImages 'car' class, scaled -1..1 python3 source_models/create_representative_dataset.py --height 480 --width 640 --limit 30
-
Click "Upload file".
-
- On the 'Step 2: Process "det.onnx"' screen:
-
Under "Model input" select 'Image'.
-
Under "How is your input scaled?" select 'Pixels range -1..1 (not normalized)'.
-
Under "Model output" select 'Object detection'.
-
Under "Output layer" select 'PaddleOCR detector'.
-
You can now upload an image under 'Check model behavior', and optionally tune the thresholds to perfectly match your text (the defaults should be pretty good).
-
Click Save model.
-
-
Download a PaddleOCR recognizer model (English) in ONNX format (other languages available on HF: monkt/paddleocr-onnx).
If you want to switch languages, also download the
dict.txtfile for that language (e.g. languages/korean) and place it in thesource_modelsfolder of this repository. -
Create a new Edge Impulse project, e.g. name it "PaddleOCR recognizer (pretrained)".
-
Click Dashboard > Upload your model.
-
On the 'Upload pretrained model' screen:
-
Under "Upload your trained model" select
rec.onnx. -
Under "Set input shape for ONNX file" set
1, 3, 48, 320(you can change this if you want higher/lower resolution). -
Optional (to quantize the model): Under "Upload representative features" select
source_models/repr_dataset_32_320.npy(from this repo).If you want to use another resolution, you'll need to create a new representative dataset. Run from this repository:
# 1) create a new venv, and install dependencies in source_models/requirements.txt # e.g. on macOS/Linux via 'cd source_models && python3 -m venv .venv && source .venv/bin/activate && pip3 install -r requirements.txt && cd ..' # 2) download an OpenImages subset oi_download_images --base_dir=source_models/openimages --labels Car --limit 200 # 2) create a representative dataset from OpenImages 'car' class, scaled -1..1 python3 source_models/create_representative_dataset.py --height 48 --width 320 --limit 60
-
Click "Upload file".
-
-
On the 'Step 2: Process "rec.onnx"' screen:
- Under "Model input" select 'Image'.
- Under "How is your input scaled?" select 'Pixels range -1..1 (not normalized)'.
- Under "Resize mode" select 'Squash'.
- Under "Model output" select 'Freeform'.
- Click Save model.
From the device where you want to run your model (so the right hardware optimizations are loaded):
-
Install the Edge Impulse Linux CLI.
-
Download the detector model:
# Download f32 model # When prompted, log in, and select "PaddleOCR detector (pretrained)" edge-impulse-linux-runner --download ./detect-v3-640-480-f32.eim --force-variant float32 --clean # Download i8 model as well (if you've quantized before) edge-impulse-linux-runner --download ./detect-v3-640-480-i8.eim --force-variant int8
-
Download the recognizer model:
# Download f32 model # When prompted, log in, and select "PaddleOCR recognizer (pretrained)" edge-impulse-linux-runner --download ./recognizer-320-48-f32.eim --force-variant float32 --clean # Download i8 model as well (if you've quantized before) edge-impulse-linux-runner --download ./recognizer-320-48-i8.eim --force-variant int8
This repository also includes a Python implementation that runs inference on the camera stream and displays the result on the screen if required.
-
Install Python dependencies (recommended: use the repo's
env/venv or create your own):pip3 install -r python/requirements.txt
-
Run the Python app:
python3 python/python-inference.py --detect-file ./models/mac-arm64/detect-v3-640-480-i8.eim --predict-file ./models/mac-arm64/recognizer-320-48-f32.eim --dict-file ./source_models/rec_en_dict.txt
Notes:
- Use
--displayif you want to display the camera feed with the results overlay
If you have multiple cameras you can select the OpenCV device index:
python3 python/python-inference.py --camera 1 --detect-file ... --predict-file ... --dict-file ... --display
- Use
You can develop and build locally, then sync to another machine via sync.sh. E.g.:
bash sync.sh ubuntu@rubikpiThen ssh into your remote machine and just run the already built script:
cd ocr-demo-linux
python3 python/python-inference.py --camera 1 --detect-file ... --predict-file ... --dict-file ...

