Skip to content

edgeimpulse/example-ocr-linux-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Two-stage OCR for Linux (Edge Impulse)

This repository contains instructions to get 2 stages of OCR detector from edge impulse for your target platform, as well as a python script that runs both models and displays the result on the screen. Inference of the models is performed using Edge Impulse Linux Python SDK - so looking at this example you can include the OCR logic (or any edge impulse model inference) in your python application.

Demo

This OCR repository requires two models in EIM format:

  1. A text detector model (object detection, single class). This will find areas of text. For this application, the easiest is to use a generic PaddleOCR detector; but you can swap it out for a custom one (e.g. a YOLO-Pro based license plate detector). Either import a bounding box model into Edge Impulse through Bring-Your-Own-Model or train one from scratch.
  2. A text recognizer model - which interprets the bounding boxes found by model 1. The only model type supported in this application are PaddleOCR recognizers. Import them into your Edge Impulse through Bring-Your-Own-Model; and set output type "Freeform" (the parsing of the output tensor is done in this application app).

This repository grabs data from your camera, and stitches these models together to run a complete OCR application (incl. nice demo view).

Uploading your models

Note: Prebuilt models for Apple-silicon macOS, aarch64 Linux boards and aarch64 Linux boards w/ Qualcomm QNN optimizations (e.g. Rubik Pi, RB3 Gen 2 Vision Kit) are in models/.

Text detector (pretrained PaddleOCR model)

You can replace this stage with any other object detection model - as long as it has a single class (or change classify-camera-webserver.ts to ignore other classes).

  1. Download PaddleOCR detector model in ONNX format (HF: monkt/paddleocr-onnx).
  2. Create a new Edge Impulse project, e.g. name it "PaddleOCR detector (pretrained)".
  3. Click Dashboard > Upload your model.
  4. On the 'Step 1: Upload pretrained model' screen:
    1. Under "Upload your trained model" select det.onnx.

    2. Under "Set input shape for ONNX file" set 1, 3, 480, 640 (you can change this if you want higher/lower resolution).

    3. Optional (to quantize the model): Under "Upload representative features" select source_models/repr_dataset_480_640.npy (from this repo).

      If you want to use another resolution, you'll need to create a new representative dataset. Run from this repository:

      # 1) create a new venv, and install dependencies in source_models/requirements.txt
      # e.g. on macOS/Linux via 'cd source_models && python3 -m venv .venv && source .venv/bin/activate && pip3 install -r requirements.txt && cd ..'
      
      # 2) download an OpenImages subset
      oi_download_images --base_dir=source_models/openimages --labels Car --limit 200
      
      # 2) create a representative dataset from OpenImages 'car' class, scaled -1..1
      python3 source_models/create_representative_dataset.py --height 480 --width 640 --limit 30
    4. Click "Upload file".

  5. On the 'Step 2: Process "det.onnx"' screen:
    1. Under "Model input" select 'Image'.

    2. Under "How is your input scaled?" select 'Pixels range -1..1 (not normalized)'.

    3. Under "Model output" select 'Object detection'.

    4. Under "Output layer" select 'PaddleOCR detector'.

    5. You can now upload an image under 'Check model behavior', and optionally tune the thresholds to perfectly match your text (the defaults should be pretty good).

      PaddleOCR detector in Edge Impulse

    6. Click Save model.

Text recognizer (pretained PaddleOCR recognizer)

  1. Download a PaddleOCR recognizer model (English) in ONNX format (other languages available on HF: monkt/paddleocr-onnx).

    If you want to switch languages, also download the dict.txt file for that language (e.g. languages/korean) and place it in the source_models folder of this repository.

  2. Create a new Edge Impulse project, e.g. name it "PaddleOCR recognizer (pretrained)".

  3. Click Dashboard > Upload your model.

  4. On the 'Upload pretrained model' screen:

    1. Under "Upload your trained model" select rec.onnx.

    2. Under "Set input shape for ONNX file" set 1, 3, 48, 320 (you can change this if you want higher/lower resolution).

    3. Optional (to quantize the model): Under "Upload representative features" select source_models/repr_dataset_32_320.npy (from this repo).

      If you want to use another resolution, you'll need to create a new representative dataset. Run from this repository:

      # 1) create a new venv, and install dependencies in source_models/requirements.txt
      # e.g. on macOS/Linux via 'cd source_models && python3 -m venv .venv && source .venv/bin/activate && pip3 install -r requirements.txt && cd ..'
      
      # 2) download an OpenImages subset
      oi_download_images --base_dir=source_models/openimages --labels Car --limit 200
      
      # 2) create a representative dataset from OpenImages 'car' class, scaled -1..1
      python3 source_models/create_representative_dataset.py --height 48 --width 320 --limit 60
    4. Click "Upload file".

  5. On the 'Step 2: Process "rec.onnx"' screen:

    1. Under "Model input" select 'Image'.
    2. Under "How is your input scaled?" select 'Pixels range -1..1 (not normalized)'.
    3. Under "Resize mode" select 'Squash'.
    4. Under "Model output" select 'Freeform'.
    5. Click Save model.

Downloading your models in EIM format

From the device where you want to run your model (so the right hardware optimizations are loaded):

  1. Install the Edge Impulse Linux CLI.

  2. Download the detector model:

    # Download f32 model
    # When prompted, log in, and select "PaddleOCR detector (pretrained)"
    edge-impulse-linux-runner --download ./detect-v3-640-480-f32.eim --force-variant float32 --clean
    
    # Download i8 model as well (if you've quantized before)
    edge-impulse-linux-runner --download ./detect-v3-640-480-i8.eim --force-variant int8
  3. Download the recognizer model:

    # Download f32 model
    # When prompted, log in, and select "PaddleOCR recognizer (pretrained)"
    edge-impulse-linux-runner --download ./recognizer-320-48-f32.eim --force-variant float32 --clean
    
    # Download i8 model as well (if you've quantized before)
    edge-impulse-linux-runner --download ./recognizer-320-48-i8.eim --force-variant int8

Running the OCR application (Python)

This repository also includes a Python implementation that runs inference on the camera stream and displays the result on the screen if required.

  1. Install Python dependencies (recommended: use the repo's env/ venv or create your own):

    pip3 install -r python/requirements.txt
  2. Run the Python app:

    python3 python/python-inference.py --detect-file ./models/mac-arm64/detect-v3-640-480-i8.eim --predict-file ./models/mac-arm64/recognizer-320-48-f32.eim --dict-file ./source_models/rec_en_dict.txt

    Notes:

    • Use --display if you want to display the camera feed with the results overlay

    If you have multiple cameras you can select the OpenCV device index:

    python3 python/python-inference.py --camera 1 --detect-file ... --predict-file ... --dict-file ... --display

Developing locally, running remote

You can develop and build locally, then sync to another machine via sync.sh. E.g.:

bash sync.sh ubuntu@rubikpi

Then ssh into your remote machine and just run the already built script:

cd ocr-demo-linux
python3 python/python-inference.py --camera 1 --detect-file ... --predict-file ... --dict-file ...

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published