ML Inference

Generic inference module powered by ONNX Runtime. Loads any .onnx model file, reads its port layout from an optional model-description.md companion file, and maps model inputs/outputs to Usine ports in real time using a dedicated background worker thread.

ONNX (Open Neural Network Exchange) is an open format for machine-learning models. Models trained in PyTorch, TensorFlow, scikit-learn, or any compatible framework can be exported to .onnx and run directly in Usine without any Python runtime. The module handles all type conversions between Usine flow types and the flat float tensors expected by the model.

Settings

Settings

model

File path to the .onnx model file. Click to open a file browser filtered to *.onnx. When a file is selected, the module:

  1. Loads the model into ONNX Runtime.
  2. Looks for a model-description.md file in the same directory as the .onnx file.
  3. Configures input and output ports according to the description (or falls back to array ports if no description is found).
  4. Starts a background inference thread.

The model is reloaded automatically when the threads setting changes.

threads

Number of CPU threads allocated to ONNX Runtime's intra-op parallelism. Range 1 – 8, default 1. Increase for large models or on multi-core systems; for small models a single thread usually gives lower latency.

Inputs

The module provides 2 input ports (input-0, input-1). Their captions and flow types are set automatically when a model is loaded.

Port state Meaning
not used No model loaded, or the model has fewer inputs than this slot.
port name from model Configured by the model-description.md frontmatter.

Port flow types depend on the flow field in the model description:

Flow type Usine type Notes
array FT_ARRAY Default. Float array of fixed size.
data FT_DATA_FLOAT Single float value (size always 1).
audio FT_AUDIO Audio signal. Automatic resampling and windowing. See Audio ports below.
video FT_VIDEO Video frame. Automatic resize and pixel-to-float conversion. See Video ports below.

Outputs

The module provides 5 output ports (output-0 to output-4). Their captions and flow types are configured by the model description, following the same rules as inputs.

time

Inference latency in milliseconds, measured from the moment inputs are submitted to the worker thread until the result is written back. Useful for performance profiling.

model-description.md

Place a file named model-description.md next to your .onnx file to control port layout, captions, and flow types. Without this file the module still works but all ports default to array flow and use the raw ONNX tensor names as captions.

The file uses a YAML frontmatter block (between --- markers). Any text after the second --- is loaded as the model description and shown in the module's Properties panel.

Frontmatter structure

---
usine:
  version: 1
  name: "My Model"
  description: "Short description shown in Properties."

  inputs:
    - name: input_tensor_name     # must match the ONNX tensor name exactly
      label: "friendly caption"   # optional, shown on the port
      flow: array                 # array | data | audio | video
      size: 128                   # number of float elements

  outputs:
    - name: output_tensor_name
      label: "result"
      flow: data
---

Longer description shown in the Properties panel.

Audio-specific fields

When flow: audio, additional fields control windowing and resampling:

Field Description
size Window size in model samples (e.g. 16000 for 1 s at 16 kHz).
sample_rate Model sample rate in Hz. Usine resamples from its own sample rate automatically.
hop Hop size in model samples. When set, a new inference is triggered every hop samples rather than every size samples (overlapping windows). If omitted, defaults to size (no overlap).
normalize Normalization applied before submitting to the model. Currently supported: zero-mean-unit-variance.
  inputs:
    - name: audio_input
      label: "microphone"
      flow: audio
      size: 16000        # 1 s window at 16 kHz
      sample_rate: 16000
      hop: 8000          # inference every 0.5 s (50% overlap)
      normalize: zero-mean-unit-variance

Video-specific fields

When flow: video, the input frame is resized to the model's expected resolution and converted to a flat float tensor in [0, 1]:

Field Description
width Expected frame width in pixels.
height Expected frame height in pixels.
channels Number of color channels: 1 (grayscale), 3 (RGB), or 4 (RGBA).
format Tensor layout: chw (channels-first) or hwc (channels-last, default).
  inputs:
    - name: image
      label: "camera"
      flow: video
      width: 224
      height: 224
      channels: 3
      format: chw       # PyTorch convention

How it works

Inference pipeline

Usine Process thread          Background worker thread
─────────────────────         ─────────────────────────
[input data ready]
  → copy to worker buffer
  → signal worker
                              ← wake up
                              ← run ONNX session
                              ← write outputs to buffer
[next process cycle]
  → check HasOutput
  → copy outputs to ports
  → write InferenceTime

The worker thread runs asynchronously: the Process thread never blocks. If inference is slower than the process block rate, the module silently skips frames. This ensures Usine's audio engine is never stalled.

Audio windowing

For audio ports, samples are written into a ring buffer as they arrive. When the ring buffer has accumulated enough samples to fill one window:

  1. The window is extracted and optionally normalized.
  2. If the model sample rate differs from Usine's rate, the window is resampled using cubic interpolation.
  3. The resampled buffer is submitted to the worker thread.
  4. The next window starts hop samples later.

Output audio is upsampled back to Usine's sample rate after inference.

Video resize

If the incoming frame dimensions differ from the model's expected width × height, the frame is rescaled using a fast low-quality resize on the worker thread. Pixel values are converted to [0, 1] floats (dividing by 255). Channel ordering follows the format field.

Workflow — step by step

1. Export your model to ONNX

From PyTorch:

torch.onnx.export(model, dummy_input, "my_model.onnx",
                  input_names=["features"],
                  output_names=["label_scores"])

2. Write a model-description.md

Create model-description.md next to my_model.onnx:

---
usine:
  version: 1
  name: "My Classifier"
  inputs:
    - name: features
      flow: array
      size: 32
  outputs:
    - name: label_scores
      flow: array
      size: 10
---
Classifies a 32-element feature vector into 10 categories.

3. Load in Usine

  • Drop an ML Inference module into your patch.
  • In Settings, click model and select my_model.onnx.
  • The two input ports reconfigure to match your description.
  • Wire your data source to the input port and read the output array.

4. Monitor

  • Connect the time output to a display to watch inference latency.
  • Check the Usine trace panel for load errors or fallback messages.

Notes and limits

Port captions and flow types are set at load time. Changing the model file reconfigures all ports immediately. Wires connected to ports whose flow type changes will be disconnected automatically.

The maximum number of inputs is 2 and the maximum number of outputs is 5. If your model has more tensors, only the first ones (up to the limit) will be exposed.

If no model-description.md is found, all ports default to array flow using the raw ONNX tensor names. This is sufficient for simple array-to-array models.

Audio input ports accumulate samples across process blocks. There is an inherent latency equal to one window size at minimum. For real-time audio processing, prefer small window sizes and use hop to control the inference rate.

Common Settings

info

show manual

Opens the web browser to display information or help about the selected object, if it exists.

For more details about information/help creation, see create-help-file.

description

Description of the module for internal help purposes only. The description is not displayed in the interface.

ID's

visible only in god mode, see setup-panel-tab-expert.

unique ID

Current private ID for this control used to identify the object.

preset ID

Current private preset ID for this control used for presets.

recreate ID

If you experience difficulties in Polyphonic mode, try to recreate new id(s) with this button.

repair ID s

Each Patch shared on the local network uses its own ID (identification number). If you experience issues of Patches that don't send information to the good target, this button will rebuild all these id's.

Object Remote Address

absolute

Absolute remote address. see objects-address.

local

Local to the current patch remote address. see objects-address.

user addr

User defined remote address. see objects-address.

See also

version 7.0.250121

Edit All Pages