Backends
Backends are configured via the backends config field. All configured backends
run in order and their results are merged.
Default: ["imageproc"]
{ "backends": ["atspi", "imageproc", "ocrs"] }
Merge pipeline
AT-SPI ──────┐
imageproc ───┤── merge ──→ filter tiny ──→ overlap cull ──→ label
ocrs ────────┘
When multiple backends produce overlapping children:
- Text references from fallback backends reclassify BFS components (
Element→Text) when overlap exceeds 95% - Original backend Text references are discarded — only BFS components survive
- Pairwise overlap culling prefers
TextoverElement
AT-SPI (async D-Bus)
Connects to the system’s accessibility bus via atspi + zbus. Walks the
accessibility tree of the focused window.
- Queries roles, states, and geometry via batched async calls
- Max recursion depth: 20
- Max children per level: 500
- Filters by application state (Sensitive + Showing + Visible)
- Excludes roles: PANEL, FRAME, MENU_BAR, TOOL_BAR, LIST, SCROLL_PANE, TABLE, etc.
- Timeout: 150ms tokio timeout + 250ms hard deadline
Text role detection
The following AT-SPI roles produce ChildKind::Text:
Label(36), Text(74), DocumentText(87), Static(116), Paragraph(73), Heading(83)
Everything else becomes ChildKind::Element.
Requirements
systemctl --user status at-spi-dbus-bus.service
Some applications (VS Code, browsers) may not expose accessibility info unless launched with appropriate flags.
Imageproc (computer vision)
Screenshot-based detection. Runs when "imageproc" is in the backends list.
Pipeline
- Screenshot — X11
GetImageon the window region - Grayscale conversion — dual pass: max-of-RGB (for edges) and weighted luminance (for text detection)
- Canny edge detection — configurable min/max thresholds and detection scale
- Text word detection — horizontal projection → text line bands → vertical projection per line → word segments
- Dilation — morphological dilation to connect nearby edges
- BFS — 4-direction flood fill → connected components
- Filter — remove components >50% of window
- Overlap analysis — BFS components overlapping text words by >95% are
reclassified as
Text
Output
- BFS connected components →
ChildKind::Element - Text line/word segments →
ChildKind::Text - Debug images saved to
/tmp/qhints_debug/whendev.save_debug_imagesis enabled
Per-app tuning
{
"application_rules": {
"firefox": {
"canny_min_val": 20,
"canny_max_val": 50,
"kernel_size": 3,
"detection_scale": 1.0
}
}
}
Higher canny_min_val = fewer edges (sparser detection).
Higher detection_scale = upscale before detection (more detail, slower).
Timeout: 5 seconds.
OCR (optional)
Feature-gated. Build with --features ocr to enable.
Uses ocrs + rten for text detection and recognition. Downloads pre-trained
models (~35MB) from AWS S3 on first run to ~/.cache/qhints/ocrs/.
Note: Requires clang + libclang-dev system packages.
Pipeline
- Screenshot — same X11
GetImageas imageproc - OCR —
ocrs::OcrEnginedetects word bounding boxes - Word boxes →
ChildKind::Text - BFS gap-filling — Canny edge detection + dilation + BFS on the same screenshot to find non-text elements
- Filter — BFS components overlapping OCR words by >30% are removed
- Merge — BFS components appended first, then word boxes
Models
text-detection.rten— finds text regionstext-recognition.rten— recognizes characters (not used for positioning)
Timeout: 15 seconds.