Teoria
Teoria
The pipeline separates detection from replacement. Detectors return spans; strategies replace those spans. This lets the same detection report feed masking, hashing, tokenisation, or dropping.
Overlap policy
When detections overlap, the engine sorts by lower offset, then by longer length for ties. A later overlapping detection is discarded. This makes output stable across runs.
flowchart LR
A[Detector outputs] --> B[Sort by offset]
B --> C[Keep first non-overlapping span]
C --> D[Apply replacements right-to-left]
Hash namespace
For a truncated hash namespace of size m, approximate collision probability follows:
Use longer prefixes when unique-value volume grows or when pseudonymous joins are operationally important.