Code Duplication Detection
Primus Security v2.2.0 adds token-based code duplication detection (Phase 7) — the same algorithmic class as SonarQube's Copy-Paste Detector (CPD).
How it works
- Tokenise every
.csfile, discarding whitespace and comments - Normalise tokens — identifiers →
$ID, string literals →$STR, numbers →$NUM(so renamed variables don't break matching) - Sliding window — compute a Rabin-Karp rolling hash over windows of
MinBlockTokenstokens - Index fingerprints → locations
- Merge adjacent/overlapping blocks from the same file pair
- Report duplicate blocks with file, line range, and duplication percentage
Enabling
Duplication detection is opt-in (adds scan time proportional to codebase size):
// appsettings.json
{
"PrimusSecurity": {
"EnableDuplicationDetection": true,
"DuplicationMinBlockTokens": 100,
"QualityGate": {
"MaxDuplicateBlocks": 10
}
}
}
Or via the CLI:
primus-scan ./MyApp --duplication --max-duplication 10
Reading results
var result = await scanner.ScanAsync("./MyApp");
var dup = result.DuplicationReport;
if (dup != null)
{
Console.WriteLine($"Duplicate blocks: {dup.DuplicateBlocks.Count}");
Console.WriteLine($"Duplicated tokens: {dup.DuplicatedPercent:F1}%");
foreach (var block in dup.DuplicateBlocks)
{
Console.WriteLine("Duplicate block:");
foreach (var loc in block.Locations)
Console.WriteLine($" {loc.FilePath}:{loc.StartLine}-{loc.EndLine}");
}
}
Configuration reference
| Option | Default | Description |
|---|---|---|
EnableDuplicationDetection | false | Opt-in — disabled by default |
DuplicationMinBlockTokens | 100 | ~10 lines of code. Increase to reduce noise |
DuplicationMaxBlocks | -1 | Quality gate threshold. -1 = disabled |
QualityGate.MaxDuplicateBlocks | -1 | Equivalent gate field in QualityGate object |
SARIF output
When duplication is enabled, the SARIF run.properties section includes:
{
"duplicateBlocks": 3,
"duplicationPercent": 4.2
}
Performance notes
- Runs after SAST analysis in
ScanAsync() - Scales linearly with total token count
- A 50,000-line codebase typically completes in under 3 seconds
- Set
DuplicationMinBlockTokens = 200for large repos to reduce noise