Skip to Content
Scanning & DetectionOverview

Scanning & Detection Overview

Scanning is the core capability of Slim.io. The platform discovers and classifies sensitive data in cloud storage by processing files through a multi-layered detection pipeline. This section covers how scans work, the detection engine architecture, and the configuration options available.

Scan Types

TypeTriggerScopeUse Case
Full ScanManual or scheduledAll files in the connector scopeInitial baseline, periodic re-scan
Incremental ScanManual or scheduledFiles modified since the last scanOngoing monitoring with lower cost
Event-Driven ScanCloud storage eventSingle file or batchReal-time detection on new uploads

Scan Lifecycle

Every scan progresses through a defined lifecycle:

Created → Queued → Running → Completed Failed / Cancelled
  • Created — Scan job is initialized
  • Queued — Waiting for available workers within tier limits
  • Running — Workers are actively processing files
  • Completed — All files processed, findings stored
  • Failed — Unrecoverable error (e.g., credential issue, infrastructure failure)
  • Cancelled — Manually stopped by the user

Detection Pipeline

Each file passes through a multi-stage detection pipeline:

  1. Probabilistic Pre-Screen — A fast statistical check that quickly eliminates files unlikely to contain sensitive data, focusing scan effort on the highest-value targets.
  2. Classifier Execution — Files that pass the pre-screen are analyzed by all active classifiers (pattern, dictionary, proximity, checksum, and ML-assisted).
  3. Confidence Scoring — Each match receives a confidence score based on classifier type, pattern specificity, and contextual signals from surrounding fields.
  4. AI Disambiguation (Optional) — Findings that fall in a configurable ambiguous range are escalated to a multi-provider AI pipeline with automatic failover, which adjudicates the final classification.
  5. Deduplication — Overlapping findings from multiple classifiers are merged into a single canonical finding.
  6. Finding Storage — Final findings are persisted with full provenance metadata, including detection method and classifier version.

The probabilistic pre-screen significantly reduces processing time and cost by skipping low-value files before invoking the full classifier stack — without compromising recall on files that are likely to contain sensitive data.

Supported File Formats

Slim.io can process a wide range of file formats:

CategoryFormats
Structured DataCSV, TSV, JSON, JSONL, Parquet, Avro, ORC
DocumentsPDF, DOCX, XLSX, TXT, RTF
ConfigurationYAML, TOML, XML, INI, ENV
LogsPlain text, structured JSON logs
ArchivesZIP, GZIP, TAR (contents extracted and scanned)

Scan Tier Limits

Scan capacity is governed by your subscription tier:

TierMax Files / ScanMax Concurrent WorkersScans / Month
Free10,000210
Starter100,000550
Professional1,000,00020Unlimited
EnterpriseUnlimitedCustomUnlimited

Learn More

Last updated on