Scanning & Detection Overview
Scanning is the core capability of Slim.io. The platform discovers and classifies sensitive data in cloud storage by processing files through a multi-layered detection pipeline. This section covers how scans work, the detection engine architecture, and the configuration options available.
Scan Types
| Type | Trigger | Scope | Use Case |
|---|---|---|---|
| Full Scan | Manual or scheduled | All files in the connector scope | Initial baseline, periodic re-scan |
| Incremental Scan | Manual or scheduled | Files modified since the last scan | Ongoing monitoring with lower cost |
| Event-Driven Scan | Cloud storage event | Single file or batch | Real-time detection on new uploads |
Scan Lifecycle
Every scan progresses through a defined lifecycle:
Created → Queued → Running → Completed
↓
Failed / Cancelled- Created — Scan job is initialized
- Queued — Waiting for available workers within tier limits
- Running — Workers are actively processing files
- Completed — All files processed, findings stored
- Failed — Unrecoverable error (e.g., credential issue, infrastructure failure)
- Cancelled — Manually stopped by the user
Detection Pipeline
Each file passes through a multi-stage detection pipeline:
- Probabilistic Pre-Screen — A fast statistical check that quickly eliminates files unlikely to contain sensitive data, focusing scan effort on the highest-value targets.
- Classifier Execution — Files that pass the pre-screen are analyzed by all active classifiers (pattern, dictionary, proximity, checksum, and ML-assisted).
- Confidence Scoring — Each match receives a confidence score based on classifier type, pattern specificity, and contextual signals from surrounding fields.
- AI Disambiguation (Optional) — Findings that fall in a configurable ambiguous range are escalated to a multi-provider AI pipeline with automatic failover, which adjudicates the final classification.
- Deduplication — Overlapping findings from multiple classifiers are merged into a single canonical finding.
- Finding Storage — Final findings are persisted with full provenance metadata, including detection method and classifier version.
The probabilistic pre-screen significantly reduces processing time and cost by skipping low-value files before invoking the full classifier stack — without compromising recall on files that are likely to contain sensitive data.
Supported File Formats
Slim.io can process a wide range of file formats:
| Category | Formats |
|---|---|
| Structured Data | CSV, TSV, JSON, JSONL, Parquet, Avro, ORC |
| Documents | PDF, DOCX, XLSX, TXT, RTF |
| Configuration | YAML, TOML, XML, INI, ENV |
| Logs | Plain text, structured JSON logs |
| Archives | ZIP, GZIP, TAR (contents extracted and scanned) |
Scan Tier Limits
Scan capacity is governed by your subscription tier:
| Tier | Max Files / Scan | Max Concurrent Workers | Scans / Month |
|---|---|---|---|
| Free | 10,000 | 2 | 10 |
| Starter | 100,000 | 5 | 50 |
| Professional | 1,000,000 | 20 | Unlimited |
| Enterprise | Unlimited | Custom | Unlimited |
Learn More
- Scan Management — Starting, controlling, and monitoring scans
- Parallel Scanning Engine — Distributed scan architecture
- Scanner Fleet — Agentless and connector-based scanning infrastructure
- Classifiers — Detection rule types and configuration (170 built-in rules across 50+ countries)
- Smart Scanning Modes — Full, Incremental, Smart, and Bootstrap scan modes
- PII Detection Engine — Multi-stage detection pipeline architecture
- Detection-as-Code — YAML-based classifier definitions
- LLM Assist — AI-powered false positive reduction
- Event-Driven Scanning — Real-time scan triggers
Last updated on